Datasets
| Dataset | Tasks |
|---|---|
livecodebench/livecodebench | 100 |
terminal-bench-pro/terminal-bench-pro | 200 |
openai/swe-lancer-diamond-all | 463 |
sldbench/sldbench | 8 |
openai/simpleqa | 4,326 |
harbor/hello-world | 1 |
abundant/swe-gen-js | 1,000 |
gpqa-diamond/gpqa-diamond | 198 |
openai/swe-lancer-diamond-manager | 265 |
openai/swe-lancer-diamond-ic | 198 |
codepde/codepde | 5 |
kumo/kumo-hard | 250 |
swe-bench/swe-bench-verified | 500 |
swe-bench/swe-smith | 100 |
abundant/swe-gen-cpp | 999 |
abundant/swe-gen-go | 1,000 |
swt-bench/swt-bench-verified | 433 |
algotune/algotune | 154 |
abundant/swe-gen-java | 1,000 |
abundant/swe-gen-rust | 1,000 |
qcircuitbench/qcircuitbench | 28 |
quixbugs/quixbugs | 80 |
deveval/deveval | 63 |
termigen/termigen-environments | 3,566 |
vmax/vmax-tasks | 1,043 |
satbench/satbench | 2,100 |
scale-ai/swe-bench-pro | 731 |
featurebench/featurebench | 200 |
usaco/usaco | 304 |
lica-world/gdb | 33,786 |
193 datasets