Datasets

DatasetTasks
livecodebench/livecodebench
100
terminal-bench-pro/terminal-bench-pro
200
openai/swe-lancer-diamond-all
463
sldbench/sldbench
8
openai/simpleqa
4,326
harbor/hello-world
1
abundant/swe-gen-js
1,000
gpqa-diamond/gpqa-diamond
198
openai/swe-lancer-diamond-manager
265
openai/swe-lancer-diamond-ic
198
codepde/codepde
5
kumo/kumo-hard
250
swe-bench/swe-bench-verified
500
swe-bench/swe-smith
100
abundant/swe-gen-cpp
999
abundant/swe-gen-go
1,000
swt-bench/swt-bench-verified
433
algotune/algotune
154
abundant/swe-gen-java
1,000
abundant/swe-gen-rust
1,000
qcircuitbench/qcircuitbench
28
quixbugs/quixbugs
80
deveval/deveval
63
termigen/termigen-environments
3,566
vmax/vmax-tasks
1,043
satbench/satbench
2,100
scale-ai/swe-bench-pro
731
featurebench/featurebench
200
usaco/usaco
304
lica-world/gdb
33,786