Datasets
⌘K
kumo/kumo-easy
latestrev. 1
5,050 tasks
reasoning-gym/reasoning-gym-easy
latestrev. 1
288 tasks
livecodebench/livecodebench
latestrev. 1
100 tasks
terminal-bench-pro/terminal-bench-pro
latestrev. 1
200 tasks
openai/swe-lancer-diamond-all
latestrev. 1
463 tasks
sldbench/sldbench
latestrev. 1
8 tasks
openai/simpleqa
latestrev. 1
4,326 tasks
harbor/hello-world
latestrev. 1
1 taskAlex Shaw
abundant/swe-gen-js
latestrev. 1
Dataset of 1000 JS/TS SWE tasks
1,000 tasksrishi
gpqa-diamond/gpqa-diamond
latestrev. 1
198 tasks
openai/swe-lancer-diamond-manager
latestrev. 1
265 tasks
openai/swe-lancer-diamond-ic
latestrev. 1
198 tasks
codepde/codepde
latestrev. 1
5 tasks
kumo/kumo-hard
latestrev. 1
250 tasks
swe-bench/swe-bench-verified
latestrev. 1
500 tasks
swe-bench/swe-smith
latestrev. 1
100 tasks
abundant/swe-gen-cpp
latestrev. 1
Dataset of 1000 C++ SWE tasks
999 tasksrishi
abundant/swe-gen-go
latestrev. 1
Dataset of 1000 Go SWE tasks
1,000 tasksrishi
swt-bench/swt-bench-verified
latestrev. 1
433 tasks
algotune/algotune
latestrev. 1
154 tasks
abundant/swe-gen-java
latestrev. 1
Dataset of 1000 Java SWE tasks
1,000 tasksrishi
abundant/swe-gen-rust
latestrev. 1
Dataset of 1000 Rust SWE tasks
1,000 tasksrishi
qcircuitbench/qcircuitbench
latestrev. 1
28 tasks
quixbugs/quixbugs
latestrev. 1
80 tasks
deveval/deveval
latestrev. 1
63 tasks
termigen/termigen-environments
latestrev. 1
3,566 tasks
vmax/vmax-tasks
latestrev. 1
1,043 tasks
satbench/satbench
latestrev. 1
2,100 tasks
scale-ai/swe-bench-pro
latestrev. 2
731 tasks
featurebench/featurebench
latestrev. 1
200 tasks
91 datasets