Datasets
| Dataset | Tasks |
|---|---|
| Dataset | Tasks |
|---|---|
| Dataset | Tasks |
|---|---|
scale-ai/swe-atlas-rf | 70 |
scale-ai/swe-atlas-tw | 90 |
scale-ai/swe-bench-pro | 731 |
scienceagentbench/scienceagentbench | 102 |
sierra-research/tau3-bench | 375 |
sldbench/sldbench | 8 |
stanford/medagentbench | 300 |
strongreject/strongreject | 150 |
swe-bench/swe-bench-verified | 500 |
swe-bench/swe-smith | 100 |
swt-bench/swt-bench-verified | 433 |
tencent/autocodebench | 200 |
termigen/termigen-environments | 3,566 |
terminal-bench/terminal-bench-2-1 | 89 |
terminal-bench-pro/terminal-bench-pro | 200 |
theagentcompany/theagentcompany | 174 |
thetalab/vector-edit-gym | 106 |
usaco/usaco | 304 |
vals/financeagent | 50 |
vmax/vmax-tasks | 1,043 |
webgen-bench/webgen-bench | 101 |
xiaoboai/pawbench | 150 |
xlang/ds-1000 | 1,000 |
yanagiorigami/frontier-cs | 172 |
204 datasets