Datasets
| Dataset | Tasks |
|---|---|
| Dataset | Tasks |
|---|---|
| Dataset | Tasks |
|---|---|
codepde/codepde | 5 |
cookbook/test | 1 |
crustbench/crustbench | 100 |
dbt-labs/ade-bench | 48 |
deveval/deveval | 63 |
evoeval/evoeval | 100 |
featurebench/featurebench | 200 |
featurebench/featurebench-lite | 30 |
featurebench/featurebench-lite-modal | 30 |
featurebench/featurebench-modal | 200 |
futurehouse/bixbench | 205 |
futurehouse/bixbench-cli | 205 |
futurehouse/labbench | 181 |
gaia/gaia | 165 |
gnucleus-ai/cad-bench | 100 |
gorilla/bfcl | 3,641 |
gorilla/bfcl_parity | 123 |
gpqa-diamond/gpqa-diamond | 198 |
grafana/o11y-bench | 63 |
harbor/hello-world | 1 |
harbor/rewardhackbench | 846 |
harveyai/lab | 1,251 |
ineqmath/ineqmath | 100 |
ivanleo/agent-search | 20 |
kumo/kumo-1 | 5,300 |
kumo/kumo-easy | 5,050 |
kumo/kumo-hard | 250 |
kumo/kumo-parity | 212 |
lawbench/lawbench | 1,000 |
lcb/longswebench-32k | 3 |
204 datasets