Datasets
| Dataset | Tasks |
|---|---|
| Dataset | Tasks |
|---|---|
Terminal-Bench 2.1 subset for coding, software engineering, debugging, and game-like reasoning tasks, curated for Novita Sandbox Hackathon.
harbor run -d NovitaAI/tb21-code-debugCoding, software engineering, debugging, build repair, language/runtime work, and game-like reasoning tasks from Terminal-Bench 2.1.
This public Harbor dataset is curated by NovitaAI for a Novita Sandbox hackathon track. It is a category-based subset of Terminal-Bench 2.1. Agents and models are not fixed; the only required runtime environment for the hackathon is Novita Sandbox.
NovitaAI/tb21-code-debug-e novitaRun the full track once:
harbor run \
-d NovitaAI/tb21-code-debug \
-a <agent> \
-m <model> \
-e novita \
-k 1 \
-n 1 \
-y
Run a small smoke test from the track:
harbor run \
-d NovitaAI/tb21-code-debug \
-a <agent> \
-m <model> \
-e novita \
-l 1 \
-k 1 \
-n 1 \
-y
Upload a public result for the hackathon leaderboard:
harbor upload jobs/<job_name> --public
Submit the resulting Harbor Hub job link to the hackathon leaderboard form.
A valid track submission should satisfy:
NovitaAI/tb21-code-debug.environment.type = "novita".Suggested ranking fields:
terminal-bench/build-cython-extterminal-bench/build-pmarsterminal-bench/build-pov-rayterminal-bench/cancel-async-tasksterminal-bench/chess-best-moveterminal-bench/circuit-fibsqrtterminal-bench/cobol-modernizationterminal-bench/code-from-imageterminal-bench/custom-memory-heap-crashterminal-bench/fix-gitterminal-bench/fix-ocaml-gcterminal-bench/git-leak-recoveryterminal-bench/gpt2-codegolfterminal-bench/headless-terminalterminal-bench/kv-store-grpcterminal-bench/make-doom-for-mipsterminal-bench/make-mips-interpreterterminal-bench/merge-diff-arc-agi-taskterminal-bench/overfull-hboxterminal-bench/path-tracingterminal-bench/path-tracing-reverseterminal-bench/polyglot-c-pyterminal-bench/polyglot-rust-cterminal-bench/prove-plus-commterminal-bench/pypi-serverterminal-bench/regex-chessterminal-bench/schemelike-metacircular-evalterminal-bench/sqlite-db-truncateterminal-bench/torch-pipeline-parallelismterminal-bench/torch-tensor-parallelismterminal-bench/winning-avg-corewarsterminal-bench/write-compressor