Datasets
| Dataset | Tasks |
|---|---|
| Dataset | Tasks |
|---|---|
Terminal-Bench 2.1 subset for systems, operations, security, cryptanalysis, and assistant-style operational tasks, curated for Novita Sandbox Hackathon.
harbor run -d NovitaAI/tb21-systems-securitySystems administration, web/server operations, security, cryptanalysis, vulnerability repair, and operational assistant tasks from Terminal-Bench 2.1.
This public Harbor dataset is curated by NovitaAI for a Novita Sandbox hackathon track. It is a category-based subset of Terminal-Bench 2.1. Agents and models are not fixed; the only required runtime environment for the hackathon is Novita Sandbox.
NovitaAI/tb21-systems-security-e novitaRun the full track once:
harbor run \
-d NovitaAI/tb21-systems-security \
-a <agent> \
-m <model> \
-e novita \
-k 1 \
-n 1 \
-y
Run a small smoke test from the track:
harbor run \
-d NovitaAI/tb21-systems-security \
-a <agent> \
-m <model> \
-e novita \
-l 1 \
-k 1 \
-n 1 \
-y
Upload a public result for the hackathon leaderboard:
harbor upload jobs/<job_name> --public
Submit the resulting Harbor Hub job link to the hackathon leaderboard form.
A valid track submission should satisfy:
NovitaAI/tb21-systems-security.environment.type = "novita".Suggested ranking fields:
terminal-bench/break-filter-js-from-htmlterminal-bench/compile-compcertterminal-bench/configure-git-webserverterminal-bench/constraints-schedulingterminal-bench/crack-7z-hashterminal-bench/feal-differential-cryptanalysisterminal-bench/feal-linear-cryptanalysisterminal-bench/filter-js-from-htmlterminal-bench/fix-code-vulnerabilityterminal-bench/git-multibranchterminal-bench/install-windows-3.11terminal-bench/largest-eigenvalterminal-bench/mailmanterminal-bench/model-extraction-relu-logitsterminal-bench/nginx-request-loggingterminal-bench/openssl-selfsigned-certterminal-bench/password-recoveryterminal-bench/qemu-alpine-sshterminal-bench/qemu-startupterminal-bench/sanitize-git-repoterminal-bench/sqlite-with-gcovterminal-bench/vulnerable-secret