NovitaAI/tb21-file-recovery

Terminal-Bench 2.1 subset for file operations, data manipulation, document processing, log analysis, and recovery tasks, curated for Novita Sandbox Hackathon.

Published 6/4/2026 by Alex

New Job

harbor run -d NovitaAI/tb21-file-recovery

TB2.1 File & Recovery Track

File operations, data manipulation, document processing, log analysis, extraction, and recovery tasks from Terminal-Bench 2.1.

This public Harbor dataset is curated by NovitaAI for a Novita Sandbox hackathon track. It is a category-based subset of Terminal-Bench 2.1. Agents and models are not fixed; the only required runtime environment for the hackathon is Novita Sandbox.

Dataset

Harbor dataset: NovitaAI/tb21-file-recovery
Track size: 9 tasks
Source benchmark: Terminal-Bench 2.1
Included source categories: file-operations, data-processing
Required hackathon sandbox: -e novita

Quick Start

Run the full track once:

harbor run \
  -d NovitaAI/tb21-file-recovery \
  -a <agent> \
  -m <model> \
  -e novita \
  -k 1 \
  -n 1 \
  -y

Run a small smoke test from the track:

harbor run \
  -d NovitaAI/tb21-file-recovery \
  -a <agent> \
  -m <model> \
  -e novita \
  -l 1 \
  -k 1 \
  -n 1 \
  -y

Upload a public result for the hackathon leaderboard:

harbor upload jobs/<job_name> --public

Submit the resulting Harbor Hub job link to the hackathon leaderboard form.

Valid Submission Rules

A valid track submission should satisfy:

The Harbor job uses this dataset: NovitaAI/tb21-file-recovery.
The job config has environment.type = "novita".
The job does not use extra hints or task-specific extra instructions.
The submitted Harbor Hub job is public.
Agent and model are free choice unless a specific event round says otherwise.

Suggested ranking fields:

Primary: mean reward
Tie-breaker 1: fewer exceptions/errors
Tie-breaker 2: lower average duration
Tie-breaker 3: lower output tokens or total tokens, if the event wants an efficiency prize

Tasks

terminal-bench/db-wal-recovery
terminal-bench/extract-elf
terminal-bench/extract-moves-from-video
terminal-bench/financial-document-processor
terminal-bench/gcode-to-text
terminal-bench/large-scale-text-editing
terminal-bench/log-summary-date-ranges
terminal-bench/multi-source-data-merger
terminal-bench/regex-log

⌘K

Task
terminal-bench/log-summary-date-ranges
terminal-bench/regex-log
terminal-bench/extract-elf
terminal-bench/gcode-to-text
terminal-bench/financial-document-processor
terminal-bench/db-wal-recovery
terminal-bench/large-scale-text-editing
terminal-bench/extract-moves-from-video
terminal-bench/multi-source-data-merger

Displaying 9 of 9 tasks

NovitaAI/tb21-file-recovery

Terminal-Bench 2.1 subset for file operations, data manipulation, document processing, log analysis, and recovery tasks, curated for Novita Sandbox Hackathon.

Published 6/4/2026 by Alex

New Job

harbor run -d NovitaAI/tb21-file-recovery

TB2.1 File & Recovery Track

File operations, data manipulation, document processing, log analysis, extraction, and recovery tasks from Terminal-Bench 2.1.

Dataset

Harbor dataset: NovitaAI/tb21-file-recovery
Track size: 9 tasks
Source benchmark: Terminal-Bench 2.1
Included source categories: file-operations, data-processing
Required hackathon sandbox: -e novita

Quick Start

Run the full track once:

harbor run \
  -d NovitaAI/tb21-file-recovery \
  -a <agent> \
  -m <model> \
  -e novita \
  -k 1 \
  -n 1 \
  -y

Run a small smoke test from the track:

harbor run \
  -d NovitaAI/tb21-file-recovery \
  -a <agent> \
  -m <model> \
  -e novita \
  -l 1 \
  -k 1 \
  -n 1 \
  -y

Upload a public result for the hackathon leaderboard:

harbor upload jobs/<job_name> --public

Submit the resulting Harbor Hub job link to the hackathon leaderboard form.

Valid Submission Rules

A valid track submission should satisfy:

The Harbor job uses this dataset: NovitaAI/tb21-file-recovery.
The job config has environment.type = "novita".
The job does not use extra hints or task-specific extra instructions.
The submitted Harbor Hub job is public.
Agent and model are free choice unless a specific event round says otherwise.

Suggested ranking fields:

Primary: mean reward
Tie-breaker 1: fewer exceptions/errors
Tie-breaker 2: lower average duration
Tie-breaker 3: lower output tokens or total tokens, if the event wants an efficiency prize

Tasks

terminal-bench/db-wal-recovery
terminal-bench/extract-elf
terminal-bench/extract-moves-from-video
terminal-bench/financial-document-processor
terminal-bench/gcode-to-text
terminal-bench/large-scale-text-editing
terminal-bench/log-summary-date-ranges
terminal-bench/multi-source-data-merger
terminal-bench/regex-log

⌘K

Task
terminal-bench/log-summary-date-ranges
terminal-bench/regex-log
terminal-bench/extract-elf
terminal-bench/gcode-to-text
terminal-bench/financial-document-processor
terminal-bench/db-wal-recovery
terminal-bench/large-scale-text-editing
terminal-bench/extract-moves-from-video
terminal-bench/multi-source-data-merger

Displaying 9 of 9 tasks

TB2.1 File & Recovery Track

File operations, data manipulation, document processing, log analysis, extraction, and recovery tasks from Terminal-Bench 2.1.

Dataset

Harbor dataset: NovitaAI/tb21-file-recovery
Track size: 9 tasks
Source benchmark: Terminal-Bench 2.1
Included source categories: file-operations, data-processing
Required hackathon sandbox: -e novita

Quick Start

Run the full track once:

harbor run \
  -d NovitaAI/tb21-file-recovery \
  -a <agent> \
  -m <model> \
  -e novita \
  -k 1 \
  -n 1 \
  -y

Run a small smoke test from the track:

harbor run \
  -d NovitaAI/tb21-file-recovery \
  -a <agent> \
  -m <model> \
  -e novita \
  -l 1 \
  -k 1 \
  -n 1 \
  -y

Upload a public result for the hackathon leaderboard:

harbor upload jobs/<job_name> --public

Submit the resulting Harbor Hub job link to the hackathon leaderboard form.

Valid Submission Rules

A valid track submission should satisfy:

The Harbor job uses this dataset: NovitaAI/tb21-file-recovery.
The job config has environment.type = "novita".
The job does not use extra hints or task-specific extra instructions.
The submitted Harbor Hub job is public.
Agent and model are free choice unless a specific event round says otherwise.

Suggested ranking fields:

Primary: mean reward
Tie-breaker 1: fewer exceptions/errors
Tie-breaker 2: lower average duration
Tie-breaker 3: lower output tokens or total tokens, if the event wants an efficiency prize

Tasks

terminal-bench/db-wal-recovery
terminal-bench/extract-elf
terminal-bench/extract-moves-from-video
terminal-bench/financial-document-processor
terminal-bench/gcode-to-text
terminal-bench/large-scale-text-editing
terminal-bench/log-summary-date-ranges
terminal-bench/multi-source-data-merger
terminal-bench/regex-log