Datasets
| Dataset | Tasks |
|---|---|
| Dataset | Tasks |
|---|---|
Evaluation dataset for agent-search task.
harbor run -d ivanleo/agent-searchThis repository contains the benchmark code, dataset, and results comparing a raw filesystem search setup (Grep) against a database-centric setup (SQLite FTS) across the Google Gemini API & Managed Agents documentation (~300k tokens of text over 94 documents).
We evaluated the performance of an agent (built with the Antigravity SDK) answering 10 complex, multi-hop retrieval questions under two architectures:
view_file./docs/docs.db) containing two tables:
docs: Stores the full page text of each document.doc_search: A virtual FTS5 table containing chunked headings and content snippets.run_command.| Metric | Grep Baseline Run | SQLite FTS Run (Resumed) |
|---|---|---|
| Mean Reward | 0.800 (8/10 passed) | 0.700 (7/10 passed) |
| Total Input Tokens | 3.67M | 5.86M |
| Total Cost (USD) | $2.45 | $3.48 |
| Harbor Hub Job Link | Job 18c065f1 (Grep) | Job bc96451b (FTS) |
| Harbor Hub Dataset | ivanleo/agent-search | ivanleo/agent-search |