minnesotanlp/aar

The Amazing Agent Race (AAR): 1400 multi-step scavenger-hunt puzzles for evaluating LLM agents on tool use, web navigation, and arithmetic reasoning. Includes linear (800) and DAG (600) variants across 4 difficulty levels.

harbor run -d minnesotanlp/aar
K

Displaying 0 of 1,400 tasks