minnesotanlp/aar
The Amazing Agent Race (AAR): 1400 multi-step scavenger-hunt puzzles for evaluating LLM agents on tool use, web navigation, and arithmetic reasoning. Includes linear (800) and DAG (600) variants across 4 difficulty levels.
harbor run -d minnesotanlp/aar⌘K
Displaying 0 of 1,400 tasks