grafana/o11y-bench

o11y-bench: an open agentic observability benchmark. Measures how well AI agents perform 63 real-world observability tasks across logs, metrics, traces, dashboards, and incident workflows.

harbor run -d grafana/o11y-bench

grafana/o11y-bench

o11y-bench is an open agentic observability benchmark from Grafana.

It measures how well AI agents perform 63 real-world observability tasks across logs, metrics, traces, dashboards, and incident workflows.

Source, documentation, and usage instructions are available in the public GitHub repository:

https://github.com/grafana/o11y-bench