Show HN: AI-Evals.io – Evaluate this site with the tools it reviews

5 points by alexhans 2 months ago

I've been working on a site [1] to give people control of their LLM workflows through AI evals - automated checks that, once defined, let you move fast without regressions and cut through hype with proof.

That one-liner is aimed at software engineers, but I've spent my career helping cross-functional teams collaborate, and that's really what this is about. AI agents make powerful workflows very plausible, but only if teams can grow them incrementally without losing control - no vendor lock-in, no discipline silos, no blind trust in outputs.

The site tries to meet different audiences where they are, with mostly practice over theory: tool comparisons, minimal approaches, and freedom to work at whatever level of complexity serves you - whether that's Claude Code with Agent Skills, local models, or custom Python agents.

As a fun "eat your own dog food" experiment, I use the site itself as the reproducible cookbook ("eval-ception") [2]. It's the quickest way to feel what different eval tools are actually like in practice.

I welcome feedback, contributions, or stories. More on the project and what's coming [3]. It's a rewarding area once you realize you can keep control and move methodically - doesn't matter if it's the smallest model or a swarm.

[1] https://ai-evals.io/

[2] https://ai-evals.io/cookbook/eval-ception.html

[3] https://ai-evals.io/about/