skillgrade

Sat, 14 Mar 2026 00:00:00 +0000

skillgrade

A few weeks ago I wrote about Skill Eval, a framework for testing AI agent skills. The idea resonated — skills are becoming a critical part of how teams work with agents, and without a way to measure whether they work, you’re guessing.

The problem was that Skill Eval required too much setup. You had to clone a repo, understand a specific directory structure, write TypeScript config, and wire everything together before you could run your first eval. The barrier to entry was high for something that should be simple.

Unit Tests for AI Agent Skills

Thu, 26 Feb 2026 00:00:00 +0000

⭐ Find Skill Eval on GitHub

Unit Tests for AI Agent Skills

I’ve been working with AI coding agents daily - Antigravity, Gemini CLI, Claude Code, and others. One pattern I keep seeing is teams building skills for these agents: procedural instructions that teach the model how to use internal tools, follow specific workflows, or comply with team conventions.

The problem? There’s no way to know if they actually work. You write a text file, hand it to an agent, and hope for the best. When you tweak the instructions, you have no signal telling you whether that change made things better or worse. You’re flying blind.

Evals on Minko Gechev's blog

skillgrade

skillgrade

Unit Tests for AI Agent Skills

Unit Tests for AI Agent Skills