LLMs on Minko Gechev's blog

skillgrade

Sat, 14 Mar 2026 00:00:00 +0000

skillgrade

A few weeks ago I wrote about Skill Eval, a framework for testing AI agent skills. The idea resonated — skills are becoming a critical part of how teams work with agents, and without a way to measure whether they work, you’re guessing.

The problem was that Skill Eval required too much setup. You had to clone a repo, understand a specific directory structure, write TypeScript config, and wire everything together before you could run your first eval. The barrier to entry was high for something that should be simple.

Unit Tests for AI Agent Skills

Thu, 26 Feb 2026 00:00:00 +0000

⭐ Find Skill Eval on GitHub

Unit Tests for AI Agent Skills

I’ve been working with AI coding agents daily - Antigravity, Gemini CLI, Claude Code, and others. One pattern I keep seeing is teams building skills for these agents: procedural instructions that teach the model how to use internal tools, follow specific workflows, or comply with team conventions.

The problem? There’s no way to know if they actually work. You write a text file, hand it to an agent, and hope for the best. When you tweak the instructions, you have no signal telling you whether that change made things better or worse. You’re flying blind.

You Should Care About AI

Fri, 07 Nov 2025 00:00:00 +0000

In conversations with developers lately I’ve been noticing a lot of discomfort when bringing AI. Talking to people and doing self-reflection I think there are a couple of reasons for this:

Overinflation. AI is certainly over hyped by some. People have business interests in raising money or increasing the valuation of their companies and they tend to exaggerated the capabilities of agents and language models. This makes it feel like we’re in yet another hype cycle that will pass, so why waste energy?
Fear. Developers see how AI automates some of the tasks they do. Combining this with the overinflation often leads to fear that software engineers will become obsolete. We already see how new grads are struggling to find jobs.

Overinflation

I’ve been using AI daily in my work over the past couple of years. I’ve been using it for coding, scaffolding documents, understanding stack traces, and so much more. It’s certainly not perfect and it is certainly over hyped by some. At the same time, there are thousands of use cases that are real and can improve your productivity.

Generative Development

Mon, 20 Oct 2025 00:00:00 +0000

Historically in my blog I’ve been posting 10-20 page deep-dive explorations in type theory, deep neural networks, predictive prefetching, etc. Recently, I’ve been thinking of taking a new approach with short snippets based on my current thinking about a particular topic.

Today, I’ll share high-level takes on developer workflows and how they can impact the tools we use.

LLMs enable velocity

We can certainly speed up our developer velocity using GenAI. Lately, I’ve been consistently using Gemini CLI and Cursor. Currently, I feel resistance when I have to write a piece of trivial logic, like create a login form or a chat interface from scratch. Agentic tools have enabled me to quickly iterate and test hypothesis. In this phase of the development process I rarely need to look at source code, unless something went terribly wrong. My focus in such early stages of exploration is on the output of whatever I’m building. Whether I’m developing an agent, web UI, or a compiler, I’d create multiple prototypes using agentic tools and entirely focus on the output the prototypes without looking at the implementation. Based on the outputs, I decide which prototype I’d like to move forward with.

LLM-first Web Framework

Sat, 19 Apr 2025 00:00:00 +0000

The opinions stated here are my own, not necessarily those of my employer.

Over the past few weeks I’ve been thinking about how we can make a framework easier for AI. In particular I’ve been only focused on building user interface. When we add a backend, database, and communication protocol across the backend and the frontend, we get another set of problems that could be a good fit for another post and exploration.