Kapacitor/Evals

Stop teaching agents the same repo rules twice.

Kapacitor scores sessions against your rubric and turns durable lessons into repo guidance loaded before the next agent starts.

Without evals

The same mistake comes back next week.

+ jest.mock("./db", () => ({ ... }))   // session 87
+ jest.mock("./db", () => ({ ... }))   // session 134

Your team never mocks the database in this repo. Every new session has to relearn that rule.

Agent

I’ll mock the db module so the checkout tests run faster.

Reviewer

Please don’t mock the database in this repo. Use the integration test harness.

Without evals, lessons stay trapped in individual sessions.

With evals

The useful lesson becomes repo guidance.

CorrectnessDid the fix address the real failure?

Test fidelityDid tests match how this repo works?

Surface areaDid the agent change only what it needed?

Time on taskDid it repeat known dead ends?

The rule moves from a reviewer comment into the repo’s agent guidance. The next agent reads it before writing code.

Agent · next session

This repo’s guidance says not to mock ./db in checkout tests. I’ll use the integration harness instead.

Scored · promoted · loaded next run

With evals, the floor rises without asking every reviewer to repeat themselves.

How it works

Evaluations run per session, score against your questions, and emit judge facts. Repeated high-signal facts can be promoted into curatable guidelines that ship at SessionStart. Read the mechanics in Evaluations and Embeddings and guidelines.

Improve your team's agents, run over run.

Kapacitor scores sessions and promotes what worked. Request private preview to score your team's first 50 sessions.

Rather start a conversation? Talk to the team — we’re building with teams that already use coding agents.

Built by the team behind KurrentDB — event streams in production are what we do. Coding agents just produce a new kind.