Skip to content

Eval runner

The server-side eval runner lets users dispatch evaluations from the dashboard without running a local daemon. The runner uses LLM credentials the admin configures here; users don’t need their own. This page is admin-only.

If no server runner is configured, dashboard evaluations fall back to the daemon runner, which uses the user’s own local Claude/Codex install. Either path produces the same evaluation output.

/admin/settings → Eval runner section.

The runner supports two LLM vendors, configured independently:

VendorBlockUse when
AnthropicEvals:ServerRunner:AnthropicYou want Claude judges (sonnet or opus). Recommended default.
OpenAIEvals:ServerRunner:OpenAIYou want GPT judges, or your team standardises on OpenAI for billing/governance reasons.

Configure either or both. Each block has its own credentials, so an admin can keep them on separate billing accounts.

Per vendor:

KeyRequiredDescription
ApiKeyyesThe vendor API key. Stored encrypted at rest. The block is treated as enabled only when an API key is set.
ModelyesThe judge model. For Anthropic, e.g. claude-sonnet-4-6 or claude-opus-4-7. For OpenAI, e.g. gpt-4o.

And one runner-wide setting:

KeyDefaultDescription
RunTimeoutSeconds300Per-evaluation timeout. Long sessions with --threshold raised may need a higher value.

In the dashboard’s eval dialog:

  • If exactly one vendor is configured, evals dispatch to it implicitly.
  • If both are configured, the dialog shows a vendor selector and the user picks. The EvalRunRequest.Vendor field is required in this case.
  • If neither is configured, the Run evaluation button only routes to the daemon runner; if the user has no daemon, the button is hidden.

There are two ways an evaluation can run, and either is fine:

  • Daemon runner — the user’s daemon connects to the tenant over SignalR and runs the judges using whichever Claude/Codex install the daemon has. No tenant-side credentials. Free if the user has Claude Pro or a Codex subscription.
  • Server runner — the tenant calls Anthropic/OpenAI directly using the credentials configured here. Centralised billing, no daemon needed, works for users without a local Claude/Codex install (read-only reviewers, mobile, etc.).

Mixing both is common: most evals run on daemons; the server runner exists for reviewers and for teams who want the consistency of a single billable account.