Jan 30, 2026

Optimizers and Improvement

optimizationimprovementcompilerpolicy

Agent behavior can improve over time without rewriting product logic. The idea: collect (input, output, outcome) from real runs, then run optimizers that search for better prompts, instructions, or routing. The contract (signature) stays the same; the policy that fulfills it gets better.

Why optimization matters for agents

Data-driven — Improvement is driven by real outcomes (did the test pass? did the user accept?) not by hand-tuned prompts.
Auditable — Every improved policy is versioned. You can roll back, A/B test, or promote based on scorecards.
Reversible — If a new policy is worse, you revert. No “ship and hope.”

Conceptual flow

Capture — Runs produce examples: (input to signature, output from model, outcome in the world). Outcome = ground truth (verification result, user action, etc.).
Label — Examples are labeled (success, failure, partial). Metrics (e.g. verification rate, cost, latency) are computed.
Optimize — An optimizer searches over prompts, instructions, or routing to maximize metrics. It outputs a new policy (e.g. new prompt, new routing table).
Compile and deploy — The new policy is compiled into a manifest. You can run it in shadow mode, A/B test, or promote when scorecards improve.

So: same signatures, better policies. The compiler layer (see Compiler Layer) is what makes this possible; optimizers are the engine that consumes data and produces better policies.

What optimizers need

Typed contracts — Signatures with clear input/output. So the optimizer knows what to change (prompt, routing) and what to measure (outcome).
Outcome labels — Verification results, user feedback, cost, latency. So the optimizer knows what “better” means.
Versioning and rollback — Manifests and scorecards so every change is traceable and reversible.

Go deeper

Compiler layer: Compiler Layer
Signatures and contracts: Signatures and Typed Contracts
Replay and artifacts (training data): Replay and Artifacts