#AI
3 posts filed under this tag.
Preregistering Experiment #1: How Fragile Are Production JSON Extraction Prompts?
The first concrete experiment in the precision research line — committed in public before the data is collected. Five perturbation classes, fifteen variants, four metrics, four falsifiable hypotheses, and a runnable companion repo with tests.
The Deterministic Backbone: Why Production AI Systems Are Moving Away From Fully Autonomous Agents
Fully autonomous agents are hard to bound, hard to test, and expensive to operate. A deterministic backbone with narrow agent steps gives you the control flow back while keeping the intelligence where it matters. Here is how to design, test, and migrate toward it.
Memory Evaluation: Measuring How AI Memory Decays Over a Project's Lifetime
Most AI memory benchmarks grade on recall and stop there. That hides the real failure mode: stale facts quietly poisoning the context window. Here is a lifecycle-based evaluation framework that tests recall, revision, and controlled forgetting across the change points every long-lived project goes through.