#LLM

Every post I've written about LLM — 3 articles, across Engineering, AI. Most recent: “Turning LLM Context Engineering Into an Evaluation Loop with DSPy.”

Engineering01

Turning LLM Context Engineering Into an Evaluation Loop with DSPy

Notes from two weekends of digging into DSPy. I stopped treating prompts as the source of truth and started treating them as compiled output from a typed signature, a metric, and an optimizer. Here is the smallest end-to-end program I kept, how MIPROv2 actually searches, and where the approach breaks down in practice.

May 3

AI02

The Deterministic Backbone: Why Production AI Systems Are Moving Away From Fully Autonomous Agents

Fully autonomous agents are hard to bound, hard to test, and expensive to operate. A deterministic backbone with narrow agent steps gives you the control flow back while keeping the intelligence where it matters. Here is how to design, test, and migrate toward it.

Apr 19

AI03

Memory Evaluation: Measuring How AI Memory Decays Over a Project's Lifetime

Most AI memory benchmarks grade on recall and stop there. That hides the real failure mode: stale facts quietly poisoning the context window. Here is a lifecycle-based evaluation framework that tests recall, revision, and controlled forgetting across the change points every long-lived project goes through.

Apr 17

Posts tagged #LLM

Turning LLM Context Engineering Into an Evaluation Loop with DSPy

The Deterministic Backbone: Why Production AI Systems Are Moving Away From Fully Autonomous Agents

Memory Evaluation: Measuring How AI Memory Decays Over a Project's Lifetime