#RAG
2 posts filed under this tag.
Turning LLM Context Engineering Into an Evaluation Loop with DSPy
Notes from two weekends of digging into DSPy. I stopped treating prompts as the source of truth and started treating them as compiled output from a typed signature, a metric, and an optimizer. Here is the smallest end-to-end program I kept, how MIPROv2 actually searches, and where the approach breaks down in practice.
Memory Evaluation: Measuring How AI Memory Decays Over a Project's Lifetime
Most AI memory benchmarks grade on recall and stop there. That hides the real failure mode: stale facts quietly poisoning the context window. Here is a lifecycle-based evaluation framework that tests recall, revision, and controlled forgetting across the change points every long-lived project goes through.