Drop the Right Requests First: Priority-Aware Load Shedding Under Overload
Static RPS caps shed the wrong traffic. Concurrency is what saturates a service, not request rate. From my notes after reading the InfoQ piece on overload protection, Uber's January writeup on Cinnamon, and Netflix's QCon SF talk on service-level prioritized load shedding, here is why latency is the right control signal — and how a small priority taxonomy plus an adaptive concurrency limit keep the cheapest traffic shedding first.
