The Circuit Breaker Pattern is a resilience design pattern used in distributed systems to prevent cascading failures by automatically stopping requests to a failing service and allowing it time to recover.
The Circuit Breaker Pattern acts as a proxy between a caller and a remote service, monitoring the success and failure rates of requests. When failures exceed a defined threshold, the circuit 'opens' and subsequent calls are immediately rejected without hitting the downstream service. It was popularized by Michael Nygard in 'Release It!' and is now a cornerstone of fault-tolerant microservices architecture.
In distributed systems, a slow or unavailable dependency can exhaust thread pools, consume resources, and cause failures to cascade across the entire application. Without a circuit breaker, callers keep retrying a dead service, amplifying the problem. The pattern protects system stability by failing fast and shedding load before damage spreads.
A circuit breaker operates in three distinct states: Closed (normal operation, requests flow through and failures are counted), Open (the circuit has tripped, all requests fail immediately without calling the service), and Half-Open (a limited number of trial requests are allowed through to test whether the service has recovered). If trial requests succeed, the circuit resets to Closed; if they fail, it returns to Open.
The breaker tracks a rolling window of recent call outcomes — either a count-based or time-based window. When the failure rate or slow-call rate breaches a configured threshold (e.g., 50% failures in the last 10 seconds), it transitions to Open. A configurable wait timeout determines how long the circuit stays Open before entering Half-Open to probe recovery. Libraries like Resilience4j, Polly, and Hystrix implement this logic out of the box.
Opening the circuit only prevents further damage — callers still need a meaningful response. Always pair the circuit breaker with a fallback: return a cached result, a default value, or a graceful error message to the end user. Without a fallback, an open circuit simply trades a slow failure for a fast one, which may be equally harmful from a user-experience perspective.
Tune thresholds carefully — overly sensitive settings cause the circuit to trip on transient blips (false positives), while loose settings let real failures linger too long. Expose circuit state via health endpoints and metrics dashboards so on-call engineers can observe trips in real time. Combine the pattern with retries using exponential backoff and jitter, but apply retries inside the circuit breaker so they count toward the failure budget rather than bypassing it.
© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app