The Saga Pattern is a microservices design pattern for managing long-running, distributed transactions without relying on a single atomic database commit. It breaks a transaction into a sequence of smaller, isolated local transactions coordinated through events or commands, ensuring data consistency across services.
A Saga is a sequence of local transactions where each step updates a single service's database and then triggers the next step via an event or message. If any step fails, the saga executes compensating transactions to undo the work already completed. This approach replaces the traditional two-phase commit (2PC) protocol, which is impractical across independent microservices. The name derives from the academic paper 'Sagas' by Hector Garcia-Molina and Kenneth Salem (1987).
Distributed systems cannot use a single ACID transaction spanning multiple databases or services, because each service owns its own data store. Without a coordination strategy, partial failures leave data inconsistent across services. The Saga Pattern provides a structured, reliable way to achieve eventual consistency in these environments. It is especially critical in domains like e-commerce order processing, financial transfers, and travel booking.
There are two main implementation styles. In choreography, each service publishes domain events and other services react to them — there is no central coordinator, making the flow decentralized and loosely coupled. In orchestration, a dedicated Saga Orchestrator sends commands to each service and listens for replies, giving a single place to track the overall workflow. Choreography suits simpler flows; orchestration is easier to reason about for complex, multi-step transactions.
Because sagas cannot roll back like a database transaction, each forward step must have a corresponding compensating transaction that semantically undoes its effect. For example, if 'reserve inventory' is a forward step, 'release inventory' is its compensating counterpart. Compensating transactions must be idempotent and retryable, since they may be invoked multiple times due to network failures or retries. Designing good compensating transactions upfront is one of the most challenging aspects of the pattern.
Sagas provide eventual consistency, not isolation — intermediate states are visible to other processes, which can cause dirty reads or 'lost updates' if not carefully handled. Use semantic locks or countermeasures (like pivot transactions) to mitigate anomalies. Always make all saga steps and compensating actions idempotent, and use a reliable message broker (e.g., Kafka, RabbitMQ) with at-least-once delivery to avoid lost events. Persist saga state so it can be resumed after a crash.
© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app