🎯 Introduction
In the era of microservices architecture, managing transactions across multiple services presents significant challenges. Traditional distributed transaction mechanisms like Two-Phase Commit (2PC) often lead to tight coupling, reduced availability, and poor performance. The SAGA Pattern emerges as a powerful alternative, providing a way to manage distributed transactions through a sequence of local transactions, each with compensating actions for rollback scenarios.
📚 What is the SAGA Pattern?
🔍 Core Concepts
The SAGA pattern is a design pattern for managing long-running distributed transactions across multiple microservices. Instead of using distributed locks or two-phase commits, SAGA breaks down a complex business transaction into a series of smaller, local transactions that can be independently committed or rolled back.
graph TD
A[Start SAGA Transaction] --> B[Local Transaction 1]
B --> C{Success?}
C -->|Yes| D[Local Transaction 2]
C -->|No| E[Compensate T1]
D --> F{Success?}
F -->|Yes| G[Local Transaction 3]
F -->|No| H[Compensate T2]
H --> I[Compensate T1]
G --> J{Success?}
J -->|Yes| K[SAGA Complete]
J -->|No| L[Compensate T3]
L --> M[Compensate T2]
M --> N[Compensate T1]
style A fill:#ff6b6b
style K fill:#4ecdc4
style E fill:#feca57
style I fill:#feca57
style N fill:#feca57
🏗️ SAGA vs Traditional Approaches
Key Differences:
Aspect | Two-Phase Commit | SAGA Pattern |
---|---|---|
Consistency | Strong (ACID) | Eventually Consistent |
Availability | Reduced during locks | High availability |
Performance | Blocking operations | Non-blocking |
Complexity | Protocol complexity | Business logic complexity |
Fault Tolerance | Coordinator failure risks | Resilient to service failures |
Scalability | Limited by lock duration | Highly scalable |
📋 SAGA Properties
SAGA transactions must satisfy:
- Atomicity: Either all transactions complete, or all compensations are executed
- Consistency: System maintains consistent state through compensations
- Isolation: Individual transactions may see intermediate states
- Durability: Each local transaction and compensation is durable
🎭 SAGA Implementation Patterns
🎼 1. Orchestration Pattern
In the orchestration approach, a central SAGA Orchestrator coordinates the entire transaction flow, explicitly calling each service and managing the sequence of operations.
📋 Orchestration Architecture
graph TD
A[SAGA Orchestrator] --> B[Order Service]
A --> C[Payment Service]
A --> D[Inventory Service]
A --> E[Shipping Service]
B --> F[Create Order]
C --> G[Process Payment]
D --> H[Reserve Inventory]
E --> I[Schedule Shipment]
F --> A
G --> A
H --> A
I --> A
A --> J[Success Path]
A --> K[Compensation Path]
K --> L[Cancel Shipment]
K --> M[Release Inventory]
K --> N[Refund Payment]
K --> O[Cancel Order]
style A fill:#ff6b6b
style J fill:#4ecdc4
style K fill:#feca57
🛠️ Java Spring Boot Implementation
SAGA Orchestrator Service:
1@Service
2@Transactional
3public class OrderSagaOrchestrator {
4
5 private final OrderService orderService;
6 private final PaymentService paymentService;
7 private final InventoryService inventoryService;
8 private final ShippingService shippingService;
9 private final SagaTransactionRepository sagaRepository;
10 private final ApplicationEventPublisher eventPublisher;
11
12 public SagaExecutionResult executeOrderSaga(OrderSagaRequest request) {
13 String sagaId = UUID.randomUUID().toString();
14 SagaTransaction saga = createSagaTransaction(sagaId, request);
15
16 try {
17 return executeOrderFlow(saga);
18 } catch (Exception e) {
19 return executeCompensationFlow(saga, e);
20 }
21 }
22
23 private SagaExecutionResult executeOrderFlow(SagaTransaction saga) {
24 // Step 1: Create Order
25 SagaStep orderStep = executeStep(saga, "CREATE_ORDER", () -> {
26 OrderResponse order = orderService.createOrder(saga.getRequest().toOrderRequest());
27 saga.setOrderId(order.getOrderId());
28 return order;
29 });
30
31 if (!orderStep.isSuccess()) {
32 return compensateFromStep(saga, orderStep);
33 }
34
35 // Step 2: Reserve Inventory
36 SagaStep inventoryStep = executeStep(saga, "RESERVE_INVENTORY", () -> {
37 return inventoryService.reserveItems(
38 saga.getRequest().getItems(),
39 saga.getOrderId()
40 );
41 });
42
43 if (!inventoryStep.isSuccess()) {
44 return compensateFromStep(saga, inventoryStep);
45 }
46
47 // Step 3: Process Payment
48 SagaStep paymentStep = executeStep(saga, "PROCESS_PAYMENT", () -> {
49 return paymentService.processPayment(
50 saga.getRequest().getPaymentDetails(),
51 saga.getOrderId(),
52 saga.getRequest().getTotalAmount()
53 );
54 });
55
56 if (!paymentStep.isSuccess()) {
57 return compensateFromStep(saga, paymentStep);
58 }
59
60 // Step 4: Schedule Shipping
61 SagaStep shippingStep = executeStep(saga, "SCHEDULE_SHIPPING", () -> {
62 return shippingService.scheduleShipping(
63 saga.getOrderId(),
64 saga.getRequest().getShippingAddress()
65 );
66 });
67
68 if (!shippingStep.isSuccess()) {
69 return compensateFromStep(saga, shippingStep);
70 }
71
72 // All steps successful
73 saga.setStatus(SagaStatus.COMPLETED);
74 sagaRepository.save(saga);
75
76 eventPublisher.publishEvent(new OrderSagaCompletedEvent(saga.getSagaId()));
77
78 return SagaExecutionResult.success(saga);
79 }
80
81 private SagaStep executeStep(SagaTransaction saga, String stepName, Supplier<Object> operation) {
82 SagaStep step = SagaStep.builder()
83 .stepName(stepName)
84 .sagaId(saga.getSagaId())
85 .status(SagaStepStatus.IN_PROGRESS)
86 .startTime(LocalDateTime.now())
87 .build();
88
89 try {
90 Object result = operation.get();
91 step.setStatus(SagaStepStatus.COMPLETED);
92 step.setResult(result);
93 step.setEndTime(LocalDateTime.now());
94
95 saga.addStep(step);
96 sagaRepository.save(saga);
97
98 return step;
99
100 } catch (Exception e) {
101 step.setStatus(SagaStepStatus.FAILED);
102 step.setErrorMessage(e.getMessage());
103 step.setEndTime(LocalDateTime.now());
104
105 saga.addStep(step);
106 sagaRepository.save(saga);
107
108 throw new SagaStepExecutionException(stepName, e);
109 }
110 }
111
112 private SagaExecutionResult compensateFromStep(SagaTransaction saga, SagaStep failedStep) {
113 saga.setStatus(SagaStatus.COMPENSATING);
114 sagaRepository.save(saga);
115
116 // Execute compensations in reverse order
117 List<SagaStep> completedSteps = saga.getSteps().stream()
118 .filter(step -> step.getStatus() == SagaStepStatus.COMPLETED)
119 .sorted(Comparator.comparing(SagaStep::getStartTime).reversed())
120 .toList();
121
122 for (SagaStep step : completedSteps) {
123 try {
124 executeCompensation(saga, step);
125 } catch (Exception e) {
126 // Log compensation failure but continue with other compensations
127 log.error("Compensation failed for step: {}, saga: {}",
128 step.getStepName(), saga.getSagaId(), e);
129 }
130 }
131
132 saga.setStatus(SagaStatus.COMPENSATED);
133 sagaRepository.save(saga);
134
135 eventPublisher.publishEvent(new OrderSagaCompensatedEvent(saga.getSagaId(), failedStep.getStepName()));
136
137 return SagaExecutionResult.compensated(saga, failedStep.getErrorMessage());
138 }
139
140 private void executeCompensation(SagaTransaction saga, SagaStep step) {
141 switch (step.getStepName()) {
142 case "CREATE_ORDER":
143 orderService.cancelOrder(saga.getOrderId());
144 break;
145 case "RESERVE_INVENTORY":
146 inventoryService.releaseReservation(saga.getOrderId());
147 break;
148 case "PROCESS_PAYMENT":
149 paymentService.refundPayment(saga.getOrderId());
150 break;
151 case "SCHEDULE_SHIPPING":
152 shippingService.cancelShipment(saga.getOrderId());
153 break;
154 default:
155 log.warn("No compensation defined for step: {}", step.getStepName());
156 }
157 }
158}
SAGA Data Models:
1@Entity
2@Table(name = "saga_transactions")
3public class SagaTransaction {
4
5 @Id
6 private String sagaId;
7
8 @Enumerated(EnumType.STRING)
9 private SagaStatus status;
10
11 @Column(name = "order_id")
12 private String orderId;
13
14 @Column(columnDefinition = "JSON")
15 @Convert(converter = OrderSagaRequestConverter.class)
16 private OrderSagaRequest request;
17
18 @OneToMany(mappedBy = "sagaTransaction", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
19 private List<SagaStep> steps = new ArrayList<>();
20
21 @Column(name = "created_at")
22 private LocalDateTime createdAt;
23
24 @Column(name = "updated_at")
25 private LocalDateTime updatedAt;
26
27 public void addStep(SagaStep step) {
28 step.setSagaTransaction(this);
29 this.steps.add(step);
30 this.updatedAt = LocalDateTime.now();
31 }
32
33 // Constructors, getters, setters
34}
35
36@Entity
37@Table(name = "saga_steps")
38public class SagaStep {
39
40 @Id
41 @GeneratedValue(strategy = GenerationType.IDENTITY)
42 private Long id;
43
44 @Column(name = "step_name")
45 private String stepName;
46
47 @Enumerated(EnumType.STRING)
48 private SagaStepStatus status;
49
50 @Column(name = "start_time")
51 private LocalDateTime startTime;
52
53 @Column(name = "end_time")
54 private LocalDateTime endTime;
55
56 @Column(columnDefinition = "TEXT")
57 private String errorMessage;
58
59 @Column(columnDefinition = "JSON")
60 private String result;
61
62 @ManyToOne(fetch = FetchType.LAZY)
63 @JoinColumn(name = "saga_id", referencedColumnName = "sagaId")
64 private SagaTransaction sagaTransaction;
65
66 // Constructors, getters, setters
67}
68
69public enum SagaStatus {
70 STARTED, IN_PROGRESS, COMPLETED, COMPENSATING, COMPENSATED, FAILED
71}
72
73public enum SagaStepStatus {
74 PENDING, IN_PROGRESS, COMPLETED, FAILED, COMPENSATED
75}
REST Controller:
1@RestController
2@RequestMapping("/api/orders")
3public class OrderSagaController {
4
5 private final OrderSagaOrchestrator orchestrator;
6
7 @PostMapping("/saga")
8 public ResponseEntity<SagaResponse> createOrderWithSaga(@RequestBody OrderSagaRequest request) {
9 try {
10 SagaExecutionResult result = orchestrator.executeOrderSaga(request);
11
12 if (result.isSuccess()) {
13 return ResponseEntity.ok(SagaResponse.success(result.getSagaId()));
14 } else {
15 return ResponseEntity.status(HttpStatus.BAD_REQUEST)
16 .body(SagaResponse.failed(result.getSagaId(), result.getErrorMessage()));
17 }
18
19 } catch (Exception e) {
20 return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
21 .body(SagaResponse.error("SAGA execution failed: " + e.getMessage()));
22 }
23 }
24
25 @GetMapping("/saga/{sagaId}")
26 public ResponseEntity<SagaStatusResponse> getSagaStatus(@PathVariable String sagaId) {
27 Optional<SagaTransaction> saga = orchestrator.getSagaStatus(sagaId);
28
29 if (saga.isPresent()) {
30 return ResponseEntity.ok(SagaStatusResponse.from(saga.get()));
31 } else {
32 return ResponseEntity.notFound().build();
33 }
34 }
35}
💃 2. Choreography Pattern
In the choreography approach, there’s no central coordinator. Each service publishes and listens to events, and the SAGA progresses through a series of event-driven interactions between services.
📋 Choreography Architecture
graph TD
A[Order Created Event] --> B[Order Service]
A --> C[Inventory Service]
C --> D[Inventory Reserved Event]
D --> E[Payment Service]
E --> F[Payment Processed Event]
F --> G[Shipping Service]
G --> H[Shipment Scheduled Event]
H --> I[Order Complete]
J[Inventory Failed Event] --> K[Order Service]
K --> L[Order Cancelled Event]
M[Payment Failed Event] --> N[Inventory Service]
N --> O[Inventory Released Event]
O --> K
P[Shipping Failed Event] --> Q[Payment Service]
Q --> R[Payment Refunded Event]
R --> N
style I fill:#4ecdc4
style L fill:#feca57
style O fill:#feca57
style R fill:#feca57
🛠️ Java Spring Boot Implementation
Event-Driven Order Service:
1@Service
2@Transactional
3public class ChoreographyOrderService {
4
5 private final OrderRepository orderRepository;
6 private final ApplicationEventPublisher eventPublisher;
7 private final SagaEventRepository sagaEventRepository;
8
9 @EventListener
10 public void handleOrderCreationRequest(OrderCreationRequestEvent event) {
11 try {
12 Order order = createOrder(event.getOrderRequest());
13
14 // Publish order created event
15 OrderCreatedEvent orderCreatedEvent = OrderCreatedEvent.builder()
16 .sagaId(event.getSagaId())
17 .orderId(order.getId())
18 .customerId(order.getCustomerId())
19 .items(order.getItems())
20 .totalAmount(order.getTotalAmount())
21 .timestamp(LocalDateTime.now())
22 .build();
23
24 recordSagaEvent(orderCreatedEvent);
25 eventPublisher.publishEvent(orderCreatedEvent);
26
27 } catch (Exception e) {
28 // Publish order creation failed event
29 OrderCreationFailedEvent failedEvent = OrderCreationFailedEvent.builder()
30 .sagaId(event.getSagaId())
31 .reason(e.getMessage())
32 .timestamp(LocalDateTime.now())
33 .build();
34
35 recordSagaEvent(failedEvent);
36 eventPublisher.publishEvent(failedEvent);
37 }
38 }
39
40 @EventListener
41 public void handleInventoryFailure(InventoryReservationFailedEvent event) {
42 try {
43 // Cancel the order
44 Order order = orderRepository.findBySagaId(event.getSagaId())
45 .orElseThrow(() -> new OrderNotFoundException(event.getSagaId()));
46
47 order.setStatus(OrderStatus.CANCELLED);
48 order.setCancellationReason("Inventory reservation failed: " + event.getReason());
49 orderRepository.save(order);
50
51 // Publish order cancelled event
52 OrderCancelledEvent cancelledEvent = OrderCancelledEvent.builder()
53 .sagaId(event.getSagaId())
54 .orderId(order.getId())
55 .reason("Inventory unavailable")
56 .timestamp(LocalDateTime.now())
57 .build();
58
59 recordSagaEvent(cancelledEvent);
60 eventPublisher.publishEvent(cancelledEvent);
61
62 } catch (Exception e) {
63 log.error("Failed to handle inventory failure for saga: {}", event.getSagaId(), e);
64 }
65 }
66
67 @EventListener
68 public void handlePaymentFailure(PaymentFailedEvent event) {
69 // Similar compensation logic for payment failures
70 handleCompensationScenario(event.getSagaId(), "Payment failed: " + event.getReason());
71 }
72
73 @EventListener
74 public void handleShippingFailure(ShippingFailedEvent event) {
75 // Similar compensation logic for shipping failures
76 handleCompensationScenario(event.getSagaId(), "Shipping failed: " + event.getReason());
77 }
78
79 @EventListener
80 public void handleSagaCompletion(ShipmentScheduledEvent event) {
81 try {
82 Order order = orderRepository.findBySagaId(event.getSagaId())
83 .orElseThrow(() -> new OrderNotFoundException(event.getSagaId()));
84
85 order.setStatus(OrderStatus.CONFIRMED);
86 order.setShipmentId(event.getShipmentId());
87 orderRepository.save(order);
88
89 // Publish final completion event
90 OrderSagaCompletedEvent completedEvent = OrderSagaCompletedEvent.builder()
91 .sagaId(event.getSagaId())
92 .orderId(order.getId())
93 .timestamp(LocalDateTime.now())
94 .build();
95
96 recordSagaEvent(completedEvent);
97 eventPublisher.publishEvent(completedEvent);
98
99 } catch (Exception e) {
100 log.error("Failed to complete saga: {}", event.getSagaId(), e);
101 }
102 }
103
104 private void recordSagaEvent(SagaEvent event) {
105 SagaEventRecord record = SagaEventRecord.builder()
106 .sagaId(event.getSagaId())
107 .eventType(event.getClass().getSimpleName())
108 .eventData(JsonUtils.toJson(event))
109 .timestamp(event.getTimestamp())
110 .build();
111
112 sagaEventRepository.save(record);
113 }
114}
Inventory Service with Event Handling:
1@Service
2@Transactional
3public class ChoreographyInventoryService {
4
5 private final InventoryRepository inventoryRepository;
6 private final ReservationRepository reservationRepository;
7 private final ApplicationEventPublisher eventPublisher;
8
9 @EventListener
10 @Async
11 public void handleOrderCreated(OrderCreatedEvent event) {
12 try {
13 // Check inventory availability
14 boolean allItemsAvailable = event.getItems().stream()
15 .allMatch(item -> checkInventoryAvailability(item.getProductId(), item.getQuantity()));
16
17 if (!allItemsAvailable) {
18 publishInventoryFailure(event.getSagaId(), "Insufficient inventory");
19 return;
20 }
21
22 // Reserve inventory
23 List<InventoryReservation> reservations = event.getItems().stream()
24 .map(item -> reserveInventory(item, event.getSagaId(), event.getOrderId()))
25 .toList();
26
27 // Publish success event
28 InventoryReservedEvent reservedEvent = InventoryReservedEvent.builder()
29 .sagaId(event.getSagaId())
30 .orderId(event.getOrderId())
31 .reservations(reservations.stream()
32 .map(this::toReservationInfo)
33 .toList())
34 .timestamp(LocalDateTime.now())
35 .build();
36
37 eventPublisher.publishEvent(reservedEvent);
38
39 } catch (Exception e) {
40 publishInventoryFailure(event.getSagaId(), e.getMessage());
41 }
42 }
43
44 @EventListener
45 public void handlePaymentFailure(PaymentFailedEvent event) {
46 try {
47 // Release inventory reservations
48 List<InventoryReservation> reservations = reservationRepository
49 .findBySagaId(event.getSagaId());
50
51 for (InventoryReservation reservation : reservations) {
52 releaseReservation(reservation);
53 }
54
55 // Publish inventory released event
56 InventoryReleasedEvent releasedEvent = InventoryReleasedEvent.builder()
57 .sagaId(event.getSagaId())
58 .reason("Payment failed")
59 .timestamp(LocalDateTime.now())
60 .build();
61
62 eventPublisher.publishEvent(releasedEvent);
63
64 } catch (Exception e) {
65 log.error("Failed to release inventory for saga: {}", event.getSagaId(), e);
66 }
67 }
68
69 private void publishInventoryFailure(String sagaId, String reason) {
70 InventoryReservationFailedEvent failedEvent = InventoryReservationFailedEvent.builder()
71 .sagaId(sagaId)
72 .reason(reason)
73 .timestamp(LocalDateTime.now())
74 .build();
75
76 eventPublisher.publishEvent(failedEvent);
77 }
78
79 private InventoryReservation reserveInventory(OrderItem item, String sagaId, String orderId) {
80 // Update inventory count
81 Inventory inventory = inventoryRepository.findByProductId(item.getProductId())
82 .orElseThrow(() -> new ProductNotFoundException(item.getProductId()));
83
84 if (inventory.getAvailableQuantity() < item.getQuantity()) {
85 throw new InsufficientInventoryException(item.getProductId(), item.getQuantity());
86 }
87
88 inventory.setAvailableQuantity(inventory.getAvailableQuantity() - item.getQuantity());
89 inventory.setReservedQuantity(inventory.getReservedQuantity() + item.getQuantity());
90 inventoryRepository.save(inventory);
91
92 // Create reservation record
93 InventoryReservation reservation = InventoryReservation.builder()
94 .sagaId(sagaId)
95 .orderId(orderId)
96 .productId(item.getProductId())
97 .quantity(item.getQuantity())
98 .status(ReservationStatus.RESERVED)
99 .createdAt(LocalDateTime.now())
100 .build();
101
102 return reservationRepository.save(reservation);
103 }
104}
Event Configuration with Spring Boot:
1@Configuration
2@EnableAsync
3public class SagaEventConfiguration {
4
5 @Bean
6 public ApplicationEventMulticaster applicationEventMulticaster() {
7 SimpleApplicationEventMulticaster eventMulticaster = new SimpleApplicationEventMulticaster();
8 eventMulticaster.setTaskExecutor(sagaEventExecutor());
9 eventMulticaster.setErrorHandler(new SagaEventErrorHandler());
10 return eventMulticaster;
11 }
12
13 @Bean
14 public TaskExecutor sagaEventExecutor() {
15 ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
16 executor.setCorePoolSize(10);
17 executor.setMaxPoolSize(50);
18 executor.setQueueCapacity(200);
19 executor.setThreadNamePrefix("saga-event-");
20 executor.setWaitForTasksToCompleteOnShutdown(true);
21 executor.setAwaitTerminationSeconds(60);
22 executor.initialize();
23 return executor;
24 }
25
26 @Component
27 public static class SagaEventErrorHandler implements ErrorHandler {
28
29 private final Logger log = LoggerFactory.getLogger(SagaEventErrorHandler.class);
30
31 @Override
32 public void handleError(Throwable throwable) {
33 log.error("Error processing SAGA event", throwable);
34 // Implement dead letter queue or retry mechanism
35 }
36 }
37}
🔄 Advanced SAGA Patterns
📊 SAGA State Management
For complex scenarios, implementing explicit state management helps track SAGA progress and handle failures more effectively.
1@Entity
2@Table(name = "saga_state")
3public class SagaState {
4
5 @Id
6 private String sagaId;
7
8 @Enumerated(EnumType.STRING)
9 private SagaPhase currentPhase;
10
11 @Column(columnDefinition = "JSON")
12 @Convert(converter = SagaContextConverter.class)
13 private SagaContext context;
14
15 @ElementCollection
16 @Enumerated(EnumType.STRING)
17 private Set<SagaStepType> completedSteps = new HashSet<>();
18
19 @ElementCollection
20 @Enumerated(EnumType.STRING)
21 private Set<SagaStepType> compensatedSteps = new HashSet<>();
22
23 @Column(name = "created_at")
24 private LocalDateTime createdAt;
25
26 @Column(name = "updated_at")
27 private LocalDateTime updatedAt;
28
29 public boolean isStepCompleted(SagaStepType stepType) {
30 return completedSteps.contains(stepType);
31 }
32
33 public boolean isStepCompensated(SagaStepType stepType) {
34 return compensatedSteps.contains(stepType);
35 }
36
37 public void markStepCompleted(SagaStepType stepType) {
38 completedSteps.add(stepType);
39 updatedAt = LocalDateTime.now();
40 }
41
42 public void markStepCompensated(SagaStepType stepType) {
43 compensatedSteps.add(stepType);
44 completedSteps.remove(stepType);
45 updatedAt = LocalDateTime.now();
46 }
47}
48
49public enum SagaPhase {
50 STARTED,
51 ORDER_CREATION,
52 INVENTORY_RESERVATION,
53 PAYMENT_PROCESSING,
54 SHIPPING_SCHEDULING,
55 COMPLETED,
56 COMPENSATING,
57 COMPENSATED,
58 FAILED
59}
60
61public enum SagaStepType {
62 CREATE_ORDER,
63 RESERVE_INVENTORY,
64 PROCESS_PAYMENT,
65 SCHEDULE_SHIPPING
66}
🔄 SAGA Recovery Mechanism
Implementing recovery mechanisms for handling partial failures and system restarts:
1@Component
2@Scheduled(fixedRate = 30000) // Run every 30 seconds
3public class SagaRecoveryService {
4
5 private final SagaStateRepository sagaStateRepository;
6 private final OrderSagaOrchestrator orchestrator;
7 private final SagaEventPublisher eventPublisher;
8
9 @Scheduled(fixedRate = 30000)
10 public void recoverStuckSagas() {
11 LocalDateTime cutoffTime = LocalDateTime.now().minusMinutes(10);
12
13 List<SagaState> stuckSagas = sagaStateRepository
14 .findByUpdatedAtBeforeAndCurrentPhaseIn(
15 cutoffTime,
16 Arrays.asList(SagaPhase.ORDER_CREATION, SagaPhase.INVENTORY_RESERVATION,
17 SagaPhase.PAYMENT_PROCESSING, SagaPhase.SHIPPING_SCHEDULING)
18 );
19
20 for (SagaState saga : stuckSagas) {
21 try {
22 recoverSaga(saga);
23 } catch (Exception e) {
24 log.error("Failed to recover saga: {}", saga.getSagaId(), e);
25 }
26 }
27 }
28
29 private void recoverSaga(SagaState saga) {
30 log.info("Recovering stuck saga: {}, phase: {}", saga.getSagaId(), saga.getCurrentPhase());
31
32 switch (saga.getCurrentPhase()) {
33 case ORDER_CREATION:
34 if (!saga.isStepCompleted(SagaStepType.CREATE_ORDER)) {
35 retryOrderCreation(saga);
36 } else {
37 moveToNextPhase(saga, SagaPhase.INVENTORY_RESERVATION);
38 }
39 break;
40
41 case INVENTORY_RESERVATION:
42 if (!saga.isStepCompleted(SagaStepType.RESERVE_INVENTORY)) {
43 retryInventoryReservation(saga);
44 } else {
45 moveToNextPhase(saga, SagaPhase.PAYMENT_PROCESSING);
46 }
47 break;
48
49 case PAYMENT_PROCESSING:
50 if (!saga.isStepCompleted(SagaStepType.PROCESS_PAYMENT)) {
51 retryPaymentProcessing(saga);
52 } else {
53 moveToNextPhase(saga, SagaPhase.SHIPPING_SCHEDULING);
54 }
55 break;
56
57 case SHIPPING_SCHEDULING:
58 if (!saga.isStepCompleted(SagaStepType.SCHEDULE_SHIPPING)) {
59 retryShippingScheduling(saga);
60 } else {
61 moveToNextPhase(saga, SagaPhase.COMPLETED);
62 }
63 break;
64
65 default:
66 log.warn("Unknown phase for recovery: {}", saga.getCurrentPhase());
67 }
68 }
69
70 private void retryOrderCreation(SagaState saga) {
71 try {
72 eventPublisher.publishOrderCreationRetry(saga.getSagaId(), saga.getContext());
73 } catch (Exception e) {
74 initiateSagaCompensation(saga, "Order creation retry failed");
75 }
76 }
77
78 private void initiateSagaCompensation(SagaState saga, String reason) {
79 saga.setCurrentPhase(SagaPhase.COMPENSATING);
80 sagaStateRepository.save(saga);
81
82 eventPublisher.publishSagaCompensationInitiated(saga.getSagaId(), reason);
83 }
84}
📈 Performance Optimization & Monitoring
🔍 SAGA Metrics and Monitoring
1@Component
2public class SagaMetrics {
3
4 private final MeterRegistry meterRegistry;
5 private final Timer sagaExecutionTimer;
6 private final Counter sagaSuccessCounter;
7 private final Counter sagaFailureCounter;
8 private final Gauge activeSagasGauge;
9
10 public SagaMetrics(MeterRegistry meterRegistry, SagaStateRepository sagaStateRepository) {
11 this.meterRegistry = meterRegistry;
12 this.sagaExecutionTimer = Timer.builder("saga.execution.time")
13 .description("Time taken to execute SAGA transactions")
14 .register(meterRegistry);
15
16 this.sagaSuccessCounter = Counter.builder("saga.executions")
17 .tag("result", "success")
18 .description("Successful SAGA executions")
19 .register(meterRegistry);
20
21 this.sagaFailureCounter = Counter.builder("saga.executions")
22 .tag("result", "failure")
23 .description("Failed SAGA executions")
24 .register(meterRegistry);
25
26 this.activeSagasGauge = Gauge.builder("saga.active.count")
27 .description("Number of active SAGA transactions")
28 .register(meterRegistry, sagaStateRepository, repo ->
29 repo.countByCurrentPhaseIn(Arrays.asList(
30 SagaPhase.STARTED, SagaPhase.ORDER_CREATION,
31 SagaPhase.INVENTORY_RESERVATION, SagaPhase.PAYMENT_PROCESSING,
32 SagaPhase.SHIPPING_SCHEDULING)));
33 }
34
35 public Timer.Sample startSagaTimer() {
36 return Timer.start(meterRegistry);
37 }
38
39 public void recordSagaSuccess(Timer.Sample sample, String sagaType) {
40 sample.stop(Timer.builder("saga.execution.time")
41 .tag("type", sagaType)
42 .tag("result", "success")
43 .register(meterRegistry));
44 sagaSuccessCounter.increment();
45 }
46
47 public void recordSagaFailure(Timer.Sample sample, String sagaType, String errorType) {
48 sample.stop(Timer.builder("saga.execution.time")
49 .tag("type", sagaType)
50 .tag("result", "failure")
51 .tag("error", errorType)
52 .register(meterRegistry));
53 sagaFailureCounter.increment();
54 }
55
56 public void recordCompensationEvent(String stepType, String reason) {
57 Counter.builder("saga.compensations")
58 .tag("step", stepType)
59 .tag("reason", reason)
60 .register(meterRegistry)
61 .increment();
62 }
63}
🔧 Configuration Optimization
1# application.yml
2spring:
3 application:
4 name: saga-orchestrator
5
6 datasource:
7 hikari:
8 maximum-pool-size: 50
9 minimum-idle: 10
10 connection-timeout: 30000
11 idle-timeout: 600000
12 max-lifetime: 1800000
13
14 kafka:
15 producer:
16 bootstrap-servers: ${KAFKA_BOOTSTRAP_SERVERS:localhost:9092}
17 key-serializer: org.apache.kafka.common.serialization.StringSerializer
18 value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
19 acks: all
20 retries: 3
21 batch-size: 16384
22 linger-ms: 5
23 buffer-memory: 33554432
24
25 consumer:
26 bootstrap-servers: ${KAFKA_BOOTSTRAP_SERVERS:localhost:9092}
27 key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
28 value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
29 properties:
30 spring.json.trusted.packages: "com.example.saga.events"
31 group-id: saga-choreography-group
32 auto-offset-reset: earliest
33 enable-auto-commit: false
34 max-poll-records: 10
35
36saga:
37 orchestration:
38 timeout: 300000 # 5 minutes
39 retry:
40 max-attempts: 3
41 backoff-delay: 1000
42 multiplier: 2
43
44 choreography:
45 event-store:
46 retention-days: 30
47 recovery:
48 check-interval: 30000
49 stuck-timeout: 600000 # 10 minutes
50
51 compensation:
52 timeout: 60000 # 1 minute per step
53 max-retries: 3
54
55management:
56 endpoints:
57 web:
58 exposure:
59 include: health, metrics, prometheus, saga-status
60 endpoint:
61 health:
62 show-details: always
63 metrics:
64 export:
65 prometheus:
66 enabled: true
⚖️ SAGA Pattern Trade-offs
📊 Orchestration vs Choreography Comparison
graph TD
A[SAGA Implementation Choice] --> B{System Characteristics}
B -->|Simple Flow, Few Services| C[Orchestration]
B -->|Complex Flow, Many Services| D[Choreography]
B -->|Mixed Requirements| E[Hybrid Approach]
C --> F[Centralized Control]
C --> G[Easy Debugging]
C --> H[Single Point of Failure]
D --> I[Distributed Control]
D --> J[High Resilience]
D --> K[Complex Event Chains]
E --> L[Service-Specific Choice]
E --> M[Gradual Migration]
style C fill:#4ecdc4
style D fill:#45b7d1
style E fill:#feca57
style H fill:#ff6b35
style K fill:#ff9ff3
📈 Performance Comparison
Aspect | Orchestration | Choreography | Two-Phase Commit |
---|---|---|---|
Latency | Medium | Low | High |
Throughput | Medium | High | Low |
Complexity | Medium | High | Low |
Debugging | Easy | Difficult | Easy |
Fault Tolerance | Medium | High | Low |
Consistency | Eventual | Eventual | Strong |
✅ SAGA Pattern Pros & Cons
Advantages:
- High Availability: No blocking distributed locks
- Scalability: Services can scale independently
- Fault Tolerance: Resilient to individual service failures
- Performance: Better throughput than 2PC
- Flexibility: Business logic can be distributed
Disadvantages:
- Complexity: More complex error handling and compensation logic
- Eventual Consistency: Temporary inconsistent states during execution
- Debugging: Harder to trace and debug distributed flows
- Data Isolation: Lack of isolation between concurrent SAGAs
- Compensation Logic: Requires careful design of compensating actions
🎯 Best Practices and Recommendations
🔧 Implementation Guidelines
1. Idempotency Design:
1@Service
2public class IdempotentPaymentService {
3
4 private final PaymentRepository paymentRepository;
5 private final IdempotencyKeyRepository idempotencyKeyRepository;
6
7 @Transactional
8 public PaymentResult processPayment(String idempotencyKey, PaymentRequest request) {
9 // Check for existing payment with same idempotency key
10 Optional<Payment> existingPayment = paymentRepository
11 .findByIdempotencyKey(idempotencyKey);
12
13 if (existingPayment.isPresent()) {
14 return PaymentResult.fromExisting(existingPayment.get());
15 }
16
17 // Record idempotency key to prevent concurrent processing
18 IdempotencyKey key = new IdempotencyKey(idempotencyKey, PaymentRequest.class.getSimpleName());
19 idempotencyKeyRepository.saveWithLock(key);
20
21 try {
22 Payment payment = executePayment(request);
23 payment.setIdempotencyKey(idempotencyKey);
24 paymentRepository.save(payment);
25
26 return PaymentResult.success(payment);
27
28 } catch (Exception e) {
29 idempotencyKeyRepository.delete(key);
30 throw new PaymentProcessingException("Payment failed", e);
31 }
32 }
33}
2. Compensation Design:
1public interface CompensatingAction {
2
3 CompensationResult execute(CompensationContext context);
4
5 boolean isCompensatable(CompensationContext context);
6
7 default int getRetryAttempts() {
8 return 3;
9 }
10
11 default Duration getRetryDelay() {
12 return Duration.ofSeconds(5);
13 }
14}
15
16@Component
17public class OrderCancellationCompensation implements CompensatingAction {
18
19 @Override
20 public CompensationResult execute(CompensationContext context) {
21 try {
22 String orderId = context.getOrderId();
23 Order order = orderRepository.findById(orderId)
24 .orElse(null);
25
26 if (order == null) {
27 return CompensationResult.success("Order already cancelled or not found");
28 }
29
30 if (order.getStatus() == OrderStatus.SHIPPED) {
31 // Cannot cancel shipped orders - need manual intervention
32 return CompensationResult.failed("Cannot cancel shipped order - manual intervention required");
33 }
34
35 order.cancel("SAGA compensation");
36 orderRepository.save(order);
37
38 return CompensationResult.success("Order cancelled successfully");
39
40 } catch (Exception e) {
41 return CompensationResult.failed("Failed to cancel order: " + e.getMessage());
42 }
43 }
44
45 @Override
46 public boolean isCompensatable(CompensationContext context) {
47 // Check if compensation is still possible
48 String orderId = context.getOrderId();
49 return orderRepository.findById(orderId)
50 .map(order -> order.getStatus() != OrderStatus.SHIPPED)
51 .orElse(false);
52 }
53}
3. Testing Strategy:
1@SpringBootTest
2@TestPropertySource(properties = {
3 "saga.orchestration.timeout=5000",
4 "saga.compensation.timeout=3000"
5})
6class SagaIntegrationTest {
7
8 @Autowired
9 private OrderSagaOrchestrator orchestrator;
10
11 @MockBean
12 private PaymentService paymentService;
13
14 @Test
15 void testSuccessfulSagaExecution() {
16 // Arrange
17 OrderSagaRequest request = createValidOrderRequest();
18
19 // Act
20 SagaExecutionResult result = orchestrator.executeOrderSaga(request);
21
22 // Assert
23 assertThat(result.isSuccess()).isTrue();
24 assertThat(result.getSaga().getStatus()).isEqualTo(SagaStatus.COMPLETED);
25 }
26
27 @Test
28 void testSagaCompensationOnPaymentFailure() {
29 // Arrange
30 OrderSagaRequest request = createValidOrderRequest();
31 when(paymentService.processPayment(any(), any(), any()))
32 .thenThrow(new PaymentProcessingException("Insufficient funds"));
33
34 // Act
35 SagaExecutionResult result = orchestrator.executeOrderSaga(request);
36
37 // Assert
38 assertThat(result.isSuccess()).isFalse();
39 assertThat(result.getSaga().getStatus()).isEqualTo(SagaStatus.COMPENSATED);
40
41 // Verify compensations were called
42 verify(inventoryService).releaseReservation(any());
43 verify(orderService).cancelOrder(any());
44 }
45
46 @Test
47 void testConcurrentSagaExecution() throws InterruptedException {
48 // Test multiple concurrent SAGAs to ensure isolation
49 int threadCount = 10;
50 CountDownLatch latch = new CountDownLatch(threadCount);
51 AtomicInteger successCount = new AtomicInteger(0);
52
53 ExecutorService executor = Executors.newFixedThreadPool(threadCount);
54
55 for (int i = 0; i < threadCount; i++) {
56 executor.submit(() -> {
57 try {
58 OrderSagaRequest request = createValidOrderRequest();
59 SagaExecutionResult result = orchestrator.executeOrderSaga(request);
60
61 if (result.isSuccess()) {
62 successCount.incrementAndGet();
63 }
64 } finally {
65 latch.countDown();
66 }
67 });
68 }
69
70 latch.await(30, TimeUnit.SECONDS);
71 assertThat(successCount.get()).isGreaterThan(0);
72 }
73}
🎯 Conclusion
The SAGA pattern provides a robust alternative to traditional distributed transaction mechanisms, offering better availability and scalability at the cost of increased complexity. When implementing SAGA patterns in Spring Boot applications:
🏆 Key Takeaways
- Choose the Right Pattern: Orchestration for simpler, centralized control; Choreography for distributed, resilient systems
- Design for Idempotency: Ensure all operations can be safely retried
- Implement Comprehensive Compensation: Plan for all possible failure scenarios
- Monitor and Observe: Use metrics and logging to track SAGA execution
- Test Thoroughly: Include failure scenarios and compensation paths in testing
🚀 When to Use SAGA Pattern
Ideal Scenarios:
- Microservices architecture with multiple data stores
- Long-running business processes
- Systems requiring high availability
- Applications with complex business workflows
Avoid SAGA When:
- Strong consistency is mandatory
- Simple, single-service transactions
- Systems with simple failure scenarios
- Applications requiring strict ACID properties
By implementing SAGA patterns correctly in your Spring Boot applications, you can achieve distributed transaction management that scales with your microservices architecture while maintaining business consistency requirements.