In previous articel discussed about the BulkHead Pattern, Now we are discussing on Sliding Window.
Good 👍 this is core internal logic of CircuitBreaker in Resilience4j.
Most developers use @CircuitBreaker but don’t understand how sliding window actually calculates failure rate.
Let’s break it clearly.
🔥 What is Sliding Window in Resilience4j?
Sliding window is the statistical window used by CircuitBreaker to decide:
Should we OPEN the circuit or keep it CLOSED?
It calculates:
-
Failure rate %
-
Slow call rate %
-
Total calls count
Based on last N calls or last N seconds.
📌 Two Types of Sliding Windows
1️⃣ COUNT_BASED Sliding Window
Based on number of calls.
Example:
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.slidingWindowType(SlidingWindowType.COUNT_BASED)
.slidingWindowSize(10)
.failureRateThreshold(50)
.build();
Meaning:
-
Observe last 10 calls
-
If more than 50% fail
-
Circuit goes OPEN
Example Scenario:
Last 10 calls:
S F S F F S F F S F
Failures = 6
Failure rate = 60%
If threshold = 50% → Circuit OPEN
2️⃣ TIME_BASED Sliding Window
Based on time duration.
Example:
.slidingWindowType(SlidingWindowType.TIME_BASED)
.slidingWindowSize(10)
Meaning:
-
Observe calls in last 10 seconds
-
Calculate failure rate
-
If threshold crossed → OPEN
🧠 How Sliding Window Internally Works
Internally it maintains:
-
Circular array (ring buffer)
-
Buckets for time-based
-
Atomic counters
Every new call:
-
Old data expires
-
New result added
-
Failure rate recalculated
-
Decision made
This is O(1) time complexity per update.
Very efficient.
🎯 Important Configurations (Architect Level)
🔹 Minimum Number of Calls
.minimumNumberOfCalls(5)
Circuit will not evaluate failure rate unless at least 5 calls happen.
This avoids false positives in low traffic systems.
🔹 Failure Rate Threshold
.failureRateThreshold(50)
If failure % > threshold → OPEN
🔹 Slow Call Rate Threshold
.slowCallRateThreshold(60)
.slowCallDurationThreshold(Duration.ofSeconds(2))
If 60% calls take > 2 seconds → OPEN
This protects against latency spikes.
🏦 Real Banking Example (APS Context)
Let’s say:
Loan SOR:
-
Sliding window size = 20 calls
-
Failure threshold = 40%
-
Minimum calls = 10
If last 20 calls:
-
8 failures
-
Failure rate = 40%
Circuit remains CLOSED.
But if 9 failures:
-
45%
-
Circuit OPEN
🔄 Difference Between Count vs Time Based
| Feature | COUNT_BASED | TIME_BASED |
|---|---|---|
| Best For | Stable traffic | Variable traffic |
| Banking Core APIs | ✅ Good | ⚠️ Depends |
| High burst systems | ❌ Risky | ✅ Better |
| Predictability | High | Medium |
No comments:
Post a Comment