In previous articel discussed about the BulkHead Pattern, Now we are discussing on Sliding Window.

Good 👍 this is core internal logic of CircuitBreaker in Resilience4j.

Most developers use @CircuitBreaker but don’t understand how sliding window actually calculates failure rate.

Let’s break it clearly.

🔥 What is Sliding Window in Resilience4j?

Sliding window is the statistical window used by CircuitBreaker to decide:

Should we OPEN the circuit or keep it CLOSED?

It calculates:

Failure rate %
Slow call rate %
Total calls count

Based on last N calls or last N seconds.

📌 Two Types of Sliding Windows

1️⃣ COUNT_BASED Sliding Window

Based on number of calls.

Example:


CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .slidingWindowType(SlidingWindowType.COUNT_BASED)
    .slidingWindowSize(10)
    .failureRateThreshold(50)
    .build();

Meaning:

Observe last 10 calls
If more than 50% fail
Circuit goes OPEN

Example Scenario:

Last 10 calls:


S F S F F S F F S F

Failures = 6
Failure rate = 60%

If threshold = 50% → Circuit OPEN

2️⃣ TIME_BASED Sliding Window

Based on time duration.

Example:


.slidingWindowType(SlidingWindowType.TIME_BASED)
.slidingWindowSize(10)

Meaning:

Observe calls in last 10 seconds
Calculate failure rate
If threshold crossed → OPEN

🧠 How Sliding Window Internally Works

Internally it maintains:

Circular array (ring buffer)
Buckets for time-based
Atomic counters

Every new call:

Old data expires
New result added
Failure rate recalculated
Decision made

This is O(1) time complexity per update.

Very efficient.

🎯 Important Configurations (Architect Level)

🔹 Minimum Number of Calls


.minimumNumberOfCalls(5)

Circuit will not evaluate failure rate unless at least 5 calls happen.

This avoids false positives in low traffic systems.

🔹 Failure Rate Threshold


.failureRateThreshold(50)

If failure % > threshold → OPEN

🔹 Slow Call Rate Threshold


.slowCallRateThreshold(60)
.slowCallDurationThreshold(Duration.ofSeconds(2))

If 60% calls take > 2 seconds → OPEN

This protects against latency spikes.

🏦 Real Banking Example (APS Context)

Let’s say:

Loan SOR:

Sliding window size = 20 calls
Failure threshold = 40%
Minimum calls = 10

If last 20 calls:

8 failures
Failure rate = 40%

Circuit remains CLOSED.

But if 9 failures:

45%
Circuit OPEN

🔄 Difference Between Count vs Time Based

Feature	COUNT_BASED	TIME_BASED
Best For	Stable traffic	Variable traffic
Banking Core APIs	✅ Good	⚠️ Depends
High burst systems	❌ Risky	✅ Better
Predictability	High	Medium

Resilence4J Patterns - These patterns will focus only on threads.

Bulkhead Pattern – Every SOR Should Have a Separate Thread Pool/Executor Service

In enterprise banking systems, a single application often communicates with multiple Systems of Record (SORs) — such as Customer SOR, Loan SOR, Payment SOR, or Core Banking.

If one SOR becomes slow or unavailable, it should not impact other SOR integrations.

This is where the Bulkhead Pattern becomes critical.

Ex:

APS Service

├── Customer SOR

├── Loan SOR

├── Payment SOR

└── Notification SOR

All SOR calls share the same thread pool.

❌ What happens if Loan SOR becomes slow?

Threads get blocked
Thread pool gets exhausted
Customer SOR calls start waiting
Entire APS system becomes unresponsive
Production incident

This is called resource starvation.

✅ Solution: Separate Thread Pool Per SOR

Each SOR should have:

Dedicated thread pool
Dedicated timeout
Dedicated circuit breaker
Dedicated monitoring metrics

Architecture becomes:


Customer SOR → ThreadPool-A
Loan SOR     → ThreadPool-B
Payment SOR  → ThreadPool-C
Notification → ThreadPool-D

Now:

Loan SOR failure affects only ThreadPool-B
Other SORs continue working normally
System stability increases dramatically

rameshvanka

Sunday, 15 February 2026

Resilience4J - SlidingWindow Protocal with CircuitBreaker

🔥 What is Sliding Window in Resilience4j?

📌 Two Types of Sliding Windows

1️⃣ COUNT_BASED Sliding Window

Meaning:

Example Scenario:

2️⃣ TIME_BASED Sliding Window

🧠 How Sliding Window Internally Works

🎯 Important Configurations (Architect Level)

🔹 Minimum Number of Calls

🔹 Failure Rate Threshold

🔹 Slow Call Rate Threshold

🏦 Real Banking Example (APS Context)

🔄 Difference Between Count vs Time Based

Saturday, 14 February 2026

Bulkhead Pattern – Every SOR Should Have a Separate Thread Pool/Executor Service

❌ What happens if Loan SOR becomes slow?

✅ Solution: Separate Thread Pool Per SOR