Replicated Load-Balanced Services: High Availability Through Horizontal Scaling
Core Concept
- Replicated load-balanced service = multiple identical servers + load balancer
- Any instance can handle any request
- Load balancer distributes traffic (round-robin or sticky sessions)
Why Replication Matters
1. High availability (HA)
- Single instance = downtime during:
- Deployments
- Crashes
- Minimum 2 replicas needed for reliability
2. Horizontal scaling
- Handle more users by adding replicas
- Scale out instead of scaling up
Stateless Services (Ideal Case)
- No dependency on stored state
- Any request → any replica
- Examples:
- APIs
- Web servers
- Middleware
Makes load balancing simple and efficient
Readiness Probes (Very Important)
- Liveness ≠ Readiness
- Liveness → is app alive?
- Readiness → can it serve traffic?
- Load balancer should only send traffic when:
- App is fully initialized
Prevents broken/slow startups from affecting users
Session Tracking (When Needed)
- Some apps need sticky sessions
- Same user → same replica
Methods:
- IP hashing (limited due to NAT)
- Cookies / headers (preferred)
Often needed for:
- Caching
- Long-running sessions
Application-Layer Enhancements
1. Caching Layer
- Add cache between users and backend
- Example: Varnish proxy
Benefits:
- Faster responses
- Reduced backend load
Best practice:
- Few large cache nodes (better hit rate)
- Not many small ones
2. Rate Limiting & DDoS Protection
- Prevent abuse or overload
- Return HTTP 429 (Too Many Requests)
- Often implemented in proxy layer
3. SSL Termination Layer
- Handle HTTPS at edge (e.g., nginx)
- Forward internal traffic as HTTP
Separates:
- Security concerns
- Application logic
Final Architecture (Layered)
[ Client ]
↓
[ Load Balancer / SSL (nginx) ]
↓
[ Cache Layer (Varnish) ]
↓
[ App Replicas (stateless servers) ]
Each layer:
- Scales independently
- Has a clear responsibility
Trade-offs
Pros
- High availability
- Easy scaling
- Fault tolerance
- Performance optimization (via cache)
Cons
- More infrastructure layers
- Cache invalidation complexity
- Session handling complexity
Mental Model
“Clone your service many times and put a smart traffic distributor in front.”
One-line Summary
Replicated load-balanced services use multiple identical stateless instances behind a load balancer to achieve high availability, scalability, and reliability, often enhanced with caching and SSL layers.
#Distributed Systems #System Design #Load Balancing #High Availability #Scaling