Scaling WebSockets to 100k Connections: Architecture Lessons from Production

When Virat Kohli walks to the crease during an IPL match, something remarkable happens to cricket scoring apps: traffic doesn't climb gradually—it explodes. Thousands of fans refresh simultaneously, desperate for real-time updates. For applications built on WebSockets, this isn't just a scaling challenge—it's a stress test that exposes every architectural weakness.

A recent case study from a production cricket scoring application reveals the hard-won lessons of scaling WebSocket connections to 100,000+ concurrent users. Here's what actually works when theory meets reality.

The WebSocket Scaling Problem

HTTP's request-response model scales horizontally with relative ease. Add more servers, distribute traffic with a load balancer, and you're mostly done. WebSockets break this simplicity.

Unlike HTTP, WebSocket connections are stateful and persistent. Each connection consumes server resources—file descriptors, memory buffers, and CPU cycles for heartbeat checks. More critically, a single connection can live for hours. When you're serving 100,000 users, you're not handling 100,000 requests—you're maintaining 100,000 simultaneous relationships.

The cricket app team discovered this the hard way. Their initial Node.js deployment handled 5,000 connections comfortably on a single EC2 instance. But at 15,000 connections, response times degraded. At 25,000, the event loop began blocking. The math was brutal: to reach 100,000 connections, they'd need 20+ instances. But horizontal scaling introduced a new problem—how do you broadcast a score update to 100,000 users when they're distributed across 20 different servers?

Architecture Patterns That Actually Scale

Sticky Sessions with Redis Pub/Sub

The team's breakthrough came from separating connection management from message distribution. Their final architecture uses:

Layer 1: Load Balancer with Sticky Sessions
ALB routes users to specific WebSocket servers based on connection hash. Once connected, a user stays pinned to that server for the session duration. This eliminates the reconnection overhead that plagued their early attempts.

Layer 2: WebSocket Servers (Stateful)
Each Node.js server maintains 10,000-15,000 connections. They tuned ulimit -n to 65,535 and optimized the event loop by offloading CPU-intensive JSON parsing to worker threads. Critical configuration: maxPayload: 1024 to prevent memory exhaustion from malicious payloads.

Layer 3: Redis Pub/Sub (Message Bus)
When a wicket falls or a boundary is scored, the event service publishes once to Redis. All WebSocket servers subscribe to relevant channels (match-specific topics like match:12345:score). This architectural shift reduced database queries by 98%—instead of 100,000 servers polling for updates, one event triggers a fan-out.

Connection Lifecycle Optimizations

The team implemented aggressive connection hygiene:

  • Heartbeat tuning: 30-second ping/pong intervals with 45-second timeout. This culls dead connections quickly without overwhelming the network.
  • Graceful degradation: When server capacity hits 90%, new connections receive a Retry-After header pointing to a fallback HTTP polling endpoint.
  • Client-side backoff: Exponential retry with jitter prevents thundering herd problems during server restarts.

The State Management Breakthrough

Initially, each WebSocket server queried the database for user preferences (favorite teams, notification settings). At scale, this created a secondary bottleneck. The solution:

  1. User preferences cached in Redis (TTL: 5 minutes)
  2. Connection metadata stored in memory on the WebSocket server
  3. Only score data flows through the pub/sub system

This separation of concerns reduced Redis memory usage from 8GB to 1.2GB and eliminated 90% of preference-related queries.

Lessons from the Trenches

Monitoring is non-negotiable
The team's CloudWatch dashboard tracks connections per server, message latency (p50, p95, p99), and Redis pub/sub lag. During the Kohli spike, p99 latency jumped from 45ms to 320ms—still acceptable, but a warning sign that triggered auto-scaling.

Client libraries matter
Switching from socket.io to ws (a minimal WebSocket library) reduced memory footprint by 40%. Socket.io's automatic fallback features are convenient for development but wasteful at scale when 99.9% of clients support native WebSockets.

Vertical scaling has a place
Counter-intuitively, fewer, larger instances outperformed many small ones. A c5.4xlarge instance (16 vCPU, 32GB RAM) handled 15,000 connections more efficiently than three c5.xlarge instances handling 5,000 each. The overhead of Redis subscriptions and load balancer health checks favored consolidation.

Test with realistic traffic patterns
Load testing with gradual ramps missed the real problem. Cricket traffic doesn't climb—it jumps. Their final test suite simulates 10,000 connections establishing within 30 seconds, mirroring actual celebrity-player-induced spikes.

The Takeaway

Scaling WebSockets to 100,000 connections isn't about finding a single silver bullet—it's about addressing every bottleneck systematically. Sticky sessions prevent connection churn. Redis pub/sub decouples message distribution from connection management. Aggressive connection hygiene prevents resource leaks. And crucially, vertical scaling of WebSocket servers combined with horizontal scaling via load balancing gives you the best of both worlds.

The cricket app now handles peak loads of 120,000 concurrent connections with p99 latency under 200ms. When the next cricket star walks to the crease, the system doesn't blink.

For teams building real-time features—chat systems, live dashboards, multiplayer games—these patterns transfer directly. The infrastructure that handles cricket fans can handle your users too.