Scaling WebSockets to 100k Concurrent Connections: A Real-Time Architecture Deep Dive

When Virat Kohli walks to the crease, traffic on a cricket scoring app doesn't climb gradually—it explodes. Millions of fans worldwide refresh their apps simultaneously, each expecting real-time ball-by-ball updates. For the engineering team behind one such cricket platform, this scenario wasn't hypothetical. It was a recurring challenge that forced them to rethink their entire WebSocket architecture.

The story, shared on Dev.to, offers a masterclass in real-time system design. While cricket might be the domain, the lessons apply to any platform handling live events—trading dashboards, multiplayer games, live auctions, or collaborative editing tools.

The WebSocket Scaling Problem

WebSockets provide full-duplex communication channels over a single TCP connection, making them ideal for real-time updates. But they come with a hidden cost: persistent connections consume memory.

Unlike HTTP requests that complete in milliseconds, each WebSocket connection stays open—sometimes for hours. At scale, this creates three critical bottlenecks:

Memory per connection: Even an idle WebSocket consumes kernel buffers, application-level read/write buffers, and framework overhead. In Node.js, this can range from 10-50KB per connection. At 100k connections, that's 1-5GB before you've sent a single byte of application data.

File descriptor limits: Every socket is a file descriptor. Linux defaults to 1024 open files per process. While tunable via ulimit, hitting kernel limits means rejected connections—no matter how much RAM you have.

CPU overhead: Managing connection lifecycle events (ping/pong heartbeats, reconnection logic, authentication tokens) generates constant background work. At 100k connections sending heartbeats every 30 seconds, your server handles 3,300 events per second just staying alive.

The cricket app faced an additional challenge: traffic isn't evenly distributed. A boundary hit might generate 500 messages/second. A wicket? 5,000 messages/second. Star player introductions triggered avalanche reconnects as users refreshed their apps.

Architecture for 100k Connections

The solution wasn't a single technology choice—it was a distributed architecture addressing each bottleneck:

Horizontal Scaling with Sticky Sessions

No single server instance handled all connections. The team deployed a cluster of WebSocket servers behind a load balancer configured for sticky sessions (session affinity). This ensured that once a client connected to a server, subsequent reconnections landed on the same instance—critical for maintaining stateful connections.

They used IP hash-based routing rather than cookie-based affinity, since WebSocket handshakes happen before cookies can be reliably inspected. The trade-off? Users behind shared NAT (corporate networks, mobile carriers) might overload specific servers. The team mitigated this with connection limits per server and graceful degradation to HTTP polling when WebSocket servers reached capacity.

Pub/Sub for Message Distribution

With connections distributed across 20+ servers, how do you broadcast a wicket update to 100k clients? The answer: Redis Pub/Sub.

Each WebSocket server subscribed to match-specific Redis channels. When a scorer submitted an update, it published to Redis once. All WebSocket servers received the message and fanned it out to their connected clients. This architecture decoupled the "source of truth" (the scoring backend) from the distribution layer (WebSocket servers).

Critical optimization: message serialization happened once, before publishing to Redis. Servers received JSON payloads ready to forward, avoiding 20 redundant serialization operations per event.

Connection State Offloading

Storing user context (authentication, subscribed matches, preferences) in-memory per connection doesn't scale. The team moved connection state to Redis Hashes, keyed by connection ID.

When a client connected:

Server authenticated via JWT
Created a Redis hash with {userId, matchIds, tier}
Added connection ID to a Redis Set for each subscribed match

When broadcasting updates, servers queried SMEMBERS match:{matchId}:connections to determine which local connections needed the message. This kept per-connection memory footprint minimal and enabled seamless failover—if a server crashed, clients reconnected to a different instance that rehydrated state from Redis.

Key Optimizations and Lessons Learned

1. Aggressive Heartbeat Tuning

Default WebSocket ping/pong intervals (30 seconds) were too conservative. The team increased to 60 seconds, halving heartbeat CPU overhead. For mobile clients on flaky networks, they implemented exponential backoff reconnection with jitter to avoid thundering herds after network blips.

2. Message Batching

During high-intensity overs, individual ball updates arrived milliseconds apart. Instead of sending 6 separate WebSocket frames, servers batched updates over a 200ms window. This reduced syscall overhead and improved mobile battery life by minimizing radio wake-ups.

3. Tiered Broadcasting

Not all clients need microsecond updates. Free users received batched updates every 5 seconds. Premium users got real-time streams. This tiered approach cut peak message volume by 60% without degrading perceived performance for most users.

4. Observability as a First-Class Concern

Scaling WebSockets is debugging connection states across distributed servers. The team instrumented:

Per-server connection counts (exposed via Prometheus)
Message delivery latency (p50, p95, p99)
Reconnection rates (spike detection for infrastructure issues)
Redis Pub/Sub lag (early warning for message backlog)

When Kohli's century celebration caused a 30-second traffic spike, dashboards showed exactly which servers approached capacity and triggered auto-scaling 90 seconds before user impact.

Takeaways for Your Real-Time System

Scaling WebSockets to 100k connections isn't about finding the "right" framework—it's about distributed systems fundamentals:

Horizontal scaling is non-negotiable. Plan for sticky sessions and state externalization from day one.
Pub/Sub decouples message sources from distribution. Redis works for most workloads; consider Kafka or NATS for higher throughput.
Observability surfaces bottlenecks before users complain. Instrument connection lifecycle, not just message volume.
Tiered service lets you scale economics, not just infrastructure. Real-time for those who pay; near-real-time for everyone else.

The cricket app's architecture isn't exotic—Redis, Node.js, and load balancers are industry-standard tools. What's notable is the disciplined approach to measuring, isolating, and optimizing each constraint. That's the real lesson: WebSocket scaling is a marathon of incremental improvements, not a sprint to the latest framework.

And when the next cricket superstar walks out to bat? The servers are ready.

Scaling WebSockets to 100k Concurrent Connections: A Real-Time Architecture Deep Dive

Scaling WebSockets to 100k Concurrent Connections: A Real-Time Architecture Deep Dive

The WebSocket Scaling Problem

Architecture for 100k Connections

Horizontal Scaling with Sticky Sessions

Pub/Sub for Message Distribution

Connection State Offloading

Key Optimizations and Lessons Learned

1. Aggressive Heartbeat Tuning

2. Message Batching

3. Tiered Broadcasting

4. Observability as a First-Class Concern

Takeaways for Your Real-Time System

// rate this post

// comments (0)

When a 10x Speedup Delivers Zero Impact: The Threshold Problem

FIFA World Cup IDOR: How One Credential Hijacked an Entire Event

Stop Trusting AI, Start Designing It: GraphRAG + MCP for Large Codebases