Scaling Long-Lived WebSocket Connections on Kubernetes with KEDA and a Custom Session Dispatcher

Introduction

Scaling standard HTTP applications on Kubernetes is a well-understood problem. However, scaling long-lived, stateful connections like WebSockets—especially for resource-intensive applications like cloud gaming emulators—presents unique challenges. Standard horizontal pod autoscalers (HPA) often fail to account for the actual “session” count, and simple load balancers can break connection stickiness when multiple clients share a single public IP (common in residential NAT environments).

In this post, I’ll break down the architecture of the <your-project> project, where I implemented a custom Session Dispatcher to manage a pool of GameCube emulators that can scale from zero to five based on active user sessions.

The Architecture Goals

The project required a pool of Dolphin emulators accessible via the web. Unlike my previous per-user deployment model, this setup uses a shared StatefulSet to provide predictable pod identities and persistent storage mappings while remaining highly efficient with resources.

Scale to Zero: No pods should run if no one is playing.
Stateful Stickiness: Once a user is assigned to Pod 0, all their traffic (and their WebSocket controller input) must stay on Pod 0.
NAT Compatibility: Stickiness must be based on browser cookies, not IP addresses, to support multiple users behind the same router.
Metric-Driven Scaling: Use KEDA to drive the StatefulSet replica count based on real-time session metrics.
Graceful Cold Start: Provide a dedicated landing page while the KEDA-scaled environment initializes.

The Component Breakdown

1. The Custom Session Dispatcher (Traffic Manager)

The heart of this architecture is a lightweight Node.js proxy. While KEDA’s HTTP Add-on is great for standard traffic, it didn’t provide the granular control needed for WebSocket session cleanup and cookie-based sticky mapping to specific StatefulSet indices.

I built a custom Dispatcher that performs the following:

Initial Session Creation: If a user arrives without a GC_SESSION_ID cookie, the Dispatcher sets one and proxies the request to a dedicated Landing Page service (gamecube-landing-page).
Sticky Allocation: Maps the cookie to an available pod index (0 through 4).
WebSocket Proxying: Uses http-proxy to handle the transparent upgrade of HTTP connections to WebSockets.
Inactivity Cleanup: Monitors both HTTP heartbeat intervals and WebSocket closures to release pod allocations after a grace period.
Metrics Exposure: Exposes a /metrics endpoint for Prometheus, reporting the gauge gamecube_active_sessions.

        
      
// High-level logic for the Dispatcher
function handleRequest(req, res, isWs = false) {
    let sid = getSessionId(req);
    
    // 🟢 INITIAL HIT: Set session cookie and show landing page
    if (!sid && !isWs) {
        sid = Math.random().toString(36).substring(2);
        res.setHeader('Set-Cookie', `GC_SESSION_ID=${sid}; Path=/; Max-Age=3600`);
        const target = `http://gamecube-landing-page.<your-namespace>.svc.cluster.local:80`;
        return proxy.web(req, res, { target });
    }

    // ... allocate podIndex (0-4) if new ...
    const target = `http://gamecube-${podIndex}.gamecube-service.<your-namespace>.svc.cluster.local:3000`;
    
    if (isWs) {
        proxy.ws(req, res, head, { target });
        // Release session 10s after socket close
        res.on('close', () => setTimeout(() => cleanup(sid), 10000));
    } else {
        proxy.web(req, res, { target });
    }
}

2. Auto-Scaling with KEDA and Prometheus

With the Dispatcher exposing the number of active sessions, we can use KEDA’s Prometheus trigger to scale the StatefulSet.

The ScaledObject queries the Dispatcher’s metrics via the cluster’s Prometheus instance. If the sum(gamecube_active_sessions) is 1, KEDA scales the StatefulSet to 1. If it reaches 5, it scales to 5.

        
      
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: gamecube-scale
spec:
  scaleTargetRef:
    kind: StatefulSet
    name: gamecube
  minReplicaCount: 0
  maxReplicaCount: 5
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-prometheus.monitoring:9090
      threshold: '1'
      query: sum(gamecube_active_sessions)

3. Persistent Storage and GPU Infrastructure

To ensure performance, the emulators run on worker nodes with physical GPUs.

Local PVs: We use local PersistentVolumes mapped to /mnt/data/<your-project>/config-{0,1,2,3,4} on the host. This provides the low-latency disk I/O required for smooth game loading and save-state creation.
NFS ROMs: The actual game library is mounted via a shared NFS ReadOnlyMany volume, ensuring all pods have access to the same 100GB+ collection without duplicating data.

Challenges Overcome

Cold Start Latency: Waking up a pod, initializing the GPU, and mounting the volume takes time. Instead of showing a generic error, the Dispatcher now proxies initial hits to a lightweight Nginx container running a “Waking up…” landing page. This page auto-refreshes every 10 seconds, providing a professional and informative experience during the scale-up event.
Service Discovery: Because pods in a StatefulSet are reached via their stable network names (e.g., gamecube-0), the Dispatcher must ensure these pods are actually “Ready” before attempting to proxy traffic. I added robust error handling to the proxy logic (returning a 503 with a retry script) to manage the brief window between pod creation and service readiness.

Conclusion

By decoupling the traffic management from the standard ingress and using KEDA to bridge the gap between application-level sessions and infrastructure-level replicas, we’ve created a cloud gaming environment that is both user-friendly and cost-effective. The system remains dormant and consumes zero GPU resources until the moment a user hits the URL, at which point it dynamically breathes life into the necessary resources.

This pattern—Cookie-based Dispatcher + Custom Metrics + KEDA—is a powerful tool for any stateful application that needs to scale dynamically on Kubernetes.

Scaling Long-Lived WebSocket Connections on Kubernetes with KEDA and a Custom Session Dispatcher

Introduction

The Architecture Goals

The Component Breakdown

1. The Custom Session Dispatcher (Traffic Manager)

2. Auto-Scaling with KEDA and Prometheus

3. Persistent Storage and GPU Infrastructure

Challenges Overcome

Conclusion

Further Reading

Architecting Multi-Tenant Cloud Gaming on Kubernetes with KEDA and Gateway API

Deploying a GameCube Emulator (Dolphin) on Kubernetes with GitOps

Deep Telemetry for Unifi with Unpoller and Prometheus