Architecting Multi-Tenant Cloud Gaming on Kubernetes with KEDA and Gateway API

Introduction

Self-hosting a single cloud gaming instance (like a GameCube emulator accessed via a web browser) is a fun homelab project. However, scaling that setup to support multiple independent users—each with their own save files, controller configurations, and dedicated subdomains, all while sharing the same hardware—presents a unique set of distributed systems challenges.

In this post, I’ll walk through how I evolved a standalone GameCube emulator pod into a multi-tenant, scale-to-zero architecture on Kubernetes. We’ll cover GPU time-slicing, storage strategies for shared ROMs vs. isolated configs, advanced routing with the Gateway API, and overcoming KEDA scaling conflicts.

The Architecture Goals

The objective was to create a scalable platform where:

Multiple Users can play simultaneously.
Hardware is Shared: Everyone shares the same GPU and the same massive library of ROMs.
State is Isolated: User A’s save games and controller mappings cannot interfere with User B’s.
Resource Efficiency: If a user isn’t playing, their instance scales to zero to save compute and VRAM.

Storage: Shared vs. Isolated

Storage in a multi-tenant emulator environment requires two distinct approaches:

1. Shared ROMs (NFS)

ROMs are large and immutable. It makes no sense to duplicate a 50GB game library for every user. We used an NFS share mapped via a PersistentVolume (PV) and claimed by a PersistentVolumeClaim (PVC) with the ReadOnlyMany (ROX) access mode.

Why Read-Only? It prevents one user’s emulator from accidentally deleting or corrupting the shared library.

2. Isolated Configs and Saves (Local Storage)

Emulator configurations, memory cards, and save states are small but highly specific to the user. Furthermore, emulators often expect strict file ownership (e.g., UID/GID 1000). Using NFS for this often results in chown: Operation not permitted errors if root_squash is enforced.

The Solution: We provisioned dedicated local storage (zlocal-sc) PersistentVolumes on the GPU node for each user (e.g., /mnt/data/gamecube/config-user0). These are mounted as ReadWriteOnce (RWO), ensuring blazing-fast read/writes for save states and zero permission conflicts.

Overcoming KEDA Scaling Conflicts

Initially, the plan was to use a single StatefulSet with volumeClaimTemplates to handle multiple users (Pod 0 for User 0, Pod 1 for User 1). However, when integrating KEDA for HTTP-based scale-to-zero, we hit a roadblock.

The Problem with StatefulSets and KEDA

KEDA’s HTTPScaledObject maps an incoming HTTP request (via a specific hostname) to a target workload. If you try to create two HTTPScaledObjects (one for user0.domain.com and one for user1.domain.com) that both target the same StatefulSet, KEDA’s admission webhook blocks it:

“the workload ‘gamecube’ is already managed by the ScaledObject ‘gamecube-scale-user0’”

Furthermore, StatefulSets start sequentially. User 1 would have to wait for User 0’s pod to fully initialize before their own pod could even begin scheduling.

The “Deployment-per-User” Pattern

To achieve true isolation and rapid scaling, we pivoted to a Deployment-per-User model.

gamecube-user0 Deployment + KEDA HTTPScaledObject
gamecube-user1 Deployment + KEDA HTTPScaledObject

This allows KEDA to scale User 1’s instance from 0 to 1 entirely independently of User 0’s activity.

Handling the “Cold Start” with Landing Pages

Waking up a container that requires GPU initialization and NFS mounting takes time (often 30–45 seconds). Without intervention, a user navigating to their sleeping emulator would just get a browser timeout.

KEDA provides a brilliant feature for this: the coldStartTimeoutFailoverRef. We deployed a lightweight Nginx container hosting a simple HTML “spinner” page. While KEDA is waking up the emulator deployment, the HTTP Interceptor automatically routes the user to this landing page. We increased the timeout to 60 seconds to comfortably accommodate the GPU spin-up.

        
      
  coldStartTimeoutFailoverRef:
    service: gamecube-landing-page
    port: 80
    timeoutSeconds: 60

Advanced Routing with Gateway API

Routing traffic to dynamically scaling, multi-namespace workloads requires precise configuration, especially when using modern standards like the Kubernetes Gateway API (implemented via Cilium).

The HTTPRoute

We created specific HTTPRoute resources for each user. For example, user0-gamecube.<your-domain>.com routes directly to KEDA’s keda-add-ons-http-interceptor-proxy. The interceptor holds the connection, scales the deployment, and then forwards the traffic to the user’s dedicated ClusterIP service.

The Importance of ReferenceGrants

One of the most critical security features (and common stumbling blocks) of the Gateway API is cross-namespace isolation. If your Gateway lives in the prod-gateway namespace, but your Service lives in the gamecube namespace, traffic is blocked by default.

To authorize this connection, you must explicitly create a ReferenceGrant in the destination namespace, signaling to the cluster that the Gateway is permitted to send traffic across the boundary.

Conclusion

Scaling cloud gaming on Kubernetes is a fantastic exercise in resource management and routing. By combining GPU time-slicing, intelligent storage separation (NFS for read-heavy bulk data, local disks for write-heavy state), and KEDA’s HTTP interceptors, you can build a highly efficient, multi-tenant gaming cluster that consumes zero resources when everyone is logged off.