Running an AI Agent Gateway on Kubernetes with OpenClaw

Introduction

Managing a homelab cluster used to mean keeping a terminal open. Something is consuming memory on worker2? SSH in, run kubectl top pods, grep through logs. Want to trigger the morning brief job before the cron fires? Open another tab, construct the kubectl create job command, get the namespace right. It works, but it’s friction — and friction is what kills the habit of actually operating your infrastructure.

The shift that made this feel different: teaching an agent your cluster’s specific operations once, then driving everything through a conversation. OpenClaw is what made that practical. It runs as a Kubernetes StatefulSet, exposes an API behind a Gateway API HTTPRoute, and connects a Telegram bot to a model-backed agent that has genuine cluster access — kubectl binary, NFS shares, mounted secrets, the whole thing. When you message it to queue a job, it runs the job. No SSH required.

This post covers the architecture of that setup: the pod structure, model routing, volume strategy, and the AGENTS.md pattern that makes the agent useful rather than generic.

The Core Problem: Generic Agents Are Not Useful

Most self-hosted LLM agent setups suffer from the same failure mode: the agent knows how to use tools in the abstract but has no idea what your cluster actually does. It does not know that the morning brief lives in the zinfra namespace as a CronJob named morning-brief, or that web-to-PDF jobs are triggered by running node /scripts/trigger_articles.js, or that resume pipeline data lives at /mnt/gigs.

You can prompt it every time, or you can codify that knowledge once into a workspace file the agent reads at startup. The latter is the AGENTS.md pattern, and it is what separates a useful homelab agent from a novelty.

Architecture

OpenClaw runs in the zinfra namespace as a StatefulSet. The pod has three containers:

Container	Image	Role
`gateway`	`ghcr.io/openclaw/openclaw:2026.4.14`	Serves the OpenClaw API on port 18789
`node`	`ghcr.io/openclaw/openclaw:2026.4.14`	Agent executor, connects to gateway
`chrome`	`zenika/alpine-chrome`	Headless Chromium on port 9222 for web automation

The gateway container is exposed externally via a Kubernetes Gateway API HTTPRoute at oc.<your-domain>.com. The node container handles all agent execution — tool calls, model inference, job dispatch.

The pod is scheduled on node-availability: 24x7 nodes (worker1 and worker2). The GPU nodes are deliberately excluded; inference for this workload routes to external APIs and a local LiteLLM service rather than local GPU compute.

Model Routing

Model selection is controlled by the openclaw-config ConfigMap. The primary model is google/gemini-2.5-flash, accessed via the Google Generative AI API. Local models are served through a LiteLLM instance at http://litellm-service.llama.svc.cluster.local:4000 and act as fallbacks:

        
      
# openclaw-config ConfigMap (excerpt)
primary_model: google/gemini-2.5-flash
fallbacks:
  - openai/llama3.1:latest
  - openai/qwen3:latest
  - openai/qwen2.5-coder:14b
litellm_base_url: http://litellm-service.llama.svc.cluster.local:4000

Gemini handles the majority of requests. The local fallbacks cover cases where the Gemini API is unavailable or where a task is better served by a code-focused model — qwen2.5-coder:14b in particular handles structured output generation reliably.

Gateway Startup and Tool Access

The gateway container’s startup command handles several one-time provisioning steps before launching the server:

        
      
cp /tmp/openclaw.json /home/node/.openclaw/openclaw.json
cp /tmp/AGENTS.md /home/node/.openclaw/workspace/AGENTS.md
curl -sLo /home/node/bin/kubectl \
  "https://dl.k8s.io/release/v1.34.3/bin/linux/amd64/kubectl"
openclaw channels add --channel telegram --use-env
exec openclaw gateway run --bind lan --allow-unconfigured

Breaking this down:

openclaw.json is mounted from a ConfigMap and copied to the expected config path.
AGENTS.md is mounted from a ConfigMap and placed in the agent’s workspace directory. This is the file that teaches the agent your homelab.
kubectl v1.34.3 is downloaded fresh at startup. Rather than baking it into the image, it is pulled at boot time so the binary version can be updated via the ConfigMap without rebuilding the image.
Telegram channel registration reads the bot token from the environment (backed by a SealedSecret).
The gateway starts with --bind lan, making it reachable on the cluster network, and --allow-unconfigured permits routing to models that are not explicitly pre-validated.

Volume Strategy

The pod mounts several volumes that give the agent genuine access to cluster state and shared data:

        
      
volumes:
  - name: nfs-share
    nfsPath: /mnt/nfs_share        # shared with other workloads
  - name: gigs-pvc
    persistentVolumeClaim:
      claimName: gigs-pvc          # resume pipeline data at /mnt/gigs
  - name: ssh-key
    secret:
      secretName: openclaw-ssh-key # git push access to the Jekyll blog repo
  - name: google-cookies
    secret:
      secretName: google-cookies   # authenticated web scraping sessions
  - name: agents-md
    configMap:
      name: openclaw-agents-md     # workspace definition file

The NFS share at /mnt/nfs_share is the same mount used by other workloads in the cluster — the web-to-PDF job, the morning brief job, etc. The agent can read and write to shared data without going through any intermediary API. The gigs PVC at /mnt/gigs holds resume pipeline artifacts; the agent can inspect pipeline state and queue new runs directly.

The SSH key gives the agent the ability to push commits to the Jekyll blog repository, which is how it can publish content without a human in the loop.

The AGENTS.md Pattern

This is the most important piece of the setup. AGENTS.md is a plain text file mounted into the agent’s workspace at startup. It describes your homelab in the agent’s own terms: what jobs exist, how to trigger them, what the directory layout means, what’s normal and what’s a problem.

A condensed example of what that file teaches the agent:

        
      
# Homelab Operations

## Trigger web-to-PDF article fetch
node /scripts/trigger_articles.js

## Run the morning brief manually
kubectl create job --from=cronjob/morning-brief morning-brief-manual \
  -n zinfra

## Parse job posting URLs for the resume pipeline
node /scripts/parse_job_links.js <url1> [url2] ...

## Trigger the gigs resume tailor pipeline
# pipeline artifacts land at /mnt/gigs
# check status: kubectl get jobs -n zinfra -l app=gigs-tailor

Without this file, the agent can call kubectl but has no idea what to call it with. With it, a Telegram message of “fetch today’s articles” maps directly to a known operation. The agent does not need to reason about your cluster topology from first principles every time.

The pattern scales. Each time you add a new recurring operation to the cluster, you add a stanza to AGENTS.md. The agent picks it up on next restart. This is infrastructure-as-documentation, except the documentation is executable.

Secrets Management

All sensitive values are managed via SealedSecrets and unsealed by the Bitnami Sealed Secrets controller running in the cluster. The secret surface for this workload is:

openclaw-gateway-token — the gateway’s API token
gemini-api-key — Google Generative AI API key
litellm-api-key — key for the local LiteLLM service
telegram-bot-token — the Telegram bot token read by --use-env at startup
openclaw-ssh-key — deploy key for git push access to the blog repo
google-cookies — session cookies for authenticated scraping

None of these values appear in plaintext in the GitOps repository. SealedSecrets are encrypted with the cluster’s public key and can only be decrypted by the controller in that specific cluster.

Headless Browser Integration

The chrome sidecar runs zenika/alpine-chrome and exposes the Chrome DevTools Protocol on port 9222. The node container connects to it for tasks that require a real browser — JavaScript-heavy pages that cannot be scraped with a plain HTTP client, authenticated sessions that depend on cookie state, or page rendering for PDF generation.

        
      
- name: chrome
  image: zenika/alpine-chrome
  ports:
    - containerPort: 9222
  args:
    - --no-sandbox
    - --remote-debugging-port=9222
    - --remote-debugging-address=0.0.0.0

The Google cookies secret is mounted into the node container and injected into the browser session when authenticated scraping is needed. This is how the web-to-PDF job fetches articles behind Google login without manual cookie refresh — as long as the session is valid, the agent handles it automatically.

Gateway API Routing

External access is handled by a Kubernetes Gateway API HTTPRoute, not an Ingress. The HTTPRoute routes traffic from the shared gateway to the OpenClaw service in zinfra:

        
      
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openclaw
  namespace: zinfra
spec:
  parentRefs:
    - name: homelab-gateway
      namespace: gateway
  hostnames:
    - oc.<your-domain>.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: openclaw-gateway
          port: 18789

This keeps the agent’s API accessible from outside the cluster — useful for integrations beyond Telegram, or for direct API calls during debugging.

Conclusion

The value of this setup is not that the agent is smart. It is that the agent is situated. It knows the cluster because AGENTS.md tells it what the cluster does. It has cluster access because the pod is built with the right tools mounted at the right paths. It has continuity because the StatefulSet keeps it running across restarts without losing workspace state.

The result is a conversational interface to your homelab that does not require you to remember namespaces, job names, or script paths. You built the infrastructure; AGENTS.md is how you teach the agent to operate it on your behalf.

For platforms engineers already comfortable with Kubernetes primitives — StatefulSets, Gateway API, SealedSecrets, PVCs — wiring this up is mostly configuration work, not novel engineering. The tooling is the same. The model is just another workload.