Deploying Stable Diffusion WebUI on Kubernetes

Introduction

This post explores the deployment of Stable Diffusion WebUI (AUTOMATIC1111) on a Kubernetes cluster. By leveraging GitOps principles with ArgoCD, we ensure a declarative and reproducible environment for high-performance AI image generation.

Project Overview

The Stable Diffusion deployment is part of the broader llama project, which aims to provide a comprehensive local AI infrastructure. Key features of this setup include:

Stable Diffusion WebUI: A feature-rich interface for interacting with Stable Diffusion models.
Hardware Acceleration: Direct access to NVIDIA GPUs within the Kubernetes cluster for rapid image generation.
Persistent Model Storage: Centralized storage for large model checkpoints using NFS.
GitOps Management: Automated synchronization and drift detection via ArgoCD.

Architecture and Components

The deployment is defined through a set of Kubernetes manifests, optimized for performance and stability:

1. Storage Configuration

Large AI models require significant storage and high availability. We utilize a PersistentVolume (PV) and PersistentVolumeClaim (PVC) backed by an NFS server:

sd-webui-pv & sd-webui-pvc: Provisioned with 40Gi of storage using the kubenfs storage class. The access mode is set to ReadWriteMany (RWX), allowing the volume to be managed and accessed across the cluster. The models are stored on an external ZFS pool (<nfs-server-ip>:<nfs-path>/llama/sd-webui).

2. Stable Diffusion Deployment

The core of the application is the sd-webui-deployment, which uses a specialized Docker image from ai-dock:

Image: ghcr.io/ai-dock/stable-diffusion-webui:latest-cuda
Resource Management:
- CPU: Requests of 2000m and limits of 4000m.
- Memory: Requests of 8Gi and limits of 16Gi.
- GPU Acceleration: Specifically requests nvidia.com/gpu: 1, ensuring the pod is scheduled on a node with an available NVIDIA GPU.
Environment Variables:
- WEBUI_FLAGS: Configured with --listen --api --xformers --enable-insecure-extension-access to enable remote access and optimize performance with Xformers.
- NVIDIA_VISIBLE_DEVICES & NVIDIA_DRIVER_CAPABILITIES: Set to all to ensure full GPU functionality within the container.
Volume Mounting: The model storage is mounted to /workspace/stable-diffusion-webui/models, ensuring that downloaded checkpoints persist across pod restarts.

3. Service Exposure

Internal communication is handled by a ClusterIP service:

sd-webui-service: Exposes the WebUI on its default port 17860. This service can be further fronted by an Ingress or HTTPRoute for external access.

GitOps with ArgoCD

Like the rest of the llama stack, Stable Diffusion is managed as an ArgoCD Application. This allows for:

Version Control: All manifest changes are tracked in Git.
Automated Sync: ArgoCD ensures the cluster state matches the Git repository.
Sync Waves: We utilize ArgoCD sync waves to ensure resources are created in the correct order (e.g., PV at wave 0, PVC at wave 1, and the Deployment at wave 2).
Easy Rollbacks: Quickly revert to a previous configuration if needed.

Conclusion

Deploying Stable Diffusion WebUI on Kubernetes provides a scalable and robust platform for AI image generation. By combining GPU passthrough with persistent NFS storage and GitOps management, we’ve created a high-performance environment that is both easy to manage and highly resilient.

Deploying Stable Diffusion WebUI on Kubernetes

Introduction

Project Overview

Architecture and Components

1. Storage Configuration

2. Stable Diffusion Deployment

3. Service Exposure

GitOps with ArgoCD

Conclusion

Further Reading

Deploying LLMs with Ollama and Open WebUI on Kubernetes

Deploying a GameCube Emulator (Dolphin) on Kubernetes with GitOps

Deploying Minecraft Server on Kubernetes with ArgoCD