Skip to content

06 · StatefulSets

Deployments treat Pods as identical and interchangeable. StatefulSets are for workloads where each Pod needs a stable identity, ordered lifecycle, and its own persistent storage.


The Problem with Stateful Apps in Kubernetes

A MySQL replica set is not like a web server. Pod 0 is the primary; Pod 1 and Pod 2 are replicas that replicate from Pod 0. They are not interchangeable. If Pod 0 is killed and replaced, the new Pod must rejoin the cluster as the primary with the same name — not as a fresh instance.

Deployments can't provide this. They create Pods with random names, in arbitrary order, all sharing the same identity.


What StatefulSet Provides

Stable, unique Pod names: pods are named <statefulset>-0, <statefulset>-1, etc. The names are deterministic and persistent across rescheduling.

Stable network identity: each Pod gets a DNS entry based on its name. mysql-0.mysql.default.svc.cluster.local always resolves to the same Pod, regardless of which node it's on.

Ordered deployment and scaling: Pods are created in order (0, 1, 2) and terminated in reverse order (2, 1, 0). Pod N is not started until Pod N-1 is Running and Ready.

Per-Pod persistent storage: each Pod gets its own PersistentVolumeClaim. The PVC survives Pod deletion — if Pod 1 is deleted and recreated, it gets the same PVC with its data intact.


A Minimal StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql          # required: must reference a headless service
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:       # one PVC per Pod
  - metadata:
      name: data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 10Gi

The volumeClaimTemplates field is the key difference from a Deployment. Each Pod gets its own PVC (data-mysql-0, data-mysql-1, data-mysql-2).


Headless Service

StatefulSets require a headless service (ClusterIP: None) to provide stable DNS for each Pod.

apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  clusterIP: None    # headless: no virtual IP
  selector:
    app: mysql
  ports:
  - port: 3306

With a headless service, DNS resolves directly to Pod IPs instead of a virtual IP. Each Pod is reachable individually:

mysql-0.mysql.default.svc.cluster.local  →  Pod 0 IP
mysql-1.mysql.default.svc.cluster.local  →  Pod 1 IP
mysql-2.mysql.default.svc.cluster.local  →  Pod 2 IP

This is how the primary node in a replica set is addressable: mysql-0.mysql.


Update Strategy

StatefulSets support two update strategies:

  • RollingUpdate (default): updates Pods in reverse order, one at a time, waiting for each to be Ready before proceeding. The primary (Pod 0) is updated last.
  • OnDelete: Pods are only updated when manually deleted. Useful when you need fine-grained control over the update sequence.

Common Use Cases

Application Why StatefulSet
MySQL, PostgreSQL Stable identity for replication roles
MongoDB, Cassandra Cluster membership by stable hostname
Redis Sentinel Primary/replica roles with stable addresses
Kafka, ZooKeeper Broker IDs tied to Pod names
Elasticsearch Node roles, shard allocation by hostname

StatefulSet vs Deployment Summary

Deployment StatefulSet
Pod names Random Predictable (pod-0, pod-1)
Pod DNS Shared service Per-pod stable hostname
Storage Shared or none One PVC per Pod
Startup order Parallel Sequential
Use case Stateless (API, web) Stateful (DB, queue)

Operators for complex stateful apps

For production databases, consider using a Kubernetes Operator (e.g., Percona Operator for MySQL, Strimzi for Kafka) instead of writing your own StatefulSet. Operators encode the operational knowledge of the application — including failover, backups, and scaling — as code.