k3s-demo

Kubernetes on a single k3s node - press the button and watch the load drive autoscaling, live.

Active app pods

- / 6

CPU vs 70% target

Redis counter

Load & autoscaling (last ~90s) CPU %pods70% target

Pressing the button sends many concurrent /burn requests, which makes the app pods work hard. Watch CPU spike past the 70% line on the chart, and the pod count rise behind it as the autoscaler reacts. After load stops, the pods scale back to 2 in about 30-60s (this HPA's scale-down is tuned for the demo; Kubernetes defaults to a cautious 5 minutes to avoid flapping). It's one node, so it caps at what the node can hold.

What am I looking at?

Pod - the smallest thing Kubernetes runs: one container, i.e. one running copy of this app. Normally 2 copies share the traffic; under load Kubernetes starts more so the work is spread out.
HorizontalPodAutoscaler (HPA) - a Kubernetes controller that automatically adds or removes pods based on how busy they are. This one watches CPU, targets 70%, and ranges from 2 to 6 pods. It's what makes the count change on its own.
CPU % - how hard the app pods are working, measured against the CPU each pod reserves (its request, 50m = 0.05 of a core here), not against a whole CPU - so it can go over 100%. A pod is allowed to burst up to its limit (250m), which reads as ~500% of its 50m request. When the average crosses 70% (the green dashed line) the HPA adds pods; when it stays low, it removes them.
Redis - a fast in-memory datastore running next to the app (as a second tier). It holds the shared counter and the list of currently-active pods, so every pod sees the same state instead of each keeping its own.
Container runtime - this app is a single static Go binary in a FROM scratch image (10.9MB; no shell, interpreter, or libc inside - there is nothing in the container to attack but the binary). The image is built with Docker, but this node runs it with containerd (k3s's built-in runtime, via Kubernetes' CRI), not the Docker daemon - Kubernetes dropped Docker as a runtime in v1.24. Docker-built (OCI) images run unchanged.
Policy as code - an admission gate (OPA/Gatekeeper) that rejects a non-compliant workload before it runs: every container here must be non-root, drop all Linux capabilities, have a read-only root filesystem, declare CPU/memory limits, carry health probes, and pin an explicit image tag. The same default-deny rule applies whether a human or an automated change tries to weaken it. (One rule is also written as a built-in ValidatingAdmissionPolicy to compare the two engines - see the repo's policy/.)

Seeing more than 2 pods before pressing the button? A recent load test is still scaling back down - here that takes about 30-60s (Kubernetes' default is a cautious 5 minutes; this HPA is tuned faster for the demo). Briefly seeing more than 6? During a code deploy Kubernetes runs the old and new pods at once (a zero-downtime rolling update with maxSurge), so the count can momentarily exceed the 6-pod max before settling.