Introduction

Running Kubernetes in production without a testing strategy is an act of faith. You deploy workloads, apply configurations, and assume they'll behave. Sometimes they do. Then a node goes down at 3am, a deployment gets OOMKilled under load, or a container image ships with a critical CVE that nobody scanned.

Testing Kubernetes is not a single discipline. It spans four distinct problem domains: resilience, performance, security, and resource efficiency. Each requires different tools, different mental models, and different workflows. Most teams pick one and ignore the rest.

This article covers all four. We'll examine five tools — Chaos Mesh, k6, Trivy, kube-bench, and Goldilocks — in the context of a real DigitalOcean Kubernetes cluster provisioned with Terraform. For each tool we'll describe what it actually does, when to use it, what the output looks like, and how to interpret results.

No installation walkthroughs. No Helm chart configuration deep dives. Just the tools themselves — what they measure, what they tell you, and what to do with the information.

Source Code

All Terraform files, Helm configurations, k6 test scripts, and Makefile from this article are available on GitHub:

github.com/vladlevinas/Kubernetes-stresstest

The repository includes:

  • main.tf — cluster + all tools in one pass
  • variables.tf / outputs.tf
  • terraform.tfvars.example
  • k6-test.yaml — load test example
  • Makefile — shortcuts for every operation
  • README.md — quick start guide

The Test Environment

Before covering the tools, it's worth describing the infrastructure they run on. Understanding the cluster configuration helps interpret test results correctly — resource constraints on small nodes affect what's measurable and what's noise.

What Terraform Provisions

The Terraform configuration creates a complete Kubernetes testing platform on DigitalOcean in a single terraform apply. Here's what gets built:

┌─────────────────────────────────────────────────────────────────┐
│  DigitalOcean DOKS — fra1 (Frankfurt)                          │
│                                                                 │
│  Control Plane (managed by DigitalOcean, no cost)              │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Kubernetes API Server  │  etcd  │  Scheduler  │  CM     │  │
│  └───────────────────────────────────────────────────────────┘  │
│                          │                                      │
│            ┌─────────────┴──────────────┐                       │
│            ▼                            ▼                       │
│   ┌─────────────────┐        ┌─────────────────┐               │
│   │   Node 1        │        │   Node 2        │               │
│   │   s-1vcpu-2gb   │        │   s-1vcpu-2gb   │               │
│   │   1 vCPU        │        │   1 vCPU        │               │
│   │   2 GB RAM      │        │   2 GB RAM      │               │
│   │   containerd    │        │   containerd    │               │
│   └────────┬────────┘        └────────┬────────┘               │
│            │                          │                         │
│   ┌────────▼──────────────────────────▼────────┐               │
│   │              Testing Stack                 │               │
│   │                                            │               │
│   │  chaos-mesh ns    k6 ns    trivy-system ns │               │
│   │  goldilocks ns    default ns               │               │
│   └────────────────────────────────────────────┘               │
│                                                                 │
│  NodePort services: :32333 (Chaos Mesh) :32080 (Goldilocks)    │
│  Cost: ~$24/mo while running | $0 when destroyed               │
└─────────────────────────────────────────────────────────────────┘

Cluster Specifications

Property Value
Provider DigitalOcean DOKS
Region fra1 (Frankfurt)
Node size s-1vcpu-2gb
Node count 2
Kubernetes version 1.31.x
Container runtime containerd
Control plane Managed (DigitalOcean)
HA control plane No
Total RAM 4 GB
Total vCPU 2
Monthly cost ~$24

Pasted image 20260316185118.png

What the Terraform Creates

The configuration provisions these Kubernetes resources:

digitalocean_kubernetes_cluster     — the cluster itself
kubernetes_namespace (chaos-mesh)   — isolated namespace
helm_release (chaos-mesh)           — Chaos Mesh operator + dashboard
helm_release (k6-operator)          — k6 operator with own namespace
helm_release (trivy-operator)       — Trivy with own namespace
helm_release (goldilocks)           — Goldilocks with own namespace
kubernetes_namespace (goldilocks)   — resource advisor namespace
kubernetes_labels (default)         — enables Goldilocks on default ns
kubernetes_job (kube-bench)         — one-shot CIS audit job
kubernetes_service_account          — Chaos Mesh dashboard auth
kubernetes_cluster_role_binding     — cluster-admin for dashboard
local_file (kubeconfig.yaml)        — written to project directory

All five tools deploy independently. If one fails, the others continue. kube-bench runs immediately after cluster creation and exits — it's a one-shot job, not a long-running operator.
Pasted image 20260316185200.png

Comparative Analysis

Tool Comparison Table

Chaos Mesh k6 Trivy kube-bench Goldilocks
Category Resilience Performance Security Security Efficiency
What it tests Failure recovery Load capacity Image CVEs + config CIS compliance Resource sizing
When it runs On demand On demand Continuously One-shot Continuously
Output type Pod events, dashboard Metrics, pass/fail VulnerabilityReports Log output Dashboard
Affects workloads? Yes (by design) Yes (generates load) No No No
Requires traffic No Generates it No No Yes
CI/CD friendly Partially Yes Yes Yes No
Dashboard Yes (:32333) No No No Yes (:32080)
External access NodePort No No No NodePort

Why This Node Size Matters for Testing

Two s-1vcpu-2gb nodes give 4 GB total RAM and 2 vCPUs for the entire cluster including system pods and all testing tools. This is intentional — it mirrors a constrained environment where resource pressure is real. Tests run against this backdrop:

  • Chaos experiments that kill pods are meaningful because the cluster has limited headroom
  • Load tests produce realistic throttling behavior
  • Goldilocks recommendations reflect actual resource competition
  • kube-bench results reflect a managed K8s environment with provider-controlled settings

If you test on a 16-core, 64GB cluster with nothing running, you're not testing — you're confirming that idle systems don't break.


The Four Dimensions of Kubernetes Testing

                    ┌─────────────────────────┐
                    │   Kubernetes Testing     │
                    └───────────┬─────────────┘
                                │
          ┌─────────────────────┼─────────────────────┐
          │                     │                     │
          ▼                     ▼                     ▼
   ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
   │ Resilience  │      │ Performance │      │  Security   │
   │             │      │             │      │             │
   │ Chaos Mesh  │      │     k6      │      │    Trivy    │
   │             │      │             │      │  kube-bench │
   └─────────────┘      └─────────────┘      └─────────────┘
          │
          ▼
   ┌─────────────┐
   │  Efficiency │
   │             │
   │ Goldilocks  │
   └─────────────┘

Each dimension answers a different question:

  • Resilience: Does the system recover when things break?
  • Performance: Does the system hold up under real load?
  • Security: Are there known vulnerabilities or misconfigurations?
  • Efficiency: Are resources allocated correctly?

Most teams test performance. Few test resilience. Almost none test all four together.


Chaos Mesh — Resilience Testing Through Deliberate Failure

What It Is

Chaos Mesh is a cloud-native chaos engineering platform for Kubernetes. It injects faults at the infrastructure level — killing pods, corrupting network traffic, stressing CPU and memory, filling disks — and observes how the system responds.

The core insight behind chaos engineering is that distributed systems fail in ways that are impossible to predict from code review or load testing alone. The only way to know how a system behaves under failure is to make it fail, intentionally, in a controlled way.

┌────────────────────────────────────────────────────────┐
│                  Chaos Mesh Architecture               │
│                                                        │
│  Dashboard (:32333)                                    │
│       │                                                │
│       ▼                                                │
│  Chaos Controller Manager (Deployment)                 │
│       │  watches CRDs                                  │
│       ▼                                                │
│  ┌──────────────────────────────────────────────┐     │
│  │  Chaos CRDs                                  │     │
│  │  PodChaos │ NetworkChaos │ StressChaos │ ... │     │
│  └──────────────────────────────────────────────┘     │
│       │  instructs                                     │
│       ▼                                                │
│  Chaos Daemon (DaemonSet — runs on every node)         │
│       │  directly injects faults via                  │
│       ▼                                                │
│  Container Runtime (containerd)                        │
└────────────────────────────────────────────────────────┘

1200

Fault Types

Chaos Mesh supports eight categories of chaos:

PodChaos — direct pod lifecycle manipulation:

  • pod-kill: terminates pods matching a selector
  • pod-failure: makes pods enter a failure state without killing them
  • container-kill: kills specific containers within a pod

NetworkChaos — traffic manipulation at the network layer:

  • delay: adds configurable latency to all traffic in/out of a pod
  • loss: randomly drops a percentage of packets
  • duplicate: duplicates packets
  • corrupt: corrupts packet content
  • partition: completely cuts network between pods

StressChaos — resource exhaustion:

  • CPU workers: spawns processes that consume CPU cycles
  • Memory workers: allocates and holds memory to trigger OOM conditions

HTTPChaos — HTTP-level injection:

  • Delays, aborts, and request/response modifications at Layer 7

TimeChaos — clock skew injection for testing time-sensitive logic

IOChaos — filesystem I/O fault injection: delays, errors, and attribute modifications

DNSChaos — DNS resolution failures and random errors

KernelChaos — kernel-level fault injection (requires privileged access)

1200

Using Chaos Mesh

Chaos Mesh has two interfaces: the dashboard and Kubernetes CRDs. Both create the same underlying objects.

Via dashboard (http://NODE-IP:32333):

The dashboard provides a visual workflow: choose fault type → configure selector → set mode → set duration → submit. The "Preview of Pods to be injected" section shows exactly which pods will be targeted before you commit.

Via kubectl and YAML:

apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: pod-kill-nginx
  namespace: chaos-mesh
spec:
  action: pod-kill
  mode: one
  selector:
    namespaces:
      - default
    labelSelectors:
      app: nginx-test
  duration: "30s"
kubectl apply -f pod-chaos.yaml
kubectl get podchaos -n chaos-mesh
NAME              ACTION     DURATION   AGE
pod-kill-nginx    pod-kill   30s        12s

Network delay experiment:

apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: network-delay-100ms
  namespace: chaos-mesh
spec:
  action: delay
  mode: one
  selector:
    namespaces: [default]
    labelSelectors:
      app: nginx-test
  delay:
    latency: "100ms"
    jitter: "10ms"
  duration: "5m"

CPU stress experiment:

apiVersion: chaos-mesh.org/v1alpha1
kind: StressChaos
metadata:
  name: cpu-stress-80pct
  namespace: chaos-mesh
spec:
  mode: one
  selector:
    namespaces: [default]
    labelSelectors:
      app: nginx-test
  stressors:
    cpu:
      workers: 1
      load: 80
  duration: "3m"

Reading Results

During an experiment, watch what happens:

kubectl get pods -w --kubeconfig=kubeconfig.yaml
NAME                          READY   STATUS      RESTARTS   AGE
nginx-test-6ff8-slwrl         1/1     Running     0          5m
nginx-test-6ff8-slwrl         1/1     Terminating 0          5m
nginx-test-6ff8-abc12         0/1     Pending     0          0s
nginx-test-6ff8-abc12         0/1     ContainerCreating   0  1s
nginx-test-6ff8-abc12         1/1     Running     0          4s

This is the expected sequence for a healthy deployment. If the pod stays in Terminating or the replacement takes more than 30 seconds to reach Running, you have a problem — either missing readiness probes, insufficient replica count, or PodDisruptionBudgets not set.

The Chaos Mesh dashboard's Events tab shows a timeline of all injected faults with start/end times, which you can correlate against monitoring data.

When to Use Chaos Mesh

Use Chaos Mesh when you need to answer these questions:

  • Does my Deployment recover automatically when pods are killed?
  • What happens to my service when a node disappears?
  • How does my application respond to 200ms added latency on downstream services?
  • Does my circuit breaker actually trip under real network conditions?
  • What's my real RTO (Recovery Time Objective) for a pod failure?

Do not use Chaos Mesh as a replacement for monitoring. It reveals failure modes; it does not tell you the business impact. Run Chaos Mesh alongside a load test to measure impact quantitatively.


k6 — Performance Testing Under Real Load

What It Is

k6 is a developer-centric load testing tool. Scripts are written in JavaScript, tests run as Kubernetes Jobs (via the k6 Operator), and results come out as structured metrics. It's purpose-built for testing services that run inside Kubernetes — no external load generator needed, no egress costs, no network hops across datacenters.

┌─────────────────────────────────────────────────────┐
│  k6 Operator Architecture                          │
│                                                     │
│  ┌─────────────┐                                   │
│  │  TestRun    │  (CRD you create)                 │
│  │  CRD        │                                   │
│  └──────┬──────┘                                   │
│         │  operator watches                        │
│         ▼                                          │
│  ┌─────────────────┐                               │
│  │  k6 Operator    │  (Deployment)                 │
│  └──────┬──────────┘                               │
│         │  creates                                 │
│         ▼                                          │
│  ┌──────────────────────────────────────┐          │
│  │  k6 Jobs (parallelism: N)            │          │
│  │  ┌────────┐ ┌────────┐ ┌────────┐   │          │
│  │  │ Job 1  │ │ Job 2  │ │ Job N  │   │          │
│  │  │ VUs:10 │ │ VUs:10 │ │ VUs:10 │   │          │
│  │  └────────┘ └────────┘ └────────┘   │          │
│  └──────────────────────────────────────┘          │
│         │  HTTP traffic                            │
│         ▼                                          │
│  ┌─────────────────┐                               │
│  │  Target Service │  (your workload)              │
│  └─────────────────┘                               │
└─────────────────────────────────────────────────────┘

Test Structure

A k6 test has three parts: options (what load to generate), default function (what to do per VU per iteration), and checks (assertions on responses).

export const options = {
  stages: [
    { duration: '30s', target: 10 },   // ramp up: 0 → 10 virtual users
    { duration: '1m',  target: 10 },   // hold: 10 VUs for 1 minute
    { duration: '30s', target: 0  },   // ramp down: 10 → 0
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95% of requests under 500ms
    http_req_failed:   ['rate<0.01'],  // error rate under 1%
  },
};

export default function () {
  const res = http.get('http://nginx-test.default.svc.cluster.local');
  check(res, {
    'status 200':       (r) => r.status === 200,
    'duration < 200ms': (r) => r.timings.duration < 200,
  });
  sleep(1);
}

The stages array defines a load profile. Thresholds define pass/fail criteria — if either threshold is violated, the test exits with a non-zero code, which makes it CI-friendly.

Running Tests

kubectl apply -f k6-test.yaml
kubectl get testrun -n k6 -w
NAME               STAGE      AGE
nginx-load-test    started    5s
nginx-load-test    running    12s
nginx-load-test    finished   2m18s

Collect results:

kubectl logs -n k6 -l k6_cr=nginx-load-test
✓ status 200
✓ duration < 200ms

checks.........................: 100.00% ✓ 620   ✗ 0
data_received..................: 524 kB  3.9 kB/s
data_sent......................: 52 kB   388 B/s
http_req_blocked...............: avg=122µs  min=2µs    med=4µs    max=12ms
http_req_duration..............: avg=12ms   min=3ms    med=10ms   max=89ms
                              { expected_response:true }: avg=12ms
http_req_failed................: 0.00%   ✓ 0     ✗ 620
http_reqs......................: 620     4.63/s
iteration_duration.............: avg=1.01s  min=1s     med=1s     max=1.09s
vus............................: 1       min=1       max=10

Reading k6 Metrics

The most important metrics:

http_req_duration — end-to-end request latency. Look at p(95) and p(99), not avg. Average latency hides tail latency that real users experience.

http_req_failed — percentage of requests that returned errors (status >= 400 or network errors). Even 0.1% failure rate at scale is significant.

http_req_blocked — time waiting for a TCP connection slot. High values indicate connection pool exhaustion.

iteration_duration — total time per VU iteration including sleep. Use this to calculate effective throughput.

Combining k6 with Chaos

The real power of k6 in a testing lab is running it simultaneously with Chaos Mesh experiments:

Timeline:
00:00  k6 load test starts — 10 VUs hitting nginx-test
01:00  Chaos Mesh injects pod-kill on nginx-test
01:04  Pod recovered — k6 shows error spike then recovery
02:00  Chaos Mesh injects 100ms network delay
02:30  k6 p(95) climbs from 12ms to 118ms
03:00  Network chaos ends — k6 metrics normalize
04:00  k6 test ends

This workflow answers the question that neither tool answers alone: not just "does the pod restart?" but "how many requests failed during the restart, and how long did recovery take?"

When to Use k6

Use k6 when you need to answer:

  • What is my service's throughput at 10, 50, 100 concurrent users?
  • Where does latency start to degrade?
  • What's the actual error rate under load, not just under zero load?
  • Does my HPA (Horizontal Pod Autoscaler) trigger at the right time?
  • How does my service behave when a downstream dependency is slow?

k6 is not a monitoring tool and not a synthetic uptime checker. It generates sustained load to find the breaking point. Run it in your test lab, not against production, unless you have very carefully scoped test scripts.


Trivy Operator — Continuous Vulnerability Scanning

What It Is

Trivy is a vulnerability scanner. The Trivy Operator runs as a controller inside Kubernetes and automatically scans every container image deployed in the cluster. It doesn't require you to trigger scans — it watches for new workloads and scans them as they appear.

Beyond container images, Trivy also scans Kubernetes resource configurations (ConfigMaps, Deployments, RBACs) against a set of known misconfiguration rules derived from NSA/CISA guidelines and CIS benchmarks.

┌────────────────────────────────────────────────────────┐
│  Trivy Operator Architecture                           │
│                                                        │
│  Kubernetes API Server                                 │
│       │  watches                                       │
│       ▼                                                │
│  Trivy Operator (Deployment in trivy-system ns)        │
│       │                                                │
│       ├── detects new Pod/ReplicaSet                   │
│       │       │                                        │
│       │       ▼                                        │
│       │   Scan Job (ephemeral)                         │
│       │       │  pulls image + scans                  │
│       │       ▼                                        │
│       │   VulnerabilityReport CRD                      │
│       │                                                │
│       └── detects new Deployment/ConfigMap/RBAC        │
│               │                                        │
│               ▼                                        │
│           ConfigAuditReport CRD                        │
└────────────────────────────────────────────────────────┘

Reading Vulnerability Reports

kubectl get vulnerabilityreports -A
NAMESPACE      NAME                               CRITICAL  HIGH  MEDIUM  LOW
default        replicaset-nginx-test-abc123       0         3     12      8
trivy-system   replicaset-trivy-operator-xyz456   0         1     4       2
chaos-mesh     daemonset-chaos-daemon-abc789      0         0     2       1
k6             deployment-k6-operator-def012      0         2     6       3

Drill into a specific report:

kubectl describe vulnerabilityreport replicaset-nginx-test-abc123 -n default
Spec:
  Artifact:
    Digest:     sha256:4bf0762cb...
    Repository: library/nginx
    Tag:        latest

Report:
  Vulnerabilities:
    - VulnerabilityID: CVE-2023-44487
      Severity:        HIGH
      Title:           HTTP/2 Rapid Reset Attack
      InstalledVersion: 1.25.3
      FixedVersion:    1.25.4
      Description:     ...

    - VulnerabilityID: CVE-2024-21626
      Severity:        HIGH
      Title:           runc container breakout
      InstalledVersion: 1.1.9
      FixedVersion:    1.1.12

Reading Config Audit Reports

kubectl get configauditreports -A
NAMESPACE   NAME                              CRITICAL  HIGH  MEDIUM  LOW
default     replicaset-nginx-test-abc123      0         2     3       5
kubectl describe configauditreport replicaset-nginx-test-abc123 -n default
Report:
  Checks:
    - CheckID: KSV014
      Severity: HIGH
      Title:    Root file system is not read-only
      Message:  Container 'nginx' of ReplicaSet 'nginx-test-abc123'
                should set 'securityContext.readOnlyRootFilesystem' to true

    - CheckID: KSV003
      Severity: HIGH
      Title:    No capabilities drop defined
      Message:  Container 'nginx' should drop ALL capabilities

These findings are actionable: add readOnlyRootFilesystem: true and capabilities: {drop: [ALL]} to the container security context.

Trivy Severity Levels

Severity Meaning Action
CRITICAL Remote code execution, privilege escalation Fix immediately — update image
HIGH Significant data exposure or system compromise Fix within sprint
MEDIUM Limited impact, requires other conditions Fix in regular maintenance
LOW Minimal impact or theoretical Track and accept or fix
UNKNOWN Insufficient data to score Investigate manually

The ignoreUnfixed: true setting (enabled in the Terraform config) filters out CVEs that have no available fix — these clutter reports without providing actionable guidance.

When to Use Trivy

Trivy runs continuously — you don't "use" it so much as read it periodically. Build it into your workflow:

  • Daily: check for new CRITICAL/HIGH findings via kubectl get vulnerabilityreports -A
  • Before deploying new images: add trivy image <image> to your CI pipeline
  • After major Kubernetes upgrades: re-scan all workloads for new CVEs against updated components
  • After security incidents: use ConfigAuditReports to check for misconfigurations that may have contributed

The key discipline with Trivy is not letting reports accumulate without action. A list of 200 unaddressed findings becomes noise. Triage weekly, fix CRITICAL findings same-day, and track HIGH findings in your issue tracker.


kube-bench — CIS Security Benchmark Auditing

What It Is

kube-bench runs the CIS (Center for Internet Security) Kubernetes Benchmark against your cluster. The CIS Benchmark is the industry standard for Kubernetes security hardening — 300+ checks across control plane, worker nodes, etcd, and policies.

Note

see example of report Kubernetes-stresstest/kube-bench.sh at main · vladlevinas/Kubernetes-stresstest

Unlike Trivy which scans workload content, kube-bench audits the cluster configuration itself: kubelet settings, API server flags, file permissions, authentication configuration, network policies, and RBAC setup.

On managed Kubernetes like DOKS, many checks are controlled by the provider and not configurable by the user. kube-bench correctly identifies these and marks them as warnings with remediation notes that say "this is controlled by your provider."

Running kube-bench

In the Terraform setup, kube-bench runs as a Kubernetes Job immediately after cluster creation. It completes in about 60 seconds and exits.

kubectl logs job/kube-bench --kubeconfig=kubeconfig.yaml
[INFO] 4 Worker Node Security Configuration
[INFO] 4.1 Worker Node Configuration Files

[PASS] 4.1.1 Ensure that the kubelet service file permissions are set to 600
[PASS] 4.1.2 Ensure that the kubelet service file ownership is set to root:root
[WARN] 4.1.3 If proxy kubeconfig file exists ensure permissions are set to 600
[PASS] 4.2.1 Ensure that the --anonymous-auth argument is set to false
[PASS] 4.2.2 Ensure that the --authorization-mode argument is not set to AlwaysAllow
[PASS] 4.2.6 Ensure that the --protect-kernel-defaults is set to true
[FAIL] 4.2.11 Ensure that the RotateKubeletServerCertificate is set to true

[INFO] 5 Kubernetes Policies
[INFO] 5.1 RBAC and Service Accounts

[WARN] 5.1.1 Ensure that the cluster-admin role is only used where required
[FAIL] 5.1.6 Ensure that Service Account Tokens are not automatically mounted
[PASS] 5.2.2 Minimize the admission of containers wishing to share the host PID
[PASS] 5.2.3 Minimize the admission of containers with added capability
[FAIL] 5.4.2 Ensure that all Namespaces have Network Policies defined

== Remediations ==
4.2.11 Edit the kubelet configuration file /var/lib/kubelet/config.yaml
       and set: RotateKubeletServerCertificate: true
       Note: On managed clusters (DOKS, GKE, EKS), this may be
       controlled by the provider.

5.1.6  Apply automountServiceAccountToken: false to service accounts
       that do not require API access.

5.4.2  Create NetworkPolicy objects for each namespace to restrict
       pod-to-pod traffic appropriately.

== Summary node ==
19 checks PASS
3  checks FAIL
4  checks WARN
0  checks INFO

Interpreting Results

kube-bench output has four categories:

PASS — configuration matches the CIS recommendation. No action needed.

FAIL — configuration does not match. Remediation is described in the output. On managed Kubernetes, some FAILs are expected because the provider controls those settings (kubelet configuration, API server flags).

WARN — check could not be fully automated or requires manual verification. The output explains what to check manually.

INFO — informational finding, no action required.

Actionable vs Non-Actionable Findings on DOKS

On DigitalOcean managed Kubernetes, expect approximately:

  • 15-20 PASS on worker node checks
  • 2-4 FAIL on checks controlled by the provider (non-actionable)
  • 3-5 FAIL on policy checks (actionable — NetworkPolicies, RBAC, service account tokens)
  • Several WARN on configuration that requires manual inspection

The actionable failures for a typical cluster are:

Check Finding Fix
5.1.6 Service Account Tokens auto-mounted Add automountServiceAccountToken: false to service accounts
5.4.2 No NetworkPolicies defined Create default-deny NetworkPolicy per namespace
5.1.1 cluster-admin used broadly Audit ClusterRoleBindings, replace with least-privilege roles
5.2.6 Containers running as root Add runAsNonRoot: true to pod security context

When to Use kube-bench

kube-bench is not a continuous monitoring tool — it's a point-in-time audit. Run it:

  • After initial cluster provisioning (already done in the Terraform setup)
  • After major Kubernetes version upgrades
  • Before production go-live or SOC2/ISO27001 audits
  • After significant RBAC or workload configuration changes
  • Quarterly as part of a security review cycle

The output doubles as a hardening checklist. Work through the FAIL items one by one, distinguishing between provider-controlled (document and accept) and user-controlled (fix).


Goldilocks — Resource Optimization via VPA Recommendations

What It Is

Goldilocks solves the resource request guessing problem. Most engineers set CPU and memory requests/limits based on intuition, copy-paste from documentation, or not at all. Both extremes are harmful: over-provisioned workloads waste money and reduce scheduling density; under-provisioned workloads get OOMKilled or CPU-throttled under load.

Goldilocks runs the Kubernetes Vertical Pod Autoscaler (VPA) in recommendation-only mode on every workload in labeled namespaces, then presents those recommendations in a dashboard. The VPA watches actual CPU and memory usage over time and produces statistically sound recommendations based on real consumption patterns.

┌─────────────────────────────────────────────────────────┐
│  Goldilocks Architecture                               │
│                                                         │
│  ┌─────────────────────────────────────┐               │
│  │  Namespace (label: goldilocks=true) │               │
│  │                                     │               │
│  │  Deployment: nginx-test             │               │
│  │  Pod: running, consuming resources  │               │
│  └──────────────┬──────────────────────┘               │
│                 │  metrics                             │
│                 ▼                                       │
│  VPA (recommendation mode only — never resizes pods)   │
│                 │  recommendations                     │
│                 ▼                                       │
│  Goldilocks Controller                                  │
│                 │  reads + aggregates                  │
│                 ▼                                       │
│  Goldilocks Dashboard (:32080)                          │
│                 │  displays per deployment             │
│                 ▼                                       │
│  ┌─────────────────────────────────┐                   │
│  │  nginx-test                     │                   │
│  │  CPU req: 15m  lim: 15m         │                   │
│  │  Mem req: 32Mi lim: 32Mi        │                   │
│  └─────────────────────────────────┘                   │
└─────────────────────────────────────────────────────────┘

Critical distinction: VPA in recommendation mode never changes anything. It only watches and suggests. No pods are restarted, no resources are modified. This is the safe way to use VPA.
1200

Reading Goldilocks Output

Open http://NODE-IP:32080:

Namespace: default

Deployment: nginx-test
  Container: nginx

  QoS Policy: Guaranteed (request == limit)

  Current settings:   CPU req: -    CPU lim: -    Mem req: -    Mem lim: -
  Recommended:        CPU req: 15m  CPU lim: 15m  Mem req: 32Mi Mem lim: 32Mi

  Burstable policy:   CPU req: 15m  CPU lim: 1000m  Mem req: 32Mi  Mem lim: 500Mi

Goldilocks shows two recommendation modes:

Guaranteed — request equals limit. The pod gets exactly what it asks for, no more. Best for predictable, steady-state workloads. The pod will never be CPU-throttled but will be OOMKilled if it spikes above the limit.

Burstable — request is the minimum guarantee, limit is the ceiling. The pod can burst above its request when node resources are available. Best for workloads with variable load patterns.

What the Metrics Mean

CPU request — the minimum CPU the scheduler guarantees. If you set 15m (millicores), the pod is guaranteed 1.5% of a vCPU.

CPU limit — the hard cap. CPU is throttled at this value even if node has free capacity. Setting CPU limits too low causes CpuThrottling — the application runs slowly without any visible error.

Memory request — the scheduler guarantee. Used for bin-packing decisions.

Memory limit — the hard cap. Exceeding this kills the pod with OOMKilled. Unlike CPU throttling, there's no graceful degradation — the process is terminated immediately.

Workflow: Optimize Then Test

The correct workflow for Goldilocks is iterative:

1. Deploy workload with no resource limits
2. Run k6 load test (realistic traffic)
3. Wait 10-15 minutes for VPA to collect data
4. Read Goldilocks recommendations
5. Update deployment manifests with recommended values
6. Run k6 load test again
7. Verify: no OOMKills, no CPU throttling, no degraded latency
8. Repeat for each environment (test → staging → production)

Step 6 is often skipped. Don't skip it. Adding resource limits to a previously unconstrained pod can cause surprising performance regressions if the limits are tighter than the workload's actual burst behavior.

When to Use Goldilocks

Use Goldilocks when:

  • You're setting resource limits for the first time on a new workload
  • You've been running workloads with no limits and want to add them safely
  • You're experiencing OOMKills but don't know the right memory limit
  • Nodes are running out of capacity and you suspect over-provisioning
  • You're preparing cost optimization work — right-sized requests reduce waste

Don't trust Goldilocks recommendations on workloads that have only been running for a few minutes. The VPA needs sustained traffic — ideally covering your peak load period — to produce accurate recommendations. Run your k6 load tests before reading Goldilocks output.


Comparative Analysis

Tool Comparison Table

Chaos Mesh k6 Trivy kube-bench Goldilocks
Category Resilience Performance Security Security Efficiency
What it tests Failure recovery Load capacity Image CVEs + config CIS compliance Resource sizing
When it runs On demand On demand Continuously One-shot Continuously
Output type Pod events, dashboard Metrics, pass/fail VulnerabilityReports Log output Dashboard
Affects workloads? Yes (by design) Yes (generates load) No No No
Requires traffic No Generates it No No Yes
CI/CD friendly Partially Yes Yes Yes No
Dashboard Yes (:32333) No No No Yes (:32080)
External access NodePort No No No NodePort

Decision Guide: Which Tool for Which Problem

Problem: "We don't know if our pods recover after a crash"
→ Chaos Mesh (PodChaos, pod-kill action)

Problem: "We don't know how many users our service can handle"
→ k6 (load test with ramp-up stages and thresholds)

Problem: "Our container images might have unpatched CVEs"
→ Trivy Operator (scan all running images)

Problem: "We need to pass a security audit"
→ kube-bench (CIS benchmark, document findings)

Problem: "Our pods keep getting OOMKilled"
→ Goldilocks (read VPA memory recommendations)

Problem: "We need to prove our service handles failure gracefully"
→ Chaos Mesh + k6 running simultaneously

Problem: "We're over-spending on node capacity"
→ Goldilocks (identify over-provisioned workloads)

Problem: "Security team wants CVE report for all running workloads"
→ kubectl get vulnerabilityreports -A -o json

Combining Tools for Maximum Coverage

The tools are most powerful in combination. Three workflows cover most production readiness requirements:

Workflow 1: Pre-production resilience gate

  1. Deploy workload to test cluster
  2. Run k6 at expected production load (baseline)
  3. Run Chaos Mesh pod-kill during k6 test
  4. Assert: error rate stays below 1%, recovery time under 10 seconds
  5. Run Chaos Mesh network delay (100ms) during k6 test
  6. Assert: p95 latency stays below SLA

Workflow 2: Security clearance before deployment

  1. Run trivy image <new-image> in CI — fail pipeline on CRITICAL
  2. After deploy to test cluster, check kubectl get vulnerabilityreports
  3. Run kube-bench if cluster configuration changed
  4. Review ConfigAuditReports for new misconfigurations

Workflow 3: Resource right-sizing

  1. Deploy workload without resource limits
  2. Run k6 at peak expected load for 10 minutes
  3. Read Goldilocks dashboard
  4. Apply recommended limits to deployment
  5. Run k6 again — verify no regressions

The Testing Loop

All five tools together form a testing loop that covers the full lifecycle of a workload:

┌─────────────────────────────────────────────────────────────────┐
│                      Testing Loop                              │
│                                                                 │
│  ┌─────────┐     ┌──────────┐     ┌──────────┐                │
│  │  Deploy │────►│  k6 load │────►│Goldilocks│                │
│  │workload │     │   test   │     │  sizing  │                │
│  └─────────┘     └──────────┘     └────┬─────┘                │
│       ▲                                │                       │
│       │ apply limits                   │ recommendations       │
│       └────────────────────────────────┘                       │
│                                                                 │
│  ┌──────────┐     ┌──────────┐     ┌──────────┐               │
│  │  Chaos   │     │  Trivy   │     │  kube-   │               │
│  │  Mesh    │     │  scan    │     │  bench   │               │
│  │resilience│     │ security │     │  audit   │               │
│  └──────────┘     └──────────┘     └──────────┘               │
│       │                │                │                      │
│       └────────────────┴────────────────┘                      │
│                        │                                       │
│                  fix findings                                  │
│                        │                                       │
│                        ▼                                       │
│                  production ready                              │
└─────────────────────────────────────────────────────────────────┘

No workload should reach production without passing through all five lenses. In practice, run Trivy and kube-bench first (they find blocking issues fastest), then size with Goldilocks, then validate resilience and performance with Chaos Mesh and k6.


Conclusion

Kubernetes testing is not one thing. A pod that survives chaos experiments might still fall over under load. A load-tested service might run with unpatched CVEs. A security-audited cluster might have workloads without resource limits, causing cascading evictions under traffic spikes.

The five tools in this stack cover the blind spots that individual approaches miss. More importantly, they're not expensive or complex to run — the entire stack deploys in under 10 minutes on a $24/month cluster and costs nothing when idle.

The infrastructure-as-code approach (single terraform apply, single terraform destroy) removes the friction that usually prevents teams from building test environments. There's no excuse to skip testing when the environment is disposable.


Full Terraform source code and setup guide: doc.thedevops.dev

Follow for more content on Kubernetes, AI infrastructure, and DevOps automation.