Kubernetes Autoscaling Complete Guide (Part 5): Vertical Pod Autoscaler & Resource Optimization

Series Overview

This is Part 5 of the Kubernetes Autoscaling Complete Guide series:


While Horizontal Pod Autoscaler (HPA) scales the number of pod replicas, Vertical Pod Autoscaler (VPA) optimizes resource requests and limits for individual pods. This guide explores VPA architecture, implementation strategies, safe combination with HPA, and comprehensive resource optimization techniques.

The Resource Management Challenge

The Cost of Misconfigured Resources

OVER-PROVISIONED SCENARIO:
┌─────────────────────────────────────────────────────────────┐
│  Pod Resource Configuration                                 │
│                                                              │
│  Requested: 2 CPU, 4GB RAM                                  │
│  Actual Usage: 0.3 CPU (15%), 800MB RAM (20%)              │
│                                                              │
│  Waste: 1.7 CPU (85%), 3.2GB RAM (80%)                     │
│  Monthly Cost: $120                                          │
│  Wasted Cost: $102/month per pod                            │
│                                                              │
│  With 100 pods: $10,200/month wasted                        │
└─────────────────────────────────────────────────────────────┘

UNDER-PROVISIONED SCENARIO:
┌─────────────────────────────────────────────────────────────┐
│  Pod Resource Configuration                                 │
│                                                              │
│  Requested: 0.5 CPU, 512MB RAM                              │
│  Actual Usage: 0.8 CPU (160%), 1.2GB RAM (240%)            │
│                                                              │
│  Problems:                                                   │
│  • CPU throttling → slow response times                     │
│  • OOMKilled → pod restarts                                 │
│  • Service degradation                                       │
│  • Customer impact → lost revenue                           │
└─────────────────────────────────────────────────────────────┘

VPA OPTIMIZED:
┌─────────────────────────────────────────────────────────────┐
│  Pod Resource Configuration                                 │
│                                                              │
│  Requested: 0.4 CPU, 1GB RAM                                │
│  Actual Usage: 0.35 CPU (87%), 900MB RAM (90%)             │
│                                                              │
│  Result:                                                     │
│  • 80% cost savings vs over-provisioned                     │
│  • No throttling or OOM issues                              │
│  • Optimal resource utilization                             │
└─────────────────────────────────────────────────────────────┘

Business Impact

MetricWithout VPAWith VPAImpact
Resource Waste40-70% typical5-15%60%+ cost reduction
OOMKilled EventsCommonRareBetter reliability
CPU ThrottlingFrequentMinimalImproved performance
Manual Tuning TimeHours/weekAutomatedOperational efficiency
Right-sizing AccuracyGuessworkData-drivenPrecision optimization

Understanding Vertical Pod Autoscaler

VPA Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                        VPA ARCHITECTURE                              │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                   VPA ADMISSION CONTROLLER                     │ │
│  │                                                                 │ │
│  │  • Intercepts pod creation requests                            │ │
│  │  • Injects resource requests/limits                            │ │
│  │  • Works at pod admission time                                 │ │
│  └────────────────┬───────────────────────────────────────────────┘ │
│                   │                                                  │
│                   ↓                                                  │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                   VPA RECOMMENDER                              │ │
│  │                                                                 │ │
│  │  • Monitors pod resource usage (from Metrics Server)           │ │
│  │  • Analyzes historical metrics                                 │ │
│  │  • Calculates optimal resource requests                        │ │
│  │  • Stores recommendations in VPA objects                       │ │
│  └────────────────┬───────────────────────────────────────────────┘ │
│                   │                                                  │
│                   ↓                                                  │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                   VPA UPDATER                                  │ │
│  │                                                                 │ │
│  │  • Checks if pods need resource updates                        │ │
│  │  • Evicts pods with outdated resource configs                  │ │
│  │  • Triggers pod recreation with new resources                  │ │
│  │  • Respects PodDisruptionBudgets                              │ │
│  └────────────────┬───────────────────────────────────────────────┘ │
│                   │                                                  │
│                   ↓                                                  │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │              KUBERNETES API & METRICS                          │ │
│  │                                                                 │ │
│  │  Metrics Server → VPA Recommender → VPA Object → Updater       │ │
│  └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘

VPA vs HPA Comparison

AspectVPAHPA
Scaling DirectionVertical (resources per pod)Horizontal (number of pods)
What it ChangesCPU/memory requests & limitsReplica count
Pod DisruptionYes (recreation required)No (gradual)
Best ForRight-sizing, cost optimizationTraffic scaling, load handling
Stateful AppsSuitableComplex
Response TimeMinutes (pod restart)Seconds to minutes
Use CaseUnknown resource needsKnown scaling patterns
Combine with OtherCan combine with HPA (carefully)Can combine with VPA

Part 1: Installing VPA

Prerequisites

1# Ensure Metrics Server is installed
2kubectl get deployment metrics-server -n kube-system
3
4# Verify metrics are available
5kubectl top nodes
6kubectl top pods -A

Installation via Manifests

 1# Clone VPA repository
 2git clone https://github.com/kubernetes/autoscaler.git
 3cd autoscaler/vertical-pod-autoscaler
 4
 5# Install VPA components
 6./hack/vpa-up.sh
 7
 8# Verify installation
 9kubectl get pods -n kube-system | grep vpa
10
11# Expected output:
12# vpa-admission-controller-xxx   1/1     Running   0          2m
13# vpa-recommender-xxx            1/1     Running   0          2m
14# vpa-updater-xxx                1/1     Running   0          2m
15
16# Verify CRDs
17kubectl get crd | grep verticalpodautoscaler
18
19# Expected:
20# verticalpodautoscalercheckpoints.autoscaling.k8s.io
21# verticalpodautoscalers.autoscaling.k8s.io

Installation via Helm

 1# Add VPA Helm repository
 2helm repo add fairwinds-stable https://charts.fairwinds.com/stable
 3helm repo update
 4
 5# Install VPA
 6helm install vpa fairwinds-stable/vpa \
 7  --namespace kube-system \
 8  --set recommender.enabled=true \
 9  --set updater.enabled=true \
10  --set admissionController.enabled=true
11
12# Verify installation
13helm status vpa -n kube-system
14kubectl get pods -n kube-system -l app.kubernetes.io/name=vpa

Installation via AWS CDK (EKS Integration)

Add to your CDK stack from Part 3:

 1import * as cdk from 'aws-cdk-lib';
 2import * as eks from 'aws-cdk-lib/aws-eks';
 3import { Construct } from 'constructs';
 4
 5export class EksVpaStack extends cdk.Stack {
 6  constructor(scope: Construct, id: string, cluster: eks.Cluster, props?: cdk.StackProps) {
 7    super(scope, id, props);
 8
 9    // Install VPA using Helm
10    const vpa = cluster.addHelmChart('VPA', {
11      chart: 'vpa',
12      repository: 'https://charts.fairwinds.com/stable',
13      namespace: 'kube-system',
14      release: 'vpa',
15      version: '4.4.6', // Check for latest version
16
17      values: {
18        // Recommender configuration
19        recommender: {
20          enabled: true,
21          extraArgs: {
22            'v': '4', // Verbose logging
23            'pod-recommendation-min-cpu-millicores': '25', // Minimum CPU recommendation
24            'pod-recommendation-min-memory-mb': '100', // Minimum memory recommendation
25            'recommendation-margin-fraction': '0.15', // 15% safety margin
26            'storage': 'prometheus', // Optional: Use Prometheus for history
27          },
28          resources: {
29            requests: {
30              cpu: '200m',
31              memory: '512Mi',
32            },
33            limits: {
34              cpu: '500m',
35              memory: '1Gi',
36            },
37          },
38        },
39
40        // Updater configuration
41        updater: {
42          enabled: true,
43          extraArgs: {
44            'min-replicas': '2', // Only update deployments with 2+ replicas
45            'eviction-tolerance': '0.5', // Max 50% pods can be evicting
46          },
47          resources: {
48            requests: {
49              cpu: '100m',
50              memory: '256Mi',
51            },
52            limits: {
53              cpu: '200m',
54              memory: '512Mi',
55            },
56          },
57        },
58
59        // Admission Controller configuration
60        admissionController: {
61          enabled: true,
62          generateCertificate: true,
63          resources: {
64            requests: {
65              cpu: '100m',
66              memory: '256Mi',
67            },
68            limits: {
69              cpu: '200m',
70              memory: '512Mi',
71            },
72          },
73        },
74
75        // Metrics Server dependency
76        metrics: {
77          enabled: false, // Assuming already installed
78        },
79      },
80    });
81
82    // Output VPA status check command
83    new cdk.CfnOutput(this, 'VPAStatusCommand', {
84      value: 'kubectl get pods -n kube-system -l app.kubernetes.io/name=vpa',
85      description: 'Command to check VPA pods status',
86    });
87  }
88}

Part 2: VPA Update Modes

VPA supports four update modes that control how it applies recommendations:

Mode 1: Off (Recommendation Only)

Use Case: Testing VPA without impacting workloads

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name: my-app-vpa-off
 5  namespace: default
 6spec:
 7  targetRef:
 8    apiVersion: apps/v1
 9    kind: Deployment
10    name: my-app
11
12  updatePolicy:
13    updateMode: "Off"  # Only generate recommendations
14
15# VPA will NOT modify pods, only provide recommendations
16# Check recommendations:
17# kubectl describe vpa my-app-vpa-off

Benefits:

  • Safe exploration of VPA recommendations
  • No disruption to running workloads
  • Understand resource usage patterns
  • Plan resource adjustments

Example Output:

 1kubectl describe vpa my-app-vpa-off
 2
 3# Output shows recommendations:
 4Recommendation:
 5  Container Recommendations:
 6    Container Name: my-app
 7    Lower Bound:
 8      Cpu:     150m
 9      Memory:  256Mi
10    Target:
11      Cpu:     300m      # Recommended request
12      Memory:  512Mi     # Recommended request
13    Uncapped Target:
14      Cpu:     300m
15      Memory:  512Mi
16    Upper Bound:
17      Cpu:     1
18      Memory:  2Gi

Mode 2: Initial (Apply on Pod Creation Only)

Use Case: New deployments, gradual rollout

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name: my-app-vpa-initial
 5  namespace: default
 6spec:
 7  targetRef:
 8    apiVersion: apps/v1
 9    kind: Deployment
10    name: my-app
11
12  updatePolicy:
13    updateMode: "Initial"  # Apply only when pods are created
14
15  resourcePolicy:
16    containerPolicies:
17    - containerName: my-app
18      minAllowed:
19        cpu: 100m
20        memory: 128Mi
21      maxAllowed:
22        cpu: 2
23        memory: 4Gi

Behavior:

  • VPA sets resource requests when pods are first created
  • No changes to existing running pods
  • Useful for new deployments or scaling events
  • Safe for production workloads

When to Use:

  • Initial deployment with unknown resource needs
  • Canary deployments
  • Blue/green deployments
  • When combined with HPA (pods recreated during scale events)

Mode 3: Recreate (Apply by Restarting Pods)

Use Case: Production optimization with controlled disruption

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name: my-app-vpa-recreate
 5  namespace: default
 6spec:
 7  targetRef:
 8    apiVersion: apps/v1
 9    kind: Deployment
10    name: my-app
11
12  updatePolicy:
13    updateMode: "Recreate"  # VPA will evict and recreate pods
14
15  resourcePolicy:
16    containerPolicies:
17    - containerName: my-app
18      minAllowed:
19        cpu: 100m
20        memory: 128Mi
21      maxAllowed:
22        cpu: 2
23        memory: 4Gi
24      controlledResources: ["cpu", "memory"]
25      mode: Auto  # VPA manages both requests and limits
26
27# PodDisruptionBudget to control eviction rate
28---
29apiVersion: policy/v1
30kind: PodDisruptionBudget
31metadata:
32  name: my-app-pdb
33  namespace: default
34spec:
35  minAvailable: 2  # At least 2 pods must remain available
36  selector:
37    matchLabels:
38      app: my-app

Behavior:

  • VPA evicts pods with outdated resource configuration
  • Pods are recreated with new resource requests
  • Respects PodDisruptionBudgets
  • Gradual rollout to maintain availability

Important Considerations:

  • Disruption: Pods will be restarted
  • Stateful Apps: Handle with care (use PVCs, proper shutdown)
  • PDBs Required: Prevent cascading failures
  • Monitoring: Watch for elevated pod restart rates

Mode 4: Auto (Future - Not Yet Implemented)

Status: Planned feature for in-place resource updates

 1# Future capability (not yet available)
 2apiVersion: autoscaling.k8s.io/v1
 3kind: VerticalPodAutoscaler
 4metadata:
 5  name: my-app-vpa-auto
 6spec:
 7  targetRef:
 8    apiVersion: apps/v1
 9    kind: Deployment
10    name: my-app
11
12  updatePolicy:
13    updateMode: "Auto"  # In-place updates without pod restart
14
15# When available, will update resources WITHOUT pod eviction
16# Requires Kubernetes in-place resource update feature

Expected Behavior (when implemented):

  • Update pod resources without restart
  • Zero disruption
  • Immediate application of new limits

Part 3: VPA Configuration Deep Dive

Basic VPA Configuration

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name: my-app-vpa
 5  namespace: default
 6spec:
 7  # Target workload
 8  targetRef:
 9    apiVersion: apps/v1
10    kind: Deployment  # Can be: Deployment, StatefulSet, DaemonSet, ReplicaSet
11    name: my-app
12
13  # Update policy
14  updatePolicy:
15    updateMode: "Auto"  # Off, Initial, Recreate, Auto
16
17  # Resource policy (constraints and rules)
18  resourcePolicy:
19    containerPolicies:
20    - containerName: '*'  # Apply to all containers, or specify name
21      minAllowed:
22        cpu: 100m
23        memory: 128Mi
24      maxAllowed:
25        cpu: 2
26        memory: 4Gi
27      controlledResources: ["cpu", "memory"]  # What VPA should manage
28
29      # Resource scaling mode
30      mode: Auto  # Auto (manage requests & limits) or Off

Advanced VPA Configuration

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name: advanced-vpa
 5  namespace: production
 6  labels:
 7    app: my-app
 8    environment: production
 9spec:
10  targetRef:
11    apiVersion: apps/v1
12    kind: Deployment
13    name: my-app
14
15  updatePolicy:
16    updateMode: "Recreate"
17
18    # Minimum number of replicas required
19    minReplicas: 2  # Don't update if less than 2 replicas
20
21  resourcePolicy:
22    containerPolicies:
23
24    # Application container
25    - containerName: app
26      minAllowed:
27        cpu: 200m
28        memory: 256Mi
29      maxAllowed:
30        cpu: 4
31        memory: 8Gi
32      controlledResources: ["cpu", "memory"]
33      mode: Auto
34
35      # Resource scaling factors
36      controlledValues: RequestsAndLimits  # or RequestsOnly
37
38    # Sidecar container (different policy)
39    - containerName: sidecar
40      minAllowed:
41        cpu: 50m
42        memory: 64Mi
43      maxAllowed:
44        cpu: 500m
45        memory: 512Mi
46      controlledResources: ["cpu", "memory"]
47      mode: Auto
48
49  # Recommender configuration
50  recommenders:
51  - name: custom-recommender  # Use custom recommender if deployed

Resource Policy Options Explained

controlledResources

1# Option 1: Manage both CPU and memory
2controlledResources: ["cpu", "memory"]
3
4# Option 2: CPU only
5controlledResources: ["cpu"]
6
7# Option 3: Memory only
8controlledResources: ["memory"]

controlledValues

1# Option 1: Manage both requests and limits (default)
2controlledValues: RequestsAndLimits
3# VPA sets both resource requests and limits
4# Limit = Request * current limit/request ratio
5
6# Option 2: Manage requests only
7controlledValues: RequestsOnly
8# VPA only sets resource requests
9# Limits remain as defined in pod spec

Example:

 1# Original pod spec:
 2resources:
 3  requests:
 4    cpu: 100m
 5    memory: 128Mi
 6  limits:
 7    cpu: 500m     # 5x request
 8    memory: 512Mi # 4x request
 9
10# With controlledValues: RequestsAndLimits
11# VPA recommendation: 200m CPU, 256Mi memory
12# VPA sets:
13resources:
14  requests:
15    cpu: 200m
16    memory: 256Mi
17  limits:
18    cpu: 1000m    # 5x request (ratio preserved)
19    memory: 1Gi   # 4x request (ratio preserved)
20
21# With controlledValues: RequestsOnly
22# VPA sets:
23resources:
24  requests:
25    cpu: 200m
26    memory: 256Mi
27  limits:
28    cpu: 500m     # Original limit (unchanged)
29    memory: 512Mi # Original limit (unchanged)

Part 4: Combining VPA with HPA

The Challenge

VPA and HPA can conflict when both try to manage the same workload:

Conflict Scenario:
┌────────────────────────────────────────────────────────────┐
│  Time: 10:00 - High CPU usage detected                    │
│                                                             │
│  HPA: "CPU is high, scale from 3 to 6 pods"               │
│  VPA: "CPU is high, increase CPU requests from 100m to 200m"│
│                                                             │
│  Result: Both scale simultaneously                         │
│  • HPA adds 3 pods with old 100m requests                 │
│  • VPA tries to recreate all 6 pods with 200m requests    │
│  • Cascading pod restarts                                  │
│  • Service disruption                                      │
└────────────────────────────────────────────────────────────┘

Safe Combination Strategies

Strategy 1: VPA for CPU, HPA for Custom Metrics

Recommendation: Most common and safest approach

 1# VPA configuration
 2apiVersion: autoscaling.k8s.io/v1
 3kind: VerticalPodAutoscaler
 4metadata:
 5  name: my-app-vpa
 6spec:
 7  targetRef:
 8    apiVersion: apps/v1
 9    kind: Deployment
10    name: my-app
11
12  updatePolicy:
13    updateMode: "Initial"  # Only apply on new pods (from HPA scaling)
14
15  resourcePolicy:
16    containerPolicies:
17    - containerName: '*'
18      minAllowed:
19        cpu: 100m
20        memory: 256Mi
21      maxAllowed:
22        cpu: 2
23        memory: 4Gi
24      # KEY: Only manage CPU
25      controlledResources: ["cpu"]
26      controlledValues: RequestsOnly
27
28---
29# HPA configuration
30apiVersion: autoscaling/v2
31kind: HorizontalPodAutoscaler
32metadata:
33  name: my-app-hpa
34spec:
35  scaleTargetRef:
36    apiVersion: apps/v1
37    kind: Deployment
38    name: my-app
39
40  minReplicas: 2
41  maxReplicas: 20
42
43  # KEY: Use custom metrics, NOT CPU
44  metrics:
45  - type: Pods
46    pods:
47      metric:
48        name: http_requests_per_second
49      target:
50        type: AverageValue
51        averageValue: "1000"
52
53  # Or use memory (since VPA manages CPU)
54  - type: Resource
55    resource:
56      name: memory
57      target:
58        type: Utilization
59        averageUtilization: 70

Why This Works:

  • VPA optimizes CPU requests based on actual usage
  • HPA scales replicas based on request rate or memory
  • No conflict: they manage different dimensions

Strategy 2: VPA Off Mode + Manual Right-sizing

 1# VPA in recommendation-only mode
 2apiVersion: autoscaling.k8s.io/v1
 3kind: VerticalPodAutoscaler
 4metadata:
 5  name: my-app-vpa-readonly
 6spec:
 7  targetRef:
 8    apiVersion: apps/v1
 9    kind: Deployment
10    name: my-app
11
12  updatePolicy:
13    updateMode: "Off"  # Recommendations only
14
15---
16# HPA manages scaling
17apiVersion: autoscaling/v2
18kind: HorizontalPodAutoscaler
19metadata:
20  name: my-app-hpa
21spec:
22  scaleTargetRef:
23    apiVersion: apps/v1
24    kind: Deployment
25    name: my-app
26
27  minReplicas: 3
28  maxReplicas: 50
29
30  metrics:
31  - type: Resource
32    resource:
33      name: cpu
34      target:
35        type: Utilization
36        averageUtilization: 70

Process:

  1. VPA generates recommendations
  2. Review recommendations weekly/monthly
  3. Manually update deployment resource requests
  4. HPA continues to scale horizontally

Benefits:

  • Zero conflict
  • Full control over resource changes
  • Suitable for conservative environments

Strategy 3: Separate Workloads

Best Practice: Use VPA and HPA on different workloads

 1# VPA for stateful workloads (vertical scaling)
 2apiVersion: autoscaling.k8s.io/v1
 3kind: VerticalPodAutoscaler
 4metadata:
 5  name: database-vpa
 6spec:
 7  targetRef:
 8    apiVersion: apps/v1
 9    kind: StatefulSet
10    name: postgres
11  updatePolicy:
12    updateMode: "Recreate"
13  resourcePolicy:
14    containerPolicies:
15    - containerName: postgres
16      minAllowed:
17        cpu: 1
18        memory: 2Gi
19      maxAllowed:
20        cpu: 8
21        memory: 32Gi
22
23---
24# HPA for stateless workloads (horizontal scaling)
25apiVersion: autoscaling/v2
26kind: HorizontalPodAutoscaler
27metadata:
28  name: api-hpa
29spec:
30  scaleTargetRef:
31    apiVersion: apps/v1
32    kind: Deployment
33    name: api-server
34  minReplicas: 5
35  maxReplicas: 100
36  metrics:
37  - type: Resource
38    resource:
39      name: cpu
40      target:
41        type: Utilization
42        averageUtilization: 70

Configuration Matrix

VPA ModeHPA MetricResultRecommendation
OffCPU✅ SafeVPA provides insights, HPA scales
InitialCustom (requests/sec)✅ SafeVPA right-sizes on scale events
InitialMemory✅ SafeDifferent resources managed
RecreateCPU⚠️ RiskyCan cause thrashing
RecreateCustom✅ SafeVPA updates resources, HPA scales on different metric
RecreateMemory⚠️ ModerateMonitor closely

Part 5: Production VPA Examples

Example 1: Stateless Web Application

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: web-app
 5  namespace: production
 6spec:
 7  replicas: 5
 8  selector:
 9    matchLabels:
10      app: web-app
11  template:
12    metadata:
13      labels:
14        app: web-app
15    spec:
16      containers:
17      - name: nginx
18        image: nginx:1.25
19        resources:
20          requests:
21            cpu: 100m      # Initial guess
22            memory: 128Mi  # Initial guess
23          limits:
24            cpu: 500m
25            memory: 512Mi
26
27---
28apiVersion: autoscaling.k8s.io/v1
29kind: VerticalPodAutoscaler
30metadata:
31  name: web-app-vpa
32  namespace: production
33spec:
34  targetRef:
35    apiVersion: apps/v1
36    kind: Deployment
37    name: web-app
38
39  updatePolicy:
40    updateMode: "Recreate"
41
42  resourcePolicy:
43    containerPolicies:
44    - containerName: nginx
45      minAllowed:
46        cpu: 50m
47        memory: 64Mi
48      maxAllowed:
49        cpu: 1
50        memory: 1Gi
51      controlledResources: ["cpu", "memory"]
52      controlledValues: RequestsAndLimits
53
54---
55# PDB to ensure availability during updates
56apiVersion: policy/v1
57kind: PodDisruptionBudget
58metadata:
59  name: web-app-pdb
60  namespace: production
61spec:
62  minAvailable: 3  # Keep at least 3 pods running
63  selector:
64    matchLabels:
65      app: web-app

Example 2: Stateful Database

 1apiVersion: apps/v1
 2kind: StatefulSet
 3metadata:
 4  name: postgres
 5  namespace: databases
 6spec:
 7  serviceName: postgres
 8  replicas: 3
 9  selector:
10    matchLabels:
11      app: postgres
12  template:
13    metadata:
14      labels:
15        app: postgres
16    spec:
17      containers:
18      - name: postgres
19        image: postgres:15
20        resources:
21          requests:
22            cpu: 2
23            memory: 4Gi
24          limits:
25            cpu: 4
26            memory: 16Gi
27        volumeMounts:
28        - name: data
29          mountPath: /var/lib/postgresql/data
30
31  volumeClaimTemplates:
32  - metadata:
33      name: data
34    spec:
35      accessModes: ["ReadWriteOnce"]
36      resources:
37        requests:
38          storage: 100Gi
39
40---
41apiVersion: autoscaling.k8s.io/v1
42kind: VerticalPodAutoscaler
43metadata:
44  name: postgres-vpa
45  namespace: databases
46spec:
47  targetRef:
48    apiVersion: apps/v1
49    kind: StatefulSet
50    name: postgres
51
52  updatePolicy:
53    updateMode: "Initial"  # Safer for stateful apps
54
55  resourcePolicy:
56    containerPolicies:
57    - containerName: postgres
58      minAllowed:
59        cpu: 1
60        memory: 2Gi
61      maxAllowed:
62        cpu: 8
63        memory: 32Gi
64      controlledResources: ["cpu", "memory"]
65      controlledValues: RequestsOnly  # Keep original limits
66
67---
68apiVersion: policy/v1
69kind: PodDisruptionBudget
70metadata:
71  name: postgres-pdb
72  namespace: databases
73spec:
74  maxUnavailable: 1  # Only 1 pod can be down at a time
75  selector:
76    matchLabels:
77      app: postgres

Example 3: Microservices with Different Profiles

 1# CPU-intensive service
 2---
 3apiVersion: autoscaling.k8s.io/v1
 4kind: VerticalPodAutoscaler
 5metadata:
 6  name: image-processor-vpa
 7  namespace: production
 8spec:
 9  targetRef:
10    apiVersion: apps/v1
11    kind: Deployment
12    name: image-processor
13
14  updatePolicy:
15    updateMode: "Recreate"
16
17  resourcePolicy:
18    containerPolicies:
19    - containerName: processor
20      minAllowed:
21        cpu: 500m      # Higher CPU baseline
22        memory: 256Mi
23      maxAllowed:
24        cpu: 8         # Allow significant CPU growth
25        memory: 2Gi
26      controlledResources: ["cpu", "memory"]
27
28---
29# Memory-intensive service
30apiVersion: autoscaling.k8s.io/v1
31kind: VerticalPodAutoscaler
32metadata:
33  name: cache-service-vpa
34  namespace: production
35spec:
36  targetRef:
37    apiVersion: apps/v1
38    kind: Deployment
39    name: cache-service
40
41  updatePolicy:
42    updateMode: "Recreate"
43
44  resourcePolicy:
45    containerPolicies:
46    - containerName: redis
47      minAllowed:
48        cpu: 100m
49        memory: 1Gi       # Higher memory baseline
50      maxAllowed:
51        cpu: 2
52        memory: 16Gi      # Allow significant memory growth
53      controlledResources: ["cpu", "memory"]
54
55---
56# Balanced service
57apiVersion: autoscaling.k8s.io/v1
58kind: VerticalPodAutoscaler
59metadata:
60  name: api-service-vpa
61  namespace: production
62spec:
63  targetRef:
64    apiVersion: apps/v1
65    kind: Deployment
66    name: api-service
67
68  updatePolicy:
69    updateMode: "Initial"  # Apply on HPA scale events
70
71  resourcePolicy:
72    containerPolicies:
73    - containerName: api
74      minAllowed:
75        cpu: 100m
76        memory: 128Mi
77      maxAllowed:
78        cpu: 2
79        memory: 2Gi
80      controlledResources: ["cpu", "memory"]

Part 6: Resource Optimization Strategies

Strategy 1: Rightsizing Workflow

Phase 1: Discovery (Week 1)

 1# Step 1: Deploy VPA in "Off" mode for all deployments
 2for deployment in $(kubectl get deployments -n production -o name); do
 3  cat <<EOF | kubectl apply -f -
 4apiVersion: autoscaling.k8s.io/v1
 5kind: VerticalPodAutoscaler
 6metadata:
 7  name: $(basename $deployment)-vpa
 8  namespace: production
 9spec:
10  targetRef:
11    apiVersion: apps/v1
12    kind: Deployment
13    name: $(basename $deployment)
14  updatePolicy:
15    updateMode: "Off"
16EOF
17done
18
19# Step 2: Wait for 7 days to collect data
20
21# Step 3: Collect recommendations
22kubectl get vpa -n production -o yaml > vpa-recommendations.yaml
23
24# Step 4: Analyze recommendations
25for vpa in $(kubectl get vpa -n production -o name); do
26  echo "=== $vpa ==="
27  kubectl describe $vpa -n production | grep -A 20 "Target:"
28done

Phase 2: Analysis (Week 2)

 1# Generate resource optimization report
 2cat > analyze-vpa.sh <<'EOF'
 3#!/bin/bash
 4
 5echo "VPA Recommendations Analysis"
 6echo "============================="
 7echo ""
 8
 9for vpa in $(kubectl get vpa -n production -o name); do
10  deployment=$(kubectl get $vpa -n production -o jsonpath='{.spec.targetRef.name}')
11
12  echo "Deployment: $deployment"
13
14  # Current requests
15  current_cpu=$(kubectl get deployment $deployment -n production -o jsonpath='{.spec.template.spec.containers[0].resources.requests.cpu}')
16  current_mem=$(kubectl get deployment $deployment -n production -o jsonpath='{.spec.template.spec.containers[0].resources.requests.memory}')
17
18  # VPA recommendations
19  target_cpu=$(kubectl get $vpa -n production -o jsonpath='{.status.recommendation.containerRecommendations[0].target.cpu}')
20  target_mem=$(kubectl get $vpa -n production -o jsonpath='{.status.recommendation.containerRecommendations[0].target.memory}')
21
22  echo "  Current: CPU=$current_cpu, Memory=$current_mem"
23  echo "  Target:  CPU=$target_cpu, Memory=$target_mem"
24  echo ""
25done
26EOF
27
28chmod +x analyze-vpa.sh
29./analyze-vpa.sh

Phase 3: Implementation (Week 3)

 1# Apply recommendations gradually
 2# Start with non-critical services
 3
 4# 1. Test environment first
 5kubectl patch deployment my-app -n production -p '{
 6  "spec": {
 7    "template": {
 8      "spec": {
 9        "containers": [
10          {
11            "name": "my-app",
12            "resources": {
13              "requests": {
14                "cpu": "300m",
15                "memory": "512Mi"
16              }
17            }
18          }
19        ]
20      }
21    }
22  }
23}'
24
25# 2. Monitor for issues
26kubectl top pods -n production -l app=my-app --watch
27
28# 3. If stable, proceed with production

Strategy 2: Cluster-Wide Optimization

 1# Create VPA for all deployments using a script
 2apiVersion: v1
 3kind: ConfigMap
 4metadata:
 5  name: vpa-automation
 6  namespace: kube-system
 7data:
 8  create-vpas.sh: |
 9    #!/bin/bash
10
11    # Create VPA for all deployments in specific namespaces
12    NAMESPACES="production staging development"
13
14    for ns in $NAMESPACES; do
15      for deployment in $(kubectl get deployments -n $ns -o name); do
16        deployment_name=$(basename $deployment)
17
18        cat <<EOF | kubectl apply -f -
19    apiVersion: autoscaling.k8s.io/v1
20    kind: VerticalPodAutoscaler
21    metadata:
22      name: ${deployment_name}-vpa
23      namespace: $ns
24      labels:
25        managed-by: vpa-automation
26    spec:
27      targetRef:
28        apiVersion: apps/v1
29        kind: Deployment
30        name: ${deployment_name}
31      updatePolicy:
32        updateMode: "Initial"  # Safe default
33      resourcePolicy:
34        containerPolicies:
35        - containerName: '*'
36          minAllowed:
37            cpu: 50m
38            memory: 64Mi
39          maxAllowed:
40            cpu: 4
41            memory: 8Gi
42    EOF
43      done
44    done
45
46    echo "VPA objects created for all deployments"    
47
48---
49# CronJob to run automation weekly
50apiVersion: batch/v1
51kind: CronJob
52metadata:
53  name: vpa-optimizer
54  namespace: kube-system
55spec:
56  schedule: "0 2 * * 0"  # Every Sunday at 2 AM
57  jobTemplate:
58    spec:
59      template:
60        spec:
61          serviceAccountName: vpa-automation
62          containers:
63          - name: optimizer
64            image: bitnami/kubectl:latest
65            command:
66            - /bin/bash
67            - -c
68            - |
69              # Generate cost savings report
70              echo "Weekly VPA Optimization Report"
71              echo "=============================="
72
73              total_savings=0
74
75              for ns in production staging; do
76                echo ""
77                echo "Namespace: $ns"
78                echo "---"
79
80                for vpa in $(kubectl get vpa -n $ns -o name); do
81                  deployment=$(kubectl get $vpa -n $ns -o jsonpath='{.spec.targetRef.name}')
82
83                  # Calculate potential savings
84                  # (This is simplified; real calculation would be more complex)
85
86                  echo "  $deployment: Review recommendations"
87                done
88              done              
89          restartPolicy: OnFailure

Strategy 3: Cost Attribution & Showback

 1# Prometheus rules for cost tracking
 2apiVersion: monitoring.coreos.com/v1
 3kind: PrometheusRule
 4metadata:
 5  name: resource-cost-tracking
 6  namespace: monitoring
 7spec:
 8  groups:
 9  - name: resource-costs
10    interval: 5m
11    rules:
12
13    # CPU cost per namespace
14    - record: namespace:cpu_cost:sum
15      expr: |
16        sum(
17          kube_pod_container_resource_requests{resource="cpu", unit="core"}
18          * 0.04  # $0.04 per CPU hour
19        ) by (namespace)        
20
21    # Memory cost per namespace
22    - record: namespace:memory_cost:sum
23      expr: |
24        sum(
25          kube_pod_container_resource_requests{resource="memory", unit="byte"}
26          / (1024*1024*1024)  # Convert to GB
27          * 0.005  # $0.005 per GB hour
28        ) by (namespace)        
29
30    # Total cost per namespace
31    - record: namespace:total_cost:sum
32      expr: |
33        namespace:cpu_cost:sum + namespace:memory_cost:sum        
34
35    # VPA optimization potential
36    - record: namespace:vpa_savings_potential:sum
37      expr: |
38        sum(
39          kube_pod_container_resource_requests{resource="cpu"}
40          - on(pod, namespace) group_left()
41          kube_verticalpodautoscaler_spec_resourcepolicy_container_policies_target{resource="cpu"}
42        ) by (namespace)
43        * 0.04  # CPU price        
44
45---
46# Grafana dashboard for cost tracking (ConfigMap)
47apiVersion: v1
48kind: ConfigMap
49metadata:
50  name: cost-dashboard
51  namespace: monitoring
52  labels:
53    grafana_dashboard: "1"
54data:
55  cost-dashboard.json: |
56    {
57      "dashboard": {
58        "title": "Kubernetes Cost & VPA Savings",
59        "panels": [
60          {
61            "title": "Monthly Cost by Namespace",
62            "targets": [
63              {
64                "expr": "namespace:total_cost:sum * 730",
65                "legendFormat": "{{ namespace }}"
66              }
67            ]
68          },
69          {
70            "title": "VPA Potential Savings",
71            "targets": [
72              {
73                "expr": "namespace:vpa_savings_potential:sum * 730",
74                "legendFormat": "{{ namespace }}"
75              }
76            ]
77          }
78        ]
79      }
80    }    

Part 7: Monitoring VPA

VPA Metrics

 1# ServiceMonitor for VPA components
 2apiVersion: monitoring.coreos.com/v1
 3kind: ServiceMonitor
 4metadata:
 5  name: vpa-metrics
 6  namespace: kube-system
 7spec:
 8  selector:
 9    matchLabels:
10      app: vpa
11  endpoints:
12  - port: metrics
13    interval: 30s
14
15---
16# PrometheusRule for VPA alerts
17apiVersion: monitoring.coreos.com/v1
18kind: PrometheusRule
19metadata:
20  name: vpa-alerts
21  namespace: monitoring
22spec:
23  groups:
24  - name: vpa-health
25    interval: 30s
26    rules:
27
28    # VPA recommender not running
29    - alert: VPARecommenderDown
30      expr: up{job="vpa-recommender"} == 0
31      for: 5m
32      labels:
33        severity: critical
34      annotations:
35        summary: "VPA Recommender is down"
36        description: "VPA Recommender has been down for 5 minutes"
37
38    # VPA updater not running
39    - alert: VPAUpdaterDown
40      expr: up{job="vpa-updater"} == 0
41      for: 5m
42      labels:
43        severity: critical
44      annotations:
45        summary: "VPA Updater is down"
46        description: "VPA Updater has been down for 5 minutes"
47
48    # Large discrepancy between current and recommended
49    - alert: VPARecommendationMismatch
50      expr: |
51        (
52          kube_pod_container_resource_requests{resource="cpu"}
53          /
54          kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource="cpu"}
55        ) > 2 or
56        (
57          kube_pod_container_resource_requests{resource="cpu"}
58          /
59          kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource="cpu"}
60        ) < 0.5        
61      for: 1h
62      labels:
63        severity: warning
64      annotations:
65        summary: "Pod resources deviate significantly from VPA recommendation"
66        description: "Pod {{ $labels.pod }} in {{ $labels.namespace }} has resource requests 2x different from VPA target"
67
68    # OOMKilled pods that VPA should have prevented
69    - alert: OOMKilledDespiteVPA
70      expr: |
71        increase(kube_pod_container_status_terminated_reason{reason="OOMKilled"}[1h]) > 0
72        and on(pod, namespace)
73        kube_verticalpodautoscaler_spec_updatepolicy_updatemode{update_mode!="Off"} == 1        
74      labels:
75        severity: warning
76      annotations:
77        summary: "Pod OOMKilled despite VPA enabled"
78        description: "Pod {{ $labels.pod }} was OOMKilled even though VPA is active. Review VPA maxAllowed settings."

Grafana Dashboard for VPA

 1# Dashboard showing VPA effectiveness
 2kubectl apply -f - <<EOF
 3apiVersion: v1
 4kind: ConfigMap
 5metadata:
 6  name: vpa-dashboard
 7  namespace: monitoring
 8  labels:
 9    grafana_dashboard: "1"
10data:
11  vpa-overview.json: |
12    {
13      "dashboard": {
14        "title": "VPA Overview",
15        "panels": [
16          {
17            "title": "VPA Recommendations vs Actual",
18            "type": "graph",
19            "targets": [
20              {
21                "expr": "kube_pod_container_resource_requests{resource='cpu'}",
22                "legendFormat": "Actual - {{ pod }}"
23              },
24              {
25                "expr": "kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource='cpu'}",
26                "legendFormat": "VPA Target - {{ target_name }}"
27              }
28            ]
29          },
30          {
31            "title": "VPA Update Events",
32            "type": "table",
33            "targets": [
34              {
35                "expr": "changes(kube_pod_container_resource_requests[1h])",
36                "format": "table"
37              }
38            ]
39          },
40          {
41            "title": "Cost Savings from VPA",
42            "type": "stat",
43            "targets": [
44              {
45                "expr": "sum(namespace:vpa_savings_potential:sum) * 730"
46              }
47            ]
48          }
49        ]
50      }
51    }
52EOF

Part 8: Troubleshooting VPA

Common Issues

Issue 1: VPA Not Generating Recommendations

Symptoms:

1kubectl describe vpa my-app-vpa
2
3# Shows:
4# Recommendation: <none>

Diagnosis:

1# Check VPA recommender logs
2kubectl logs -n kube-system deployment/vpa-recommender
3
4# Check if Metrics Server is working
5kubectl top pods -n default
6
7# Verify VPA can access metrics
8kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods

Solutions:

 1# 1. Ensure Metrics Server is installed
 2kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
 3
 4# 2. Wait for sufficient data collection (minimum 24 hours)
 5
 6# 3. Verify pod has resource requests defined
 7kubectl get deployment my-app -o yaml | grep -A 5 resources
 8
 9# 4. Restart VPA recommender
10kubectl rollout restart deployment/vpa-recommender -n kube-system

Issue 2: VPA Causing Excessive Pod Restarts

Symptoms:

  • Frequent pod evictions
  • Service disruption
  • High pod restart counts

Diagnosis:

1# Check pod restart events
2kubectl get events --field-selector reason=Evicted -n production
3
4# View VPA updater logs
5kubectl logs -n kube-system deployment/vpa-updater
6
7# Check PodDisruptionBudget
8kubectl get pdb -n production

Solutions:

 1# 1. Add/update PodDisruptionBudget
 2apiVersion: policy/v1
 3kind: PodDisruptionBudget
 4metadata:
 5  name: my-app-pdb
 6spec:
 7  minAvailable: 2  # Ensure minimum availability
 8
 9---
10# 2. Change VPA update mode
11apiVersion: autoscaling.k8s.io/v1
12kind: VerticalPodAutoscaler
13metadata:
14  name: my-app-vpa
15spec:
16  updatePolicy:
17    updateMode: "Initial"  # Less disruptive
18
19---
20# 3. Increase minReplicas
21spec:
22  updatePolicy:
23    minReplicas: 3  # Don't update if less than 3 replicas

Issue 3: VPA and HPA Conflict

Symptoms:

  • Thrashing (rapid scale up/down)
  • Unexpected pod restarts
  • Resource request fluctuations

Diagnosis:

1# Check both VPA and HPA status
2kubectl get vpa,hpa -n production
3
4# View scaling events
5kubectl get events --sort-by='.lastTimestamp' | grep -E 'Scaled|Evicted'
6
7# Check if both manage same resources
8kubectl describe vpa my-app-vpa | grep controlledResources
9kubectl describe hpa my-app-hpa | grep metrics

Solutions:

 1# Option 1: VPA for CPU, HPA for custom metrics
 2apiVersion: autoscaling.k8s.io/v1
 3kind: VerticalPodAutoscaler
 4metadata:
 5  name: my-app-vpa
 6spec:
 7  resourcePolicy:
 8    containerPolicies:
 9    - containerName: '*'
10      controlledResources: ["cpu"]  # VPA manages CPU only
11
12---
13apiVersion: autoscaling/v2
14kind: HorizontalPodAutoscaler
15metadata:
16  name: my-app-hpa
17spec:
18  metrics:
19  - type: Pods
20    pods:
21      metric:
22        name: http_requests_per_second  # HPA uses custom metric
23      target:
24        type: AverageValue
25        averageValue: "1000"
26
27---
28# Option 2: Use VPA in "Off" mode
29apiVersion: autoscaling.k8s.io/v1
30kind: VerticalPodAutoscaler
31metadata:
32  name: my-app-vpa
33spec:
34  updatePolicy:
35    updateMode: "Off"  # Recommendations only

Part 9: Best Practices

Production Checklist

Before Enabling VPA:

  • Metrics Server installed and verified
  • Baseline metrics collected (minimum 7 days)
  • PodDisruptionBudgets configured
  • Resource limits defined in pod specs
  • Monitoring and alerting in place

VPA Configuration:

  • Start with “Off” mode for analysis
  • Set appropriate min/max bounds
  • Use “Initial” mode for safety
  • Configure PDBs for “Recreate” mode
  • Test in non-production first

When Combining VPA + HPA:

  • VPA manages different resources than HPA
  • Use “Initial” update mode
  • Monitor for conflicts
  • Document the strategy

Monitoring:

  • Track VPA recommendations vs actual
  • Alert on excessive evictions
  • Monitor OOMKilled events
  • Track cost savings

Deployment Patterns

Pattern 1: Gradual Rollout

 1# Week 1: Analysis only
 2kubectl apply -f vpa-off-mode.yaml
 3
 4# Week 2: Apply to test environment
 5kubectl apply -f vpa-initial-mode-test.yaml
 6
 7# Week 3: Apply to production (low-risk services)
 8kubectl apply -f vpa-initial-mode-prod.yaml
 9
10# Week 4: Expand to more services
11kubectl apply -f vpa-recreate-mode-prod.yaml

Pattern 2: Service Tiers

1# Tier 1: Critical services - VPA Off mode
2# (manual review required)
3
4# Tier 2: Important services - VPA Initial mode
5# (apply on scale events only)
6
7# Tier 3: Standard services - VPA Recreate mode
8# (automatic updates with PDB protection)

Key Takeaways

VPA Value Proposition

  1. Cost Optimization: 40-70% reduction in wasted resources
  2. Performance: Right-sized pods perform better
  3. Automation: Reduces manual resource tuning effort
  4. Reliability: Prevents OOMKilled events

When to Use VPA

Good Fit:

  • Unknown resource requirements
  • Variable workload patterns
  • Stateful applications
  • Long-running services
  • Cost optimization initiatives

Not Recommended:

  • Short-lived jobs (insufficient data)
  • Highly variable workloads (frequent restarts)
  • Critical services without PDBs
  • When combined with HPA on same metric

VPA Mode Selection Guide

ScenarioRecommended ModeRationale
Initial deploymentOff → InitialLearn first, then apply
Stateless appsRecreateSafe with PDBs
Stateful appsInitialMinimize disruption
Critical servicesOffManual control
With HPAInitial + Custom HPA metricsAvoid conflicts
TestingOffNo impact

Autoscaling Series

Conclusion

Vertical Pod Autoscaler is a powerful tool for resource optimization in Kubernetes, enabling:

  1. Automated Right-Sizing: Data-driven resource allocation
  2. Cost Reduction: Eliminate over-provisioning waste
  3. Performance Improvement: Prevent throttling and OOMKills
  4. Operational Efficiency: Reduce manual tuning effort

Implementation Roadmap

Month 1: Foundation

  • Install VPA components
  • Deploy in “Off” mode cluster-wide
  • Collect baseline recommendations

Month 2: Testing

  • Enable “Initial” mode in test environment
  • Validate recommendations
  • Establish monitoring

Month 3: Production

  • Gradual rollout to production
  • Start with non-critical services
  • Expand based on success

Month 4: Optimization

  • Fine-tune min/max bounds
  • Combine with HPA where appropriate
  • Measure cost savings

Next Steps

  1. Install VPA: Follow installation guide for your platform
  2. Start Small: Enable “Off” mode for a few deployments
  3. Analyze Data: Review recommendations after 7 days
  4. Implement Gradually: Move to “Initial” or “Recreate” mode
  5. Monitor & Iterate: Track savings and adjust

VPA transforms resource management from guesswork to data-driven optimization, delivering significant cost savings while improving application reliability. Combined with HPA and Cluster Autoscaler, it completes the Kubernetes autoscaling toolkit.

Happy optimizing! 💰📊