Series Overview
This is Part 5 of the Kubernetes Autoscaling Complete Guide series:
- Part 1: Horizontal Pod Autoscaler - Application-level autoscaling theory
- Part 2: Cluster Autoscaling & Cloud Providers - Infrastructure-level autoscaling
- Part 3: Hands-On HPA Demo - Practical implementation with Apache-PHP
- Part 4: Monitoring, Alerting & Threshold Tuning - Production observability
- Part 5 (This Post): VPA & Resource Optimization - Right-sizing and cost optimization
While Horizontal Pod Autoscaler (HPA) scales the number of pod replicas, Vertical Pod Autoscaler (VPA) optimizes resource requests and limits for individual pods. This guide explores VPA architecture, implementation strategies, safe combination with HPA, and comprehensive resource optimization techniques.
The Resource Management Challenge
The Cost of Misconfigured Resources
OVER-PROVISIONED SCENARIO:
┌─────────────────────────────────────────────────────────────┐
│ Pod Resource Configuration │
│ │
│ Requested: 2 CPU, 4GB RAM │
│ Actual Usage: 0.3 CPU (15%), 800MB RAM (20%) │
│ │
│ Waste: 1.7 CPU (85%), 3.2GB RAM (80%) │
│ Monthly Cost: $120 │
│ Wasted Cost: $102/month per pod │
│ │
│ With 100 pods: $10,200/month wasted │
└─────────────────────────────────────────────────────────────┘
UNDER-PROVISIONED SCENARIO:
┌─────────────────────────────────────────────────────────────┐
│ Pod Resource Configuration │
│ │
│ Requested: 0.5 CPU, 512MB RAM │
│ Actual Usage: 0.8 CPU (160%), 1.2GB RAM (240%) │
│ │
│ Problems: │
│ • CPU throttling → slow response times │
│ • OOMKilled → pod restarts │
│ • Service degradation │
│ • Customer impact → lost revenue │
└─────────────────────────────────────────────────────────────┘
VPA OPTIMIZED:
┌─────────────────────────────────────────────────────────────┐
│ Pod Resource Configuration │
│ │
│ Requested: 0.4 CPU, 1GB RAM │
│ Actual Usage: 0.35 CPU (87%), 900MB RAM (90%) │
│ │
│ Result: │
│ • 80% cost savings vs over-provisioned │
│ • No throttling or OOM issues │
│ • Optimal resource utilization │
└─────────────────────────────────────────────────────────────┘
Business Impact
| Metric | Without VPA | With VPA | Impact |
|---|---|---|---|
| Resource Waste | 40-70% typical | 5-15% | 60%+ cost reduction |
| OOMKilled Events | Common | Rare | Better reliability |
| CPU Throttling | Frequent | Minimal | Improved performance |
| Manual Tuning Time | Hours/week | Automated | Operational efficiency |
| Right-sizing Accuracy | Guesswork | Data-driven | Precision optimization |
Understanding Vertical Pod Autoscaler
VPA Architecture
┌──────────────────────────────────────────────────────────────────────┐
│ VPA ARCHITECTURE │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ VPA ADMISSION CONTROLLER │ │
│ │ │ │
│ │ • Intercepts pod creation requests │ │
│ │ • Injects resource requests/limits │ │
│ │ • Works at pod admission time │ │
│ └────────────────┬───────────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ VPA RECOMMENDER │ │
│ │ │ │
│ │ • Monitors pod resource usage (from Metrics Server) │ │
│ │ • Analyzes historical metrics │ │
│ │ • Calculates optimal resource requests │ │
│ │ • Stores recommendations in VPA objects │ │
│ └────────────────┬───────────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ VPA UPDATER │ │
│ │ │ │
│ │ • Checks if pods need resource updates │ │
│ │ • Evicts pods with outdated resource configs │ │
│ │ • Triggers pod recreation with new resources │ │
│ │ • Respects PodDisruptionBudgets │ │
│ └────────────────┬───────────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ KUBERNETES API & METRICS │ │
│ │ │ │
│ │ Metrics Server → VPA Recommender → VPA Object → Updater │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
VPA vs HPA Comparison
| Aspect | VPA | HPA |
|---|---|---|
| Scaling Direction | Vertical (resources per pod) | Horizontal (number of pods) |
| What it Changes | CPU/memory requests & limits | Replica count |
| Pod Disruption | Yes (recreation required) | No (gradual) |
| Best For | Right-sizing, cost optimization | Traffic scaling, load handling |
| Stateful Apps | Suitable | Complex |
| Response Time | Minutes (pod restart) | Seconds to minutes |
| Use Case | Unknown resource needs | Known scaling patterns |
| Combine with Other | Can combine with HPA (carefully) | Can combine with VPA |
Part 1: Installing VPA
Prerequisites
1# Ensure Metrics Server is installed
2kubectl get deployment metrics-server -n kube-system
3
4# Verify metrics are available
5kubectl top nodes
6kubectl top pods -A
Installation via Manifests
1# Clone VPA repository
2git clone https://github.com/kubernetes/autoscaler.git
3cd autoscaler/vertical-pod-autoscaler
4
5# Install VPA components
6./hack/vpa-up.sh
7
8# Verify installation
9kubectl get pods -n kube-system | grep vpa
10
11# Expected output:
12# vpa-admission-controller-xxx 1/1 Running 0 2m
13# vpa-recommender-xxx 1/1 Running 0 2m
14# vpa-updater-xxx 1/1 Running 0 2m
15
16# Verify CRDs
17kubectl get crd | grep verticalpodautoscaler
18
19# Expected:
20# verticalpodautoscalercheckpoints.autoscaling.k8s.io
21# verticalpodautoscalers.autoscaling.k8s.io
Installation via Helm
1# Add VPA Helm repository
2helm repo add fairwinds-stable https://charts.fairwinds.com/stable
3helm repo update
4
5# Install VPA
6helm install vpa fairwinds-stable/vpa \
7 --namespace kube-system \
8 --set recommender.enabled=true \
9 --set updater.enabled=true \
10 --set admissionController.enabled=true
11
12# Verify installation
13helm status vpa -n kube-system
14kubectl get pods -n kube-system -l app.kubernetes.io/name=vpa
Installation via AWS CDK (EKS Integration)
Add to your CDK stack from Part 3:
1import * as cdk from 'aws-cdk-lib';
2import * as eks from 'aws-cdk-lib/aws-eks';
3import { Construct } from 'constructs';
4
5export class EksVpaStack extends cdk.Stack {
6 constructor(scope: Construct, id: string, cluster: eks.Cluster, props?: cdk.StackProps) {
7 super(scope, id, props);
8
9 // Install VPA using Helm
10 const vpa = cluster.addHelmChart('VPA', {
11 chart: 'vpa',
12 repository: 'https://charts.fairwinds.com/stable',
13 namespace: 'kube-system',
14 release: 'vpa',
15 version: '4.4.6', // Check for latest version
16
17 values: {
18 // Recommender configuration
19 recommender: {
20 enabled: true,
21 extraArgs: {
22 'v': '4', // Verbose logging
23 'pod-recommendation-min-cpu-millicores': '25', // Minimum CPU recommendation
24 'pod-recommendation-min-memory-mb': '100', // Minimum memory recommendation
25 'recommendation-margin-fraction': '0.15', // 15% safety margin
26 'storage': 'prometheus', // Optional: Use Prometheus for history
27 },
28 resources: {
29 requests: {
30 cpu: '200m',
31 memory: '512Mi',
32 },
33 limits: {
34 cpu: '500m',
35 memory: '1Gi',
36 },
37 },
38 },
39
40 // Updater configuration
41 updater: {
42 enabled: true,
43 extraArgs: {
44 'min-replicas': '2', // Only update deployments with 2+ replicas
45 'eviction-tolerance': '0.5', // Max 50% pods can be evicting
46 },
47 resources: {
48 requests: {
49 cpu: '100m',
50 memory: '256Mi',
51 },
52 limits: {
53 cpu: '200m',
54 memory: '512Mi',
55 },
56 },
57 },
58
59 // Admission Controller configuration
60 admissionController: {
61 enabled: true,
62 generateCertificate: true,
63 resources: {
64 requests: {
65 cpu: '100m',
66 memory: '256Mi',
67 },
68 limits: {
69 cpu: '200m',
70 memory: '512Mi',
71 },
72 },
73 },
74
75 // Metrics Server dependency
76 metrics: {
77 enabled: false, // Assuming already installed
78 },
79 },
80 });
81
82 // Output VPA status check command
83 new cdk.CfnOutput(this, 'VPAStatusCommand', {
84 value: 'kubectl get pods -n kube-system -l app.kubernetes.io/name=vpa',
85 description: 'Command to check VPA pods status',
86 });
87 }
88}
Part 2: VPA Update Modes
VPA supports four update modes that control how it applies recommendations:
Mode 1: Off (Recommendation Only)
Use Case: Testing VPA without impacting workloads
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: my-app-vpa-off
5 namespace: default
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: my-app
11
12 updatePolicy:
13 updateMode: "Off" # Only generate recommendations
14
15# VPA will NOT modify pods, only provide recommendations
16# Check recommendations:
17# kubectl describe vpa my-app-vpa-off
Benefits:
- Safe exploration of VPA recommendations
- No disruption to running workloads
- Understand resource usage patterns
- Plan resource adjustments
Example Output:
1kubectl describe vpa my-app-vpa-off
2
3# Output shows recommendations:
4Recommendation:
5 Container Recommendations:
6 Container Name: my-app
7 Lower Bound:
8 Cpu: 150m
9 Memory: 256Mi
10 Target:
11 Cpu: 300m # Recommended request
12 Memory: 512Mi # Recommended request
13 Uncapped Target:
14 Cpu: 300m
15 Memory: 512Mi
16 Upper Bound:
17 Cpu: 1
18 Memory: 2Gi
Mode 2: Initial (Apply on Pod Creation Only)
Use Case: New deployments, gradual rollout
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: my-app-vpa-initial
5 namespace: default
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: my-app
11
12 updatePolicy:
13 updateMode: "Initial" # Apply only when pods are created
14
15 resourcePolicy:
16 containerPolicies:
17 - containerName: my-app
18 minAllowed:
19 cpu: 100m
20 memory: 128Mi
21 maxAllowed:
22 cpu: 2
23 memory: 4Gi
Behavior:
- VPA sets resource requests when pods are first created
- No changes to existing running pods
- Useful for new deployments or scaling events
- Safe for production workloads
When to Use:
- Initial deployment with unknown resource needs
- Canary deployments
- Blue/green deployments
- When combined with HPA (pods recreated during scale events)
Mode 3: Recreate (Apply by Restarting Pods)
Use Case: Production optimization with controlled disruption
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: my-app-vpa-recreate
5 namespace: default
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: my-app
11
12 updatePolicy:
13 updateMode: "Recreate" # VPA will evict and recreate pods
14
15 resourcePolicy:
16 containerPolicies:
17 - containerName: my-app
18 minAllowed:
19 cpu: 100m
20 memory: 128Mi
21 maxAllowed:
22 cpu: 2
23 memory: 4Gi
24 controlledResources: ["cpu", "memory"]
25 mode: Auto # VPA manages both requests and limits
26
27# PodDisruptionBudget to control eviction rate
28---
29apiVersion: policy/v1
30kind: PodDisruptionBudget
31metadata:
32 name: my-app-pdb
33 namespace: default
34spec:
35 minAvailable: 2 # At least 2 pods must remain available
36 selector:
37 matchLabels:
38 app: my-app
Behavior:
- VPA evicts pods with outdated resource configuration
- Pods are recreated with new resource requests
- Respects PodDisruptionBudgets
- Gradual rollout to maintain availability
Important Considerations:
- Disruption: Pods will be restarted
- Stateful Apps: Handle with care (use PVCs, proper shutdown)
- PDBs Required: Prevent cascading failures
- Monitoring: Watch for elevated pod restart rates
Mode 4: Auto (Future - Not Yet Implemented)
Status: Planned feature for in-place resource updates
1# Future capability (not yet available)
2apiVersion: autoscaling.k8s.io/v1
3kind: VerticalPodAutoscaler
4metadata:
5 name: my-app-vpa-auto
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: my-app
11
12 updatePolicy:
13 updateMode: "Auto" # In-place updates without pod restart
14
15# When available, will update resources WITHOUT pod eviction
16# Requires Kubernetes in-place resource update feature
Expected Behavior (when implemented):
- Update pod resources without restart
- Zero disruption
- Immediate application of new limits
Part 3: VPA Configuration Deep Dive
Basic VPA Configuration
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: my-app-vpa
5 namespace: default
6spec:
7 # Target workload
8 targetRef:
9 apiVersion: apps/v1
10 kind: Deployment # Can be: Deployment, StatefulSet, DaemonSet, ReplicaSet
11 name: my-app
12
13 # Update policy
14 updatePolicy:
15 updateMode: "Auto" # Off, Initial, Recreate, Auto
16
17 # Resource policy (constraints and rules)
18 resourcePolicy:
19 containerPolicies:
20 - containerName: '*' # Apply to all containers, or specify name
21 minAllowed:
22 cpu: 100m
23 memory: 128Mi
24 maxAllowed:
25 cpu: 2
26 memory: 4Gi
27 controlledResources: ["cpu", "memory"] # What VPA should manage
28
29 # Resource scaling mode
30 mode: Auto # Auto (manage requests & limits) or Off
Advanced VPA Configuration
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: advanced-vpa
5 namespace: production
6 labels:
7 app: my-app
8 environment: production
9spec:
10 targetRef:
11 apiVersion: apps/v1
12 kind: Deployment
13 name: my-app
14
15 updatePolicy:
16 updateMode: "Recreate"
17
18 # Minimum number of replicas required
19 minReplicas: 2 # Don't update if less than 2 replicas
20
21 resourcePolicy:
22 containerPolicies:
23
24 # Application container
25 - containerName: app
26 minAllowed:
27 cpu: 200m
28 memory: 256Mi
29 maxAllowed:
30 cpu: 4
31 memory: 8Gi
32 controlledResources: ["cpu", "memory"]
33 mode: Auto
34
35 # Resource scaling factors
36 controlledValues: RequestsAndLimits # or RequestsOnly
37
38 # Sidecar container (different policy)
39 - containerName: sidecar
40 minAllowed:
41 cpu: 50m
42 memory: 64Mi
43 maxAllowed:
44 cpu: 500m
45 memory: 512Mi
46 controlledResources: ["cpu", "memory"]
47 mode: Auto
48
49 # Recommender configuration
50 recommenders:
51 - name: custom-recommender # Use custom recommender if deployed
Resource Policy Options Explained
controlledResources
1# Option 1: Manage both CPU and memory
2controlledResources: ["cpu", "memory"]
3
4# Option 2: CPU only
5controlledResources: ["cpu"]
6
7# Option 3: Memory only
8controlledResources: ["memory"]
controlledValues
1# Option 1: Manage both requests and limits (default)
2controlledValues: RequestsAndLimits
3# VPA sets both resource requests and limits
4# Limit = Request * current limit/request ratio
5
6# Option 2: Manage requests only
7controlledValues: RequestsOnly
8# VPA only sets resource requests
9# Limits remain as defined in pod spec
Example:
1# Original pod spec:
2resources:
3 requests:
4 cpu: 100m
5 memory: 128Mi
6 limits:
7 cpu: 500m # 5x request
8 memory: 512Mi # 4x request
9
10# With controlledValues: RequestsAndLimits
11# VPA recommendation: 200m CPU, 256Mi memory
12# VPA sets:
13resources:
14 requests:
15 cpu: 200m
16 memory: 256Mi
17 limits:
18 cpu: 1000m # 5x request (ratio preserved)
19 memory: 1Gi # 4x request (ratio preserved)
20
21# With controlledValues: RequestsOnly
22# VPA sets:
23resources:
24 requests:
25 cpu: 200m
26 memory: 256Mi
27 limits:
28 cpu: 500m # Original limit (unchanged)
29 memory: 512Mi # Original limit (unchanged)
Part 4: Combining VPA with HPA
The Challenge
VPA and HPA can conflict when both try to manage the same workload:
Conflict Scenario:
┌────────────────────────────────────────────────────────────┐
│ Time: 10:00 - High CPU usage detected │
│ │
│ HPA: "CPU is high, scale from 3 to 6 pods" │
│ VPA: "CPU is high, increase CPU requests from 100m to 200m"│
│ │
│ Result: Both scale simultaneously │
│ • HPA adds 3 pods with old 100m requests │
│ • VPA tries to recreate all 6 pods with 200m requests │
│ • Cascading pod restarts │
│ • Service disruption │
└────────────────────────────────────────────────────────────┘
Safe Combination Strategies
Strategy 1: VPA for CPU, HPA for Custom Metrics
Recommendation: Most common and safest approach
1# VPA configuration
2apiVersion: autoscaling.k8s.io/v1
3kind: VerticalPodAutoscaler
4metadata:
5 name: my-app-vpa
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: my-app
11
12 updatePolicy:
13 updateMode: "Initial" # Only apply on new pods (from HPA scaling)
14
15 resourcePolicy:
16 containerPolicies:
17 - containerName: '*'
18 minAllowed:
19 cpu: 100m
20 memory: 256Mi
21 maxAllowed:
22 cpu: 2
23 memory: 4Gi
24 # KEY: Only manage CPU
25 controlledResources: ["cpu"]
26 controlledValues: RequestsOnly
27
28---
29# HPA configuration
30apiVersion: autoscaling/v2
31kind: HorizontalPodAutoscaler
32metadata:
33 name: my-app-hpa
34spec:
35 scaleTargetRef:
36 apiVersion: apps/v1
37 kind: Deployment
38 name: my-app
39
40 minReplicas: 2
41 maxReplicas: 20
42
43 # KEY: Use custom metrics, NOT CPU
44 metrics:
45 - type: Pods
46 pods:
47 metric:
48 name: http_requests_per_second
49 target:
50 type: AverageValue
51 averageValue: "1000"
52
53 # Or use memory (since VPA manages CPU)
54 - type: Resource
55 resource:
56 name: memory
57 target:
58 type: Utilization
59 averageUtilization: 70
Why This Works:
- VPA optimizes CPU requests based on actual usage
- HPA scales replicas based on request rate or memory
- No conflict: they manage different dimensions
Strategy 2: VPA Off Mode + Manual Right-sizing
1# VPA in recommendation-only mode
2apiVersion: autoscaling.k8s.io/v1
3kind: VerticalPodAutoscaler
4metadata:
5 name: my-app-vpa-readonly
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: my-app
11
12 updatePolicy:
13 updateMode: "Off" # Recommendations only
14
15---
16# HPA manages scaling
17apiVersion: autoscaling/v2
18kind: HorizontalPodAutoscaler
19metadata:
20 name: my-app-hpa
21spec:
22 scaleTargetRef:
23 apiVersion: apps/v1
24 kind: Deployment
25 name: my-app
26
27 minReplicas: 3
28 maxReplicas: 50
29
30 metrics:
31 - type: Resource
32 resource:
33 name: cpu
34 target:
35 type: Utilization
36 averageUtilization: 70
Process:
- VPA generates recommendations
- Review recommendations weekly/monthly
- Manually update deployment resource requests
- HPA continues to scale horizontally
Benefits:
- Zero conflict
- Full control over resource changes
- Suitable for conservative environments
Strategy 3: Separate Workloads
Best Practice: Use VPA and HPA on different workloads
1# VPA for stateful workloads (vertical scaling)
2apiVersion: autoscaling.k8s.io/v1
3kind: VerticalPodAutoscaler
4metadata:
5 name: database-vpa
6spec:
7 targetRef:
8 apiVersion: apps/v1
9 kind: StatefulSet
10 name: postgres
11 updatePolicy:
12 updateMode: "Recreate"
13 resourcePolicy:
14 containerPolicies:
15 - containerName: postgres
16 minAllowed:
17 cpu: 1
18 memory: 2Gi
19 maxAllowed:
20 cpu: 8
21 memory: 32Gi
22
23---
24# HPA for stateless workloads (horizontal scaling)
25apiVersion: autoscaling/v2
26kind: HorizontalPodAutoscaler
27metadata:
28 name: api-hpa
29spec:
30 scaleTargetRef:
31 apiVersion: apps/v1
32 kind: Deployment
33 name: api-server
34 minReplicas: 5
35 maxReplicas: 100
36 metrics:
37 - type: Resource
38 resource:
39 name: cpu
40 target:
41 type: Utilization
42 averageUtilization: 70
Configuration Matrix
| VPA Mode | HPA Metric | Result | Recommendation |
|---|---|---|---|
| Off | CPU | ✅ Safe | VPA provides insights, HPA scales |
| Initial | Custom (requests/sec) | ✅ Safe | VPA right-sizes on scale events |
| Initial | Memory | ✅ Safe | Different resources managed |
| Recreate | CPU | ⚠️ Risky | Can cause thrashing |
| Recreate | Custom | ✅ Safe | VPA updates resources, HPA scales on different metric |
| Recreate | Memory | ⚠️ Moderate | Monitor closely |
Part 5: Production VPA Examples
Example 1: Stateless Web Application
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: web-app
5 namespace: production
6spec:
7 replicas: 5
8 selector:
9 matchLabels:
10 app: web-app
11 template:
12 metadata:
13 labels:
14 app: web-app
15 spec:
16 containers:
17 - name: nginx
18 image: nginx:1.25
19 resources:
20 requests:
21 cpu: 100m # Initial guess
22 memory: 128Mi # Initial guess
23 limits:
24 cpu: 500m
25 memory: 512Mi
26
27---
28apiVersion: autoscaling.k8s.io/v1
29kind: VerticalPodAutoscaler
30metadata:
31 name: web-app-vpa
32 namespace: production
33spec:
34 targetRef:
35 apiVersion: apps/v1
36 kind: Deployment
37 name: web-app
38
39 updatePolicy:
40 updateMode: "Recreate"
41
42 resourcePolicy:
43 containerPolicies:
44 - containerName: nginx
45 minAllowed:
46 cpu: 50m
47 memory: 64Mi
48 maxAllowed:
49 cpu: 1
50 memory: 1Gi
51 controlledResources: ["cpu", "memory"]
52 controlledValues: RequestsAndLimits
53
54---
55# PDB to ensure availability during updates
56apiVersion: policy/v1
57kind: PodDisruptionBudget
58metadata:
59 name: web-app-pdb
60 namespace: production
61spec:
62 minAvailable: 3 # Keep at least 3 pods running
63 selector:
64 matchLabels:
65 app: web-app
Example 2: Stateful Database
1apiVersion: apps/v1
2kind: StatefulSet
3metadata:
4 name: postgres
5 namespace: databases
6spec:
7 serviceName: postgres
8 replicas: 3
9 selector:
10 matchLabels:
11 app: postgres
12 template:
13 metadata:
14 labels:
15 app: postgres
16 spec:
17 containers:
18 - name: postgres
19 image: postgres:15
20 resources:
21 requests:
22 cpu: 2
23 memory: 4Gi
24 limits:
25 cpu: 4
26 memory: 16Gi
27 volumeMounts:
28 - name: data
29 mountPath: /var/lib/postgresql/data
30
31 volumeClaimTemplates:
32 - metadata:
33 name: data
34 spec:
35 accessModes: ["ReadWriteOnce"]
36 resources:
37 requests:
38 storage: 100Gi
39
40---
41apiVersion: autoscaling.k8s.io/v1
42kind: VerticalPodAutoscaler
43metadata:
44 name: postgres-vpa
45 namespace: databases
46spec:
47 targetRef:
48 apiVersion: apps/v1
49 kind: StatefulSet
50 name: postgres
51
52 updatePolicy:
53 updateMode: "Initial" # Safer for stateful apps
54
55 resourcePolicy:
56 containerPolicies:
57 - containerName: postgres
58 minAllowed:
59 cpu: 1
60 memory: 2Gi
61 maxAllowed:
62 cpu: 8
63 memory: 32Gi
64 controlledResources: ["cpu", "memory"]
65 controlledValues: RequestsOnly # Keep original limits
66
67---
68apiVersion: policy/v1
69kind: PodDisruptionBudget
70metadata:
71 name: postgres-pdb
72 namespace: databases
73spec:
74 maxUnavailable: 1 # Only 1 pod can be down at a time
75 selector:
76 matchLabels:
77 app: postgres
Example 3: Microservices with Different Profiles
1# CPU-intensive service
2---
3apiVersion: autoscaling.k8s.io/v1
4kind: VerticalPodAutoscaler
5metadata:
6 name: image-processor-vpa
7 namespace: production
8spec:
9 targetRef:
10 apiVersion: apps/v1
11 kind: Deployment
12 name: image-processor
13
14 updatePolicy:
15 updateMode: "Recreate"
16
17 resourcePolicy:
18 containerPolicies:
19 - containerName: processor
20 minAllowed:
21 cpu: 500m # Higher CPU baseline
22 memory: 256Mi
23 maxAllowed:
24 cpu: 8 # Allow significant CPU growth
25 memory: 2Gi
26 controlledResources: ["cpu", "memory"]
27
28---
29# Memory-intensive service
30apiVersion: autoscaling.k8s.io/v1
31kind: VerticalPodAutoscaler
32metadata:
33 name: cache-service-vpa
34 namespace: production
35spec:
36 targetRef:
37 apiVersion: apps/v1
38 kind: Deployment
39 name: cache-service
40
41 updatePolicy:
42 updateMode: "Recreate"
43
44 resourcePolicy:
45 containerPolicies:
46 - containerName: redis
47 minAllowed:
48 cpu: 100m
49 memory: 1Gi # Higher memory baseline
50 maxAllowed:
51 cpu: 2
52 memory: 16Gi # Allow significant memory growth
53 controlledResources: ["cpu", "memory"]
54
55---
56# Balanced service
57apiVersion: autoscaling.k8s.io/v1
58kind: VerticalPodAutoscaler
59metadata:
60 name: api-service-vpa
61 namespace: production
62spec:
63 targetRef:
64 apiVersion: apps/v1
65 kind: Deployment
66 name: api-service
67
68 updatePolicy:
69 updateMode: "Initial" # Apply on HPA scale events
70
71 resourcePolicy:
72 containerPolicies:
73 - containerName: api
74 minAllowed:
75 cpu: 100m
76 memory: 128Mi
77 maxAllowed:
78 cpu: 2
79 memory: 2Gi
80 controlledResources: ["cpu", "memory"]
Part 6: Resource Optimization Strategies
Strategy 1: Rightsizing Workflow
Phase 1: Discovery (Week 1)
1# Step 1: Deploy VPA in "Off" mode for all deployments
2for deployment in $(kubectl get deployments -n production -o name); do
3 cat <<EOF | kubectl apply -f -
4apiVersion: autoscaling.k8s.io/v1
5kind: VerticalPodAutoscaler
6metadata:
7 name: $(basename $deployment)-vpa
8 namespace: production
9spec:
10 targetRef:
11 apiVersion: apps/v1
12 kind: Deployment
13 name: $(basename $deployment)
14 updatePolicy:
15 updateMode: "Off"
16EOF
17done
18
19# Step 2: Wait for 7 days to collect data
20
21# Step 3: Collect recommendations
22kubectl get vpa -n production -o yaml > vpa-recommendations.yaml
23
24# Step 4: Analyze recommendations
25for vpa in $(kubectl get vpa -n production -o name); do
26 echo "=== $vpa ==="
27 kubectl describe $vpa -n production | grep -A 20 "Target:"
28done
Phase 2: Analysis (Week 2)
1# Generate resource optimization report
2cat > analyze-vpa.sh <<'EOF'
3#!/bin/bash
4
5echo "VPA Recommendations Analysis"
6echo "============================="
7echo ""
8
9for vpa in $(kubectl get vpa -n production -o name); do
10 deployment=$(kubectl get $vpa -n production -o jsonpath='{.spec.targetRef.name}')
11
12 echo "Deployment: $deployment"
13
14 # Current requests
15 current_cpu=$(kubectl get deployment $deployment -n production -o jsonpath='{.spec.template.spec.containers[0].resources.requests.cpu}')
16 current_mem=$(kubectl get deployment $deployment -n production -o jsonpath='{.spec.template.spec.containers[0].resources.requests.memory}')
17
18 # VPA recommendations
19 target_cpu=$(kubectl get $vpa -n production -o jsonpath='{.status.recommendation.containerRecommendations[0].target.cpu}')
20 target_mem=$(kubectl get $vpa -n production -o jsonpath='{.status.recommendation.containerRecommendations[0].target.memory}')
21
22 echo " Current: CPU=$current_cpu, Memory=$current_mem"
23 echo " Target: CPU=$target_cpu, Memory=$target_mem"
24 echo ""
25done
26EOF
27
28chmod +x analyze-vpa.sh
29./analyze-vpa.sh
Phase 3: Implementation (Week 3)
1# Apply recommendations gradually
2# Start with non-critical services
3
4# 1. Test environment first
5kubectl patch deployment my-app -n production -p '{
6 "spec": {
7 "template": {
8 "spec": {
9 "containers": [
10 {
11 "name": "my-app",
12 "resources": {
13 "requests": {
14 "cpu": "300m",
15 "memory": "512Mi"
16 }
17 }
18 }
19 ]
20 }
21 }
22 }
23}'
24
25# 2. Monitor for issues
26kubectl top pods -n production -l app=my-app --watch
27
28# 3. If stable, proceed with production
Strategy 2: Cluster-Wide Optimization
1# Create VPA for all deployments using a script
2apiVersion: v1
3kind: ConfigMap
4metadata:
5 name: vpa-automation
6 namespace: kube-system
7data:
8 create-vpas.sh: |
9 #!/bin/bash
10
11 # Create VPA for all deployments in specific namespaces
12 NAMESPACES="production staging development"
13
14 for ns in $NAMESPACES; do
15 for deployment in $(kubectl get deployments -n $ns -o name); do
16 deployment_name=$(basename $deployment)
17
18 cat <<EOF | kubectl apply -f -
19 apiVersion: autoscaling.k8s.io/v1
20 kind: VerticalPodAutoscaler
21 metadata:
22 name: ${deployment_name}-vpa
23 namespace: $ns
24 labels:
25 managed-by: vpa-automation
26 spec:
27 targetRef:
28 apiVersion: apps/v1
29 kind: Deployment
30 name: ${deployment_name}
31 updatePolicy:
32 updateMode: "Initial" # Safe default
33 resourcePolicy:
34 containerPolicies:
35 - containerName: '*'
36 minAllowed:
37 cpu: 50m
38 memory: 64Mi
39 maxAllowed:
40 cpu: 4
41 memory: 8Gi
42 EOF
43 done
44 done
45
46 echo "VPA objects created for all deployments"
47
48---
49# CronJob to run automation weekly
50apiVersion: batch/v1
51kind: CronJob
52metadata:
53 name: vpa-optimizer
54 namespace: kube-system
55spec:
56 schedule: "0 2 * * 0" # Every Sunday at 2 AM
57 jobTemplate:
58 spec:
59 template:
60 spec:
61 serviceAccountName: vpa-automation
62 containers:
63 - name: optimizer
64 image: bitnami/kubectl:latest
65 command:
66 - /bin/bash
67 - -c
68 - |
69 # Generate cost savings report
70 echo "Weekly VPA Optimization Report"
71 echo "=============================="
72
73 total_savings=0
74
75 for ns in production staging; do
76 echo ""
77 echo "Namespace: $ns"
78 echo "---"
79
80 for vpa in $(kubectl get vpa -n $ns -o name); do
81 deployment=$(kubectl get $vpa -n $ns -o jsonpath='{.spec.targetRef.name}')
82
83 # Calculate potential savings
84 # (This is simplified; real calculation would be more complex)
85
86 echo " $deployment: Review recommendations"
87 done
88 done
89 restartPolicy: OnFailure
Strategy 3: Cost Attribution & Showback
1# Prometheus rules for cost tracking
2apiVersion: monitoring.coreos.com/v1
3kind: PrometheusRule
4metadata:
5 name: resource-cost-tracking
6 namespace: monitoring
7spec:
8 groups:
9 - name: resource-costs
10 interval: 5m
11 rules:
12
13 # CPU cost per namespace
14 - record: namespace:cpu_cost:sum
15 expr: |
16 sum(
17 kube_pod_container_resource_requests{resource="cpu", unit="core"}
18 * 0.04 # $0.04 per CPU hour
19 ) by (namespace)
20
21 # Memory cost per namespace
22 - record: namespace:memory_cost:sum
23 expr: |
24 sum(
25 kube_pod_container_resource_requests{resource="memory", unit="byte"}
26 / (1024*1024*1024) # Convert to GB
27 * 0.005 # $0.005 per GB hour
28 ) by (namespace)
29
30 # Total cost per namespace
31 - record: namespace:total_cost:sum
32 expr: |
33 namespace:cpu_cost:sum + namespace:memory_cost:sum
34
35 # VPA optimization potential
36 - record: namespace:vpa_savings_potential:sum
37 expr: |
38 sum(
39 kube_pod_container_resource_requests{resource="cpu"}
40 - on(pod, namespace) group_left()
41 kube_verticalpodautoscaler_spec_resourcepolicy_container_policies_target{resource="cpu"}
42 ) by (namespace)
43 * 0.04 # CPU price
44
45---
46# Grafana dashboard for cost tracking (ConfigMap)
47apiVersion: v1
48kind: ConfigMap
49metadata:
50 name: cost-dashboard
51 namespace: monitoring
52 labels:
53 grafana_dashboard: "1"
54data:
55 cost-dashboard.json: |
56 {
57 "dashboard": {
58 "title": "Kubernetes Cost & VPA Savings",
59 "panels": [
60 {
61 "title": "Monthly Cost by Namespace",
62 "targets": [
63 {
64 "expr": "namespace:total_cost:sum * 730",
65 "legendFormat": "{{ namespace }}"
66 }
67 ]
68 },
69 {
70 "title": "VPA Potential Savings",
71 "targets": [
72 {
73 "expr": "namespace:vpa_savings_potential:sum * 730",
74 "legendFormat": "{{ namespace }}"
75 }
76 ]
77 }
78 ]
79 }
80 }
Part 7: Monitoring VPA
VPA Metrics
1# ServiceMonitor for VPA components
2apiVersion: monitoring.coreos.com/v1
3kind: ServiceMonitor
4metadata:
5 name: vpa-metrics
6 namespace: kube-system
7spec:
8 selector:
9 matchLabels:
10 app: vpa
11 endpoints:
12 - port: metrics
13 interval: 30s
14
15---
16# PrometheusRule for VPA alerts
17apiVersion: monitoring.coreos.com/v1
18kind: PrometheusRule
19metadata:
20 name: vpa-alerts
21 namespace: monitoring
22spec:
23 groups:
24 - name: vpa-health
25 interval: 30s
26 rules:
27
28 # VPA recommender not running
29 - alert: VPARecommenderDown
30 expr: up{job="vpa-recommender"} == 0
31 for: 5m
32 labels:
33 severity: critical
34 annotations:
35 summary: "VPA Recommender is down"
36 description: "VPA Recommender has been down for 5 minutes"
37
38 # VPA updater not running
39 - alert: VPAUpdaterDown
40 expr: up{job="vpa-updater"} == 0
41 for: 5m
42 labels:
43 severity: critical
44 annotations:
45 summary: "VPA Updater is down"
46 description: "VPA Updater has been down for 5 minutes"
47
48 # Large discrepancy between current and recommended
49 - alert: VPARecommendationMismatch
50 expr: |
51 (
52 kube_pod_container_resource_requests{resource="cpu"}
53 /
54 kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource="cpu"}
55 ) > 2 or
56 (
57 kube_pod_container_resource_requests{resource="cpu"}
58 /
59 kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource="cpu"}
60 ) < 0.5
61 for: 1h
62 labels:
63 severity: warning
64 annotations:
65 summary: "Pod resources deviate significantly from VPA recommendation"
66 description: "Pod {{ $labels.pod }} in {{ $labels.namespace }} has resource requests 2x different from VPA target"
67
68 # OOMKilled pods that VPA should have prevented
69 - alert: OOMKilledDespiteVPA
70 expr: |
71 increase(kube_pod_container_status_terminated_reason{reason="OOMKilled"}[1h]) > 0
72 and on(pod, namespace)
73 kube_verticalpodautoscaler_spec_updatepolicy_updatemode{update_mode!="Off"} == 1
74 labels:
75 severity: warning
76 annotations:
77 summary: "Pod OOMKilled despite VPA enabled"
78 description: "Pod {{ $labels.pod }} was OOMKilled even though VPA is active. Review VPA maxAllowed settings."
Grafana Dashboard for VPA
1# Dashboard showing VPA effectiveness
2kubectl apply -f - <<EOF
3apiVersion: v1
4kind: ConfigMap
5metadata:
6 name: vpa-dashboard
7 namespace: monitoring
8 labels:
9 grafana_dashboard: "1"
10data:
11 vpa-overview.json: |
12 {
13 "dashboard": {
14 "title": "VPA Overview",
15 "panels": [
16 {
17 "title": "VPA Recommendations vs Actual",
18 "type": "graph",
19 "targets": [
20 {
21 "expr": "kube_pod_container_resource_requests{resource='cpu'}",
22 "legendFormat": "Actual - {{ pod }}"
23 },
24 {
25 "expr": "kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource='cpu'}",
26 "legendFormat": "VPA Target - {{ target_name }}"
27 }
28 ]
29 },
30 {
31 "title": "VPA Update Events",
32 "type": "table",
33 "targets": [
34 {
35 "expr": "changes(kube_pod_container_resource_requests[1h])",
36 "format": "table"
37 }
38 ]
39 },
40 {
41 "title": "Cost Savings from VPA",
42 "type": "stat",
43 "targets": [
44 {
45 "expr": "sum(namespace:vpa_savings_potential:sum) * 730"
46 }
47 ]
48 }
49 ]
50 }
51 }
52EOF
Part 8: Troubleshooting VPA
Common Issues
Issue 1: VPA Not Generating Recommendations
Symptoms:
1kubectl describe vpa my-app-vpa
2
3# Shows:
4# Recommendation: <none>
Diagnosis:
1# Check VPA recommender logs
2kubectl logs -n kube-system deployment/vpa-recommender
3
4# Check if Metrics Server is working
5kubectl top pods -n default
6
7# Verify VPA can access metrics
8kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
Solutions:
1# 1. Ensure Metrics Server is installed
2kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
3
4# 2. Wait for sufficient data collection (minimum 24 hours)
5
6# 3. Verify pod has resource requests defined
7kubectl get deployment my-app -o yaml | grep -A 5 resources
8
9# 4. Restart VPA recommender
10kubectl rollout restart deployment/vpa-recommender -n kube-system
Issue 2: VPA Causing Excessive Pod Restarts
Symptoms:
- Frequent pod evictions
- Service disruption
- High pod restart counts
Diagnosis:
1# Check pod restart events
2kubectl get events --field-selector reason=Evicted -n production
3
4# View VPA updater logs
5kubectl logs -n kube-system deployment/vpa-updater
6
7# Check PodDisruptionBudget
8kubectl get pdb -n production
Solutions:
1# 1. Add/update PodDisruptionBudget
2apiVersion: policy/v1
3kind: PodDisruptionBudget
4metadata:
5 name: my-app-pdb
6spec:
7 minAvailable: 2 # Ensure minimum availability
8
9---
10# 2. Change VPA update mode
11apiVersion: autoscaling.k8s.io/v1
12kind: VerticalPodAutoscaler
13metadata:
14 name: my-app-vpa
15spec:
16 updatePolicy:
17 updateMode: "Initial" # Less disruptive
18
19---
20# 3. Increase minReplicas
21spec:
22 updatePolicy:
23 minReplicas: 3 # Don't update if less than 3 replicas
Issue 3: VPA and HPA Conflict
Symptoms:
- Thrashing (rapid scale up/down)
- Unexpected pod restarts
- Resource request fluctuations
Diagnosis:
1# Check both VPA and HPA status
2kubectl get vpa,hpa -n production
3
4# View scaling events
5kubectl get events --sort-by='.lastTimestamp' | grep -E 'Scaled|Evicted'
6
7# Check if both manage same resources
8kubectl describe vpa my-app-vpa | grep controlledResources
9kubectl describe hpa my-app-hpa | grep metrics
Solutions:
1# Option 1: VPA for CPU, HPA for custom metrics
2apiVersion: autoscaling.k8s.io/v1
3kind: VerticalPodAutoscaler
4metadata:
5 name: my-app-vpa
6spec:
7 resourcePolicy:
8 containerPolicies:
9 - containerName: '*'
10 controlledResources: ["cpu"] # VPA manages CPU only
11
12---
13apiVersion: autoscaling/v2
14kind: HorizontalPodAutoscaler
15metadata:
16 name: my-app-hpa
17spec:
18 metrics:
19 - type: Pods
20 pods:
21 metric:
22 name: http_requests_per_second # HPA uses custom metric
23 target:
24 type: AverageValue
25 averageValue: "1000"
26
27---
28# Option 2: Use VPA in "Off" mode
29apiVersion: autoscaling.k8s.io/v1
30kind: VerticalPodAutoscaler
31metadata:
32 name: my-app-vpa
33spec:
34 updatePolicy:
35 updateMode: "Off" # Recommendations only
Part 9: Best Practices
Production Checklist
✅ Before Enabling VPA:
- Metrics Server installed and verified
- Baseline metrics collected (minimum 7 days)
- PodDisruptionBudgets configured
- Resource limits defined in pod specs
- Monitoring and alerting in place
✅ VPA Configuration:
- Start with “Off” mode for analysis
- Set appropriate min/max bounds
- Use “Initial” mode for safety
- Configure PDBs for “Recreate” mode
- Test in non-production first
✅ When Combining VPA + HPA:
- VPA manages different resources than HPA
- Use “Initial” update mode
- Monitor for conflicts
- Document the strategy
✅ Monitoring:
- Track VPA recommendations vs actual
- Alert on excessive evictions
- Monitor OOMKilled events
- Track cost savings
Deployment Patterns
Pattern 1: Gradual Rollout
1# Week 1: Analysis only
2kubectl apply -f vpa-off-mode.yaml
3
4# Week 2: Apply to test environment
5kubectl apply -f vpa-initial-mode-test.yaml
6
7# Week 3: Apply to production (low-risk services)
8kubectl apply -f vpa-initial-mode-prod.yaml
9
10# Week 4: Expand to more services
11kubectl apply -f vpa-recreate-mode-prod.yaml
Pattern 2: Service Tiers
1# Tier 1: Critical services - VPA Off mode
2# (manual review required)
3
4# Tier 2: Important services - VPA Initial mode
5# (apply on scale events only)
6
7# Tier 3: Standard services - VPA Recreate mode
8# (automatic updates with PDB protection)
Key Takeaways
VPA Value Proposition
- Cost Optimization: 40-70% reduction in wasted resources
- Performance: Right-sized pods perform better
- Automation: Reduces manual resource tuning effort
- Reliability: Prevents OOMKilled events
When to Use VPA
✅ Good Fit:
- Unknown resource requirements
- Variable workload patterns
- Stateful applications
- Long-running services
- Cost optimization initiatives
❌ Not Recommended:
- Short-lived jobs (insufficient data)
- Highly variable workloads (frequent restarts)
- Critical services without PDBs
- When combined with HPA on same metric
VPA Mode Selection Guide
| Scenario | Recommended Mode | Rationale |
|---|---|---|
| Initial deployment | Off → Initial | Learn first, then apply |
| Stateless apps | Recreate | Safe with PDBs |
| Stateful apps | Initial | Minimize disruption |
| Critical services | Off | Manual control |
| With HPA | Initial + Custom HPA metrics | Avoid conflicts |
| Testing | Off | No impact |
Related Topics
Autoscaling Series
- Part 1: Horizontal Pod Autoscaler - HPA theory and approaches
- Part 2: Cluster Autoscaling - Node-level autoscaling
- Part 3: Hands-On HPA Demo - Practical implementation
- Part 4: Monitoring & Alerting - Observability
Conclusion
Vertical Pod Autoscaler is a powerful tool for resource optimization in Kubernetes, enabling:
- Automated Right-Sizing: Data-driven resource allocation
- Cost Reduction: Eliminate over-provisioning waste
- Performance Improvement: Prevent throttling and OOMKills
- Operational Efficiency: Reduce manual tuning effort
Implementation Roadmap
Month 1: Foundation
- Install VPA components
- Deploy in “Off” mode cluster-wide
- Collect baseline recommendations
Month 2: Testing
- Enable “Initial” mode in test environment
- Validate recommendations
- Establish monitoring
Month 3: Production
- Gradual rollout to production
- Start with non-critical services
- Expand based on success
Month 4: Optimization
- Fine-tune min/max bounds
- Combine with HPA where appropriate
- Measure cost savings
Next Steps
- Install VPA: Follow installation guide for your platform
- Start Small: Enable “Off” mode for a few deployments
- Analyze Data: Review recommendations after 7 days
- Implement Gradually: Move to “Initial” or “Recreate” mode
- Monitor & Iterate: Track savings and adjust
VPA transforms resource management from guesswork to data-driven optimization, delivering significant cost savings while improving application reliability. Combined with HPA and Cluster Autoscaler, it completes the Kubernetes autoscaling toolkit.
Happy optimizing! 💰📊