Series Overview
This is Part 2 of the Kubernetes Autoscaling Complete Guide series:
- Part 1: Horizontal Pod Autoscaler - Application-level autoscaling with HPA, custom metrics, and KEDA
- Part 2 (This Post): Cluster Autoscaling & Cloud Providers - Infrastructure-level autoscaling with Cluster Autoscaler, Karpenter, and cloud-specific solutions
While Horizontal Pod Autoscaler (HPA) manages application-level scaling by adjusting pod replicas (covered in Part 1), production Kubernetes environments require intelligent cluster-level autoscaling that dynamically provisions and deprovisions compute resources. This comprehensive guide explores advanced autoscaling strategies across node management, cloud provider integrations, and cutting-edge autoscaling technologies.
The Complete Autoscaling Picture
Multi-Layer Autoscaling Architecture
Effective Kubernetes autoscaling operates across three interconnected layers:
┌─────────────────────────────────────────────────────────────────────────┐
│ KUBERNETES AUTOSCALING LAYERS │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ LAYER 3: APPLICATION AUTOSCALING │ │
│ │ • HPA (Horizontal Pod Autoscaler) │ │
│ │ • VPA (Vertical Pod Autoscaler) │ │
│ │ • KEDA (Event-Driven Autoscaling) │ │
│ │ ↓ Scales pod replicas based on metrics │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ LAYER 2: CLUSTER AUTOSCALING (This Guide's Focus) │ │
│ │ • Cluster Autoscaler │ │
│ │ • Karpenter │ │
│ │ • Cloud Provider Native Autoscaling │ │
│ │ ↓ Provisions/deprovisions nodes based on pod scheduling │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ LAYER 1: INFRASTRUCTURE AUTOSCALING │ │
│ │ • VM Instance Groups │ │
│ │ • AWS Auto Scaling Groups │ │
│ │ • Azure VM Scale Sets │ │
│ │ ↓ Manages underlying compute infrastructure │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Why Cluster Autoscaling Matters
Business Impact:
| Metric | Without Cluster Autoscaling | With Cluster Autoscaling |
|---|---|---|
| Infrastructure Costs | Over-provisioned 24/7 | 40-60% cost reduction |
| Incident Response | Manual node provisioning | Automated capacity addition |
| Resource Utilization | 20-30% average utilization | 60-80% utilization |
| Scaling Time | Hours (manual) | Minutes (automated) |
| Operational Burden | High (capacity planning) | Low (self-managing) |
Approach 1: Kubernetes Cluster Autoscaler (CA)
Overview and Architecture
The Cluster Autoscaler is the official Kubernetes project that automatically adjusts cluster size based on pod scheduling needs. It’s the most mature and widely adopted cluster autoscaling solution.
How Cluster Autoscaler Works:
┌─────────────────────────────────────────────────────────────────────┐
│ CLUSTER AUTOSCALER DECISION FLOW │
│ │
│ Pod Created → Pending State → CA Detects → Check Node Groups │
│ ↓ ↓ ↓ ↓ │
│ Scheduler No Resources Evaluation Available Types │
│ Attempts Available Logic & Constraints │
│ ↓ ↓ ↓ ↓ │
│ Fails to Triggers CA Simulates Selects Best │
│ Schedule Scale-Up Placement Node Group │
│ ↓ ↓ ↓ ↓ │
│ Remains Provisions Tests Fit Expands Group │
│ Pending New Node Scenarios (Cloud API) │
│ ↓ ↓ ↓ │
│ Node Joins Pod Scheduled Pod Running │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ SCALE-DOWN LOGIC (Proactive) │ │
│ │ │ │
│ │ Every 10s: Check node utilization │ │
│ │ ↓ │ │
│ │ Node < 50% utilized for 10+ minutes? │ │
│ │ ↓ │ │
│ │ Can all pods be rescheduled elsewhere? │ │
│ │ ↓ │ │
│ │ Safe to drain? (PDBs, local storage, etc.) │ │
│ │ ↓ │ │
│ │ Cordon → Drain → Terminate Node │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Implementation: Cluster Autoscaler on Self-Managed Kubernetes
Step 1: IAM Setup (AWS Example)
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Action": [
7 "autoscaling:DescribeAutoScalingGroups",
8 "autoscaling:DescribeAutoScalingInstances",
9 "autoscaling:DescribeLaunchConfigurations",
10 "autoscaling:DescribeScalingActivities",
11 "autoscaling:DescribeTags",
12 "ec2:DescribeInstanceTypes",
13 "ec2:DescribeLaunchTemplateVersions"
14 ],
15 "Resource": ["*"]
16 },
17 {
18 "Effect": "Allow",
19 "Action": [
20 "autoscaling:SetDesiredCapacity",
21 "autoscaling:TerminateInstanceInAutoScalingGroup",
22 "ec2:DescribeImages",
23 "ec2:GetInstanceTypesFromInstanceRequirements",
24 "eks:DescribeNodegroup"
25 ],
26 "Resource": ["*"]
27 }
28 ]
29}
Step 2: Auto Scaling Group Tags
1# Tag ASG for Cluster Autoscaler discovery
2aws autoscaling create-or-update-tags \
3 --tags \
4 ResourceId=my-asg-name \
5 ResourceType=auto-scaling-group \
6 Key=k8s.io/cluster-autoscaler/enabled \
7 Value=true \
8 PropagateAtLaunch=false \
9 --tags \
10 ResourceId=my-asg-name \
11 ResourceType=auto-scaling-group \
12 Key=k8s.io/cluster-autoscaler/my-cluster-name \
13 Value=owned \
14 PropagateAtLaunch=false
Step 3: Cluster Autoscaler Deployment
1apiVersion: v1
2kind: ServiceAccount
3metadata:
4 name: cluster-autoscaler
5 namespace: kube-system
6 annotations:
7 eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/cluster-autoscaler-role
8
9---
10apiVersion: rbac.authorization.k8s.io/v1
11kind: ClusterRole
12metadata:
13 name: cluster-autoscaler
14rules:
15- apiGroups: [""]
16 resources: ["events", "endpoints"]
17 verbs: ["create", "patch"]
18- apiGroups: [""]
19 resources: ["pods/eviction"]
20 verbs: ["create"]
21- apiGroups: [""]
22 resources: ["pods/status"]
23 verbs: ["update"]
24- apiGroups: [""]
25 resources: ["endpoints"]
26 resourceNames: ["cluster-autoscaler"]
27 verbs: ["get", "update"]
28- apiGroups: [""]
29 resources: ["nodes"]
30 verbs: ["watch", "list", "get", "update"]
31- apiGroups: [""]
32 resources: ["namespaces", "pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
33 verbs: ["watch", "list", "get"]
34- apiGroups: ["extensions"]
35 resources: ["replicasets", "daemonsets"]
36 verbs: ["watch", "list", "get"]
37- apiGroups: ["policy"]
38 resources: ["poddisruptionbudgets"]
39 verbs: ["watch", "list"]
40- apiGroups: ["apps"]
41 resources: ["statefulsets", "replicasets", "daemonsets"]
42 verbs: ["watch", "list", "get"]
43- apiGroups: ["storage.k8s.io"]
44 resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
45 verbs: ["watch", "list", "get"]
46- apiGroups: ["batch"]
47 resources: ["jobs", "cronjobs"]
48 verbs: ["watch", "list", "get"]
49- apiGroups: ["coordination.k8s.io"]
50 resources: ["leases"]
51 verbs: ["create"]
52- apiGroups: ["coordination.k8s.io"]
53 resourceNames: ["cluster-autoscaler"]
54 resources: ["leases"]
55 verbs: ["get", "update"]
56
57---
58apiVersion: rbac.authorization.k8s.io/v1
59kind: ClusterRoleBinding
60metadata:
61 name: cluster-autoscaler
62roleRef:
63 apiGroup: rbac.authorization.k8s.io
64 kind: ClusterRole
65 name: cluster-autoscaler
66subjects:
67- kind: ServiceAccount
68 name: cluster-autoscaler
69 namespace: kube-system
70
71---
72apiVersion: apps/v1
73kind: Deployment
74metadata:
75 name: cluster-autoscaler
76 namespace: kube-system
77 labels:
78 app: cluster-autoscaler
79spec:
80 replicas: 1
81 selector:
82 matchLabels:
83 app: cluster-autoscaler
84 template:
85 metadata:
86 labels:
87 app: cluster-autoscaler
88 annotations:
89 prometheus.io/scrape: "true"
90 prometheus.io/port: "8085"
91 spec:
92 priorityClassName: system-cluster-critical
93 serviceAccountName: cluster-autoscaler
94 containers:
95 - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.2
96 name: cluster-autoscaler
97 resources:
98 limits:
99 cpu: 100m
100 memory: 600Mi
101 requests:
102 cpu: 100m
103 memory: 600Mi
104 command:
105 - ./cluster-autoscaler
106 - --v=4
107 - --stderrthreshold=info
108 - --cloud-provider=aws
109 - --skip-nodes-with-local-storage=false
110 - --expander=least-waste
111 - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster-name
112 - --balance-similar-node-groups
113 - --skip-nodes-with-system-pods=false
114 # Scale-down configuration
115 - --scale-down-enabled=true
116 - --scale-down-delay-after-add=10m
117 - --scale-down-unneeded-time=10m
118 - --scale-down-utilization-threshold=0.5
119 # Advanced options
120 - --max-node-provision-time=15m
121 - --max-graceful-termination-sec=600
122 - --max-empty-bulk-delete=10
123 - --max-total-unready-percentage=45
124 - --ok-total-unready-count=3
125 - --new-pod-scale-up-delay=0s
126 env:
127 - name: AWS_REGION
128 value: us-west-2
129 volumeMounts:
130 - name: ssl-certs
131 mountPath: /etc/ssl/certs/ca-certificates.crt
132 readOnly: true
133 volumes:
134 - name: ssl-certs
135 hostPath:
136 path: /etc/ssl/certs/ca-bundle.crt
Configuration Options Explained
Expander Strategies:
| Expander | Selection Logic | Use Case |
|---|---|---|
| least-waste | Minimize unused resources | Cost optimization |
| most-pods | Fit most pending pods | High pod density |
| priority | User-defined priorities | Multi-tier workloads |
| random | Random selection | Testing/development |
| price | Lowest cost nodes | Budget-constrained |
Scale-Down Configuration:
1# Conservative scale-down (production)
2--scale-down-delay-after-add=15m # Wait 15 min after scale-up
3--scale-down-unneeded-time=20m # Node idle for 20 min
4--scale-down-utilization-threshold=0.5 # Below 50% utilization
5
6# Aggressive scale-down (dev/staging)
7--scale-down-delay-after-add=5m
8--scale-down-unneeded-time=5m
9--scale-down-utilization-threshold=0.3 # Below 30% utilization
Advanced: Multi-Node Group Configuration
1# Multiple node groups with different characteristics
2command:
3- ./cluster-autoscaler
4- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
5
6# Manual node group specification
7- --nodes=1:10:my-cluster-general-asg # General purpose
8- --nodes=0:20:my-cluster-spot-asg # Spot instances
9- --nodes=0:5:my-cluster-gpu-asg # GPU nodes
10- --nodes=2:8:my-cluster-memory-asg # Memory-optimized
11
12# Priority-based expander configuration
13---
14apiVersion: v1
15kind: ConfigMap
16metadata:
17 name: cluster-autoscaler-priority-expander
18 namespace: kube-system
19data:
20 priorities: |
21 10:
22 - .*-spot-.* # Prefer spot instances
23 50:
24 - .*-general-.* # Then general purpose
25 100:
26 - .*-gpu-.* # GPU nodes last resort
Preventing Unwanted Scale-Down
Node Annotations:
1# Prevent node from being scaled down
2kubectl annotate node ip-10-0-1-234.ec2.internal \
3 cluster-autoscaler.kubernetes.io/scale-down-disabled=true
4
5# Allow scale-down again
6kubectl annotate node ip-10-0-1-234.ec2.internal \
7 cluster-autoscaler.kubernetes.io/scale-down-disabled-
Pod Annotations:
1apiVersion: v1
2kind: Pod
3metadata:
4 name: critical-pod
5 annotations:
6 # Prevent node with this pod from scaling down
7 cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
8spec:
9 containers:
10 - name: app
11 image: myapp:v1.0
Pros and Cons
Advantages:
| Benefit | Description | Value |
|---|---|---|
| Mature & Stable | 5+ years production use | Battle-tested reliability |
| Cloud-Agnostic | Works on all major clouds | Portability across providers |
| Active Community | Official CNCF project | Regular updates, wide support |
| Cost Optimization | Automatic scale-down | 40-60% infrastructure savings |
| PDB Awareness | Respects disruption budgets | Safe scaling operations |
Limitations:
| Challenge | Impact | Mitigation |
|---|---|---|
| Slow Provisioning | 2-5 min node startup | Use warm pools, overprovisioning |
| ASG-Based | Rigid node group structure | Use Karpenter for flexibility |
| Limited Intelligence | Basic bin-packing | Priority expander for multi-tier |
| Scale-Down Delays | Capacity retained longer | Tune thresholds for workload |
| Node Group Fragmentation | Many ASGs to manage | Consolidate where possible |
When to Use Cluster Autoscaler
Ideal Scenarios:
- Traditional Kubernetes Clusters (self-managed or early EKS/GKE)
- Regulated Environments requiring stable, proven technology
- Multi-Cloud Deployments needing consistent behavior
- Existing ASG Infrastructure already in place
Not Recommended For:
- Highly Dynamic Workloads → Use Karpenter
- Spot-Heavy Strategies → Karpenter better handles interruptions
- Complex Scheduling Requirements → Karpenter’s just-in-time provisioning
Monitoring Cluster Autoscaler
1# Prometheus metrics scraping
2apiVersion: v1
3kind: Service
4metadata:
5 name: cluster-autoscaler
6 namespace: kube-system
7 labels:
8 app: cluster-autoscaler
9spec:
10 ports:
11 - port: 8085
12 protocol: TCP
13 targetPort: 8085
14 name: metrics
15 selector:
16 app: cluster-autoscaler
17
18---
19# ServiceMonitor for Prometheus Operator
20apiVersion: monitoring.coreos.com/v1
21kind: ServiceMonitor
22metadata:
23 name: cluster-autoscaler
24 namespace: kube-system
25spec:
26 selector:
27 matchLabels:
28 app: cluster-autoscaler
29 endpoints:
30 - port: metrics
31 interval: 30s
Key Metrics:
1# Cluster Autoscaler specific metrics
2cluster_autoscaler_scaled_up_nodes_total
3cluster_autoscaler_scaled_down_nodes_total
4cluster_autoscaler_unschedulable_pods_count
5cluster_autoscaler_nodes_count
6cluster_autoscaler_failed_scale_ups_total
7
8# Alert examples
9- alert: ClusterAutoscalerErrors
10 expr: rate(cluster_autoscaler_errors_total[15m]) > 0
11 for: 15m
12 annotations:
13 summary: "Cluster Autoscaler experiencing errors"
14
15- alert: UnschedulablePods
16 expr: cluster_autoscaler_unschedulable_pods_count > 0
17 for: 10m
18 annotations:
19 summary: "{{ $value }} pods unable to schedule"
Approach 2: Karpenter (Next-Generation Cluster Autoscaling)
Overview and Architecture
Karpenter is a modern, high-performance Kubernetes cluster autoscaler created by AWS that provisions just-in-time compute resources directly without relying on node groups. It represents a paradigm shift in cluster autoscaling.
Karpenter vs Cluster Autoscaler:
CLUSTER AUTOSCALER APPROACH:
┌─────────────────────────────────────────────────────┐
│ Pending Pod → Check ASGs → Select ASG → Scale ASG │
│ ↓ ↓ ↓ ↓ │
│ Fixed Pre-defined Limited Slow (3-5 │
│ Node Types Configs Choices minutes) │
└─────────────────────────────────────────────────────┘
KARPENTER APPROACH:
┌─────────────────────────────────────────────────────┐
│ Pending Pod → Analyze Needs → Provision Exactly │
│ ↓ ↓ ↓ │
│ Dynamic Pod Requests Right-sized │
│ Selection Constraints Node (30-60s) │
└─────────────────────────────────────────────────────┘
Key Innovations:
- Just-in-Time Provisioning: Creates nodes tailored to pending pods
- No Node Groups: Direct EC2 API interaction
- Bin-Packing Optimization: Intelligent consolidation
- Fast Provisioning: 30-60 second node startup
- Spot Optimization: Intelligent diversification
Architecture Overview
┌──────────────────────────────────────────────────────────────────┐
│ KARPENTER ARCHITECTURE │
│ │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ KARPENTER │ │ PROVISIONER │ │
│ │ CONTROLLER │───────▶│ RESOURCES │ │
│ │ │ │ (CRDs) │ │
│ │ • Watch Pods │ │ │ │
│ │ • Scheduling │ │ • NodePool │ │
│ │ • Bin-packing │ │ • EC2NodeClass │ │
│ └────────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌────────────────────────────────────────┐ │
│ │ DECISION ENGINE │ │
│ │ │ │
│ │ 1. Analyze pending pod requirements │ │
│ │ 2. Calculate optimal instance types │ │
│ │ 3. Check spot/on-demand availability │ │
│ │ 4. Provision via EC2 API │ │
│ │ 5. Register node to cluster │ │
│ └────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────┐ │
│ │ CONSOLIDATION ENGINE │ │
│ │ │ │
│ │ • Continuously analyze utilization │ │
│ │ • Replace with cheaper instances │ │
│ │ • Bin-pack to fewer nodes │ │
│ │ • Handle spot interruptions │ │
│ └────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Implementation: Karpenter on EKS
Step 1: Prerequisites and IAM Setup
1# Set environment variables
2export CLUSTER_NAME=my-eks-cluster
3export AWS_REGION=us-west-2
4export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
5export KARPENTER_VERSION=v0.32.1
6
7# Create Karpenter IAM role
8cat <<EOF > karpenter-controller-trust-policy.json
9{
10 "Version": "2012-10-17",
11 "Statement": [
12 {
13 "Effect": "Allow",
14 "Principal": {
15 "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/OIDC_ID"
16 },
17 "Action": "sts:AssumeRoleWithWebIdentity",
18 "Condition": {
19 "StringEquals": {
20 "oidc.eks.${AWS_REGION}.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com",
21 "oidc.eks.${AWS_REGION}.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:karpenter:karpenter"
22 }
23 }
24 }
25 ]
26}
27EOF
28
29# Create IAM role
30aws iam create-role \
31 --role-name KarpenterControllerRole-${CLUSTER_NAME} \
32 --assume-role-policy-document file://karpenter-controller-trust-policy.json
33
34# Attach policies
35aws iam attach-role-policy \
36 --role-name KarpenterControllerRole-${CLUSTER_NAME} \
37 --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy
Step 2: Install Karpenter via Helm
1# Add Karpenter Helm repo
2helm repo add karpenter https://charts.karpenter.sh
3helm repo update
4
5# Install Karpenter
6helm upgrade --install karpenter karpenter/karpenter \
7 --namespace karpenter \
8 --create-namespace \
9 --version ${KARPENTER_VERSION} \
10 --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME} \
11 --set settings.aws.clusterName=${CLUSTER_NAME} \
12 --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
13 --set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
14 --set controller.resources.requests.cpu=1 \
15 --set controller.resources.requests.memory=1Gi \
16 --set controller.resources.limits.cpu=1 \
17 --set controller.resources.limits.memory=1Gi \
18 --wait
Step 3: Create NodePool Configuration
1apiVersion: karpenter.sh/v1beta1
2kind: NodePool
3metadata:
4 name: default
5spec:
6 # Template for nodes
7 template:
8 metadata:
9 labels:
10 workload-type: general
11 spec:
12 # Requirements for node selection
13 requirements:
14 - key: karpenter.sh/capacity-type
15 operator: In
16 values: ["spot", "on-demand"]
17 - key: kubernetes.io/arch
18 operator: In
19 values: ["amd64"]
20 - key: karpenter.k8s.aws/instance-category
21 operator: In
22 values: ["c", "m", "r"]
23 - key: karpenter.k8s.aws/instance-generation
24 operator: Gt
25 values: ["5"]
26
27 # Node configuration
28 nodeClassRef:
29 name: default
30
31 # Taints for specialized workloads
32 taints: []
33
34 # Kubelet configuration
35 kubelet:
36 clusterDNS: ["10.100.0.10"]
37 maxPods: 110
38
39 # Limits for this NodePool
40 limits:
41 cpu: "1000"
42 memory: 1000Gi
43
44 # Disruption budget
45 disruption:
46 consolidationPolicy: WhenUnderutilized
47 expireAfter: 720h # 30 days
48
49---
50apiVersion: karpenter.k8s.aws/v1beta1
51kind: EC2NodeClass
52metadata:
53 name: default
54spec:
55 # AMI selection
56 amiFamily: AL2
57
58 # Subnet discovery
59 subnetSelectorTerms:
60 - tags:
61 karpenter.sh/discovery: ${CLUSTER_NAME}
62
63 # Security group discovery
64 securityGroupSelectorTerms:
65 - tags:
66 karpenter.sh/discovery: ${CLUSTER_NAME}
67
68 # IAM instance profile
69 instanceProfile: KarpenterNodeInstanceProfile-${CLUSTER_NAME}
70
71 # User data for node initialization
72 userData: |
73 #!/bin/bash
74 /etc/eks/bootstrap.sh ${CLUSTER_NAME}
75
76 # Block device mappings
77 blockDeviceMappings:
78 - deviceName: /dev/xvda
79 ebs:
80 volumeSize: 50Gi
81 volumeType: gp3
82 encrypted: true
83 deleteOnTermination: true
84
85 # Metadata options
86 metadataOptions:
87 httpEndpoint: enabled
88 httpProtocolIPv6: disabled
89 httpPutResponseHopLimit: 2
90 httpTokens: required
91
92 # Tags applied to EC2 instances
93 tags:
94 Team: platform
95 Environment: production
96 ManagedBy: karpenter
Advanced: Multi-NodePool Strategy
Production-Ready Multi-Tier Configuration:
1# General purpose workloads (spot-optimized)
2apiVersion: karpenter.sh/v1beta1
3kind: NodePool
4metadata:
5 name: general-spot
6spec:
7 template:
8 metadata:
9 labels:
10 workload-type: general
11 capacity-type: spot
12 spec:
13 requirements:
14 - key: karpenter.sh/capacity-type
15 operator: In
16 values: ["spot"]
17 - key: karpenter.k8s.aws/instance-category
18 operator: In
19 values: ["c", "m", "r"]
20 - key: karpenter.k8s.aws/instance-cpu
21 operator: In
22 values: ["4", "8", "16"]
23 - key: karpenter.k8s.aws/instance-generation
24 operator: Gt
25 values: ["5"]
26 nodeClassRef:
27 name: general
28
29 limits:
30 cpu: "500"
31 memory: 500Gi
32
33 disruption:
34 consolidationPolicy: WhenUnderutilized
35 consolidateAfter: 30s
36
37---
38# On-demand for critical workloads
39apiVersion: karpenter.sh/v1beta1
40kind: NodePool
41metadata:
42 name: critical-ondemand
43spec:
44 template:
45 metadata:
46 labels:
47 workload-type: critical
48 capacity-type: on-demand
49 spec:
50 requirements:
51 - key: karpenter.sh/capacity-type
52 operator: In
53 values: ["on-demand"]
54 - key: karpenter.k8s.aws/instance-category
55 operator: In
56 values: ["c", "m"]
57 - key: karpenter.k8s.aws/instance-size
58 operator: In
59 values: ["large", "xlarge", "2xlarge"]
60 nodeClassRef:
61 name: general
62 taints:
63 - key: workload
64 value: critical
65 effect: NoSchedule
66
67 weight: 50 # Higher priority than spot
68
69 limits:
70 cpu: "200"
71
72 disruption:
73 consolidationPolicy: WhenEmpty
74 consolidateAfter: 300s
75
76---
77# GPU workloads
78apiVersion: karpenter.sh/v1beta1
79kind: NodePool
80metadata:
81 name: gpu
82spec:
83 template:
84 metadata:
85 labels:
86 workload-type: gpu
87 nvidia.com/gpu: "true"
88 spec:
89 requirements:
90 - key: karpenter.sh/capacity-type
91 operator: In
92 values: ["on-demand", "spot"]
93 - key: karpenter.k8s.aws/instance-family
94 operator: In
95 values: ["p3", "p4", "g5"]
96 - key: node.kubernetes.io/instance-type
97 operator: In
98 values: ["p3.2xlarge", "g5.xlarge", "g5.2xlarge"]
99 nodeClassRef:
100 name: gpu
101 taints:
102 - key: nvidia.com/gpu
103 value: "true"
104 effect: NoSchedule
105 kubelet:
106 maxPods: 50
107
108 limits:
109 cpu: "100"
110 nvidia.com/gpu: "16"
111
112 disruption:
113 consolidationPolicy: WhenEmpty
114 consolidateAfter: 600s
115
116---
117# Memory-optimized for caching/databases
118apiVersion: karpenter.sh/v1beta1
119kind: NodePool
120metadata:
121 name: memory-optimized
122spec:
123 template:
124 metadata:
125 labels:
126 workload-type: memory-intensive
127 spec:
128 requirements:
129 - key: karpenter.k8s.aws/instance-category
130 operator: In
131 values: ["r", "x"]
132 - key: karpenter.k8s.aws/instance-memory
133 operator: Gt
134 values: ["32768"] # > 32GB RAM
135 nodeClassRef:
136 name: general
137 taints:
138 - key: workload
139 value: memory-intensive
140 effect: NoSchedule
141
142 limits:
143 memory: 1000Gi
144
145 disruption:
146 consolidationPolicy: WhenUnderutilized
147 consolidateAfter: 300s
Pod Configuration for Karpenter
Using NodePools Effectively:
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: web-app
5spec:
6 replicas: 10
7 template:
8 spec:
9 # Select spot nodes
10 nodeSelector:
11 karpenter.sh/capacity-type: spot
12 workload-type: general
13
14 # Tolerate spot interruptions
15 tolerations:
16 - key: karpenter.sh/disruption
17 operator: Exists
18 effect: NoSchedule
19
20 containers:
21 - name: app
22 image: myapp:v1.0
23 resources:
24 requests:
25 cpu: "500m"
26 memory: "512Mi"
27 limits:
28 cpu: "1000m"
29 memory: "1Gi"
30
31---
32# Critical database workload
33apiVersion: apps/v1
34kind: StatefulSet
35metadata:
36 name: database
37spec:
38 replicas: 3
39 template:
40 spec:
41 # Force on-demand nodes
42 nodeSelector:
43 karpenter.sh/capacity-type: on-demand
44 workload-type: critical
45
46 # Require critical node pool
47 tolerations:
48 - key: workload
49 value: critical
50 effect: NoSchedule
51
52 affinity:
53 # Spread across availability zones
54 podAntiAffinity:
55 requiredDuringSchedulingIgnoredDuringExecution:
56 - labelSelector:
57 matchExpressions:
58 - key: app
59 operator: In
60 values: ["database"]
61 topologyKey: topology.kubernetes.io/zone
62
63 containers:
64 - name: postgres
65 image: postgres:14
66 resources:
67 requests:
68 cpu: "4000m"
69 memory: "16Gi"
Karpenter Best Practices
1. Consolidation Configuration:
1# Aggressive consolidation (cost-optimized)
2disruption:
3 consolidationPolicy: WhenUnderutilized
4 consolidateAfter: 30s
5
6# Conservative consolidation (stability-focused)
7disruption:
8 consolidationPolicy: WhenEmpty
9 consolidateAfter: 600s
10
11# Disabled consolidation (manual control)
12disruption:
13 consolidationPolicy: WhenEmpty
14 consolidateAfter: Never
2. Spot Interruption Handling:
1# Karpenter automatically handles spot interruptions
2# Enable interruption queue for graceful handling
3apiVersion: v1
4kind: ConfigMap
5metadata:
6 name: karpenter-global-settings
7 namespace: karpenter
8data:
9 # AWS SQS queue for spot interruption notifications
10 aws.interruptionQueueName: ${CLUSTER_NAME}
11 # Timeout for draining nodes
12 featureGates.driftEnabled: "true"
3. Instance Diversification:
1requirements:
2# Allow many instance types for better spot availability
3- key: karpenter.k8s.aws/instance-category
4 operator: In
5 values: ["c", "m", "r"]
6- key: karpenter.k8s.aws/instance-generation
7 operator: Gt
8 values: ["5"] # Only use generation 6+
9- key: karpenter.k8s.aws/instance-size
10 operator: In
11 values: ["large", "xlarge", "2xlarge", "4xlarge"]
Pros and Cons
Advantages:
| Benefit | Description | Impact |
|---|---|---|
| Fast Provisioning | 30-60s vs 3-5min | 5x faster scale-out |
| Cost Optimization | Right-sized nodes | 20-40% additional savings |
| No Node Groups | Direct EC2 API | Simplified management |
| Intelligent Consolidation | Automatic bin-packing | Continuous optimization |
| Spot Optimization | Diversification + handling | 70-90% cost reduction |
| Just-in-Time | Provisions exact needs | Eliminates waste |
Limitations:
| Challenge | Impact | Consideration |
|---|---|---|
| AWS-Specific | EKS only (currently) | Not portable to other clouds |
| Newer Technology | Less battle-tested | Thorough testing required |
| Complexity | More configuration options | Learning curve |
| Breaking Changes | Rapid API evolution | Stay updated on versions |
When to Use Karpenter
Ideal Scenarios:
- AWS EKS Clusters (native integration)
- Highly Dynamic Workloads with variable requirements
- Spot-Heavy Strategies needing intelligent diversification
- Cost Optimization Focus as primary driver
- Modern Architectures embracing latest technologies
Migration Path from Cluster Autoscaler:
1# Phase 1: Deploy Karpenter alongside Cluster Autoscaler
2# Phase 2: Create NodePools for new workloads
3# Phase 3: Gradually migrate workloads to Karpenter nodes
4# Phase 4: Scale down old ASGs
5# Phase 5: Remove Cluster Autoscaler
6
7# Coexistence example
8kubectl label nodes -l eks.amazonaws.com/nodegroup=old-ng \
9 karpenter.sh/managed=false
Monitoring Karpenter
1# Prometheus metrics
2apiVersion: v1
3kind: Service
4metadata:
5 name: karpenter-metrics
6 namespace: karpenter
7spec:
8 selector:
9 app.kubernetes.io/name: karpenter
10 ports:
11 - port: 8080
12 name: metrics
13
14---
15# Key Karpenter metrics
16karpenter_nodes_created
17karpenter_nodes_terminated
18karpenter_pods_state
19karpenter_disruption_decisions_total
20karpenter_interruption_received_messages
21
22# Grafana dashboard
23# https://github.com/aws/karpenter/tree/main/website/content/en/preview/getting-started/getting-started-with-karpenter/grafana-dashboard
Approach 3: AWS EKS-Specific Autoscaling
Managed Node Groups Autoscaling
Native EKS Integration:
1// AWS CDK example
2import * as eks from 'aws-cdk-lib/aws-eks';
3import * as ec2 from 'aws-cdk-lib/aws-ec2';
4
5// Create managed node group with autoscaling
6const nodeGroup = cluster.addNodegroupCapacity('standard-nodes', {
7 instanceTypes: [
8 ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.LARGE),
9 ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.XLARGE),
10 ],
11 minSize: 2,
12 maxSize: 20,
13 desiredSize: 5,
14
15 // Spot instances
16 capacityType: eks.CapacityType.SPOT,
17
18 // Scaling configuration
19 amiType: eks.NodegroupAmiType.AL2_X86_64,
20 diskSize: 50,
21
22 // Labels and taints
23 labels: {
24 'workload-type': 'general',
25 },
26
27 // Remote access
28 remoteAccess: {
29 sshKeyName: 'my-key',
30 },
31});
EKS Auto Mode (Preview)
Fully Managed Compute:
1# EKS Auto Mode removes need for node management entirely
2# AWS manages:
3# - Node provisioning
4# - Auto-scaling
5# - Security patching
6# - Capacity optimization
7
8# Enable during cluster creation
9aws eks create-cluster \
10 --name my-cluster \
11 --compute-config enabled=true
12
13# Workload specifications drive capacity
14apiVersion: apps/v1
15kind: Deployment
16metadata:
17 name: app
18spec:
19 replicas: 10
20 template:
21 spec:
22 containers:
23 - name: app
24 resources:
25 requests:
26 cpu: "1000m"
27 memory: "2Gi"
28 # EKS Auto Mode handles the rest
AWS Fargate for EKS
Serverless Kubernetes:
1# Fargate profile
2apiVersion: v1
3kind: ConfigMap
4metadata:
5 name: fargate-profile
6data:
7 profile: |
8 {
9 "fargateProfileName": "serverless-apps",
10 "selectors": [
11 {
12 "namespace": "serverless",
13 "labels": {
14 "compute-type": "fargate"
15 }
16 }
17 ]
18 }
19
20---
21# Pods automatically run on Fargate
22apiVersion: v1
23kind: Pod
24metadata:
25 name: serverless-app
26 namespace: serverless
27 labels:
28 compute-type: fargate
29spec:
30 containers:
31 - name: app
32 image: myapp:v1.0
33 resources:
34 requests:
35 cpu: "500m"
36 memory: "1Gi"
37# No node management needed!
Fargate Pricing Model:
Cost = (vCPU × $0.04048/hour) + (GB RAM × $0.004445/hour)
Example:
2 vCPU + 4GB RAM = (2 × $0.04048) + (4 × $0.004445)
= $0.08096 + $0.01778
= $0.09874 per hour
= $71/month (24/7)
vs EC2 t3.medium (2vCPU, 4GB) = $30/month
Fargate Cost-Effective When:
- Intermittent workloads (not 24/7)
- Need zero operational overhead
- Compliance/isolation requirements
Approach 4: GKE-Specific Autoscaling
GKE Cluster Autoscaler
Native GKE Integration:
1# GKE cluster with autoscaling
2gcloud container clusters create my-cluster \
3 --enable-autoscaling \
4 --min-nodes=1 \
5 --max-nodes=10 \
6 --zone=us-central1-a \
7 --machine-type=n1-standard-4 \
8 --enable-autoprovisioning \
9 --min-cpu=1 \
10 --max-cpu=100 \
11 --min-memory=1 \
12 --max-memory=1000 \
13 --autoprovisioning-scopes=https://www.googleapis.com/auth/compute
Node Auto-Provisioning (NAP)
Intelligent Node Pool Creation:
1# GKE automatically creates node pools based on workload needs
2gcloud container clusters update my-cluster \
3 --enable-autoprovisioning \
4 --autoprovisioning-config-file=config.yaml
5
6# config.yaml
7resourceLimits:
8- resourceType: cpu
9 minimum: 1
10 maximum: 100
11- resourceType: memory
12 minimum: 1
13 maximum: 1000
14- resourceType: nvidia-tesla-k80
15 minimum: 0
16 maximum: 4
17
18autoscalingProfile: OPTIMIZE_UTILIZATION # or BALANCED
19
20management:
21 autoUpgrade: true
22 autoRepair: true
How NAP Works:
Pod with GPU → No suitable node → NAP creates GPU node pool → Pod schedules
↓ ↓ ↓ ↓
Specific Analyze pod Choose optimal Auto-scale
Requirements requirements instance type as needed
GKE Autopilot
Fully Managed GKE:
1# Create Autopilot cluster
2gcloud container clusters create-auto my-autopilot-cluster \
3 --region=us-central1
4
5# Autopilot handles:
6# - Node provisioning
7# - Auto-scaling
8# - Security hardening
9# - Capacity optimization
10# - Networking configuration
11
12# You only manage workloads
13kubectl apply -f deployment.yaml
14
15# Autopilot automatically:
16# - Provisions right-sized nodes
17# - Scales based on pod needs
18# - Optimizes cost and performance
19# - Handles node upgrades
Autopilot Pricing:
Cost = Sum of pod resource requests
Example Deployment:
10 pods × (0.5 vCPU + 1GB RAM)
= 5 vCPU + 10GB RAM
= (5 × $0.04208) + (10 × $0.00463)
= $0.2104 + $0.0463
= $0.2567 per hour
= $185/month
Includes:
- Compute resources
- GKE management fee
- Networking egress (within limits)
Pros and Cons
GKE Autoscaling Advantages:
| Feature | Benefit |
|---|---|
| Node Auto-Provisioning | Creates optimal node pools automatically |
| Autopilot Mode | Zero node management |
| Integrated Monitoring | Built-in Cloud Monitoring |
| Fast Provisioning | GCE startup optimization |
| Preemptible VM Support | 80% cost savings |
Limitations:
| Challenge | Impact |
|---|---|
| GCP Lock-in | Not portable |
| Autopilot Constraints | Limited customization |
| Cost | Premium pricing for convenience |
Approach 5: Azure AKS-Specific Autoscaling
AKS Cluster Autoscaler
1# Enable cluster autoscaler
2az aks update \
3 --resource-group myResourceGroup \
4 --name myAKSCluster \
5 --enable-cluster-autoscaler \
6 --min-count 1 \
7 --max-count 10
8
9# Multiple node pools
10az aks nodepool add \
11 --resource-group myResourceGroup \
12 --cluster-name myAKSCluster \
13 --name spotpool \
14 --enable-cluster-autoscaler \
15 --min-count 0 \
16 --max-count 20 \
17 --priority Spot \
18 --eviction-policy Delete \
19 --spot-max-price -1 \
20 --node-vm-size Standard_DS2_v2
Azure Container Instances (ACI) Integration
Virtual Nodes (Serverless):
1# Enable virtual nodes
2az aks enable-addons \
3 --resource-group myResourceGroup \
4 --name myAKSCluster \
5 --addons virtual-node \
6 --subnet-name VirtualNodeSubnet
7
8# Pods with virtual-kubelet toleration run on ACI
9apiVersion: v1
10kind: Pod
11metadata:
12 name: serverless-pod
13spec:
14 containers:
15 - name: app
16 image: myapp:v1.0
17 tolerations:
18 - key: virtual-kubelet.io/provider
19 operator: Equal
20 value: azure
21 effect: NoSchedule
22 nodeSelector:
23 type: virtual-kubelet
Comparison: Cloud Provider Autoscaling Solutions
| Feature | EKS | GKE | AKS |
|---|---|---|---|
| Cluster Autoscaler | ✅ Standard | ✅ Standard | ✅ Standard |
| Advanced Autoscaler | Karpenter | NAP | Standard CA |
| Serverless Pods | Fargate | Autopilot | ACI Virtual Nodes |
| Fully Managed | EKS Auto Mode | Autopilot | AKS Automatic |
| Spot Instance Support | ✅ Excellent | ✅ Preemptible | ✅ Spot VMs |
| Provisioning Speed | 2-5 min (30s Karpenter) | 1-3 min | 2-4 min |
| Cost Optimization | Karpenter best-in-class | NAP intelligent | Standard |
| Multi-Architecture | ✅ ARM64 support | ✅ ARM64 support | Limited |
Emerging Autoscaling Technologies
1. Kamaji (Multi-Tenant Control Planes)
1# Virtual control plane per tenant
2apiVersion: kamaji.clastix.io/v1alpha1
3kind: TenantControlPlane
4metadata:
5 name: tenant-a
6spec:
7 controlPlane:
8 deployment:
9 replicas: 2
10 network:
11 serviceType: LoadBalancer
12 addons:
13 coreDNS: {}
14 konnectivity: {}
15
16# Each tenant gets isolated autoscaling
2. Kwok (Kubernetes WithOut Kubelet)
1# Simulate thousands of nodes for testing autoscaling
2kwok \
3 --kubeconfig=~/.kube/config \
4 --manage-all-nodes=false \
5 --manage-nodes-with-annotation-selector=kwok.x-k8s.io/node=fake \
6 --disregard-status-with-annotation-selector=kwok.x-k8s.io/status=custom
7
8# Test autoscaling logic without real infrastructure cost
3. Volcano (Batch Job Scheduling)
1# Advanced scheduling for ML/batch workloads
2apiVersion: batch.volcano.sh/v1alpha1
3kind: Job
4metadata:
5 name: ml-training
6spec:
7 minAvailable: 4
8 schedulerName: volcano
9 policies:
10 - event: PodEvicted
11 action: RestartJob
12 tasks:
13 - replicas: 8
14 name: worker
15 template:
16 spec:
17 containers:
18 - name: worker
19 image: ml-trainer:v1.0
20 resources:
21 requests:
22 nvidia.com/gpu: 1
23
24# Volcano coordinates autoscaling with job scheduling
Production Best Practices
1. Hybrid Autoscaling Strategy
1# Baseline: Cluster Autoscaler for stability
2# Dynamic: Karpenter for optimization
3# Serverless: Fargate/Autopilot for burstiness
4
5apiVersion: v1
6kind: ConfigMap
7metadata:
8 name: autoscaling-strategy
9data:
10 strategy: |
11 Tier 1 (Critical): On-demand nodes, Cluster Autoscaler
12 Tier 2 (Standard): Mix spot/on-demand, Karpenter
13 Tier 3 (Batch): Pure spot, Karpenter with aggressive consolidation
14 Tier 4 (Burst): Fargate/Autopilot, scale-to-zero
2. Cost Optimization Tactics
1# Multi-dimensional cost optimization
2priorities:
3 1. Spot instances (70-90% savings)
4 2. Right-sizing via Karpenter
5 3. Consolidation during low traffic
6 4. Reserved instances for baseline
7 5. Savings Plans for predictable workloads
8
9# Example cost breakdown
10baseline: 10 on-demand nodes (reserved) = $1,500/month
11dynamic: 0-50 spot nodes (Karpenter) = $500-3000/month
12burst: Fargate for spikes = $200/month
13Total: $2,200-4,700/month vs $15,000 static
14Savings: 68-85%
3. Monitoring and Alerting
1# Comprehensive autoscaling observability
2apiVersion: monitoring.coreos.com/v1
3kind: PrometheusRule
4metadata:
5 name: autoscaling-alerts
6spec:
7 groups:
8 - name: cluster-autoscaling
9 rules:
10 - alert: ClusterFullCapacity
11 expr: |
12 sum(kube_node_status_allocatable{resource="cpu"})
13 - sum(kube_pod_container_resource_requests{resource="cpu"})
14 < 10
15 for: 5m
16 annotations:
17 summary: "Cluster near full capacity"
18
19 - alert: HighSpotInterruptionRate
20 expr: rate(karpenter_interruption_received_messages[5m]) > 0.1
21 annotations:
22 summary: "High spot interruption rate"
23
24 - alert: AutoscalingDisabled
25 expr: up{job="cluster-autoscaler"} == 0
26 for: 5m
27 annotations:
28 summary: "Cluster autoscaler is down"
29
30 - alert: NodeProvisioningDelayed
31 expr: |
32 sum(karpenter_pending_pods_total) > 10
33 AND
34 rate(karpenter_nodes_created[5m]) == 0
35 for: 10m
36 annotations:
37 summary: "Nodes not provisioning despite pending pods"
4. Testing Autoscaling
1# Load testing script
2#!/bin/bash
3
4# Test scale-up
5kubectl run load-generator-1 --image=busybox:1.28 \
6 --restart=Never --rm -i --tty -- /bin/sh -c \
7 "while true; do wget -q -O- http://test-service; sleep 0.01; done" &
8
9# Monitor scaling
10watch -n 5 'kubectl get nodes; kubectl get hpa; kubectl top nodes'
11
12# Test scale-down
13# Stop load and observe consolidation
14
15# Test spot interruption (Karpenter)
16# Manually terminate spot instance to verify graceful handling
17aws ec2 terminate-instances --instance-ids i-xxxxx
18
19# Verify:
20# - New node provisions
21# - Pods reschedule
22# - No downtime
Related Topics
For comprehensive Kubernetes knowledge, explore these related posts:
Horizontal Pod Autoscaling
- Part 1: Horizontal Pod Autoscaler - Deep dive into HPA, KEDA, custom metrics, and event-driven autoscaling
Kubernetes Fundamentals
- Kubernetes Complete Guide (Part 1): Introduction - Architecture, concepts, installation (Traditional Chinese)
- Kubernetes Complete Guide (Part 3): Advanced Features - RBAC, monitoring, production practices (Traditional Chinese)
Production Kubernetes
- Building Production Kubernetes Platform on AWS EKS - Complete EKS architecture with CDK implementation
Conclusion
Cluster-level autoscaling has evolved significantly, offering multiple approaches for different needs:
Decision Framework
Choose Cluster Autoscaler when:
- Running on any cloud or on-premises
- Need stable, proven technology
- Existing ASG/node group infrastructure
- Regulatory requirements for specific tech
Choose Karpenter when:
- On AWS EKS
- Cost optimization is critical
- Dynamic, unpredictable workloads
- Want latest autoscaling capabilities
Choose Cloud Provider Solutions when:
- Deep cloud integration needed
- Minimal operational overhead desired
- Willing to accept vendor lock-in
- Budget allows premium pricing
Key Takeaways
- Layer Your Autoscaling: Combine pod (HPA) and cluster autoscaling
- Start Simple: Begin with Cluster Autoscaler, evolve to Karpenter/cloud solutions
- Embrace Spot/Preemptible: 70-90% cost savings possible
- Monitor Comprehensively: Autoscaling health is critical
- Test Under Load: Validate behavior before production
Future of Kubernetes Autoscaling
The autoscaling landscape continues evolving:
- AI-Driven Autoscaling: Predictive scaling using ML models
- Multi-Cluster Autoscaling: Federated capacity management
- Sustainability-Aware: Carbon-optimized instance selection
- FinOps Integration: Real-time cost optimization
- Edge Computing: Autoscaling for edge Kubernetes
By understanding the full spectrum of autoscaling approaches—from traditional Cluster Autoscaler to cutting-edge Karpenter and cloud-native solutions—you can architect Kubernetes platforms that automatically adapt to demand while optimizing costs and maintaining reliability.
The future belongs to intelligent, multi-layered autoscaling strategies that combine the best of opensource innovation with cloud provider capabilities, delivering both operational excellence and cost efficiency at scale.