Series Overview
This is Part 4 of the Kubernetes Autoscaling Complete Guide series:
- Part 1: Horizontal Pod Autoscaler - Application-level autoscaling theory
- Part 2: Cluster Autoscaling & Cloud Providers - Infrastructure-level autoscaling
- Part 3: Hands-On HPA Demo - Practical implementation with Apache-PHP
- Part 4 (This Post): Monitoring, Alerting & Threshold Tuning - Production observability
Building on the HPA demo from Part 3, this guide implements a complete monitoring and alerting stack for your EKS cluster. We’ll deploy Prometheus for metrics collection, Grafana for visualization, AlertManager for notifications, and establish best practices for threshold tuning.
What We’ll Build
┌──────────────────────────────────────────────────────────────────────┐
│ MONITORING ARCHITECTURE │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ DATA COLLECTION LAYER │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Metrics │ │ Node │ │ kube │ │ HPA │ │ │
│ │ │ Server │ │ Exporter │ │ state │ │ metrics │ │ │
│ │ │ │ │ │ │ metrics │ │ │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ │ │ │ │ │ │ │
│ │ └─────────────┴─────────────┴─────────────┘ │ │
│ │ ↓ │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ PROMETHEUS (Storage & Queries) │ │
│ │ │ │
│ │ • Time-series database │ │
│ │ • PromQL query engine │ │
│ │ • Service discovery │ │
│ │ • Recording rules │ │
│ └────────────┬───────────────────────┬──────────────────────────┘ │
│ │ │ │
│ ↓ ↓ │
│ ┌────────────────────────┐ ┌──────────────────────────────────┐ │
│ │ ALERTMANAGER │ │ GRAFANA │ │
│ │ │ │ │ │
│ │ • Alert routing │ │ • Dashboards │ │
│ │ • Grouping │ │ • Data sources │ │
│ │ • Deduplication │ │ • Annotations │ │
│ │ • Silencing │ │ • Variables │ │
│ └────┬───────────────────┘ └──────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ NOTIFICATION CHANNELS │ │
│ │ │ │
│ │ Email │ Slack │ PagerDuty │ OpsGenie │ Webhook │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
Prerequisites
Starting from the Part 3 setup, ensure you have:
1# EKS cluster from Part 3 running
2kubectl get nodes
3
4# Helm installed
5helm version
6
7# kubectl configured
8kubectl config current-context
9
10# Part 3 application deployed
11kubectl get deployment php-apache
12kubectl get hpa php-apache-hpa
Part 1: Prometheus Stack Setup
Option A: Using kube-prometheus-stack (Recommended)
The kube-prometheus-stack includes Prometheus, Grafana, AlertManager, and exporters in one package.
Step 1: Update CDK Stack
Add Prometheus stack to lib/eks-hpa-demo-stack.ts:
1import * as cdk from 'aws-cdk-lib';
2import * as eks from 'aws-cdk-lib/aws-eks';
3import * as ec2 from 'aws-cdk-lib/aws-ec2';
4import * as iam from 'aws-cdk-lib/aws-iam';
5import { Construct } from 'constructs';
6
7export class EksHpaDemoStack extends cdk.Stack {
8 public readonly cluster: eks.Cluster;
9
10 constructor(scope: Construct, id: string, props?: cdk.StackProps) {
11 super(scope, id, props);
12
13 // ... existing VPC and cluster code from Part 3 ...
14
15 // Create namespace for monitoring
16 const monitoringNamespace = this.cluster.addManifest('monitoring-namespace', {
17 apiVersion: 'v1',
18 kind: 'Namespace',
19 metadata: {
20 name: 'monitoring',
21 labels: {
22 name: 'monitoring',
23 },
24 },
25 });
26
27 // Install kube-prometheus-stack using Helm
28 const prometheusStack = this.cluster.addHelmChart('PrometheusStack', {
29 chart: 'kube-prometheus-stack',
30 repository: 'https://prometheus-community.github.io/helm-charts',
31 namespace: 'monitoring',
32 release: 'prometheus',
33 version: '54.2.2', // Check for latest version
34 wait: true,
35 timeout: cdk.Duration.minutes(15),
36
37 values: {
38 // Prometheus configuration
39 prometheus: {
40 prometheusSpec: {
41 // Retention period
42 retention: '30d',
43 retentionSize: '50GB',
44
45 // Storage
46 storageSpec: {
47 volumeClaimTemplate: {
48 spec: {
49 accessModes: ['ReadWriteOnce'],
50 resources: {
51 requests: {
52 storage: '50Gi',
53 },
54 },
55 storageClassName: 'gp3', // AWS EBS gp3
56 },
57 },
58 },
59
60 // Resource limits
61 resources: {
62 requests: {
63 cpu: '500m',
64 memory: '2Gi',
65 },
66 limits: {
67 cpu: '2000m',
68 memory: '4Gi',
69 },
70 },
71
72 // Service monitors to scrape
73 serviceMonitorSelectorNilUsesHelmValues: false,
74 podMonitorSelectorNilUsesHelmValues: false,
75
76 // Additional scrape configs
77 additionalScrapeConfigs: [
78 {
79 job_name: 'kubernetes-pods',
80 kubernetes_sd_configs: [
81 {
82 role: 'pod',
83 },
84 ],
85 relabel_configs: [
86 {
87 source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_scrape'],
88 action: 'keep',
89 regex: 'true',
90 },
91 {
92 source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_path'],
93 action: 'replace',
94 target_label: '__metrics_path__',
95 regex: '(.+)',
96 },
97 {
98 source_labels: ['__address__', '__meta_kubernetes_pod_annotation_prometheus_io_port'],
99 action: 'replace',
100 regex: '([^:]+)(?::\\d+)?;(\\d+)',
101 replacement: '$1:$2',
102 target_label: '__address__',
103 },
104 ],
105 },
106 ],
107 },
108
109 // Service configuration
110 service: {
111 type: 'LoadBalancer', // Or ClusterIP with ingress
112 annotations: {
113 'service.beta.kubernetes.io/aws-load-balancer-type': 'nlb',
114 'service.beta.kubernetes.io/aws-load-balancer-internal': 'true',
115 },
116 },
117 },
118
119 // Grafana configuration
120 grafana: {
121 enabled: true,
122 adminPassword: 'admin123', // Change in production!
123
124 persistence: {
125 enabled: true,
126 storageClassName: 'gp3',
127 size: '10Gi',
128 },
129
130 resources: {
131 requests: {
132 cpu: '250m',
133 memory: '512Mi',
134 },
135 limits: {
136 cpu: '500m',
137 memory: '1Gi',
138 },
139 },
140
141 service: {
142 type: 'LoadBalancer',
143 annotations: {
144 'service.beta.kubernetes.io/aws-load-balancer-type': 'nlb',
145 'service.beta.kubernetes.io/aws-load-balancer-internal': 'true',
146 },
147 },
148
149 // Pre-configured data sources
150 datasources: {
151 'datasources.yaml': {
152 apiVersion: 1,
153 datasources: [
154 {
155 name: 'Prometheus',
156 type: 'prometheus',
157 url: 'http://prometheus-kube-prometheus-prometheus.monitoring:9090',
158 access: 'proxy',
159 isDefault: true,
160 },
161 ],
162 },
163 },
164
165 // Default dashboards
166 defaultDashboardsEnabled: true,
167 defaultDashboardsTimezone: 'UTC',
168
169 // Additional dashboard providers
170 dashboardProviders: {
171 'dashboardproviders.yaml': {
172 apiVersion: 1,
173 providers: [
174 {
175 name: 'default',
176 orgId: 1,
177 folder: '',
178 type: 'file',
179 disableDeletion: false,
180 editable: true,
181 options: {
182 path: '/var/lib/grafana/dashboards/default',
183 },
184 },
185 ],
186 },
187 },
188 },
189
190 // AlertManager configuration
191 alertmanager: {
192 enabled: true,
193
194 alertmanagerSpec: {
195 storage: {
196 volumeClaimTemplate: {
197 spec: {
198 accessModes: ['ReadWriteOnce'],
199 resources: {
200 requests: {
201 storage: '10Gi',
202 },
203 },
204 storageClassName: 'gp3',
205 },
206 },
207 },
208
209 resources: {
210 requests: {
211 cpu: '100m',
212 memory: '256Mi',
213 },
214 limits: {
215 cpu: '200m',
216 memory: '512Mi',
217 },
218 },
219 },
220
221 config: {
222 global: {
223 resolve_timeout: '5m',
224 },
225 route: {
226 group_by: ['alertname', 'cluster', 'service'],
227 group_wait: '10s',
228 group_interval: '10s',
229 repeat_interval: '12h',
230 receiver: 'default',
231 routes: [
232 {
233 match: {
234 alertname: 'Watchdog',
235 },
236 receiver: 'null',
237 },
238 {
239 match: {
240 severity: 'critical',
241 },
242 receiver: 'critical',
243 continue: true,
244 },
245 {
246 match: {
247 severity: 'warning',
248 },
249 receiver: 'warning',
250 },
251 ],
252 },
253 receivers: [
254 {
255 name: 'null',
256 },
257 {
258 name: 'default',
259 // Configure in next section
260 },
261 {
262 name: 'critical',
263 // Configure in next section
264 },
265 {
266 name: 'warning',
267 // Configure in next section
268 },
269 ],
270 },
271 },
272
273 // Node exporter (collects node metrics)
274 nodeExporter: {
275 enabled: true,
276 },
277
278 // Kube-state-metrics (K8s object metrics)
279 kubeStateMetrics: {
280 enabled: true,
281 },
282
283 // Prometheus operator
284 prometheusOperator: {
285 resources: {
286 requests: {
287 cpu: '200m',
288 memory: '256Mi',
289 },
290 limits: {
291 cpu: '500m',
292 memory: '512Mi',
293 },
294 },
295 },
296 },
297 });
298
299 prometheusStack.node.addDependency(monitoringNamespace);
300
301 // Output monitoring URLs
302 new cdk.CfnOutput(this, 'PrometheusURL', {
303 value: 'http://prometheus-kube-prometheus-prometheus.monitoring:9090',
304 description: 'Prometheus internal URL',
305 });
306
307 new cdk.CfnOutput(this, 'GrafanaURL', {
308 value: 'Access via: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80',
309 description: 'Grafana port-forward command',
310 });
311
312 new cdk.CfnOutput(this, 'AlertManagerURL', {
313 value: 'http://prometheus-kube-prometheus-alertmanager.monitoring:9093',
314 description: 'AlertManager internal URL',
315 });
316 }
317}
Step 2: Deploy Updated Stack
1cd cdk
2
3# Deploy monitoring stack
4cdk deploy
5
6# Wait for Helm chart installation (takes 5-10 minutes)
7
8# Verify installation
9kubectl get pods -n monitoring
10
11# Expected output:
12# NAME READY STATUS RESTARTS AGE
13# alertmanager-prometheus-kube-prom-alertmanager-0 2/2 Running 0 5m
14# prometheus-grafana-xxx 3/3 Running 0 5m
15# prometheus-kube-prom-operator-xxx 1/1 Running 0 5m
16# prometheus-kube-state-metrics-xxx 1/1 Running 0 5m
17# prometheus-prometheus-node-exporter-xxx 1/1 Running 0 5m
18# prometheus-prometheus-kube-prom-prometheus-0 2/2 Running 0 5m
Option B: Manual Helm Installation
If you prefer manual installation:
1# Add Prometheus community Helm repository
2helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
3helm repo update
4
5# Create monitoring namespace
6kubectl create namespace monitoring
7
8# Install kube-prometheus-stack
9helm install prometheus prometheus-community/kube-prometheus-stack \
10 --namespace monitoring \
11 --set prometheus.prometheusSpec.retention=30d \
12 --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
13 --set grafana.adminPassword=admin123 \
14 --set grafana.persistence.enabled=true \
15 --set grafana.persistence.size=10Gi \
16 --wait
17
18# Verify installation
19kubectl get pods -n monitoring
20kubectl get svc -n monitoring
Part 2: Accessing Monitoring Tools
Access Grafana
1# Method 1: Port forwarding (recommended for testing)
2kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
3
4# Access at: http://localhost:3000
5# Username: admin
6# Password: admin123 (or what you set in values)
7
8# Method 2: LoadBalancer (if configured)
9kubectl get svc -n monitoring prometheus-grafana
10
11# Get external IP/DNS
12export GRAFANA_URL=$(kubectl get svc -n monitoring prometheus-grafana -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
13echo "Grafana: http://$GRAFANA_URL"
Access Prometheus
1# Port forward Prometheus
2kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
3
4# Access at: http://localhost:9090
5
6# Query examples:
7# - up{job="kubernetes-nodes"}
8# - kube_pod_container_resource_requests_cpu_cores
9# - rate(container_cpu_usage_seconds_total[5m])
Access AlertManager
1# Port forward AlertManager
2kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-alertmanager 9093:9093
3
4# Access at: http://localhost:9093
Part 3: HPA-Specific Monitoring
Create ServiceMonitor for PHP-Apache
Create k8s/servicemonitor.yaml:
1apiVersion: monitoring.coreos.com/v1
2kind: ServiceMonitor
3metadata:
4 name: php-apache-monitor
5 namespace: default
6 labels:
7 app: php-apache
8 release: prometheus # Must match Prometheus release name
9spec:
10 selector:
11 matchLabels:
12 app: php-apache
13 endpoints:
14 - port: http
15 interval: 15s
16 path: /metrics # If your app exposes metrics
17 # Or use a sidecar exporter
18
19 namespaceSelector:
20 matchNames:
21 - default
Create PrometheusRule for HPA Alerts
Create k8s/prometheus-rules.yaml:
1apiVersion: monitoring.coreos.com/v1
2kind: PrometheusRule
3metadata:
4 name: hpa-alerts
5 namespace: monitoring
6 labels:
7 release: prometheus # Must match Prometheus release name
8spec:
9 groups:
10 - name: hpa-autoscaling
11 interval: 30s
12 rules:
13 # Alert when HPA is at maximum replicas
14 - alert: HPAMaxedOut
15 expr: |
16 (
17 kube_horizontalpodautoscaler_status_current_replicas{namespace="default"}
18 /
19 kube_horizontalpodautoscaler_spec_max_replicas{namespace="default"}
20 ) >= 1
21 for: 5m
22 labels:
23 severity: warning
24 component: hpa
25 annotations:
26 summary: "HPA {{ $labels.horizontalpodautoscaler }} at maximum capacity"
27 description: "HPA {{ $labels.horizontalpodautoscaler }} in namespace {{ $labels.namespace }} has been at maximum replicas ({{ $value }}) for more than 5 minutes. Consider increasing maxReplicas or adding more nodes."
28 dashboard_url: "http://grafana/d/hpa-dashboard"
29
30 # Alert when HPA cannot scale
31 - alert: HPAUnableToScale
32 expr: |
33 kube_horizontalpodautoscaler_status_condition{condition="ScalingLimited",status="true"} == 1
34 for: 10m
35 labels:
36 severity: warning
37 component: hpa
38 annotations:
39 summary: "HPA {{ $labels.horizontalpodautoscaler }} unable to scale"
40 description: "HPA {{ $labels.horizontalpodautoscaler }} has been unable to scale for 10 minutes. Check for resource constraints or scaling limits."
41
42 # Alert when HPA cannot fetch metrics
43 - alert: HPAMetricsUnavailable
44 expr: |
45 kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"} == 1
46 for: 5m
47 labels:
48 severity: critical
49 component: hpa
50 annotations:
51 summary: "HPA {{ $labels.horizontalpodautoscaler }} metrics unavailable"
52 description: "HPA {{ $labels.horizontalpodautoscaler }} cannot fetch metrics. Check Metrics Server status."
53 runbook_url: "https://docs/troubleshooting/hpa-metrics"
54
55 # Alert on rapid scaling activity (thrashing)
56 - alert: HPAScalingThrashing
57 expr: |
58 rate(kube_horizontalpodautoscaler_status_current_replicas[15m]) > 0.5
59 for: 30m
60 labels:
61 severity: warning
62 component: hpa
63 annotations:
64 summary: "HPA {{ $labels.horizontalpodautoscaler }} scaling too frequently"
65 description: "HPA is scaling up/down frequently ({{ $value }} changes/min), indicating possible threshold misconfiguration or unstable load."
66
67 # Alert when CPU usage is consistently high
68 - alert: HighCPUUsageBeforeScaling
69 expr: |
70 (
71 sum(rate(container_cpu_usage_seconds_total{namespace="default",pod=~"php-apache.*"}[5m])) by (pod)
72 /
73 sum(kube_pod_container_resource_requests{namespace="default",pod=~"php-apache.*",resource="cpu"}) by (pod)
74 ) > 0.9
75 for: 3m
76 labels:
77 severity: warning
78 component: application
79 annotations:
80 summary: "Pod {{ $labels.pod }} CPU usage very high"
81 description: "CPU usage is at {{ $value | humanizePercentage }} of requested resources. Scaling may be delayed."
82
83 # Alert when memory usage is high
84 - alert: HighMemoryUsage
85 expr: |
86 (
87 sum(container_memory_working_set_bytes{namespace="default",pod=~"php-apache.*"}) by (pod)
88 /
89 sum(kube_pod_container_resource_limits{namespace="default",pod=~"php-apache.*",resource="memory"}) by (pod)
90 ) > 0.9
91 for: 5m
92 labels:
93 severity: warning
94 component: application
95 annotations:
96 summary: "Pod {{ $labels.pod }} memory usage critical"
97 description: "Memory usage is at {{ $value | humanizePercentage }} of limits. Pod may be OOMKilled."
98
99 # Alert when pods are pending (cannot be scheduled)
100 - alert: PodsPendingScheduling
101 expr: |
102 sum(kube_pod_status_phase{namespace="default",pod=~"php-apache.*",phase="Pending"}) > 0
103 for: 10m
104 labels:
105 severity: critical
106 component: scheduler
107 annotations:
108 summary: "{{ $value }} php-apache pods pending scheduling"
109 description: "Pods cannot be scheduled. Check node resources and Cluster Autoscaler status."
110
111 - name: cluster-resources
112 interval: 30s
113 rules:
114 # Alert when cluster CPU is near capacity
115 - alert: ClusterCPUPressure
116 expr: |
117 (
118 sum(kube_node_status_allocatable{resource="cpu"})
119 -
120 sum(kube_pod_container_resource_requests{resource="cpu"})
121 ) < 2
122 for: 5m
123 labels:
124 severity: warning
125 component: cluster
126 annotations:
127 summary: "Cluster CPU capacity low"
128 description: "Only {{ $value }} CPU cores available cluster-wide. Consider adding nodes or enabling Cluster Autoscaler."
129
130 # Alert when cluster memory is near capacity
131 - alert: ClusterMemoryPressure
132 expr: |
133 (
134 sum(kube_node_status_allocatable{resource="memory"})
135 -
136 sum(kube_pod_container_resource_requests{resource="memory"})
137 ) / (1024 * 1024 * 1024) < 4
138 for: 5m
139 labels:
140 severity: warning
141 component: cluster
142 annotations:
143 summary: "Cluster memory capacity low"
144 description: "Only {{ $value }}GB memory available cluster-wide."
145
146 # Alert on node not ready
147 - alert: NodeNotReady
148 expr: |
149 kube_node_status_condition{condition="Ready",status="true"} == 0
150 for: 5m
151 labels:
152 severity: critical
153 component: node
154 annotations:
155 summary: "Node {{ $labels.node }} not ready"
156 description: "Node has been not ready for 5 minutes."
Apply the monitoring configurations:
1kubectl apply -f k8s/servicemonitor.yaml
2kubectl apply -f k8s/prometheus-rules.yaml
3
4# Verify PrometheusRule is loaded
5kubectl get prometheusrule -n monitoring
6
7# Check if rules are active in Prometheus
8# Port forward and visit: http://localhost:9090/rules
Part 4: Custom Grafana Dashboards
Dashboard 1: HPA Overview
Create grafana-dashboards/hpa-overview.json:
1{
2 "dashboard": {
3 "title": "HPA Autoscaling Overview",
4 "tags": ["kubernetes", "hpa", "autoscaling"],
5 "timezone": "browser",
6 "panels": [
7 {
8 "title": "Current vs Desired Replicas",
9 "type": "graph",
10 "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
11 "targets": [
12 {
13 "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
14 "legendFormat": "Current Replicas",
15 "refId": "A"
16 },
17 {
18 "expr": "kube_horizontalpodautoscaler_status_desired_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
19 "legendFormat": "Desired Replicas",
20 "refId": "B"
21 },
22 {
23 "expr": "kube_horizontalpodautoscaler_spec_min_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
24 "legendFormat": "Min Replicas",
25 "refId": "C"
26 },
27 {
28 "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
29 "legendFormat": "Max Replicas",
30 "refId": "D"
31 }
32 ],
33 "fieldConfig": {
34 "defaults": {
35 "unit": "short",
36 "min": 0
37 }
38 }
39 },
40 {
41 "title": "CPU Utilization vs Target",
42 "type": "graph",
43 "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
44 "targets": [
45 {
46 "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"default\",pod=~\"php-apache.*\"}[5m])) by (pod) / sum(kube_pod_container_resource_requests{namespace=\"default\",pod=~\"php-apache.*\",resource=\"cpu\"}) by (pod) * 100",
47 "legendFormat": "{{ pod }} CPU %",
48 "refId": "A"
49 },
50 {
51 "expr": "kube_horizontalpodautoscaler_spec_target_metric{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
52 "legendFormat": "HPA Target (%)",
53 "refId": "B"
54 }
55 ],
56 "fieldConfig": {
57 "defaults": {
58 "unit": "percent",
59 "min": 0,
60 "max": 100
61 }
62 }
63 },
64 {
65 "title": "Scaling Events Timeline",
66 "type": "table",
67 "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8},
68 "targets": [
69 {
70 "expr": "changes(kube_horizontalpodautoscaler_status_current_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}[1h]) > 0",
71 "format": "table",
72 "instant": true
73 }
74 ]
75 },
76 {
77 "title": "Pod Count by Status",
78 "type": "stat",
79 "gridPos": {"h": 4, "w": 6, "x": 0, "y": 16},
80 "targets": [
81 {
82 "expr": "count(kube_pod_info{namespace=\"default\",pod=~\"php-apache.*\"})",
83 "legendFormat": "Total Pods"
84 }
85 ],
86 "fieldConfig": {
87 "defaults": {
88 "color": {"mode": "thresholds"},
89 "thresholds": {
90 "mode": "absolute",
91 "steps": [
92 {"value": null, "color": "green"},
93 {"value": 8, "color": "yellow"},
94 {"value": 10, "color": "red"}
95 ]
96 }
97 }
98 }
99 },
100 {
101 "title": "Average Response Time",
102 "type": "graph",
103 "gridPos": {"h": 8, "w": 12, "x": 0, "y": 20},
104 "targets": [
105 {
106 "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{namespace=\"default\",pod=~\"php-apache.*\"}[5m]))",
107 "legendFormat": "P95 Latency",
108 "refId": "A"
109 },
110 {
111 "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{namespace=\"default\",pod=~\"php-apache.*\"}[5m]))",
112 "legendFormat": "P99 Latency",
113 "refId": "B"
114 }
115 ]
116 },
117 {
118 "title": "Request Rate",
119 "type": "graph",
120 "gridPos": {"h": 8, "w": 12, "x": 12, "y": 20},
121 "targets": [
122 {
123 "expr": "sum(rate(http_requests_total{namespace=\"default\",pod=~\"php-apache.*\"}[5m]))",
124 "legendFormat": "Requests/sec"
125 }
126 ],
127 "fieldConfig": {
128 "defaults": {
129 "unit": "reqps"
130 }
131 }
132 }
133 ],
134 "refresh": "30s",
135 "time": {
136 "from": "now-1h",
137 "to": "now"
138 }
139 }
140}
Dashboard 2: Resource Utilization
Create comprehensive resource monitoring dashboard:
1# Download pre-built Kubernetes dashboards
2# Dashboard ID 15661 - Kubernetes Cluster Monitoring
3# Dashboard ID 15760 - Kubernetes Views / Global
4
5# Import via Grafana UI:
6# 1. Go to Dashboards → Import
7# 2. Enter dashboard ID
8# 3. Select Prometheus data source
9# 4. Click Import
Import Dashboards via ConfigMap
Create k8s/grafana-dashboards.yaml:
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: hpa-dashboard
5 namespace: monitoring
6 labels:
7 grafana_dashboard: "1"
8data:
9 hpa-dashboard.json: |
10 {
11 "annotations": {
12 "list": [
13 {
14 "builtIn": 1,
15 "datasource": "-- Grafana --",
16 "enable": true,
17 "hide": true,
18 "iconColor": "rgba(0, 211, 255, 1)",
19 "name": "Annotations & Alerts",
20 "type": "dashboard"
21 }
22 ]
23 },
24 "editable": true,
25 "gnetId": null,
26 "graphTooltip": 0,
27 "id": null,
28 "links": [],
29 "panels": [
30 {
31 "datasource": "Prometheus",
32 "fieldConfig": {
33 "defaults": {
34 "color": {
35 "mode": "palette-classic"
36 },
37 "custom": {
38 "axisLabel": "",
39 "axisPlacement": "auto",
40 "barAlignment": 0,
41 "drawStyle": "line",
42 "fillOpacity": 10,
43 "gradientMode": "none",
44 "hideFrom": {
45 "tooltip": false,
46 "viz": false,
47 "legend": false
48 },
49 "lineInterpolation": "linear",
50 "lineWidth": 1,
51 "pointSize": 5,
52 "scaleDistribution": {
53 "type": "linear"
54 },
55 "showPoints": "never",
56 "spanNulls": true
57 },
58 "mappings": [],
59 "thresholds": {
60 "mode": "absolute",
61 "steps": [
62 {
63 "color": "green",
64 "value": null
65 }
66 ]
67 },
68 "unit": "short"
69 },
70 "overrides": []
71 },
72 "gridPos": {
73 "h": 9,
74 "w": 12,
75 "x": 0,
76 "y": 0
77 },
78 "id": 2,
79 "options": {
80 "legend": {
81 "calcs": [],
82 "displayMode": "list",
83 "placement": "bottom"
84 },
85 "tooltip": {
86 "mode": "single"
87 }
88 },
89 "pluginVersion": "8.0.0",
90 "targets": [
91 {
92 "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"default\"}",
93 "interval": "",
94 "legendFormat": "{{ horizontalpodautoscaler }} - Current",
95 "refId": "A"
96 },
97 {
98 "expr": "kube_horizontalpodautoscaler_status_desired_replicas{namespace=\"default\"}",
99 "interval": "",
100 "legendFormat": "{{ horizontalpodautoscaler }} - Desired",
101 "refId": "B"
102 }
103 ],
104 "title": "HPA Replica Count",
105 "type": "timeseries"
106 }
107 ],
108 "refresh": "30s",
109 "schemaVersion": 27,
110 "style": "dark",
111 "tags": ["kubernetes", "hpa"],
112 "templating": {
113 "list": []
114 },
115 "time": {
116 "from": "now-6h",
117 "to": "now"
118 },
119 "timepicker": {},
120 "timezone": "",
121 "title": "HPA Monitoring",
122 "uid": "hpa-monitoring",
123 "version": 1
124 }
Apply dashboard:
1kubectl apply -f k8s/grafana-dashboards.yaml
2
3# Restart Grafana to pick up new dashboard
4kubectl rollout restart deployment -n monitoring prometheus-grafana
Part 5: AlertManager Configuration
Configure Notification Channels
Slack Integration
Update AlertManager config in Helm values or via ConfigMap:
1apiVersion: v1
2kind: Secret
3metadata:
4 name: alertmanager-config
5 namespace: monitoring
6type: Opaque
7stringData:
8 alertmanager.yaml: |
9 global:
10 resolve_timeout: 5m
11 slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
12
13 route:
14 group_by: ['alertname', 'cluster', 'service']
15 group_wait: 10s
16 group_interval: 10s
17 repeat_interval: 12h
18 receiver: 'slack-notifications'
19 routes:
20 - match:
21 alertname: Watchdog
22 receiver: 'null'
23 - match:
24 severity: critical
25 receiver: 'slack-critical'
26 continue: true
27 - match:
28 severity: warning
29 receiver: 'slack-warnings'
30
31 receivers:
32 - name: 'null'
33
34 - name: 'slack-notifications'
35 slack_configs:
36 - channel: '#kubernetes-alerts'
37 title: 'Kubernetes Alert'
38 text: '{{ range .Alerts }}{{ .Annotations.summary }}\n{{ .Annotations.description }}\n{{ end }}'
39 send_resolved: true
40
41 - name: 'slack-critical'
42 slack_configs:
43 - channel: '#kubernetes-critical'
44 title: ':fire: CRITICAL Alert'
45 text: '{{ range .Alerts }}*{{ .Labels.alertname }}*\n{{ .Annotations.summary }}\n{{ .Annotations.description }}\n{{ end }}'
46 send_resolved: true
47 color: 'danger'
48
49 - name: 'slack-warnings'
50 slack_configs:
51 - channel: '#kubernetes-warnings'
52 title: ':warning: Warning Alert'
53 text: '{{ range .Alerts }}*{{ .Labels.alertname }}*\n{{ .Annotations.summary }}\n{{ end }}'
54 send_resolved: true
55 color: 'warning'
56
57 inhibit_rules:
58 - source_match:
59 severity: 'critical'
60 target_match:
61 severity: 'warning'
62 equal: ['alertname', 'cluster', 'service']
Email Notifications
1global:
2 smtp_smarthost: 'smtp.gmail.com:587'
3 smtp_from: 'alerts@example.com'
4 smtp_auth_username: 'alerts@example.com'
5 smtp_auth_password: 'your-app-password'
6
7receivers:
8- name: 'email-notifications'
9 email_configs:
10 - to: 'team@example.com'
11 headers:
12 Subject: '[{{ .Status }}] {{ .GroupLabels.alertname }}'
13 html: |
14 <h2>Alert: {{ .GroupLabels.alertname }}</h2>
15 {{ range .Alerts }}
16 <h3>{{ .Annotations.summary }}</h3>
17 <p>{{ .Annotations.description }}</p>
18 <p><strong>Severity:</strong> {{ .Labels.severity }}</p>
19 <p><strong>Started:</strong> {{ .StartsAt }}</p>
20 {{ end }}
PagerDuty Integration
1receivers:
2- name: 'pagerduty'
3 pagerduty_configs:
4 - service_key: 'YOUR_PAGERDUTY_SERVICE_KEY'
5 description: '{{ .GroupLabels.alertname }}: {{ .Annotations.summary }}'
6 severity: '{{ .Labels.severity }}'
Apply AlertManager Configuration
1# Apply secret
2kubectl apply -f k8s/alertmanager-secret.yaml
3
4# Or update via Helm
5helm upgrade prometheus prometheus-community/kube-prometheus-stack \
6 --namespace monitoring \
7 --reuse-values \
8 --set alertmanager.config.global.slack_api_url='https://hooks.slack.com/...' \
9 --set alertmanager.config.route.receiver='slack-notifications'
10
11# Restart AlertManager
12kubectl rollout restart statefulset -n monitoring alertmanager-prometheus-kube-prom-alertmanager
Part 6: Threshold Tuning Strategies
Understanding the HPA Formula
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]
Target Utilization = (Sum of Pod Resource Usage) / (Sum of Pod Resource Requests)
Step 1: Baseline Measurement
1# Run application under normal load for 1 hour
2kubectl run baseline-load --image=busybox:1.36 --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://php-apache; sleep 0.1; done"
3
4# Collect metrics
5kubectl top pods -l app=php-apache --watch > baseline-metrics.txt
6
7# Query Prometheus for average CPU usage
8# PromQL: avg(rate(container_cpu_usage_seconds_total{pod=~"php-apache.*"}[1h]))
9
10# Example result: 150m (0.15 cores)
Step 2: Load Testing
1# Install hey (HTTP load generator)
2# macOS: brew install hey
3# Linux: wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64
4
5# Test different load levels
6# Light load: 10 req/s
7hey -z 5m -q 10 http://$(kubectl get svc php-apache -o jsonpath='{.spec.clusterIP}')
8
9# Medium load: 50 req/s
10hey -z 5m -q 50 http://$(kubectl get svc php-apache -o jsonpath='{.spec.clusterIP}')
11
12# Heavy load: 200 req/s
13hey -z 5m -q 200 http://$(kubectl get svc php-apache -o jsonpath='{.spec.clusterIP}')
14
15# Record CPU usage at each level
16kubectl top pods -l app=php-apache
Step 3: Calculate Optimal Thresholds
Example Data:
| Load Level | Requests/sec | Avg CPU per Pod | Replicas | CPU % of Request (200m) |
|---|---|---|---|---|
| Baseline | 10 | 50m | 1 | 25% |
| Light | 50 | 120m | 1 | 60% |
| Medium | 100 | 180m | 2 | 90% |
| Heavy | 200 | 160m | 3 | 80% |
Analysis:
# Current target: 50% CPU utilization (100m of 200m request)
# At 50 req/s:
# - CPU usage: 120m (60%)
# - HPA triggers scale-up to 2 pods
# - New CPU per pod: 60m (30%)
# - System stable
# Conclusion: 50% target is appropriate
# If we used 70% target:
# - At 50 req/s, CPU would be 120m (60% < 70%)
# - No scale-up
# - At 100 req/s, CPU hits 180m (90%)
# - Late scale-up, potential latency spike
Step 4: Recommended Thresholds by Application Type
CPU-Bound Applications
1# Conservative (prioritize availability)
2metrics:
3- type: Resource
4 resource:
5 name: cpu
6 target:
7 type: Utilization
8 averageUtilization: 50 # Scale at 50%
9
10# Balanced (cost + performance)
11metrics:
12- type: Resource
13 resource:
14 name: cpu
15 target:
16 type: Utilization
17 averageUtilization: 70 # Scale at 70%
18
19# Aggressive (cost-optimized)
20metrics:
21- type: Resource
22 resource:
23 name: cpu
24 target:
25 type: Utilization
26 averageUtilization: 80 # Scale at 80%
Memory-Bound Applications
1metrics:
2- type: Resource
3 resource:
4 name: memory
5 target:
6 type: Utilization
7 averageUtilization: 75 # Memory typically more stable
8
9# Note: Memory scaling is tricky because:
10# 1. Memory doesn't "free up" like CPU
11# 2. Pods must be restarted to reduce memory
12# 3. Consider VPA for memory optimization
Latency-Sensitive Applications
1# Use custom metrics for response time
2metrics:
3- type: Pods
4 pods:
5 metric:
6 name: http_request_duration_p99_seconds
7 target:
8 type: AverageValue
9 averageValue: "0.2" # 200ms P99 latency
10
11# Or request rate
12- type: Pods
13 pods:
14 metric:
15 name: http_requests_per_second
16 target:
17 type: AverageValue
18 averageValue: "100" # 100 req/s per pod
Step 5: Tuning Scaling Behavior
Fast-Scaling Workloads (E-commerce, APIs)
1behavior:
2 scaleUp:
3 stabilizationWindowSeconds: 0 # Immediate
4 policies:
5 - type: Percent
6 value: 100 # Double capacity
7 periodSeconds: 15 # Every 15s
8 selectPolicy: Max
9
10 scaleDown:
11 stabilizationWindowSeconds: 300 # 5 minutes
12 policies:
13 - type: Percent
14 value: 25 # Max 25% reduction
15 periodSeconds: 60
16 selectPolicy: Min
Batch Processing Workloads
1behavior:
2 scaleUp:
3 stabilizationWindowSeconds: 60 # Wait 1 min
4 policies:
5 - type: Pods
6 value: 2 # Add 2 pods at a time
7 periodSeconds: 60
8 selectPolicy: Max
9
10 scaleDown:
11 stabilizationWindowSeconds: 600 # 10 minutes
12 policies:
13 - type: Pods
14 value: 1 # Remove 1 pod at a time
15 periodSeconds: 120 # Every 2 minutes
16 selectPolicy: Min
Step 6: Continuous Optimization
Create a monitoring query dashboard:
1# 1. Average CPU utilization over time
2avg(rate(container_cpu_usage_seconds_total{pod=~"php-apache.*"}[5m]))
3 /
4avg(kube_pod_container_resource_requests{pod=~"php-apache.*",resource="cpu"})
5
6# 2. HPA scaling frequency
7changes(kube_horizontalpodautoscaler_status_current_replicas[1h])
8
9# 3. Time in different replica counts
10count_over_time(kube_horizontalpodautoscaler_status_current_replicas[24h])
11
12# 4. Cost per request (estimate)
13(
14 sum(kube_pod_container_resource_requests{pod=~"php-apache.*",resource="cpu"}) * 0.04
15)
16/
17sum(rate(http_requests_total{pod=~"php-apache.*"}[5m]))
Part 7: Testing the Complete Setup
Scenario 1: Normal Traffic Pattern
1# Generate steady load
2kubectl run load-test-normal --image=busybox:1.36 --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://php-apache; sleep 0.05; done"
3
4# Monitor in Grafana
5# - HPA dashboard shows gradual scale-up
6# - CPU stays around target (50%)
7# - No alerts triggered
8# - Replicas: 1 → 2 → 3 (stabilizes)
9
10# Clean up
11kubectl delete pod load-test-normal
Scenario 2: Traffic Spike
1# Generate sudden spike
2for i in {1..10}; do
3 kubectl run load-spike-$i --image=busybox:1.36 --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://php-apache; done" &
4done
5
6# Expected behavior:
7# T+0s: Spike detected
8# T+30s: HPA scales to 5 replicas
9# T+60s: HPA scales to 8 replicas
10# T+90s: Stable at 8 replicas
11
12# Alerts triggered:
13# - HPAMaxedOut (if hits 10 replicas)
14# - HighCPUUsageBeforeScaling
15
16# Stop spike
17kubectl delete pod -l run=load-spike
Scenario 3: Cluster Capacity Test
1# Scale up beyond cluster capacity
2kubectl scale deployment php-apache --replicas=20
3
4# Expected:
5# - Pods go to Pending state
6# - PodsPendingScheduling alert fires
7# - ClusterCPUPressure alert fires (if no Cluster Autoscaler)
8# - Cluster Autoscaler adds nodes (if enabled)
9
10# Check pending pods
11kubectl get pods | grep Pending
12
13# Scale back down
14kubectl scale deployment php-apache --replicas=2
Part 8: Cleanup
Remove Monitoring Stack
1# Delete PrometheusRules
2kubectl delete prometheusrule -n monitoring hpa-alerts
3
4# Delete ServiceMonitor
5kubectl delete servicemonitor -n default php-apache-monitor
6
7# Uninstall Prometheus stack
8helm uninstall prometheus -n monitoring
9
10# Or via CDK (remove from stack and redeploy)
11# Comment out Prometheus Helm chart in CDK code
12cdk deploy
13
14# Delete monitoring namespace
15kubectl delete namespace monitoring
Key Takeaways
Monitoring Checklist
✅ Metrics Collection
- Metrics Server installed and healthy
- Node Exporter running on all nodes
- kube-state-metrics deployed
- Application metrics exposed (if using custom metrics)
✅ Storage & Retention
- Prometheus storage configured (50GB recommended)
- Retention period set (30 days minimum)
- Grafana dashboards backed up
✅ Alerting
- PrometheusRules deployed and active
- AlertManager configured with notification channels
- Alert routing rules tested
- Runbooks documented
✅ Dashboards
- HPA overview dashboard imported
- Resource utilization dashboard configured
- Cluster health dashboard available
- Application-specific dashboards created
✅ Threshold Tuning
- Baseline metrics collected
- Load testing performed
- Thresholds calculated and documented
- Scaling behavior tuned for workload type
Recommended Alert Thresholds
| Alert | Threshold | Rationale |
|---|---|---|
| HPAMaxedOut | 95% of maxReplicas for 5 min | Early warning before hitting limit |
| HighCPUUsage | >90% of requests for 3 min | Indicates scaling may be delayed |
| HighMemoryUsage | >90% of limits for 5 min | Prevent OOMKills |
| PodsPending | Any pods pending for 10 min | Capacity issue |
| ClusterCPUPressure | <2 cores available | Proactive capacity planning |
| HPAScalingThrashing | >0.5 changes/min for 30 min | Configuration issue |
Cost Optimization via Monitoring
1# Query to identify over-provisioned resources
2# (Requested but not used)
3
4# CPU waste
5sum(kube_pod_container_resource_requests{resource="cpu"})
6-
7sum(rate(container_cpu_usage_seconds_total[1d]))
8
9# Memory waste
10sum(kube_pod_container_resource_requests{resource="memory"})
11-
12sum(container_memory_working_set_bytes)
13
14# Right-sizing recommendation:
15# Set requests to P95 usage + 20% buffer
Related Topics
Autoscaling Series
- Part 1: Horizontal Pod Autoscaler - Theory and approaches
- Part 2: Cluster Autoscaling - Node-level autoscaling
- Part 3: Hands-On HPA Demo - Implementation guide
Kubernetes Monitoring
- Kubernetes Complete Guide (Part 3): Advanced Features - Includes monitoring setup (Traditional Chinese)
- Building Production Kubernetes Platform on AWS EKS - Production observability patterns
Conclusion
This guide established production-grade monitoring for Kubernetes autoscaling:
- Metrics Collection: Deployed complete Prometheus stack with exporters
- Visualization: Created Grafana dashboards for real-time visibility
- Alerting: Configured AlertManager with multi-channel notifications
- Threshold Tuning: Established data-driven approach to optimization
- Testing: Validated monitoring under various load scenarios
Implementation Checklist
Week 1: Foundation
- Deploy Prometheus stack
- Configure basic dashboards
- Verify metrics collection
Week 2: Alerting
- Create PrometheusRules
- Configure notification channels
- Test alert routing
Week 3: Optimization
- Collect baseline metrics
- Perform load testing
- Tune HPA thresholds
Week 4: Production
- Document runbooks
- Train team on dashboards
- Establish review cadence
Next Steps
- Integrate with CI/CD: Automatic threshold updates based on load tests
- Add Custom Metrics: Application-specific business metrics
- Implement SLOs: Service Level Objectives with error budgets
- Cost Optimization: Continuous right-sizing based on actual usage
- ML-Based Autoscaling: Predictive scaling using historical patterns
With comprehensive monitoring in place, you can confidently operate Kubernetes autoscaling in production, quickly identify issues, and continuously optimize for performance and cost.
Happy monitoring! 📊