Kubernetes Autoscaling Complete Guide (Part 4): Monitoring, Alerting & Threshold Tuning

Series Overview

This is Part 4 of the Kubernetes Autoscaling Complete Guide series:


Building on the HPA demo from Part 3, this guide implements a complete monitoring and alerting stack for your EKS cluster. We’ll deploy Prometheus for metrics collection, Grafana for visualization, AlertManager for notifications, and establish best practices for threshold tuning.

What We’ll Build

┌──────────────────────────────────────────────────────────────────────┐
│                  MONITORING ARCHITECTURE                            │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                    DATA COLLECTION LAYER                       │ │
│  │                                                                 │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐      │ │
│  │  │  Metrics │  │   Node   │  │   kube   │  │   HPA    │      │ │
│  │  │  Server  │  │ Exporter │  │  state   │  │  metrics │      │ │
│  │  │          │  │          │  │  metrics │  │          │      │ │
│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘      │ │
│  │       │             │             │             │             │ │
│  │       └─────────────┴─────────────┴─────────────┘             │ │
│  │                            ↓                                   │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                               ↓                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                  PROMETHEUS (Storage & Queries)                │ │
│  │                                                                 │ │
│  │  • Time-series database                                        │ │
│  │  • PromQL query engine                                         │ │
│  │  • Service discovery                                            │ │
│  │  • Recording rules                                              │ │
│  └────────────┬───────────────────────┬──────────────────────────┘ │
│               │                       │                             │
│               ↓                       ↓                             │
│  ┌────────────────────────┐  ┌──────────────────────────────────┐ │
│  │    ALERTMANAGER        │  │          GRAFANA                 │ │
│  │                        │  │                                   │ │
│  │  • Alert routing       │  │  • Dashboards                    │ │
│  │  • Grouping            │  │  • Data sources                  │ │
│  │  • Deduplication       │  │  • Annotations                   │ │
│  │  • Silencing           │  │  • Variables                     │ │
│  └────┬───────────────────┘  └──────────────────────────────────┘ │
│       │                                                             │
│       ↓                                                             │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │               NOTIFICATION CHANNELS                            │ │
│  │                                                                 │ │
│  │  Email  │  Slack  │  PagerDuty  │  OpsGenie  │  Webhook       │ │
│  └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘

Prerequisites

Starting from the Part 3 setup, ensure you have:

 1# EKS cluster from Part 3 running
 2kubectl get nodes
 3
 4# Helm installed
 5helm version
 6
 7# kubectl configured
 8kubectl config current-context
 9
10# Part 3 application deployed
11kubectl get deployment php-apache
12kubectl get hpa php-apache-hpa

Part 1: Prometheus Stack Setup

The kube-prometheus-stack includes Prometheus, Grafana, AlertManager, and exporters in one package.

Step 1: Update CDK Stack

Add Prometheus stack to lib/eks-hpa-demo-stack.ts:

  1import * as cdk from 'aws-cdk-lib';
  2import * as eks from 'aws-cdk-lib/aws-eks';
  3import * as ec2 from 'aws-cdk-lib/aws-ec2';
  4import * as iam from 'aws-cdk-lib/aws-iam';
  5import { Construct } from 'constructs';
  6
  7export class EksHpaDemoStack extends cdk.Stack {
  8  public readonly cluster: eks.Cluster;
  9
 10  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
 11    super(scope, id, props);
 12
 13    // ... existing VPC and cluster code from Part 3 ...
 14
 15    // Create namespace for monitoring
 16    const monitoringNamespace = this.cluster.addManifest('monitoring-namespace', {
 17      apiVersion: 'v1',
 18      kind: 'Namespace',
 19      metadata: {
 20        name: 'monitoring',
 21        labels: {
 22          name: 'monitoring',
 23        },
 24      },
 25    });
 26
 27    // Install kube-prometheus-stack using Helm
 28    const prometheusStack = this.cluster.addHelmChart('PrometheusStack', {
 29      chart: 'kube-prometheus-stack',
 30      repository: 'https://prometheus-community.github.io/helm-charts',
 31      namespace: 'monitoring',
 32      release: 'prometheus',
 33      version: '54.2.2', // Check for latest version
 34      wait: true,
 35      timeout: cdk.Duration.minutes(15),
 36
 37      values: {
 38        // Prometheus configuration
 39        prometheus: {
 40          prometheusSpec: {
 41            // Retention period
 42            retention: '30d',
 43            retentionSize: '50GB',
 44
 45            // Storage
 46            storageSpec: {
 47              volumeClaimTemplate: {
 48                spec: {
 49                  accessModes: ['ReadWriteOnce'],
 50                  resources: {
 51                    requests: {
 52                      storage: '50Gi',
 53                    },
 54                  },
 55                  storageClassName: 'gp3', // AWS EBS gp3
 56                },
 57              },
 58            },
 59
 60            // Resource limits
 61            resources: {
 62              requests: {
 63                cpu: '500m',
 64                memory: '2Gi',
 65              },
 66              limits: {
 67                cpu: '2000m',
 68                memory: '4Gi',
 69              },
 70            },
 71
 72            // Service monitors to scrape
 73            serviceMonitorSelectorNilUsesHelmValues: false,
 74            podMonitorSelectorNilUsesHelmValues: false,
 75
 76            // Additional scrape configs
 77            additionalScrapeConfigs: [
 78              {
 79                job_name: 'kubernetes-pods',
 80                kubernetes_sd_configs: [
 81                  {
 82                    role: 'pod',
 83                  },
 84                ],
 85                relabel_configs: [
 86                  {
 87                    source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_scrape'],
 88                    action: 'keep',
 89                    regex: 'true',
 90                  },
 91                  {
 92                    source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_path'],
 93                    action: 'replace',
 94                    target_label: '__metrics_path__',
 95                    regex: '(.+)',
 96                  },
 97                  {
 98                    source_labels: ['__address__', '__meta_kubernetes_pod_annotation_prometheus_io_port'],
 99                    action: 'replace',
100                    regex: '([^:]+)(?::\\d+)?;(\\d+)',
101                    replacement: '$1:$2',
102                    target_label: '__address__',
103                  },
104                ],
105              },
106            ],
107          },
108
109          // Service configuration
110          service: {
111            type: 'LoadBalancer', // Or ClusterIP with ingress
112            annotations: {
113              'service.beta.kubernetes.io/aws-load-balancer-type': 'nlb',
114              'service.beta.kubernetes.io/aws-load-balancer-internal': 'true',
115            },
116          },
117        },
118
119        // Grafana configuration
120        grafana: {
121          enabled: true,
122          adminPassword: 'admin123', // Change in production!
123
124          persistence: {
125            enabled: true,
126            storageClassName: 'gp3',
127            size: '10Gi',
128          },
129
130          resources: {
131            requests: {
132              cpu: '250m',
133              memory: '512Mi',
134            },
135            limits: {
136              cpu: '500m',
137              memory: '1Gi',
138            },
139          },
140
141          service: {
142            type: 'LoadBalancer',
143            annotations: {
144              'service.beta.kubernetes.io/aws-load-balancer-type': 'nlb',
145              'service.beta.kubernetes.io/aws-load-balancer-internal': 'true',
146            },
147          },
148
149          // Pre-configured data sources
150          datasources: {
151            'datasources.yaml': {
152              apiVersion: 1,
153              datasources: [
154                {
155                  name: 'Prometheus',
156                  type: 'prometheus',
157                  url: 'http://prometheus-kube-prometheus-prometheus.monitoring:9090',
158                  access: 'proxy',
159                  isDefault: true,
160                },
161              ],
162            },
163          },
164
165          // Default dashboards
166          defaultDashboardsEnabled: true,
167          defaultDashboardsTimezone: 'UTC',
168
169          // Additional dashboard providers
170          dashboardProviders: {
171            'dashboardproviders.yaml': {
172              apiVersion: 1,
173              providers: [
174                {
175                  name: 'default',
176                  orgId: 1,
177                  folder: '',
178                  type: 'file',
179                  disableDeletion: false,
180                  editable: true,
181                  options: {
182                    path: '/var/lib/grafana/dashboards/default',
183                  },
184                },
185              ],
186            },
187          },
188        },
189
190        // AlertManager configuration
191        alertmanager: {
192          enabled: true,
193
194          alertmanagerSpec: {
195            storage: {
196              volumeClaimTemplate: {
197                spec: {
198                  accessModes: ['ReadWriteOnce'],
199                  resources: {
200                    requests: {
201                      storage: '10Gi',
202                    },
203                  },
204                  storageClassName: 'gp3',
205                },
206              },
207            },
208
209            resources: {
210              requests: {
211                cpu: '100m',
212                memory: '256Mi',
213              },
214              limits: {
215                cpu: '200m',
216                memory: '512Mi',
217              },
218            },
219          },
220
221          config: {
222            global: {
223              resolve_timeout: '5m',
224            },
225            route: {
226              group_by: ['alertname', 'cluster', 'service'],
227              group_wait: '10s',
228              group_interval: '10s',
229              repeat_interval: '12h',
230              receiver: 'default',
231              routes: [
232                {
233                  match: {
234                    alertname: 'Watchdog',
235                  },
236                  receiver: 'null',
237                },
238                {
239                  match: {
240                    severity: 'critical',
241                  },
242                  receiver: 'critical',
243                  continue: true,
244                },
245                {
246                  match: {
247                    severity: 'warning',
248                  },
249                  receiver: 'warning',
250                },
251              ],
252            },
253            receivers: [
254              {
255                name: 'null',
256              },
257              {
258                name: 'default',
259                // Configure in next section
260              },
261              {
262                name: 'critical',
263                // Configure in next section
264              },
265              {
266                name: 'warning',
267                // Configure in next section
268              },
269            ],
270          },
271        },
272
273        // Node exporter (collects node metrics)
274        nodeExporter: {
275          enabled: true,
276        },
277
278        // Kube-state-metrics (K8s object metrics)
279        kubeStateMetrics: {
280          enabled: true,
281        },
282
283        // Prometheus operator
284        prometheusOperator: {
285          resources: {
286            requests: {
287              cpu: '200m',
288              memory: '256Mi',
289            },
290            limits: {
291              cpu: '500m',
292              memory: '512Mi',
293            },
294          },
295        },
296      },
297    });
298
299    prometheusStack.node.addDependency(monitoringNamespace);
300
301    // Output monitoring URLs
302    new cdk.CfnOutput(this, 'PrometheusURL', {
303      value: 'http://prometheus-kube-prometheus-prometheus.monitoring:9090',
304      description: 'Prometheus internal URL',
305    });
306
307    new cdk.CfnOutput(this, 'GrafanaURL', {
308      value: 'Access via: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80',
309      description: 'Grafana port-forward command',
310    });
311
312    new cdk.CfnOutput(this, 'AlertManagerURL', {
313      value: 'http://prometheus-kube-prometheus-alertmanager.monitoring:9093',
314      description: 'AlertManager internal URL',
315    });
316  }
317}

Step 2: Deploy Updated Stack

 1cd cdk
 2
 3# Deploy monitoring stack
 4cdk deploy
 5
 6# Wait for Helm chart installation (takes 5-10 minutes)
 7
 8# Verify installation
 9kubectl get pods -n monitoring
10
11# Expected output:
12# NAME                                                     READY   STATUS    RESTARTS   AGE
13# alertmanager-prometheus-kube-prom-alertmanager-0         2/2     Running   0          5m
14# prometheus-grafana-xxx                                   3/3     Running   0          5m
15# prometheus-kube-prom-operator-xxx                        1/1     Running   0          5m
16# prometheus-kube-state-metrics-xxx                        1/1     Running   0          5m
17# prometheus-prometheus-node-exporter-xxx                  1/1     Running   0          5m
18# prometheus-prometheus-kube-prom-prometheus-0             2/2     Running   0          5m

Option B: Manual Helm Installation

If you prefer manual installation:

 1# Add Prometheus community Helm repository
 2helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 3helm repo update
 4
 5# Create monitoring namespace
 6kubectl create namespace monitoring
 7
 8# Install kube-prometheus-stack
 9helm install prometheus prometheus-community/kube-prometheus-stack \
10  --namespace monitoring \
11  --set prometheus.prometheusSpec.retention=30d \
12  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
13  --set grafana.adminPassword=admin123 \
14  --set grafana.persistence.enabled=true \
15  --set grafana.persistence.size=10Gi \
16  --wait
17
18# Verify installation
19kubectl get pods -n monitoring
20kubectl get svc -n monitoring

Part 2: Accessing Monitoring Tools

Access Grafana

 1# Method 1: Port forwarding (recommended for testing)
 2kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
 3
 4# Access at: http://localhost:3000
 5# Username: admin
 6# Password: admin123 (or what you set in values)
 7
 8# Method 2: LoadBalancer (if configured)
 9kubectl get svc -n monitoring prometheus-grafana
10
11# Get external IP/DNS
12export GRAFANA_URL=$(kubectl get svc -n monitoring prometheus-grafana -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
13echo "Grafana: http://$GRAFANA_URL"

Access Prometheus

1# Port forward Prometheus
2kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
3
4# Access at: http://localhost:9090
5
6# Query examples:
7# - up{job="kubernetes-nodes"}
8# - kube_pod_container_resource_requests_cpu_cores
9# - rate(container_cpu_usage_seconds_total[5m])

Access AlertManager

1# Port forward AlertManager
2kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-alertmanager 9093:9093
3
4# Access at: http://localhost:9093

Part 3: HPA-Specific Monitoring

Create ServiceMonitor for PHP-Apache

Create k8s/servicemonitor.yaml:

 1apiVersion: monitoring.coreos.com/v1
 2kind: ServiceMonitor
 3metadata:
 4  name: php-apache-monitor
 5  namespace: default
 6  labels:
 7    app: php-apache
 8    release: prometheus  # Must match Prometheus release name
 9spec:
10  selector:
11    matchLabels:
12      app: php-apache
13  endpoints:
14  - port: http
15    interval: 15s
16    path: /metrics  # If your app exposes metrics
17    # Or use a sidecar exporter
18
19  namespaceSelector:
20    matchNames:
21    - default

Create PrometheusRule for HPA Alerts

Create k8s/prometheus-rules.yaml:

  1apiVersion: monitoring.coreos.com/v1
  2kind: PrometheusRule
  3metadata:
  4  name: hpa-alerts
  5  namespace: monitoring
  6  labels:
  7    release: prometheus  # Must match Prometheus release name
  8spec:
  9  groups:
 10  - name: hpa-autoscaling
 11    interval: 30s
 12    rules:
 13    # Alert when HPA is at maximum replicas
 14    - alert: HPAMaxedOut
 15      expr: |
 16        (
 17          kube_horizontalpodautoscaler_status_current_replicas{namespace="default"}
 18          /
 19          kube_horizontalpodautoscaler_spec_max_replicas{namespace="default"}
 20        ) >= 1        
 21      for: 5m
 22      labels:
 23        severity: warning
 24        component: hpa
 25      annotations:
 26        summary: "HPA {{ $labels.horizontalpodautoscaler }} at maximum capacity"
 27        description: "HPA {{ $labels.horizontalpodautoscaler }} in namespace {{ $labels.namespace }} has been at maximum replicas ({{ $value }}) for more than 5 minutes. Consider increasing maxReplicas or adding more nodes."
 28        dashboard_url: "http://grafana/d/hpa-dashboard"
 29
 30    # Alert when HPA cannot scale
 31    - alert: HPAUnableToScale
 32      expr: |
 33        kube_horizontalpodautoscaler_status_condition{condition="ScalingLimited",status="true"} == 1        
 34      for: 10m
 35      labels:
 36        severity: warning
 37        component: hpa
 38      annotations:
 39        summary: "HPA {{ $labels.horizontalpodautoscaler }} unable to scale"
 40        description: "HPA {{ $labels.horizontalpodautoscaler }} has been unable to scale for 10 minutes. Check for resource constraints or scaling limits."
 41
 42    # Alert when HPA cannot fetch metrics
 43    - alert: HPAMetricsUnavailable
 44      expr: |
 45        kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"} == 1        
 46      for: 5m
 47      labels:
 48        severity: critical
 49        component: hpa
 50      annotations:
 51        summary: "HPA {{ $labels.horizontalpodautoscaler }} metrics unavailable"
 52        description: "HPA {{ $labels.horizontalpodautoscaler }} cannot fetch metrics. Check Metrics Server status."
 53        runbook_url: "https://docs/troubleshooting/hpa-metrics"
 54
 55    # Alert on rapid scaling activity (thrashing)
 56    - alert: HPAScalingThrashing
 57      expr: |
 58        rate(kube_horizontalpodautoscaler_status_current_replicas[15m]) > 0.5        
 59      for: 30m
 60      labels:
 61        severity: warning
 62        component: hpa
 63      annotations:
 64        summary: "HPA {{ $labels.horizontalpodautoscaler }} scaling too frequently"
 65        description: "HPA is scaling up/down frequently ({{ $value }} changes/min), indicating possible threshold misconfiguration or unstable load."
 66
 67    # Alert when CPU usage is consistently high
 68    - alert: HighCPUUsageBeforeScaling
 69      expr: |
 70        (
 71          sum(rate(container_cpu_usage_seconds_total{namespace="default",pod=~"php-apache.*"}[5m])) by (pod)
 72          /
 73          sum(kube_pod_container_resource_requests{namespace="default",pod=~"php-apache.*",resource="cpu"}) by (pod)
 74        ) > 0.9        
 75      for: 3m
 76      labels:
 77        severity: warning
 78        component: application
 79      annotations:
 80        summary: "Pod {{ $labels.pod }} CPU usage very high"
 81        description: "CPU usage is at {{ $value | humanizePercentage }} of requested resources. Scaling may be delayed."
 82
 83    # Alert when memory usage is high
 84    - alert: HighMemoryUsage
 85      expr: |
 86        (
 87          sum(container_memory_working_set_bytes{namespace="default",pod=~"php-apache.*"}) by (pod)
 88          /
 89          sum(kube_pod_container_resource_limits{namespace="default",pod=~"php-apache.*",resource="memory"}) by (pod)
 90        ) > 0.9        
 91      for: 5m
 92      labels:
 93        severity: warning
 94        component: application
 95      annotations:
 96        summary: "Pod {{ $labels.pod }} memory usage critical"
 97        description: "Memory usage is at {{ $value | humanizePercentage }} of limits. Pod may be OOMKilled."
 98
 99    # Alert when pods are pending (cannot be scheduled)
100    - alert: PodsPendingScheduling
101      expr: |
102        sum(kube_pod_status_phase{namespace="default",pod=~"php-apache.*",phase="Pending"}) > 0        
103      for: 10m
104      labels:
105        severity: critical
106        component: scheduler
107      annotations:
108        summary: "{{ $value }} php-apache pods pending scheduling"
109        description: "Pods cannot be scheduled. Check node resources and Cluster Autoscaler status."
110
111  - name: cluster-resources
112    interval: 30s
113    rules:
114    # Alert when cluster CPU is near capacity
115    - alert: ClusterCPUPressure
116      expr: |
117        (
118          sum(kube_node_status_allocatable{resource="cpu"})
119          -
120          sum(kube_pod_container_resource_requests{resource="cpu"})
121        ) < 2        
122      for: 5m
123      labels:
124        severity: warning
125        component: cluster
126      annotations:
127        summary: "Cluster CPU capacity low"
128        description: "Only {{ $value }} CPU cores available cluster-wide. Consider adding nodes or enabling Cluster Autoscaler."
129
130    # Alert when cluster memory is near capacity
131    - alert: ClusterMemoryPressure
132      expr: |
133        (
134          sum(kube_node_status_allocatable{resource="memory"})
135          -
136          sum(kube_pod_container_resource_requests{resource="memory"})
137        ) / (1024 * 1024 * 1024) < 4        
138      for: 5m
139      labels:
140        severity: warning
141        component: cluster
142      annotations:
143        summary: "Cluster memory capacity low"
144        description: "Only {{ $value }}GB memory available cluster-wide."
145
146    # Alert on node not ready
147    - alert: NodeNotReady
148      expr: |
149        kube_node_status_condition{condition="Ready",status="true"} == 0        
150      for: 5m
151      labels:
152        severity: critical
153        component: node
154      annotations:
155        summary: "Node {{ $labels.node }} not ready"
156        description: "Node has been not ready for 5 minutes."

Apply the monitoring configurations:

1kubectl apply -f k8s/servicemonitor.yaml
2kubectl apply -f k8s/prometheus-rules.yaml
3
4# Verify PrometheusRule is loaded
5kubectl get prometheusrule -n monitoring
6
7# Check if rules are active in Prometheus
8# Port forward and visit: http://localhost:9090/rules

Part 4: Custom Grafana Dashboards

Dashboard 1: HPA Overview

Create grafana-dashboards/hpa-overview.json:

  1{
  2  "dashboard": {
  3    "title": "HPA Autoscaling Overview",
  4    "tags": ["kubernetes", "hpa", "autoscaling"],
  5    "timezone": "browser",
  6    "panels": [
  7      {
  8        "title": "Current vs Desired Replicas",
  9        "type": "graph",
 10        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
 11        "targets": [
 12          {
 13            "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
 14            "legendFormat": "Current Replicas",
 15            "refId": "A"
 16          },
 17          {
 18            "expr": "kube_horizontalpodautoscaler_status_desired_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
 19            "legendFormat": "Desired Replicas",
 20            "refId": "B"
 21          },
 22          {
 23            "expr": "kube_horizontalpodautoscaler_spec_min_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
 24            "legendFormat": "Min Replicas",
 25            "refId": "C"
 26          },
 27          {
 28            "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
 29            "legendFormat": "Max Replicas",
 30            "refId": "D"
 31          }
 32        ],
 33        "fieldConfig": {
 34          "defaults": {
 35            "unit": "short",
 36            "min": 0
 37          }
 38        }
 39      },
 40      {
 41        "title": "CPU Utilization vs Target",
 42        "type": "graph",
 43        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
 44        "targets": [
 45          {
 46            "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"default\",pod=~\"php-apache.*\"}[5m])) by (pod) / sum(kube_pod_container_resource_requests{namespace=\"default\",pod=~\"php-apache.*\",resource=\"cpu\"}) by (pod) * 100",
 47            "legendFormat": "{{ pod }} CPU %",
 48            "refId": "A"
 49          },
 50          {
 51            "expr": "kube_horizontalpodautoscaler_spec_target_metric{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}",
 52            "legendFormat": "HPA Target (%)",
 53            "refId": "B"
 54          }
 55        ],
 56        "fieldConfig": {
 57          "defaults": {
 58            "unit": "percent",
 59            "min": 0,
 60            "max": 100
 61          }
 62        }
 63      },
 64      {
 65        "title": "Scaling Events Timeline",
 66        "type": "table",
 67        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8},
 68        "targets": [
 69          {
 70            "expr": "changes(kube_horizontalpodautoscaler_status_current_replicas{namespace=\"default\",horizontalpodautoscaler=\"php-apache-hpa\"}[1h]) > 0",
 71            "format": "table",
 72            "instant": true
 73          }
 74        ]
 75      },
 76      {
 77        "title": "Pod Count by Status",
 78        "type": "stat",
 79        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 16},
 80        "targets": [
 81          {
 82            "expr": "count(kube_pod_info{namespace=\"default\",pod=~\"php-apache.*\"})",
 83            "legendFormat": "Total Pods"
 84          }
 85        ],
 86        "fieldConfig": {
 87          "defaults": {
 88            "color": {"mode": "thresholds"},
 89            "thresholds": {
 90              "mode": "absolute",
 91              "steps": [
 92                {"value": null, "color": "green"},
 93                {"value": 8, "color": "yellow"},
 94                {"value": 10, "color": "red"}
 95              ]
 96            }
 97          }
 98        }
 99      },
100      {
101        "title": "Average Response Time",
102        "type": "graph",
103        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 20},
104        "targets": [
105          {
106            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{namespace=\"default\",pod=~\"php-apache.*\"}[5m]))",
107            "legendFormat": "P95 Latency",
108            "refId": "A"
109          },
110          {
111            "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{namespace=\"default\",pod=~\"php-apache.*\"}[5m]))",
112            "legendFormat": "P99 Latency",
113            "refId": "B"
114          }
115        ]
116      },
117      {
118        "title": "Request Rate",
119        "type": "graph",
120        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 20},
121        "targets": [
122          {
123            "expr": "sum(rate(http_requests_total{namespace=\"default\",pod=~\"php-apache.*\"}[5m]))",
124            "legendFormat": "Requests/sec"
125          }
126        ],
127        "fieldConfig": {
128          "defaults": {
129            "unit": "reqps"
130          }
131        }
132      }
133    ],
134    "refresh": "30s",
135    "time": {
136      "from": "now-1h",
137      "to": "now"
138    }
139  }
140}

Dashboard 2: Resource Utilization

Create comprehensive resource monitoring dashboard:

1# Download pre-built Kubernetes dashboards
2# Dashboard ID 15661 - Kubernetes Cluster Monitoring
3# Dashboard ID 15760 - Kubernetes Views / Global
4
5# Import via Grafana UI:
6# 1. Go to Dashboards → Import
7# 2. Enter dashboard ID
8# 3. Select Prometheus data source
9# 4. Click Import

Import Dashboards via ConfigMap

Create k8s/grafana-dashboards.yaml:

  1apiVersion: v1
  2kind: ConfigMap
  3metadata:
  4  name: hpa-dashboard
  5  namespace: monitoring
  6  labels:
  7    grafana_dashboard: "1"
  8data:
  9  hpa-dashboard.json: |
 10    {
 11      "annotations": {
 12        "list": [
 13          {
 14            "builtIn": 1,
 15            "datasource": "-- Grafana --",
 16            "enable": true,
 17            "hide": true,
 18            "iconColor": "rgba(0, 211, 255, 1)",
 19            "name": "Annotations & Alerts",
 20            "type": "dashboard"
 21          }
 22        ]
 23      },
 24      "editable": true,
 25      "gnetId": null,
 26      "graphTooltip": 0,
 27      "id": null,
 28      "links": [],
 29      "panels": [
 30        {
 31          "datasource": "Prometheus",
 32          "fieldConfig": {
 33            "defaults": {
 34              "color": {
 35                "mode": "palette-classic"
 36              },
 37              "custom": {
 38                "axisLabel": "",
 39                "axisPlacement": "auto",
 40                "barAlignment": 0,
 41                "drawStyle": "line",
 42                "fillOpacity": 10,
 43                "gradientMode": "none",
 44                "hideFrom": {
 45                  "tooltip": false,
 46                  "viz": false,
 47                  "legend": false
 48                },
 49                "lineInterpolation": "linear",
 50                "lineWidth": 1,
 51                "pointSize": 5,
 52                "scaleDistribution": {
 53                  "type": "linear"
 54                },
 55                "showPoints": "never",
 56                "spanNulls": true
 57              },
 58              "mappings": [],
 59              "thresholds": {
 60                "mode": "absolute",
 61                "steps": [
 62                  {
 63                    "color": "green",
 64                    "value": null
 65                  }
 66                ]
 67              },
 68              "unit": "short"
 69            },
 70            "overrides": []
 71          },
 72          "gridPos": {
 73            "h": 9,
 74            "w": 12,
 75            "x": 0,
 76            "y": 0
 77          },
 78          "id": 2,
 79          "options": {
 80            "legend": {
 81              "calcs": [],
 82              "displayMode": "list",
 83              "placement": "bottom"
 84            },
 85            "tooltip": {
 86              "mode": "single"
 87            }
 88          },
 89          "pluginVersion": "8.0.0",
 90          "targets": [
 91            {
 92              "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"default\"}",
 93              "interval": "",
 94              "legendFormat": "{{ horizontalpodautoscaler }} - Current",
 95              "refId": "A"
 96            },
 97            {
 98              "expr": "kube_horizontalpodautoscaler_status_desired_replicas{namespace=\"default\"}",
 99              "interval": "",
100              "legendFormat": "{{ horizontalpodautoscaler }} - Desired",
101              "refId": "B"
102            }
103          ],
104          "title": "HPA Replica Count",
105          "type": "timeseries"
106        }
107      ],
108      "refresh": "30s",
109      "schemaVersion": 27,
110      "style": "dark",
111      "tags": ["kubernetes", "hpa"],
112      "templating": {
113        "list": []
114      },
115      "time": {
116        "from": "now-6h",
117        "to": "now"
118      },
119      "timepicker": {},
120      "timezone": "",
121      "title": "HPA Monitoring",
122      "uid": "hpa-monitoring",
123      "version": 1
124    }    

Apply dashboard:

1kubectl apply -f k8s/grafana-dashboards.yaml
2
3# Restart Grafana to pick up new dashboard
4kubectl rollout restart deployment -n monitoring prometheus-grafana

Part 5: AlertManager Configuration

Configure Notification Channels

Slack Integration

Update AlertManager config in Helm values or via ConfigMap:

 1apiVersion: v1
 2kind: Secret
 3metadata:
 4  name: alertmanager-config
 5  namespace: monitoring
 6type: Opaque
 7stringData:
 8  alertmanager.yaml: |
 9    global:
10      resolve_timeout: 5m
11      slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
12
13    route:
14      group_by: ['alertname', 'cluster', 'service']
15      group_wait: 10s
16      group_interval: 10s
17      repeat_interval: 12h
18      receiver: 'slack-notifications'
19      routes:
20      - match:
21          alertname: Watchdog
22        receiver: 'null'
23      - match:
24          severity: critical
25        receiver: 'slack-critical'
26        continue: true
27      - match:
28          severity: warning
29        receiver: 'slack-warnings'
30
31    receivers:
32    - name: 'null'
33
34    - name: 'slack-notifications'
35      slack_configs:
36      - channel: '#kubernetes-alerts'
37        title: 'Kubernetes Alert'
38        text: '{{ range .Alerts }}{{ .Annotations.summary }}\n{{ .Annotations.description }}\n{{ end }}'
39        send_resolved: true
40
41    - name: 'slack-critical'
42      slack_configs:
43      - channel: '#kubernetes-critical'
44        title: ':fire: CRITICAL Alert'
45        text: '{{ range .Alerts }}*{{ .Labels.alertname }}*\n{{ .Annotations.summary }}\n{{ .Annotations.description }}\n{{ end }}'
46        send_resolved: true
47        color: 'danger'
48
49    - name: 'slack-warnings'
50      slack_configs:
51      - channel: '#kubernetes-warnings'
52        title: ':warning: Warning Alert'
53        text: '{{ range .Alerts }}*{{ .Labels.alertname }}*\n{{ .Annotations.summary }}\n{{ end }}'
54        send_resolved: true
55        color: 'warning'
56
57    inhibit_rules:
58    - source_match:
59        severity: 'critical'
60      target_match:
61        severity: 'warning'
62      equal: ['alertname', 'cluster', 'service']    

Email Notifications

 1global:
 2  smtp_smarthost: 'smtp.gmail.com:587'
 3  smtp_from: 'alerts@example.com'
 4  smtp_auth_username: 'alerts@example.com'
 5  smtp_auth_password: 'your-app-password'
 6
 7receivers:
 8- name: 'email-notifications'
 9  email_configs:
10  - to: 'team@example.com'
11    headers:
12      Subject: '[{{ .Status }}] {{ .GroupLabels.alertname }}'
13    html: |
14      <h2>Alert: {{ .GroupLabels.alertname }}</h2>
15      {{ range .Alerts }}
16      <h3>{{ .Annotations.summary }}</h3>
17      <p>{{ .Annotations.description }}</p>
18      <p><strong>Severity:</strong> {{ .Labels.severity }}</p>
19      <p><strong>Started:</strong> {{ .StartsAt }}</p>
20      {{ end }}      

PagerDuty Integration

1receivers:
2- name: 'pagerduty'
3  pagerduty_configs:
4  - service_key: 'YOUR_PAGERDUTY_SERVICE_KEY'
5    description: '{{ .GroupLabels.alertname }}: {{ .Annotations.summary }}'
6    severity: '{{ .Labels.severity }}'

Apply AlertManager Configuration

 1# Apply secret
 2kubectl apply -f k8s/alertmanager-secret.yaml
 3
 4# Or update via Helm
 5helm upgrade prometheus prometheus-community/kube-prometheus-stack \
 6  --namespace monitoring \
 7  --reuse-values \
 8  --set alertmanager.config.global.slack_api_url='https://hooks.slack.com/...' \
 9  --set alertmanager.config.route.receiver='slack-notifications'
10
11# Restart AlertManager
12kubectl rollout restart statefulset -n monitoring alertmanager-prometheus-kube-prom-alertmanager

Part 6: Threshold Tuning Strategies

Understanding the HPA Formula

desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

Target Utilization = (Sum of Pod Resource Usage) / (Sum of Pod Resource Requests)

Step 1: Baseline Measurement

 1# Run application under normal load for 1 hour
 2kubectl run baseline-load --image=busybox:1.36 --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://php-apache; sleep 0.1; done"
 3
 4# Collect metrics
 5kubectl top pods -l app=php-apache --watch > baseline-metrics.txt
 6
 7# Query Prometheus for average CPU usage
 8# PromQL: avg(rate(container_cpu_usage_seconds_total{pod=~"php-apache.*"}[1h]))
 9
10# Example result: 150m (0.15 cores)

Step 2: Load Testing

 1# Install hey (HTTP load generator)
 2# macOS: brew install hey
 3# Linux: wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64
 4
 5# Test different load levels
 6# Light load: 10 req/s
 7hey -z 5m -q 10 http://$(kubectl get svc php-apache -o jsonpath='{.spec.clusterIP}')
 8
 9# Medium load: 50 req/s
10hey -z 5m -q 50 http://$(kubectl get svc php-apache -o jsonpath='{.spec.clusterIP}')
11
12# Heavy load: 200 req/s
13hey -z 5m -q 200 http://$(kubectl get svc php-apache -o jsonpath='{.spec.clusterIP}')
14
15# Record CPU usage at each level
16kubectl top pods -l app=php-apache

Step 3: Calculate Optimal Thresholds

Example Data:

Load LevelRequests/secAvg CPU per PodReplicasCPU % of Request (200m)
Baseline1050m125%
Light50120m160%
Medium100180m290%
Heavy200160m380%

Analysis:

# Current target: 50% CPU utilization (100m of 200m request)

# At 50 req/s:
# - CPU usage: 120m (60%)
# - HPA triggers scale-up to 2 pods
# - New CPU per pod: 60m (30%)
# - System stable

# Conclusion: 50% target is appropriate

# If we used 70% target:
# - At 50 req/s, CPU would be 120m (60% < 70%)
# - No scale-up
# - At 100 req/s, CPU hits 180m (90%)
# - Late scale-up, potential latency spike

CPU-Bound Applications

 1# Conservative (prioritize availability)
 2metrics:
 3- type: Resource
 4  resource:
 5    name: cpu
 6    target:
 7      type: Utilization
 8      averageUtilization: 50  # Scale at 50%
 9
10# Balanced (cost + performance)
11metrics:
12- type: Resource
13  resource:
14    name: cpu
15    target:
16      type: Utilization
17      averageUtilization: 70  # Scale at 70%
18
19# Aggressive (cost-optimized)
20metrics:
21- type: Resource
22  resource:
23    name: cpu
24    target:
25      type: Utilization
26      averageUtilization: 80  # Scale at 80%

Memory-Bound Applications

 1metrics:
 2- type: Resource
 3  resource:
 4    name: memory
 5    target:
 6      type: Utilization
 7      averageUtilization: 75  # Memory typically more stable
 8
 9# Note: Memory scaling is tricky because:
10# 1. Memory doesn't "free up" like CPU
11# 2. Pods must be restarted to reduce memory
12# 3. Consider VPA for memory optimization

Latency-Sensitive Applications

 1# Use custom metrics for response time
 2metrics:
 3- type: Pods
 4  pods:
 5    metric:
 6      name: http_request_duration_p99_seconds
 7    target:
 8      type: AverageValue
 9      averageValue: "0.2"  # 200ms P99 latency
10
11# Or request rate
12- type: Pods
13  pods:
14    metric:
15      name: http_requests_per_second
16    target:
17      type: AverageValue
18      averageValue: "100"  # 100 req/s per pod

Step 5: Tuning Scaling Behavior

Fast-Scaling Workloads (E-commerce, APIs)

 1behavior:
 2  scaleUp:
 3    stabilizationWindowSeconds: 0    # Immediate
 4    policies:
 5    - type: Percent
 6      value: 100                      # Double capacity
 7      periodSeconds: 15               # Every 15s
 8    selectPolicy: Max
 9
10  scaleDown:
11    stabilizationWindowSeconds: 300  # 5 minutes
12    policies:
13    - type: Percent
14      value: 25                       # Max 25% reduction
15      periodSeconds: 60
16    selectPolicy: Min

Batch Processing Workloads

 1behavior:
 2  scaleUp:
 3    stabilizationWindowSeconds: 60   # Wait 1 min
 4    policies:
 5    - type: Pods
 6      value: 2                        # Add 2 pods at a time
 7      periodSeconds: 60
 8    selectPolicy: Max
 9
10  scaleDown:
11    stabilizationWindowSeconds: 600  # 10 minutes
12    policies:
13    - type: Pods
14      value: 1                        # Remove 1 pod at a time
15      periodSeconds: 120              # Every 2 minutes
16    selectPolicy: Min

Step 6: Continuous Optimization

Create a monitoring query dashboard:

 1# 1. Average CPU utilization over time
 2avg(rate(container_cpu_usage_seconds_total{pod=~"php-apache.*"}[5m]))
 3  /
 4avg(kube_pod_container_resource_requests{pod=~"php-apache.*",resource="cpu"})
 5
 6# 2. HPA scaling frequency
 7changes(kube_horizontalpodautoscaler_status_current_replicas[1h])
 8
 9# 3. Time in different replica counts
10count_over_time(kube_horizontalpodautoscaler_status_current_replicas[24h])
11
12# 4. Cost per request (estimate)
13(
14  sum(kube_pod_container_resource_requests{pod=~"php-apache.*",resource="cpu"}) * 0.04
15)
16/
17sum(rate(http_requests_total{pod=~"php-apache.*"}[5m]))

Part 7: Testing the Complete Setup

Scenario 1: Normal Traffic Pattern

 1# Generate steady load
 2kubectl run load-test-normal --image=busybox:1.36 --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://php-apache; sleep 0.05; done"
 3
 4# Monitor in Grafana
 5# - HPA dashboard shows gradual scale-up
 6# - CPU stays around target (50%)
 7# - No alerts triggered
 8# - Replicas: 1 → 2 → 3 (stabilizes)
 9
10# Clean up
11kubectl delete pod load-test-normal

Scenario 2: Traffic Spike

 1# Generate sudden spike
 2for i in {1..10}; do
 3  kubectl run load-spike-$i --image=busybox:1.36 --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://php-apache; done" &
 4done
 5
 6# Expected behavior:
 7# T+0s:   Spike detected
 8# T+30s:  HPA scales to 5 replicas
 9# T+60s:  HPA scales to 8 replicas
10# T+90s:  Stable at 8 replicas
11
12# Alerts triggered:
13# - HPAMaxedOut (if hits 10 replicas)
14# - HighCPUUsageBeforeScaling
15
16# Stop spike
17kubectl delete pod -l run=load-spike

Scenario 3: Cluster Capacity Test

 1# Scale up beyond cluster capacity
 2kubectl scale deployment php-apache --replicas=20
 3
 4# Expected:
 5# - Pods go to Pending state
 6# - PodsPendingScheduling alert fires
 7# - ClusterCPUPressure alert fires (if no Cluster Autoscaler)
 8# - Cluster Autoscaler adds nodes (if enabled)
 9
10# Check pending pods
11kubectl get pods | grep Pending
12
13# Scale back down
14kubectl scale deployment php-apache --replicas=2

Part 8: Cleanup

Remove Monitoring Stack

 1# Delete PrometheusRules
 2kubectl delete prometheusrule -n monitoring hpa-alerts
 3
 4# Delete ServiceMonitor
 5kubectl delete servicemonitor -n default php-apache-monitor
 6
 7# Uninstall Prometheus stack
 8helm uninstall prometheus -n monitoring
 9
10# Or via CDK (remove from stack and redeploy)
11# Comment out Prometheus Helm chart in CDK code
12cdk deploy
13
14# Delete monitoring namespace
15kubectl delete namespace monitoring

Key Takeaways

Monitoring Checklist

Metrics Collection

  • Metrics Server installed and healthy
  • Node Exporter running on all nodes
  • kube-state-metrics deployed
  • Application metrics exposed (if using custom metrics)

Storage & Retention

  • Prometheus storage configured (50GB recommended)
  • Retention period set (30 days minimum)
  • Grafana dashboards backed up

Alerting

  • PrometheusRules deployed and active
  • AlertManager configured with notification channels
  • Alert routing rules tested
  • Runbooks documented

Dashboards

  • HPA overview dashboard imported
  • Resource utilization dashboard configured
  • Cluster health dashboard available
  • Application-specific dashboards created

Threshold Tuning

  • Baseline metrics collected
  • Load testing performed
  • Thresholds calculated and documented
  • Scaling behavior tuned for workload type
AlertThresholdRationale
HPAMaxedOut95% of maxReplicas for 5 minEarly warning before hitting limit
HighCPUUsage>90% of requests for 3 minIndicates scaling may be delayed
HighMemoryUsage>90% of limits for 5 minPrevent OOMKills
PodsPendingAny pods pending for 10 minCapacity issue
ClusterCPUPressure<2 cores availableProactive capacity planning
HPAScalingThrashing>0.5 changes/min for 30 minConfiguration issue

Cost Optimization via Monitoring

 1# Query to identify over-provisioned resources
 2# (Requested but not used)
 3
 4# CPU waste
 5sum(kube_pod_container_resource_requests{resource="cpu"})
 6-
 7sum(rate(container_cpu_usage_seconds_total[1d]))
 8
 9# Memory waste
10sum(kube_pod_container_resource_requests{resource="memory"})
11-
12sum(container_memory_working_set_bytes)
13
14# Right-sizing recommendation:
15# Set requests to P95 usage + 20% buffer

Autoscaling Series

Kubernetes Monitoring

Conclusion

This guide established production-grade monitoring for Kubernetes autoscaling:

  1. Metrics Collection: Deployed complete Prometheus stack with exporters
  2. Visualization: Created Grafana dashboards for real-time visibility
  3. Alerting: Configured AlertManager with multi-channel notifications
  4. Threshold Tuning: Established data-driven approach to optimization
  5. Testing: Validated monitoring under various load scenarios

Implementation Checklist

Week 1: Foundation

  • Deploy Prometheus stack
  • Configure basic dashboards
  • Verify metrics collection

Week 2: Alerting

  • Create PrometheusRules
  • Configure notification channels
  • Test alert routing

Week 3: Optimization

  • Collect baseline metrics
  • Perform load testing
  • Tune HPA thresholds

Week 4: Production

  • Document runbooks
  • Train team on dashboards
  • Establish review cadence

Next Steps

  1. Integrate with CI/CD: Automatic threshold updates based on load tests
  2. Add Custom Metrics: Application-specific business metrics
  3. Implement SLOs: Service Level Objectives with error budgets
  4. Cost Optimization: Continuous right-sizing based on actual usage
  5. ML-Based Autoscaling: Predictive scaling using historical patterns

With comprehensive monitoring in place, you can confidently operate Kubernetes autoscaling in production, quickly identify issues, and continuously optimize for performance and cost.

Happy monitoring! 📊