Kubernetes Autoscaling Complete Guide (Part 6): Advanced Autoscaling Patterns

Series Overview

This is Part 6 of the Kubernetes Autoscaling Complete Guide series:


Beyond basic HPA and cluster autoscaling, production Kubernetes deployments require sophisticated patterns for stateful workloads, multi-cluster architectures, aggressive cost optimization, and specialized workload types. This guide explores advanced autoscaling strategies used by leading organizations.

Pattern 1: Stateful Application Autoscaling

The StatefulSet Challenge

Traditional HPA with StatefulSets:
┌────────────────────────────────────────────────────────────┐
│  CHALLENGES                                                │
│                                                             │
│  1. Ordered Pod Creation/Deletion                          │
│     • pod-0 must exist before pod-1                        │
│     • Slow scale-up during traffic spikes                  │
│                                                             │
│  2. Persistent Volumes                                      │
│     • Each pod has unique PVC                              │
│     • Storage costs accumulate                             │
│     • PVCs remain after scale-down                         │
│                                                             │
│  3. State Synchronization                                   │
│     • New pods must sync state (databases, caches)         │
│     • Sync time adds to scale-up latency                   │
│     • Potential data consistency issues                    │
│                                                             │
│  4. Service Discovery                                       │
│     • Clients must discover new pods                       │
│     • DNS updates take time                                │
│     • Connection draining needed on scale-down             │
└────────────────────────────────────────────────────────────┘

Pattern 1A: Database Scaling with StatefulSet

Scenario: PostgreSQL cluster with read replicas that scale based on read query load.

  1# PostgreSQL StatefulSet
  2apiVersion: apps/v1
  3kind: StatefulSet
  4metadata:
  5  name: postgres-replicas
  6  namespace: databases
  7spec:
  8  serviceName: postgres-replicas
  9  replicas: 2  # Initial: 1 primary + 1 replica
 10
 11  selector:
 12    matchLabels:
 13      app: postgres
 14      role: replica
 15
 16  template:
 17    metadata:
 18      labels:
 19        app: postgres
 20        role: replica
 21      annotations:
 22        prometheus.io/scrape: "true"
 23        prometheus.io/port: "9187"  # postgres_exporter
 24    spec:
 25      initContainers:
 26      # Initialize replica from primary
 27      - name: init-replica
 28        image: postgres:15
 29        command:
 30        - bash
 31        - -c
 32        - |
 33          if [ ! -f /var/lib/postgresql/data/PG_VERSION ]; then
 34            # Clone from primary
 35            pg_basebackup -h postgres-primary -D /var/lib/postgresql/data -U replication -v -P
 36            # Create recovery signal
 37            touch /var/lib/postgresql/data/standby.signal
 38          fi          
 39        volumeMounts:
 40        - name: data
 41          mountPath: /var/lib/postgresql/data
 42
 43      containers:
 44      # PostgreSQL replica
 45      - name: postgres
 46        image: postgres:15
 47        env:
 48        - name: POSTGRES_USER
 49          value: postgres
 50        - name: POSTGRES_PASSWORD
 51          valueFrom:
 52            secretKeyRef:
 53              name: postgres-secret
 54              key: password
 55        - name: PGDATA
 56          value: /var/lib/postgresql/data/pgdata
 57
 58        ports:
 59        - containerPort: 5432
 60          name: postgres
 61
 62        resources:
 63          requests:
 64            cpu: 1
 65            memory: 2Gi
 66          limits:
 67            cpu: 4
 68            memory: 8Gi
 69
 70        volumeMounts:
 71        - name: data
 72          mountPath: /var/lib/postgresql/data
 73        - name: config
 74          mountPath: /etc/postgresql/postgresql.conf
 75          subPath: postgresql.conf
 76
 77      # Postgres Exporter for metrics
 78      - name: postgres-exporter
 79        image: prometheuscommunity/postgres-exporter:latest
 80        env:
 81        - name: DATA_SOURCE_NAME
 82          value: "postgresql://postgres:$(POSTGRES_PASSWORD)@localhost:5432/postgres?sslmode=disable"
 83        ports:
 84        - containerPort: 9187
 85          name: metrics
 86        resources:
 87          requests:
 88            cpu: 100m
 89            memory: 128Mi
 90
 91      volumes:
 92      - name: config
 93        configMap:
 94          name: postgres-config
 95
 96  volumeClaimTemplates:
 97  - metadata:
 98      name: data
 99    spec:
100      accessModes: ["ReadWriteOnce"]
101      storageClassName: gp3-encrypted
102      resources:
103        requests:
104          storage: 100Gi
105
106---
107# Headless service for StatefulSet
108apiVersion: v1
109kind: Service
110metadata:
111  name: postgres-replicas
112  namespace: databases
113spec:
114  clusterIP: None
115  selector:
116    app: postgres
117    role: replica
118  ports:
119  - port: 5432
120    name: postgres
121
122---
123# Regular service for read traffic (load balanced)
124apiVersion: v1
125kind: Service
126metadata:
127  name: postgres-read
128  namespace: databases
129  annotations:
130    prometheus.io/scrape: "true"
131    prometheus.io/port: "9187"
132spec:
133  type: ClusterIP
134  selector:
135    app: postgres
136    role: replica
137  ports:
138  - port: 5432
139    name: postgres
140
141---
142# HPA for read replicas based on custom metrics
143apiVersion: autoscaling/v2
144kind: HorizontalPodAutoscaler
145metadata:
146  name: postgres-replicas-hpa
147  namespace: databases
148spec:
149  scaleTargetRef:
150    apiVersion: apps/v1
151    kind: StatefulSet
152    name: postgres-replicas
153
154  minReplicas: 2   # Always have at least 1 replica + 1 primary
155  maxReplicas: 10  # Max read replicas
156
157  metrics:
158  # Scale based on active connections
159  - type: Pods
160    pods:
161      metric:
162        name: pg_stat_database_numbackends
163      target:
164        type: AverageValue
165        averageValue: "50"  # 50 connections per replica
166
167  # Scale based on replication lag
168  - type: Pods
169    pods:
170      metric:
171        name: pg_replication_lag_seconds
172      target:
173        type: AverageValue
174        averageValue: "5"  # Keep lag under 5 seconds
175
176  behavior:
177    scaleUp:
178      stabilizationWindowSeconds: 60  # Wait 1 min before scale-up
179      policies:
180      - type: Pods
181        value: 1                       # Add 1 replica at a time
182        periodSeconds: 60
183      selectPolicy: Min
184
185    scaleDown:
186      stabilizationWindowSeconds: 600  # Wait 10 min before scale-down
187      policies:
188      - type: Pods
189        value: 1                        # Remove 1 replica at a time
190        periodSeconds: 300              # Every 5 minutes
191      selectPolicy: Min
192
193---
194# PrometheusRule for PostgreSQL monitoring
195apiVersion: monitoring.coreos.com/v1
196kind: PrometheusRule
197metadata:
198  name: postgres-autoscaling-rules
199  namespace: monitoring
200spec:
201  groups:
202  - name: postgres-custom-metrics
203    interval: 15s
204    rules:
205    # Active connections per pod
206    - record: pg_stat_database_numbackends
207      expr: |
208        sum(pg_stat_database_numbackends{datname="postgres"}) by (pod, namespace)        
209
210    # Replication lag in seconds
211    - record: pg_replication_lag_seconds
212      expr: |
213        pg_replication_lag        
214
215  - name: postgres-alerts
216    rules:
217    # Alert when replicas are at max
218    - alert: PostgresReplicasMaxedOut
219      expr: |
220        (
221          kube_statefulset_status_replicas{statefulset="postgres-replicas"}
222          /
223          kube_statefulset_spec_replicas{statefulset="postgres-replicas"}
224        ) >= 0.9        
225      for: 10m
226      labels:
227        severity: warning
228      annotations:
229        summary: "PostgreSQL replicas near maximum capacity"
230        description: "Consider increasing maxReplicas or optimizing queries"
231
232    # Alert on high replication lag
233    - alert: PostgresHighReplicationLag
234      expr: pg_replication_lag_seconds > 30
235      for: 5m
236      labels:
237        severity: critical
238      annotations:
239        summary: "PostgreSQL replication lag is high"
240        description: "Replication lag is {{ $value }}s, may impact read consistency"

Pattern 1B: Redis Cache Cluster Autoscaling

  1# Redis Cluster with dynamic scaling
  2apiVersion: apps/v1
  3kind: StatefulSet
  4metadata:
  5  name: redis-cluster
  6  namespace: caching
  7spec:
  8  serviceName: redis-cluster
  9  replicas: 6  # 3 master + 3 replica
 10
 11  selector:
 12    matchLabels:
 13      app: redis-cluster
 14
 15  template:
 16    metadata:
 17      labels:
 18        app: redis-cluster
 19    spec:
 20      containers:
 21      - name: redis
 22        image: redis:7-alpine
 23        command:
 24        - redis-server
 25        args:
 26        - /conf/redis.conf
 27        - --cluster-enabled
 28        - "yes"
 29        - --cluster-config-file
 30        - /data/nodes.conf
 31        - --cluster-node-timeout
 32        - "5000"
 33        - --maxmemory
 34        - "2gb"
 35        - --maxmemory-policy
 36        - "allkeys-lru"
 37
 38        ports:
 39        - containerPort: 6379
 40          name: client
 41        - containerPort: 16379
 42          name: gossip
 43
 44        resources:
 45          requests:
 46            cpu: 500m
 47            memory: 2Gi
 48          limits:
 49            cpu: 2
 50            memory: 4Gi
 51
 52        volumeMounts:
 53        - name: data
 54          mountPath: /data
 55        - name: conf
 56          mountPath: /conf
 57
 58      # Redis Exporter sidecar
 59      - name: redis-exporter
 60        image: oliver006/redis_exporter:latest
 61        ports:
 62        - containerPort: 9121
 63          name: metrics
 64        resources:
 65          requests:
 66            cpu: 100m
 67            memory: 128Mi
 68
 69  volumeClaimTemplates:
 70  - metadata:
 71      name: data
 72    spec:
 73      accessModes: ["ReadWriteOnce"]
 74      resources:
 75        requests:
 76          storage: 50Gi
 77
 78---
 79# Custom Metrics based on Redis metrics
 80apiVersion: v1
 81kind: ConfigMap
 82metadata:
 83  name: prometheus-adapter-redis
 84  namespace: monitoring
 85data:
 86  config.yaml: |
 87    rules:
 88    # Redis memory usage percentage
 89    - seriesQuery: 'redis_memory_used_bytes'
 90      resources:
 91        overrides:
 92          namespace: {resource: "namespace"}
 93          pod: {resource: "pod"}
 94      name:
 95        as: "redis_memory_usage_percentage"
 96      metricsQuery: |
 97        (redis_memory_used_bytes / redis_memory_max_bytes) * 100
 98
 99    # Redis connected clients
100    - seriesQuery: 'redis_connected_clients'
101      resources:
102        overrides:
103          namespace: {resource: "namespace"}
104          pod: {resource: "pod"}
105      name:
106        as: "redis_clients_per_pod"
107      metricsQuery: |
108        sum(redis_connected_clients) by (pod, namespace)
109
110    # Redis operations per second
111    - seriesQuery: 'redis_instantaneous_ops_per_sec'
112      resources:
113        overrides:
114          namespace: {resource: "namespace"}
115          pod: {resource: "pod"}
116      name:
117        as: "redis_ops_per_second"
118      metricsQuery: |
119        sum(rate(redis_commands_total[2m])) by (pod, namespace)    
120
121---
122# HPA for Redis based on memory and ops
123apiVersion: autoscaling/v2
124kind: HorizontalPodAutoscaler
125metadata:
126  name: redis-cluster-hpa
127  namespace: caching
128spec:
129  scaleTargetRef:
130    apiVersion: apps/v1
131    kind: StatefulSet
132    name: redis-cluster
133
134  minReplicas: 6   # Minimum cluster size (3 master + 3 replica)
135  maxReplicas: 18  # Max 9 master + 9 replica
136
137  metrics:
138  # Memory usage
139  - type: Pods
140    pods:
141      metric:
142        name: redis_memory_usage_percentage
143      target:
144        type: AverageValue
145        averageValue: "75"  # Scale when memory > 75%
146
147  # Operations per second
148  - type: Pods
149    pods:
150      metric:
151        name: redis_ops_per_second
152      target:
153        type: AverageValue
154        averageValue: "10000"  # Scale at 10k ops/sec per pod
155
156  behavior:
157    scaleUp:
158      stabilizationWindowSeconds: 120
159      policies:
160      - type: Pods
161        value: 2  # Add 2 pods at a time (1 master + 1 replica)
162        periodSeconds: 120
163
164    scaleDown:
165      stabilizationWindowSeconds: 600
166      policies:
167      - type: Pods
168        value: 2  # Remove 2 pods at a time
169        periodSeconds: 300

Key Considerations for Stateful Autoscaling

  1. Data Synchronization Time: Account for data replication delays
  2. Ordered Scaling: StatefulSets scale sequentially, slower than Deployments
  3. Storage Management: Implement PVC cleanup policies
  4. State Warmup: Consider warm-up time for caches/databases
  5. Split Read/Write: Scale read replicas independently from write nodes

Pattern 2: Multi-Cluster & Multi-Region Autoscaling

Architecture Overview

┌──────────────────────────────────────────────────────────────────────┐
│              MULTI-CLUSTER AUTOSCALING ARCHITECTURE                 │
│                                                                      │
│  ┌────────────────┐     ┌────────────────┐     ┌────────────────┐  │
│  │   REGION 1     │     │   REGION 2     │     │   REGION 3     │  │
│  │   (US-EAST)    │     │   (EU-WEST)    │     │   (AP-SOUTH)   │  │
│  │                │     │                │     │                │  │
│  │  EKS Cluster 1 │     │  EKS Cluster 2 │     │  EKS Cluster 3 │  │
│  │  • HPA         │     │  • HPA         │     │  • HPA         │  │
│  │  • Karpenter   │     │  • Karpenter   │     │  • Karpenter   │  │
│  │  • Local LB    │     │  • Local LB    │     │  • Local LB    │  │
│  └───────┬────────┘     └───────┬────────┘     └───────┬────────┘  │
│          │                      │                      │            │
│          └──────────────────────┴──────────────────────┘            │
│                               ↓                                     │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │               GLOBAL LOAD BALANCER                             │ │
│  │                                                                 │ │
│  │  • Route 53 / CloudFlare / Global Accelerator                 │ │
│  │  • Geographic routing                                          │ │
│  │  • Latency-based routing                                       │ │
│  │  • Weighted routing (for gradual shifts)                      │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                               ↓                                     │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │            CENTRALIZED AUTOSCALING CONTROLLER                  │ │
│  │                                                                 │ │
│  │  • Aggregate metrics from all clusters                         │ │
│  │  • Intelligent workload distribution                           │ │
│  │  • Cost-aware cluster selection                               │ │
│  │  • Capacity prediction                                         │ │
│  └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘

Pattern 2A: Federated HPA with Cluster API

  1# Install Cluster API
  2---
  3apiVersion: v1
  4kind: Namespace
  5metadata:
  6  name: cluster-api-system
  7
  8---
  9# Management cluster setup
 10apiVersion: cluster.x-k8s.io/v1beta1
 11kind: Cluster
 12metadata:
 13  name: workload-cluster-us-east
 14  namespace: default
 15spec:
 16  clusterNetwork:
 17    pods:
 18      cidrBlocks: ["192.168.0.0/16"]
 19  infrastructureRef:
 20    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
 21    kind: AWSCluster
 22    name: workload-cluster-us-east
 23  controlPlaneRef:
 24    kind: KubeadmControlPlane
 25    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
 26    name: workload-cluster-us-east-control-plane
 27
 28---
 29# Multi-cluster autoscaling with KubeFed
 30apiVersion: types.kubefed.io/v1beta1
 31kind: FederatedHorizontalPodAutoscaler
 32metadata:
 33  name: federated-app-hpa
 34  namespace: default
 35spec:
 36  # Target deployment across clusters
 37  placement:
 38    clusters:
 39    - name: us-east-1-cluster
 40      weight: 40
 41    - name: eu-west-1-cluster
 42      weight: 30
 43    - name: ap-south-1-cluster
 44      weight: 30
 45
 46  template:
 47    spec:
 48      scaleTargetRef:
 49        apiVersion: apps/v1
 50        kind: Deployment
 51        name: my-app
 52
 53      minReplicas: 3  # Per cluster minimum
 54      maxReplicas: 20 # Per cluster maximum
 55
 56      metrics:
 57      - type: Resource
 58        resource:
 59          name: cpu
 60          target:
 61            type: Utilization
 62            averageUtilization: 70
 63
 64  # Override for specific clusters
 65  overrides:
 66  - clusterName: us-east-1-cluster
 67    clusterOverrides:
 68    - path: "/spec/minReplicas"
 69      value: 5  # Higher baseline in primary region
 70    - path: "/spec/maxReplicas"
 71      value: 50
 72
 73---
 74# Federated deployment
 75apiVersion: types.kubefed.io/v1beta1
 76kind: FederatedDeployment
 77metadata:
 78  name: my-app
 79  namespace: default
 80spec:
 81  placement:
 82    clusters:
 83    - name: us-east-1-cluster
 84    - name: eu-west-1-cluster
 85    - name: ap-south-1-cluster
 86
 87  template:
 88    spec:
 89      replicas: 5
 90      selector:
 91        matchLabels:
 92          app: my-app
 93      template:
 94        metadata:
 95          labels:
 96            app: my-app
 97        spec:
 98          containers:
 99          - name: app
100            image: myapp:v1.0
101            resources:
102              requests:
103                cpu: 500m
104                memory: 512Mi

Pattern 2B: Custom Multi-Cluster Autoscaler

  1// Custom multi-cluster autoscaling controller
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "time"
  8
  9    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 10    "k8s.io/client-go/kubernetes"
 11    "k8s.io/client-go/tools/clientcmd"
 12)
 13
 14type ClusterConfig struct {
 15    Name           string
 16    KubeConfig     string
 17    Region         string
 18    CostPerCPUHour float64
 19    Latency        time.Duration
 20}
 21
 22type MultiClusterAutoscaler struct {
 23    clusters map[string]*kubernetes.Clientset
 24    configs  []ClusterConfig
 25}
 26
 27func NewMultiClusterAutoscaler(configs []ClusterConfig) (*MultiClusterAutoscaler, error) {
 28    mca := &MultiClusterAutoscaler{
 29        clusters: make(map[string]*kubernetes.Clientset),
 30        configs:  configs,
 31    }
 32
 33    // Initialize clients for each cluster
 34    for _, config := range configs {
 35        clientConfig, err := clientcmd.BuildConfigFromFlags("", config.KubeConfig)
 36        if err != nil {
 37            return nil, err
 38        }
 39
 40        clientset, err := kubernetes.NewForConfig(clientConfig)
 41        if err != nil {
 42            return nil, err
 43        }
 44
 45        mca.clusters[config.Name] = clientset
 46    }
 47
 48    return mca, nil
 49}
 50
 51// Decision algorithm: cost-aware + latency-aware scaling
 52func (mca *MultiClusterAutoscaler) ScaleDecision(
 53    ctx context.Context,
 54    totalReplicas int,
 55    userRegion string,
 56) (map[string]int, error) {
 57
 58    allocation := make(map[string]int)
 59
 60    // Step 1: Get current capacity in each cluster
 61    capacities := make(map[string]int)
 62    for name, client := range mca.clusters {
 63        nodes, err := client.CoreV1().Nodes().List(ctx, metav1.ListOptions{})
 64        if err != nil {
 65            return nil, err
 66        }
 67
 68        // Calculate available capacity
 69        var availableCPU int64
 70        for _, node := range nodes.Items {
 71            availableCPU += node.Status.Allocatable.Cpu().MilliValue()
 72        }
 73        capacities[name] = int(availableCPU / 500) // Assume 500m per pod
 74    }
 75
 76    // Step 2: Cost-aware allocation
 77    // Prioritize cheapest region first
 78    sortedConfigs := sortByCost(mca.configs)
 79
 80    remaining := totalReplicas
 81    for _, config := range sortedConfigs {
 82        available := capacities[config.Name]
 83
 84        // Allocate up to available capacity
 85        allocated := min(remaining, available)
 86        allocation[config.Name] = allocated
 87        remaining -= allocated
 88
 89        if remaining == 0 {
 90            break
 91        }
 92    }
 93
 94    // Step 3: Latency-aware adjustment
 95    // If user is in specific region, ensure minimum local replicas
 96    if userRegion != "" {
 97        minLocal := max(3, totalReplicas/10) // At least 10% or 3 replicas
 98        if allocation[userRegion] < minLocal {
 99            allocation[userRegion] = minLocal
100        }
101    }
102
103    return allocation, nil
104}
105
106// Apply scaling decisions to clusters
107func (mca *MultiClusterAutoscaler) ApplyScaling(
108    ctx context.Context,
109    allocation map[string]int,
110    deployment string,
111    namespace string,
112) error {
113
114    for clusterName, replicas := range allocation {
115        client := mca.clusters[clusterName]
116
117        // Update deployment replica count
118        scale, err := client.AppsV1().Deployments(namespace).
119            GetScale(ctx, deployment, metav1.GetOptions{})
120        if err != nil {
121            return fmt.Errorf("failed to get scale for %s in %s: %v",
122                deployment, clusterName, err)
123        }
124
125        scale.Spec.Replicas = int32(replicas)
126
127        _, err = client.AppsV1().Deployments(namespace).
128            UpdateScale(ctx, deployment, scale, metav1.UpdateOptions{})
129        if err != nil {
130            return fmt.Errorf("failed to update scale for %s in %s: %v",
131                deployment, clusterName, err)
132        }
133
134        fmt.Printf("Scaled %s in %s to %d replicas\n",
135            deployment, clusterName, replicas)
136    }
137
138    return nil
139}
140
141func main() {
142    configs := []ClusterConfig{
143        {
144            Name:           "us-east-1",
145            KubeConfig:     "/home/user/.kube/us-east-1",
146            Region:         "us-east-1",
147            CostPerCPUHour: 0.04,
148            Latency:        50 * time.Millisecond,
149        },
150        {
151            Name:           "eu-west-1",
152            KubeConfig:     "/home/user/.kube/eu-west-1",
153            Region:         "eu-west-1",
154            CostPerCPUHour: 0.045,
155            Latency:        100 * time.Millisecond,
156        },
157        {
158            Name:           "ap-south-1",
159            KubeConfig:     "/home/user/.kube/ap-south-1",
160            Region:         "ap-south-1",
161            CostPerCPUHour: 0.038, // Cheapest
162            Latency:        150 * time.Millisecond,
163        },
164    }
165
166    autoscaler, err := NewMultiClusterAutoscaler(configs)
167    if err != nil {
168        panic(err)
169    }
170
171    ctx := context.Background()
172
173    // Main reconciliation loop
174    ticker := time.NewTicker(30 * time.Second)
175    defer ticker.Stop()
176
177    for range ticker.C {
178        // Get total desired replicas from global metrics
179        totalReplicas := calculateGlobalReplicas()
180
181        // Determine optimal allocation
182        allocation, err := autoscaler.ScaleDecision(
183            ctx,
184            totalReplicas,
185            "us-east-1", // Primary user region
186        )
187        if err != nil {
188            fmt.Printf("Error in scale decision: %v\n", err)
189            continue
190        }
191
192        // Apply scaling
193        err = autoscaler.ApplyScaling(
194            ctx,
195            allocation,
196            "my-app",
197            "production",
198        )
199        if err != nil {
200            fmt.Printf("Error applying scaling: %v\n", err)
201        }
202    }
203}
204
205func calculateGlobalReplicas() int {
206    // Aggregate metrics from all clusters
207    // Calculate desired total replicas
208    // This would query Prometheus/Thanos for global metrics
209    return 50 // Placeholder
210}
211
212func sortByCost(configs []ClusterConfig) []ClusterConfig {
213    // Sort by cost (cheapest first)
214    sorted := make([]ClusterConfig, len(configs))
215    copy(sorted, configs)
216    // ... sorting logic
217    return sorted
218}
219
220func min(a, b int) int {
221    if a < b {
222        return a
223    }
224    return b
225}
226
227func max(a, b int) int {
228    if a > b {
229        return a
230    }
231    return b
232}

Pattern 2C: Global Metrics Aggregation with Thanos

 1# Thanos setup for multi-cluster metrics
 2---
 3# Thanos Sidecar on each cluster's Prometheus
 4apiVersion: apps/v1
 5kind: StatefulSet
 6metadata:
 7  name: prometheus
 8  namespace: monitoring
 9spec:
10  template:
11    spec:
12      containers:
13      # Prometheus
14      - name: prometheus
15        image: prom/prometheus:latest
16        args:
17        - --storage.tsdb.path=/prometheus
18        - --storage.tsdb.min-block-duration=2h
19        - --storage.tsdb.max-block-duration=2h
20        volumeMounts:
21        - name: storage
22          mountPath: /prometheus
23
24      # Thanos Sidecar
25      - name: thanos-sidecar
26        image: thanosio/thanos:latest
27        args:
28        - sidecar
29        - --prometheus.url=http://localhost:9090
30        - --tsdb.path=/prometheus
31        - --objstore.config-file=/etc/thanos/objstore.yaml
32        - --grpc-address=0.0.0.0:10901
33        volumeMounts:
34        - name: storage
35          mountPath: /prometheus
36        - name: objstore-config
37          mountPath: /etc/thanos
38        ports:
39        - containerPort: 10901
40          name: grpc
41
42---
43# Thanos Query (global query layer)
44apiVersion: apps/v1
45kind: Deployment
46metadata:
47  name: thanos-query
48  namespace: monitoring
49spec:
50  replicas: 2
51  template:
52    spec:
53      containers:
54      - name: thanos-query
55        image: thanosio/thanos:latest
56        args:
57        - query
58        - --http-address=0.0.0.0:9090
59        - --grpc-address=0.0.0.0:10901
60        # Connect to all cluster Prometheus instances
61        - --store=prometheus-us-east-1.monitoring.svc.cluster.local:10901
62        - --store=prometheus-eu-west-1.monitoring.svc.cluster.local:10901
63        - --store=prometheus-ap-south-1.monitoring.svc.cluster.local:10901
64        - --query.replica-label=replica
65        ports:
66        - containerPort: 9090
67          name: http
68        - containerPort: 10901
69          name: grpc
70
71---
72# Global HPA using Thanos metrics
73apiVersion: v1
74kind: ConfigMap
75metadata:
76  name: prometheus-adapter-thanos
77  namespace: monitoring
78data:
79  config.yaml: |
80    rules:
81    # Global request rate across all clusters
82    - seriesQuery: 'http_requests_total{job="my-app"}'
83      resources:
84        template: <<.Resource>>
85      name:
86        as: "global_requests_per_second"
87      metricsQuery: |
88        sum(rate(http_requests_total{job="my-app"}[2m]))
89
90    # Global CPU usage
91    - seriesQuery: 'container_cpu_usage_seconds_total{pod=~"my-app.*"}'
92      resources:
93        overrides:
94          namespace: {resource: "namespace"}
95      name:
96        as: "global_cpu_usage"
97      metricsQuery: |
98        sum(rate(container_cpu_usage_seconds_total{pod=~"my-app.*"}[5m]))    

Pattern 3: Aggressive Cost Optimization

Spot Instance Strategy with Multiple Fallbacks

  1# Karpenter NodePool with spot + on-demand mix
  2---
  3apiVersion: karpenter.sh/v1beta1
  4kind: NodePool
  5metadata:
  6  name: cost-optimized-spot
  7spec:
  8  template:
  9    metadata:
 10      labels:
 11        workload-type: spot-eligible
 12        cost-optimized: "true"
 13    spec:
 14      requirements:
 15      # Maximize spot instance types for availability
 16      - key: karpenter.sh/capacity-type
 17        operator: In
 18        values: ["spot"]
 19
 20      # Allow wide range of instance types
 21      - key: karpenter.k8s.aws/instance-category
 22        operator: In
 23        values: ["c", "m", "r", "t"]  # Compute, general, memory, burstable
 24
 25      - key: karpenter.k8s.aws/instance-generation
 26        operator: Gt
 27        values: ["4"]  # Generation 5+
 28
 29      # Size flexibility
 30      - key: karpenter.k8s.aws/instance-size
 31        operator: In
 32        values: ["large", "xlarge", "2xlarge", "4xlarge"]
 33
 34      nodeClassRef:
 35        name: cost-optimized
 36
 37  # Aggressive consolidation
 38  disruption:
 39    consolidationPolicy: WhenUnderutilized
 40    consolidateAfter: 30s
 41    expireAfter: 12h  # Refresh nodes every 12 hours
 42
 43  limits:
 44    cpu: "500"
 45    memory: 1000Gi
 46
 47---
 48# On-demand fallback NodePool
 49apiVersion: karpenter.sh/v1beta1
 50kind: NodePool
 51metadata:
 52  name: on-demand-fallback
 53spec:
 54  template:
 55    metadata:
 56      labels:
 57        workload-type: on-demand-fallback
 58    spec:
 59      requirements:
 60      - key: karpenter.sh/capacity-type
 61        operator: In
 62        values: ["on-demand"]
 63
 64      - key: karpenter.k8s.aws/instance-category
 65        operator: In
 66        values: ["m", "c"]
 67
 68      nodeClassRef:
 69        name: cost-optimized
 70
 71  weight: 10  # Lower priority, used when spot unavailable
 72
 73  limits:
 74    cpu: "200"
 75
 76---
 77# Application deployment with spot tolerance
 78apiVersion: apps/v1
 79kind: Deployment
 80metadata:
 81  name: cost-sensitive-app
 82  namespace: production
 83spec:
 84  replicas: 10
 85  template:
 86    spec:
 87      # Prefer spot nodes
 88      affinity:
 89        nodeAffinity:
 90          preferredDuringSchedulingIgnoredDuringExecution:
 91          - weight: 100
 92            preference:
 93              matchExpressions:
 94              - key: karpenter.sh/capacity-type
 95                operator: In
 96                values: ["spot"]
 97
 98          # Fallback to on-demand if needed
 99          - weight: 50
100            preference:
101              matchExpressions:
102              - key: workload-type
103                operator: In
104                values: ["on-demand-fallback"]
105
106      # Tolerate spot interruptions
107      tolerations:
108      - key: karpenter.sh/disruption
109        operator: Exists
110        effect: NoSchedule
111
112      # Topology spread for availability
113      topologySpreadConstraints:
114      - maxSkew: 1
115        topologyKey: topology.kubernetes.io/zone
116        whenUnsatisfiable: DoNotSchedule
117        labelSelector:
118          matchLabels:
119            app: cost-sensitive-app
120
121      containers:
122      - name: app
123        image: myapp:v1.0
124        resources:
125          requests:
126            cpu: 500m
127            memory: 512Mi
128
129---
130# PDB to handle spot interruptions gracefully
131apiVersion: policy/v1
132kind: PodDisruptionBudget
133metadata:
134  name: cost-sensitive-app-pdb
135  namespace: production
136spec:
137  minAvailable: 70%  # Keep 70% pods running during spot interruptions
138  selector:
139    matchLabels:
140      app: cost-sensitive-app

Cost-Aware Scheduling with Custom Scheduler

 1// Custom scheduler plugin for cost-aware pod placement
 2package main
 3
 4import (
 5    "context"
 6    "fmt"
 7
 8    v1 "k8s.io/api/core/v1"
 9    "k8s.io/apimachinery/pkg/runtime"
10    "k8s.io/kubernetes/pkg/scheduler/framework"
11)
12
13type CostAwarePlugin struct {
14    handle framework.Handle
15}
16
17var _ framework.ScorePlugin = &CostAwarePlugin{}
18
19// Pricing data (could be fetched from external API)
20var instancePricing = map[string]float64{
21    "t3.large":     0.0832,
22    "m5.large":     0.096,
23    "c5.large":     0.085,
24    "m5.xlarge":    0.192,
25    "c5.xlarge":    0.17,
26    "r5.large":     0.126,
27    "spot-t3.large": 0.0250,  // ~70% savings
28    "spot-m5.large": 0.0288,
29    "spot-c5.large": 0.0255,
30}
31
32func (c *CostAwarePlugin) Name() string {
33    return "CostAwarePlugin"
34}
35
36// Score nodes based on cost
37func (c *CostAwarePlugin) Score(
38    ctx context.Context,
39    state *framework.CycleState,
40    pod *v1.Pod,
41    nodeName string,
42) (int64, *framework.Status) {
43
44    nodeInfo, err := c.handle.SnapshotSharedLister().NodeInfos().Get(nodeName)
45    if err != nil {
46        return 0, framework.NewStatus(framework.Error, fmt.Sprintf("getting node %q: %v", nodeName, err))
47    }
48
49    node := nodeInfo.Node()
50
51    // Get instance type from node labels
52    instanceType := node.Labels["node.kubernetes.io/instance-type"]
53    capacityType := node.Labels["karpenter.sh/capacity-type"]
54
55    // Determine pricing key
56    pricingKey := instanceType
57    if capacityType == "spot" {
58        pricingKey = "spot-" + instanceType
59    }
60
61    // Get cost
62    cost, exists := instancePricing[pricingKey]
63    if !exists {
64        cost = 0.1 // Default cost if unknown
65    }
66
67    // Convert to score (lower cost = higher score)
68    // Normalize: max price 0.2, min price 0.02
69    // Score range: 0-100
70    normalizedCost := (cost - 0.02) / (0.2 - 0.02)
71    score := int64((1 - normalizedCost) * 100)
72
73    // Bonus for spot instances
74    if capacityType == "spot" {
75        score += 20
76    }
77
78    return score, framework.NewStatus(framework.Success)
79}
80
81// ScoreExtensions of the Score plugin
82func (c *CostAwarePlugin) ScoreExtensions() framework.ScoreExtensions {
83    return c
84}
85
86// NormalizeScore is called after scoring all nodes
87func (c *CostAwarePlugin) NormalizeScore(
88    ctx context.Context,
89    state *framework.CycleState,
90    pod *v1.Pod,
91    scores framework.NodeScoreList,
92) *framework.Status {
93    // Scores are already normalized in Score()
94    return framework.NewStatus(framework.Success)
95}
96
97func New(_ runtime.Object, h framework.Handle) (framework.Plugin, error) {
98    return &CostAwarePlugin{handle: h}, nil
99}

FinOps Dashboard and Automation

  1# CronJob for daily cost optimization report
  2---
  3apiVersion: batch/v1
  4kind: CronJob
  5metadata:
  6  name: cost-optimization-report
  7  namespace: finops
  8spec:
  9  schedule: "0 9 * * *"  # Daily at 9 AM
 10  jobTemplate:
 11    spec:
 12      template:
 13        spec:
 14          serviceAccountName: finops-reporter
 15          containers:
 16          - name: reporter
 17            image: finops-reporter:latest
 18            env:
 19            - name: PROMETHEUS_URL
 20              value: "http://prometheus.monitoring:9090"
 21            - name: SLACK_WEBHOOK
 22              valueFrom:
 23                secretKeyRef:
 24                  name: slack-webhook
 25                  key: url
 26            command:
 27            - /bin/bash
 28            - -c
 29            - |
 30              #!/bin/bash
 31
 32              echo "=== Daily Cost Optimization Report ==="
 33              echo ""
 34
 35              # Calculate total cluster cost
 36              TOTAL_CPU=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=sum(kube_pod_container_resource_requests{resource='cpu'})" | jq -r '.data.result[0].value[1]')
 37              TOTAL_MEM=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=sum(kube_pod_container_resource_requests{resource='memory'})" | jq -r '.data.result[0].value[1]')
 38
 39              CPU_COST=$(echo "$TOTAL_CPU * 0.04 * 24" | bc)
 40              MEM_COST=$(echo "$TOTAL_MEM / 1073741824 * 0.005 * 24" | bc)
 41              DAILY_COST=$(echo "$CPU_COST + $MEM_COST" | bc)
 42
 43              echo "Daily Cost: \$${DAILY_COST}"
 44              echo ""
 45
 46              # Identify optimization opportunities
 47              echo "=== Optimization Opportunities ==="
 48
 49              # Over-provisioned workloads (VPA recommendations)
 50              curl -s "$PROMETHEUS_URL/api/v1/query?query=(kube_pod_container_resource_requests{resource='cpu'} - on(pod) kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource='cpu'}) / kube_pod_container_resource_requests{resource='cpu'} > 0.5" \
 51                | jq -r '.data.result[] | "\(.metric.namespace)/\(.metric.pod): \(.value[1] * 100)% over-provisioned"'
 52
 53              # Spot instance opportunities
 54              ONDEMAND_COUNT=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=count(kube_node_labels{label_karpenter_sh_capacity_type='on-demand'})" | jq -r '.data.result[0].value[1]')
 55              echo ""
 56              echo "On-demand nodes: $ONDEMAND_COUNT (Consider spot instances for 70% savings)"
 57
 58              # Send to Slack
 59              curl -X POST $SLACK_WEBHOOK \
 60                -H 'Content-Type: application/json' \
 61                -d "{\"text\": \"Daily Cost Report: \\\$${DAILY_COST}\"}"              
 62
 63          restartPolicy: OnFailure
 64
 65---
 66# PrometheusRule for cost alerts
 67apiVersion: monitoring.coreos.com/v1
 68kind: PrometheusRule
 69metadata:
 70  name: cost-alerts
 71  namespace: monitoring
 72spec:
 73  groups:
 74  - name: cost-optimization
 75    interval: 1h
 76    rules:
 77    # Alert when daily cost exceeds budget
 78    - alert: DailyCostExceedsBudget
 79      expr: |
 80        (
 81          sum(kube_pod_container_resource_requests{resource="cpu"}) * 0.04 +
 82          sum(kube_pod_container_resource_requests{resource="memory"}) / 1073741824 * 0.005
 83        ) * 24 > 1000        
 84      labels:
 85        severity: warning
 86        team: finops
 87      annotations:
 88        summary: "Daily infrastructure cost exceeds $1000"
 89        description: "Current daily cost: ${{ $value }}"
 90
 91    # Alert on underutilized nodes
 92    - alert: UnderutilizedNodes
 93      expr: |
 94        (
 95          sum(kube_node_status_allocatable{resource="cpu"}) -
 96          sum(kube_pod_container_resource_requests{resource="cpu"})
 97        ) / sum(kube_node_status_allocatable{resource="cpu"}) > 0.5        
 98      for: 2h
 99      labels:
100        severity: info
101        team: platform
102      annotations:
103        summary: "Cluster has >50% unused CPU capacity"
104        description: "Consider scaling down or consolidating workloads"
105
106    # Spot savings opportunity
107    - alert: SpotSavingsOpportunity
108      expr: |
109        count(kube_node_labels{label_karpenter_sh_capacity_type="on-demand"})
110        /
111        count(kube_node_labels)
112        > 0.3        
113      for: 4h
114      labels:
115        severity: info
116        team: finops
117      annotations:
118        summary: ">30% on-demand nodes detected"
119        description: "Evaluate workloads for spot instance eligibility (70% potential savings)"

Pattern 4: Batch Job & Queue-Based Autoscaling

Pattern 4A: Kubernetes Job Autoscaling with KEDA

  1# KEDA ScaledJob for queue-driven batch processing
  2---
  3apiVersion: v1
  4kind: Secret
  5metadata:
  6  name: aws-sqs-credentials
  7  namespace: batch-processing
  8type: Opaque
  9stringData:
 10  AWS_ACCESS_KEY_ID: "AKIAIOSFODNN7EXAMPLE"
 11  AWS_SECRET_ACCESS_KEY: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
 12
 13---
 14apiVersion: keda.sh/v1alpha1
 15kind: TriggerAuthentication
 16metadata:
 17  name: aws-sqs-auth
 18  namespace: batch-processing
 19spec:
 20  secretTargetRef:
 21  - parameter: awsAccessKeyID
 22    name: aws-sqs-credentials
 23    key: AWS_ACCESS_KEY_ID
 24  - parameter: awsSecretAccessKey
 25    name: aws-sqs-credentials
 26    key: AWS_SECRET_ACCESS_KEY
 27
 28---
 29# ScaledJob (not Deployment) for batch processing
 30apiVersion: keda.sh/v1alpha1
 31kind: ScaledJob
 32metadata:
 33  name: image-processing-job
 34  namespace: batch-processing
 35spec:
 36  # Job template
 37  jobTargetRef:
 38    template:
 39      spec:
 40        containers:
 41        - name: processor
 42          image: image-processor:v1.0
 43          env:
 44          - name: SQS_QUEUE_URL
 45            value: "https://sqs.us-west-2.amazonaws.com/123456789/image-queue"
 46          - name: AWS_REGION
 47            value: "us-west-2"
 48          resources:
 49            requests:
 50              cpu: 2
 51              memory: 4Gi
 52            limits:
 53              cpu: 4
 54              memory: 8Gi
 55        restartPolicy: OnFailure
 56
 57  # Polling interval
 58  pollingInterval: 10  # Check queue every 10 seconds
 59
 60  # Cooldown period
 61  cooldownPeriod: 60   # Wait 60s after last trigger before scaling down
 62
 63  # Max replicas
 64  maxReplicaCount: 100
 65
 66  # Successful job retention
 67  successfulJobsHistoryLimit: 5
 68  failedJobsHistoryLimit: 5
 69
 70  # Scaling strategy
 71  scalingStrategy:
 72    strategy: "accurate"  # Create jobs based on queue length
 73    # "default" = one job per event
 74    # "custom" = custom logic
 75    # "accurate" = jobs = queue length / messages per job
 76
 77  triggers:
 78  - type: aws-sqs-queue
 79    authenticationRef:
 80      name: aws-sqs-auth
 81    metadata:
 82      queueURL: "https://sqs.us-west-2.amazonaws.com/123456789/image-queue"
 83      queueLength: "5"     # Process 5 messages per job
 84      awsRegion: "us-west-2"
 85      identityOwner: "operator"
 86
 87---
 88# Alternative: Kafka-based job scaling
 89apiVersion: keda.sh/v1alpha1
 90kind: ScaledJob
 91metadata:
 92  name: kafka-consumer-job
 93  namespace: batch-processing
 94spec:
 95  jobTargetRef:
 96    template:
 97      spec:
 98        containers:
 99        - name: consumer
100          image: kafka-consumer:v1.0
101          env:
102          - name: KAFKA_BROKERS
103            value: "kafka:9092"
104          - name: KAFKA_TOPIC
105            value: "events"
106          - name: KAFKA_CONSUMER_GROUP
107            value: "batch-processors"
108        restartPolicy: OnFailure
109
110  pollingInterval: 15
111  maxReplicaCount: 50
112
113  triggers:
114  - type: kafka
115    metadata:
116      bootstrapServers: "kafka:9092"
117      consumerGroup: "batch-processors"
118      topic: "events"
119      lagThreshold: "100"  # Create job when lag > 100 messages
120      offsetResetPolicy: "latest"

Pattern 4B: ML Training Job Autoscaling with Volcano

  1# Install Volcano scheduler
  2---
  3apiVersion: v1
  4kind: Namespace
  5metadata:
  6  name: volcano-system
  7
  8---
  9# Volcano scheduler deployment
 10# (Use official Volcano installation)
 11
 12---
 13# ML Training job with gang scheduling
 14apiVersion: batch.volcano.sh/v1alpha1
 15kind: Job
 16metadata:
 17  name: distributed-training
 18  namespace: ml-training
 19spec:
 20  # Minimum pods required to start job
 21  minAvailable: 4  # 1 master + 3 workers minimum
 22
 23  schedulerName: volcano
 24
 25  # Queue for resource management
 26  queue: ml-training-queue
 27
 28  # Plugins
 29  plugins:
 30    ssh: []        # Enable SSH between pods
 31    svc: []        # Create service for pod communication
 32    env: []        # Environment variable injection
 33
 34  # Policies
 35  policies:
 36  - event: PodEvicted
 37    action: RestartJob
 38  - event: PodFailed
 39    action: RestartJob
 40
 41  # Task groups
 42  tasks:
 43  # Master task
 44  - name: master
 45    replicas: 1
 46    template:
 47      spec:
 48        containers:
 49        - name: tensorflow
 50          image: tensorflow/tensorflow:latest-gpu
 51          command:
 52          - python
 53          - train.py
 54          - --role=master
 55          resources:
 56            requests:
 57              cpu: 4
 58              memory: 16Gi
 59              nvidia.com/gpu: 1
 60            limits:
 61              cpu: 8
 62              memory: 32Gi
 63              nvidia.com/gpu: 1
 64
 65  # Worker tasks (auto-scalable)
 66  - name: worker
 67    replicas: 3
 68    minAvailable: 1  # At least 1 worker
 69    template:
 70      spec:
 71        containers:
 72        - name: tensorflow
 73          image: tensorflow/tensorflow:latest-gpu
 74          command:
 75          - python
 76          - train.py
 77          - --role=worker
 78          resources:
 79            requests:
 80              cpu: 8
 81              memory: 32Gi
 82              nvidia.com/gpu: 2
 83            limits:
 84              cpu: 16
 85              memory: 64Gi
 86              nvidia.com/gpu: 2
 87
 88  # Parameter server tasks
 89  - name: ps
 90    replicas: 2
 91    template:
 92      spec:
 93        containers:
 94        - name: tensorflow
 95          image: tensorflow/tensorflow:latest
 96          command:
 97          - python
 98          - train.py
 99          - --role=ps
100          resources:
101            requests:
102              cpu: 2
103              memory: 8Gi
104            limits:
105              cpu: 4
106              memory: 16Gi
107
108---
109# Queue with capacity limits
110apiVersion: scheduling.volcano.sh/v1beta1
111kind: Queue
112metadata:
113  name: ml-training-queue
114spec:
115  weight: 1
116  capability:
117    cpu: "100"
118    memory: "500Gi"
119    nvidia.com/gpu: "20"
120
121---
122# HPA for worker pods (scale workers based on GPU utilization)
123apiVersion: autoscaling/v2
124kind: HorizontalPodAutoscaler
125metadata:
126  name: training-workers-hpa
127  namespace: ml-training
128spec:
129  scaleTargetRef:
130    apiVersion: batch.volcano.sh/v1alpha1
131    kind: Job
132    name: distributed-training
133
134  minReplicas: 3
135  maxReplicas: 20
136
137  metrics:
138  # GPU utilization
139  - type: Pods
140    pods:
141      metric:
142        name: DCGM_FI_DEV_GPU_UTIL
143      target:
144        type: AverageValue
145        averageValue: "80"  # Target 80% GPU utilization
146
147  # Training throughput
148  - type: Pods
149    pods:
150      metric:
151        name: training_samples_per_second
152      target:
153        type: AverageValue
154        averageValue: "1000"

Pattern 4C: Scheduled Autoscaling (Predictive)

 1# CronHPA for scheduled scaling
 2---
 3apiVersion: autoscaling.alibabacloud.com/v1beta1
 4kind: CronHorizontalPodAutoscaler
 5metadata:
 6  name: business-hours-scaling
 7  namespace: production
 8spec:
 9  scaleTargetRef:
10    apiVersion: apps/v1
11    kind: Deployment
12    name: api-server
13
14  # Business hours scaling schedule
15  jobs:
16  # Scale up for morning traffic (8 AM)
17  - name: morning-scale-up
18    schedule: "0 8 * * 1-5"  # Weekdays at 8 AM
19    targetSize: 20
20
21  # Scale up for lunch traffic (12 PM)
22  - name: lunch-scale-up
23    schedule: "0 12 * * 1-5"
24    targetSize: 30
25
26  # Scale down for evening (6 PM)
27  - name: evening-scale-down
28    schedule: "0 18 * * 1-5"
29    targetSize: 15
30
31  # Scale down for night (10 PM)
32  - name: night-scale-down
33    schedule: "0 22 * * *"
34    targetSize: 5
35
36  # Weekend minimal scaling
37  - name: weekend-minimal
38    schedule: "0 0 * * 0,6"  # Midnight on Sat/Sun
39    targetSize: 3
40
41---
42# Alternative: Using native CronJob + kubectl scale
43apiVersion: batch/v1
44kind: CronJob
45metadata:
46  name: morning-scale-up
47  namespace: production
48spec:
49  schedule: "0 8 * * 1-5"
50  jobTemplate:
51    spec:
52      template:
53        spec:
54          serviceAccountName: autoscaler
55          containers:
56          - name: kubectl
57            image: bitnami/kubectl:latest
58            command:
59            - kubectl
60            - scale
61            - deployment/api-server
62            - --replicas=20
63            - -n
64            - production
65          restartPolicy: OnFailure

Pattern 5: Emerging Technologies & Future Patterns

Pattern 5A: Predictive Autoscaling with Machine Learning

  1# ML-based predictive autoscaling model
  2import pandas as pd
  3import numpy as np
  4from sklearn.ensemble import RandomForestRegressor
  5from kubernetes import client, config
  6import datetime
  7
  8class PredictiveAutoscaler:
  9    def __init__(self):
 10        config.load_kube_config()
 11        self.apps_v1 = client.AppsV1Api()
 12        self.model = RandomForestRegressor(n_estimators=100)
 13        self.is_trained = False
 14
 15    def collect_training_data(self, days=30):
 16        """Collect historical data for training"""
 17        # Query Prometheus for historical metrics
 18        # Features: hour, day_of_week, month, previous_load, etc.
 19        # Target: actual_replicas_needed
 20
 21        data = {
 22            'hour': [],
 23            'day_of_week': [],
 24            'month': [],
 25            'previous_load': [],
 26            'previous_replicas': [],
 27            'actual_replicas': []
 28        }
 29
 30        # Fetch from Prometheus
 31        # ... (implementation details)
 32
 33        return pd.DataFrame(data)
 34
 35    def train(self):
 36        """Train the prediction model"""
 37        df = self.collect_training_data()
 38
 39        X = df[['hour', 'day_of_week', 'month', 'previous_load', 'previous_replicas']]
 40        y = df['actual_replicas']
 41
 42        self.model.fit(X, y)
 43        self.is_trained = True
 44
 45        print(f"Model trained with {len(df)} samples")
 46        print(f"Feature importances: {self.model.feature_importances_}")
 47
 48    def predict_replicas(self, deployment, namespace):
 49        """Predict required replicas for next hour"""
 50        if not self.is_trained:
 51            raise Exception("Model not trained")
 52
 53        now = datetime.datetime.now()
 54
 55        # Current state
 56        deployment_obj = self.apps_v1.read_namespaced_deployment(
 57            deployment, namespace
 58        )
 59        current_replicas = deployment_obj.spec.replicas
 60
 61        # Get current load from Prometheus
 62        current_load = self.get_current_load(deployment, namespace)
 63
 64        # Prepare features
 65        features = np.array([[
 66            now.hour,
 67            now.weekday(),
 68            now.month,
 69            current_load,
 70            current_replicas
 71        ]])
 72
 73        # Predict
 74        predicted_replicas = int(self.model.predict(features)[0])
 75
 76        # Apply safety bounds
 77        min_replicas = 2
 78        max_replicas = 100
 79        predicted_replicas = max(min_replicas, min(predicted_replicas, max_replicas))
 80
 81        return predicted_replicas
 82
 83    def apply_scaling(self, deployment, namespace, replicas):
 84        """Apply predicted scaling"""
 85        body = {
 86            'spec': {
 87                'replicas': replicas
 88            }
 89        }
 90
 91        self.apps_v1.patch_namespaced_deployment_scale(
 92            deployment,
 93            namespace,
 94            body
 95        )
 96
 97        print(f"Scaled {deployment} to {replicas} replicas")
 98
 99    def run(self, deployment, namespace, interval=300):
100        """Main loop"""
101        import time
102
103        while True:
104            try:
105                predicted = self.predict_replicas(deployment, namespace)
106                self.apply_scaling(deployment, namespace, predicted)
107
108                print(f"[{datetime.datetime.now()}] Scaled to {predicted} replicas")
109
110            except Exception as e:
111                print(f"Error: {e}")
112
113            time.sleep(interval)  # Every 5 minutes
114
115# Usage
116if __name__ == "__main__":
117    autoscaler = PredictiveAutoscaler()
118    autoscaler.train()
119    autoscaler.run("api-server", "production")

Pattern 5B: Serverless Kubernetes with Knative

 1# Knative Service with autoscaling
 2---
 3apiVersion: serving.knative.dev/v1
 4kind: Service
 5metadata:
 6  name: knative-app
 7  namespace: serverless
 8spec:
 9  template:
10    metadata:
11      annotations:
12        # Autoscaling configuration
13        autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
14        autoscaling.knative.dev/metric: "concurrency"
15        autoscaling.knative.dev/target: "10"  # Target 10 concurrent requests
16        autoscaling.knative.dev/minScale: "0"  # Scale to zero
17        autoscaling.knative.dev/maxScale: "100"
18        autoscaling.knative.dev/scaleDownDelay: "30s"
19        autoscaling.knative.dev/window: "60s"  # Evaluation window
20
21    spec:
22      containers:
23      - image: myapp:v1.0
24        ports:
25        - containerPort: 8080
26        resources:
27          requests:
28            cpu: 100m
29            memory: 128Mi
30          limits:
31            cpu: 1000m
32            memory: 512Mi
33
34---
35# Advanced: RPS-based autoscaling
36apiVersion: serving.knative.dev/v1
37kind: Service
38metadata:
39  name: rps-based-app
40  namespace: serverless
41spec:
42  template:
43    metadata:
44      annotations:
45        autoscaling.knative.dev/metric: "rps"  # Requests per second
46        autoscaling.knative.dev/target: "100"   # Target 100 RPS per pod
47        autoscaling.knative.dev/targetUtilizationPercentage: "70"
48    spec:
49      containers:
50      - image: myapp:v1.0
51        resources:
52          requests:
53            cpu: 200m
54            memory: 256Mi

Pattern 5C: Service Mesh Integration (Istio)

 1# Istio VirtualService with traffic-based autoscaling
 2---
 3apiVersion: networking.istio.io/v1beta1
 4kind: VirtualService
 5metadata:
 6  name: my-app
 7  namespace: production
 8spec:
 9  hosts:
10  - my-app.example.com
11  http:
12  - match:
13    - headers:
14        x-version:
15          exact: canary
16    route:
17    - destination:
18        host: my-app
19        subset: canary
20      weight: 10  # 10% traffic to canary
21  - route:
22    - destination:
23        host: my-app
24        subset: stable
25      weight: 90
26
27---
28# DestinationRule
29apiVersion: networking.istio.io/v1beta1
30kind: DestinationRule
31metadata:
32  name: my-app
33  namespace: production
34spec:
35  host: my-app
36  subsets:
37  - name: stable
38    labels:
39      version: stable
40  - name: canary
41    labels:
42      version: canary
43
44---
45# HPA using Istio metrics
46apiVersion: autoscaling/v2
47kind: HorizontalPodAutoscaler
48metadata:
49  name: my-app-istio-hpa
50  namespace: production
51spec:
52  scaleTargetRef:
53    apiVersion: apps/v1
54    kind: Deployment
55    name: my-app
56
57  minReplicas: 2
58  maxReplicas: 50
59
60  metrics:
61  # Istio request rate
62  - type: Pods
63    pods:
64      metric:
65        name: istio_requests_per_second
66      target:
67        type: AverageValue
68        averageValue: "100"
69
70  # Istio P99 latency
71  - type: Pods
72    pods:
73      metric:
74        name: istio_request_duration_p99
75      target:
76        type: AverageValue
77        averageValue: "200m"  # 200ms
78
79---
80# Prometheus rules for Istio metrics
81apiVersion: monitoring.coreos.com/v1
82kind: PrometheusRule
83metadata:
84  name: istio-custom-metrics
85  namespace: monitoring
86spec:
87  groups:
88  - name: istio-autoscaling
89    interval: 15s
90    rules:
91    - record: istio_requests_per_second
92      expr: |
93        sum(rate(istio_requests_total{destination_workload="my-app"}[2m])) by (pod)        
94
95    - record: istio_request_duration_p99
96      expr: |
97        histogram_quantile(0.99,
98          sum(rate(istio_request_duration_milliseconds_bucket{destination_workload="my-app"}[2m])) by (pod, le)
99        )        

Best Practices Summary

Stateful Applications

✅ Use conservative scaling policies (slower scale-up/down) ✅ Implement proper health checks and readiness probes ✅ Plan for data synchronization time ✅ Use PVCs with appropriate storage classes ✅ Consider split architectures (read/write separation)

Multi-Cluster

✅ Centralize metrics with Thanos or Prometheus federation ✅ Implement intelligent routing with global load balancers ✅ Use cost-aware scheduling ✅ Plan for cross-cluster failover ✅ Monitor inter-cluster latency

Cost Optimization

✅ Maximize spot instance usage (70-90% savings) ✅ Implement aggressive consolidation ✅ Use FinOps dashboards for visibility ✅ Set up cost alerts and budgets ✅ Regular right-sizing reviews

Batch Jobs

✅ Use KEDA ScaledJobs for queue-driven processing ✅ Implement proper job cleanup policies ✅ Set resource limits to prevent runaway costs ✅ Use gang scheduling for distributed jobs ✅ Monitor job success rates

Key Takeaways

  1. Stateful Scaling: Requires careful planning, slower policies, and split read/write architectures
  2. Multi-Cluster: Centralized metrics and intelligent distribution critical for success
  3. Cost Optimization: Spot instances + right-sizing + consolidation = 60-80% savings
  4. Batch Processing: Queue-based autoscaling with KEDA scales jobs efficiently
  5. Future: ML-based prediction, serverless K8s, and service mesh integration emerging

Autoscaling Series

Conclusion

Advanced autoscaling patterns unlock significant value:

  • Stateful applications can scale safely with proper planning
  • Multi-cluster deployments enable global scale and resilience
  • Cost optimization delivers 60-80% infrastructure savings
  • Batch processing scales efficiently with queue-based triggers
  • Emerging technologies push boundaries of what’s possible

These patterns, combined with foundational HPA and VPA, create comprehensive autoscaling architectures that balance performance, cost, and reliability at scale.

Next up: Part 7 - Production Troubleshooting & War Stories 🔧

Happy scaling! 🚀