Kubernetes Autoscaling Complete Guide (Part 6): Advanced Autoscaling Patterns

November 9, 2025 • 40 min read • Yennj12 team

all engineering devops kubernetes architecture

Kubernetes K8S Autoscaling StatefulSet Multi-Cluster Cost Optimization Spot Instances FinOps Advanced Patterns Batch Jobs

Series Overview

This is Part 6 of the Kubernetes Autoscaling Complete Guide series:

Part 1: Horizontal Pod Autoscaler - Application-level autoscaling theory
Part 2: Cluster Autoscaling & Cloud Providers - Infrastructure-level autoscaling
Part 3: Hands-On HPA Demo - Practical implementation
Part 4: Monitoring, Alerting & Threshold Tuning - Production observability
Part 5: VPA & Resource Optimization - Right-sizing strategies
Part 6 (This Post): Advanced Autoscaling Patterns - Stateful apps, multi-cluster, cost optimization

Beyond basic HPA and cluster autoscaling, production Kubernetes deployments require sophisticated patterns for stateful workloads, multi-cluster architectures, aggressive cost optimization, and specialized workload types. This guide explores advanced autoscaling strategies used by leading organizations.

Pattern 1: Stateful Application Autoscaling

The StatefulSet Challenge

Traditional HPA with StatefulSets:
┌────────────────────────────────────────────────────────────┐
│  CHALLENGES                                                │
│                                                             │
│  1. Ordered Pod Creation/Deletion                          │
│     • pod-0 must exist before pod-1                        │
│     • Slow scale-up during traffic spikes                  │
│                                                             │
│  2. Persistent Volumes                                      │
│     • Each pod has unique PVC                              │
│     • Storage costs accumulate                             │
│     • PVCs remain after scale-down                         │
│                                                             │
│  3. State Synchronization                                   │
│     • New pods must sync state (databases, caches)         │
│     • Sync time adds to scale-up latency                   │
│     • Potential data consistency issues                    │
│                                                             │
│  4. Service Discovery                                       │
│     • Clients must discover new pods                       │
│     • DNS updates take time                                │
│     • Connection draining needed on scale-down             │
└────────────────────────────────────────────────────────────┘

Pattern 1A: Database Scaling with StatefulSet

Scenario: PostgreSQL cluster with read replicas that scale based on read query load.

  1# PostgreSQL StatefulSet
  2apiVersion: apps/v1
  3kind: StatefulSet
  4metadata:
  5  name: postgres-replicas
  6  namespace: databases
  7spec:
  8  serviceName: postgres-replicas
  9  replicas: 2  # Initial: 1 primary + 1 replica
 10
 11  selector:
 12    matchLabels:
 13      app: postgres
 14      role: replica
 15
 16  template:
 17    metadata:
 18      labels:
 19        app: postgres
 20        role: replica
 21      annotations:
 22        prometheus.io/scrape: "true"
 23        prometheus.io/port: "9187"  # postgres_exporter
 24    spec:
 25      initContainers:
 26      # Initialize replica from primary
 27      - name: init-replica
 28        image: postgres:15
 29        command:
 30        - bash
 31        - -c
 32        - |
 33          if [ ! -f /var/lib/postgresql/data/PG_VERSION ]; then
 34            # Clone from primary
 35            pg_basebackup -h postgres-primary -D /var/lib/postgresql/data -U replication -v -P
 36            # Create recovery signal
 37            touch /var/lib/postgresql/data/standby.signal
 38          fi          
 39        volumeMounts:
 40        - name: data
 41          mountPath: /var/lib/postgresql/data
 42
 43      containers:
 44      # PostgreSQL replica
 45      - name: postgres
 46        image: postgres:15
 47        env:
 48        - name: POSTGRES_USER
 49          value: postgres
 50        - name: POSTGRES_PASSWORD
 51          valueFrom:
 52            secretKeyRef:
 53              name: postgres-secret
 54              key: password
 55        - name: PGDATA
 56          value: /var/lib/postgresql/data/pgdata
 57
 58        ports:
 59        - containerPort: 5432
 60          name: postgres
 61
 62        resources:
 63          requests:
 64            cpu: 1
 65            memory: 2Gi
 66          limits:
 67            cpu: 4
 68            memory: 8Gi
 69
 70        volumeMounts:
 71        - name: data
 72          mountPath: /var/lib/postgresql/data
 73        - name: config
 74          mountPath: /etc/postgresql/postgresql.conf
 75          subPath: postgresql.conf
 76
 77      # Postgres Exporter for metrics
 78      - name: postgres-exporter
 79        image: prometheuscommunity/postgres-exporter:latest
 80        env:
 81        - name: DATA_SOURCE_NAME
 82          value: "postgresql://postgres:$(POSTGRES_PASSWORD)@localhost:5432/postgres?sslmode=disable"
 83        ports:
 84        - containerPort: 9187
 85          name: metrics
 86        resources:
 87          requests:
 88            cpu: 100m
 89            memory: 128Mi
 90
 91      volumes:
 92      - name: config
 93        configMap:
 94          name: postgres-config
 95
 96  volumeClaimTemplates:
 97  - metadata:
 98      name: data
 99    spec:
100      accessModes: ["ReadWriteOnce"]
101      storageClassName: gp3-encrypted
102      resources:
103        requests:
104          storage: 100Gi
105
106---
107# Headless service for StatefulSet
108apiVersion: v1
109kind: Service
110metadata:
111  name: postgres-replicas
112  namespace: databases
113spec:
114  clusterIP: None
115  selector:
116    app: postgres
117    role: replica
118  ports:
119  - port: 5432
120    name: postgres
121
122---
123# Regular service for read traffic (load balanced)
124apiVersion: v1
125kind: Service
126metadata:
127  name: postgres-read
128  namespace: databases
129  annotations:
130    prometheus.io/scrape: "true"
131    prometheus.io/port: "9187"
132spec:
133  type: ClusterIP
134  selector:
135    app: postgres
136    role: replica
137  ports:
138  - port: 5432
139    name: postgres
140
141---
142# HPA for read replicas based on custom metrics
143apiVersion: autoscaling/v2
144kind: HorizontalPodAutoscaler
145metadata:
146  name: postgres-replicas-hpa
147  namespace: databases
148spec:
149  scaleTargetRef:
150    apiVersion: apps/v1
151    kind: StatefulSet
152    name: postgres-replicas
153
154  minReplicas: 2   # Always have at least 1 replica + 1 primary
155  maxReplicas: 10  # Max read replicas
156
157  metrics:
158  # Scale based on active connections
159  - type: Pods
160    pods:
161      metric:
162        name: pg_stat_database_numbackends
163      target:
164        type: AverageValue
165        averageValue: "50"  # 50 connections per replica
166
167  # Scale based on replication lag
168  - type: Pods
169    pods:
170      metric:
171        name: pg_replication_lag_seconds
172      target:
173        type: AverageValue
174        averageValue: "5"  # Keep lag under 5 seconds
175
176  behavior:
177    scaleUp:
178      stabilizationWindowSeconds: 60  # Wait 1 min before scale-up
179      policies:
180      - type: Pods
181        value: 1                       # Add 1 replica at a time
182        periodSeconds: 60
183      selectPolicy: Min
184
185    scaleDown:
186      stabilizationWindowSeconds: 600  # Wait 10 min before scale-down
187      policies:
188      - type: Pods
189        value: 1                        # Remove 1 replica at a time
190        periodSeconds: 300              # Every 5 minutes
191      selectPolicy: Min
192
193---
194# PrometheusRule for PostgreSQL monitoring
195apiVersion: monitoring.coreos.com/v1
196kind: PrometheusRule
197metadata:
198  name: postgres-autoscaling-rules
199  namespace: monitoring
200spec:
201  groups:
202  - name: postgres-custom-metrics
203    interval: 15s
204    rules:
205    # Active connections per pod
206    - record: pg_stat_database_numbackends
207      expr: |
208        sum(pg_stat_database_numbackends{datname="postgres"}) by (pod, namespace)        
209
210    # Replication lag in seconds
211    - record: pg_replication_lag_seconds
212      expr: |
213        pg_replication_lag        
214
215  - name: postgres-alerts
216    rules:
217    # Alert when replicas are at max
218    - alert: PostgresReplicasMaxedOut
219      expr: |
220        (
221          kube_statefulset_status_replicas{statefulset="postgres-replicas"}
222          /
223          kube_statefulset_spec_replicas{statefulset="postgres-replicas"}
224        ) >= 0.9        
225      for: 10m
226      labels:
227        severity: warning
228      annotations:
229        summary: "PostgreSQL replicas near maximum capacity"
230        description: "Consider increasing maxReplicas or optimizing queries"
231
232    # Alert on high replication lag
233    - alert: PostgresHighReplicationLag
234      expr: pg_replication_lag_seconds > 30
235      for: 5m
236      labels:
237        severity: critical
238      annotations:
239        summary: "PostgreSQL replication lag is high"
240        description: "Replication lag is {{ $value }}s, may impact read consistency"

Pattern 1B: Redis Cache Cluster Autoscaling

  1# Redis Cluster with dynamic scaling
  2apiVersion: apps/v1
  3kind: StatefulSet
  4metadata:
  5  name: redis-cluster
  6  namespace: caching
  7spec:
  8  serviceName: redis-cluster
  9  replicas: 6  # 3 master + 3 replica
 10
 11  selector:
 12    matchLabels:
 13      app: redis-cluster
 14
 15  template:
 16    metadata:
 17      labels:
 18        app: redis-cluster
 19    spec:
 20      containers:
 21      - name: redis
 22        image: redis:7-alpine
 23        command:
 24        - redis-server
 25        args:
 26        - /conf/redis.conf
 27        - --cluster-enabled
 28        - "yes"
 29        - --cluster-config-file
 30        - /data/nodes.conf
 31        - --cluster-node-timeout
 32        - "5000"
 33        - --maxmemory
 34        - "2gb"
 35        - --maxmemory-policy
 36        - "allkeys-lru"
 37
 38        ports:
 39        - containerPort: 6379
 40          name: client
 41        - containerPort: 16379
 42          name: gossip
 43
 44        resources:
 45          requests:
 46            cpu: 500m
 47            memory: 2Gi
 48          limits:
 49            cpu: 2
 50            memory: 4Gi
 51
 52        volumeMounts:
 53        - name: data
 54          mountPath: /data
 55        - name: conf
 56          mountPath: /conf
 57
 58      # Redis Exporter sidecar
 59      - name: redis-exporter
 60        image: oliver006/redis_exporter:latest
 61        ports:
 62        - containerPort: 9121
 63          name: metrics
 64        resources:
 65          requests:
 66            cpu: 100m
 67            memory: 128Mi
 68
 69  volumeClaimTemplates:
 70  - metadata:
 71      name: data
 72    spec:
 73      accessModes: ["ReadWriteOnce"]
 74      resources:
 75        requests:
 76          storage: 50Gi
 77
 78---
 79# Custom Metrics based on Redis metrics
 80apiVersion: v1
 81kind: ConfigMap
 82metadata:
 83  name: prometheus-adapter-redis
 84  namespace: monitoring
 85data:
 86  config.yaml: |
 87    rules:
 88    # Redis memory usage percentage
 89    - seriesQuery: 'redis_memory_used_bytes'
 90      resources:
 91        overrides:
 92          namespace: {resource: "namespace"}
 93          pod: {resource: "pod"}
 94      name:
 95        as: "redis_memory_usage_percentage"
 96      metricsQuery: |
 97        (redis_memory_used_bytes / redis_memory_max_bytes) * 100
 98
 99    # Redis connected clients
100    - seriesQuery: 'redis_connected_clients'
101      resources:
102        overrides:
103          namespace: {resource: "namespace"}
104          pod: {resource: "pod"}
105      name:
106        as: "redis_clients_per_pod"
107      metricsQuery: |
108        sum(redis_connected_clients) by (pod, namespace)
109
110    # Redis operations per second
111    - seriesQuery: 'redis_instantaneous_ops_per_sec'
112      resources:
113        overrides:
114          namespace: {resource: "namespace"}
115          pod: {resource: "pod"}
116      name:
117        as: "redis_ops_per_second"
118      metricsQuery: |
119        sum(rate(redis_commands_total[2m])) by (pod, namespace)    
120
121---
122# HPA for Redis based on memory and ops
123apiVersion: autoscaling/v2
124kind: HorizontalPodAutoscaler
125metadata:
126  name: redis-cluster-hpa
127  namespace: caching
128spec:
129  scaleTargetRef:
130    apiVersion: apps/v1
131    kind: StatefulSet
132    name: redis-cluster
133
134  minReplicas: 6   # Minimum cluster size (3 master + 3 replica)
135  maxReplicas: 18  # Max 9 master + 9 replica
136
137  metrics:
138  # Memory usage
139  - type: Pods
140    pods:
141      metric:
142        name: redis_memory_usage_percentage
143      target:
144        type: AverageValue
145        averageValue: "75"  # Scale when memory > 75%
146
147  # Operations per second
148  - type: Pods
149    pods:
150      metric:
151        name: redis_ops_per_second
152      target:
153        type: AverageValue
154        averageValue: "10000"  # Scale at 10k ops/sec per pod
155
156  behavior:
157    scaleUp:
158      stabilizationWindowSeconds: 120
159      policies:
160      - type: Pods
161        value: 2  # Add 2 pods at a time (1 master + 1 replica)
162        periodSeconds: 120
163
164    scaleDown:
165      stabilizationWindowSeconds: 600
166      policies:
167      - type: Pods
168        value: 2  # Remove 2 pods at a time
169        periodSeconds: 300

Key Considerations for Stateful Autoscaling

Data Synchronization Time: Account for data replication delays
Ordered Scaling: StatefulSets scale sequentially, slower than Deployments
Storage Management: Implement PVC cleanup policies
State Warmup: Consider warm-up time for caches/databases
Split Read/Write: Scale read replicas independently from write nodes

Pattern 2: Multi-Cluster & Multi-Region Autoscaling

Architecture Overview

┌──────────────────────────────────────────────────────────────────────┐
│              MULTI-CLUSTER AUTOSCALING ARCHITECTURE                 │
│                                                                      │
│  ┌────────────────┐     ┌────────────────┐     ┌────────────────┐  │
│  │   REGION 1     │     │   REGION 2     │     │   REGION 3     │  │
│  │   (US-EAST)    │     │   (EU-WEST)    │     │   (AP-SOUTH)   │  │
│  │                │     │                │     │                │  │
│  │  EKS Cluster 1 │     │  EKS Cluster 2 │     │  EKS Cluster 3 │  │
│  │  • HPA         │     │  • HPA         │     │  • HPA         │  │
│  │  • Karpenter   │     │  • Karpenter   │     │  • Karpenter   │  │
│  │  • Local LB    │     │  • Local LB    │     │  • Local LB    │  │
│  └───────┬────────┘     └───────┬────────┘     └───────┬────────┘  │
│          │                      │                      │            │
│          └──────────────────────┴──────────────────────┘            │
│                               ↓                                     │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │               GLOBAL LOAD BALANCER                             │ │
│  │                                                                 │ │
│  │  • Route 53 / CloudFlare / Global Accelerator                 │ │
│  │  • Geographic routing                                          │ │
│  │  • Latency-based routing                                       │ │
│  │  • Weighted routing (for gradual shifts)                      │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                               ↓                                     │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │            CENTRALIZED AUTOSCALING CONTROLLER                  │ │
│  │                                                                 │ │
│  │  • Aggregate metrics from all clusters                         │ │
│  │  • Intelligent workload distribution                           │ │
│  │  • Cost-aware cluster selection                               │ │
│  │  • Capacity prediction                                         │ │
│  └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘

Pattern 2A: Federated HPA with Cluster API

  1# Install Cluster API
  2---
  3apiVersion: v1
  4kind: Namespace
  5metadata:
  6  name: cluster-api-system
  7
  8---
  9# Management cluster setup
 10apiVersion: cluster.x-k8s.io/v1beta1
 11kind: Cluster
 12metadata:
 13  name: workload-cluster-us-east
 14  namespace: default
 15spec:
 16  clusterNetwork:
 17    pods:
 18      cidrBlocks: ["192.168.0.0/16"]
 19  infrastructureRef:
 20    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
 21    kind: AWSCluster
 22    name: workload-cluster-us-east
 23  controlPlaneRef:
 24    kind: KubeadmControlPlane
 25    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
 26    name: workload-cluster-us-east-control-plane
 27
 28---
 29# Multi-cluster autoscaling with KubeFed
 30apiVersion: types.kubefed.io/v1beta1
 31kind: FederatedHorizontalPodAutoscaler
 32metadata:
 33  name: federated-app-hpa
 34  namespace: default
 35spec:
 36  # Target deployment across clusters
 37  placement:
 38    clusters:
 39    - name: us-east-1-cluster
 40      weight: 40
 41    - name: eu-west-1-cluster
 42      weight: 30
 43    - name: ap-south-1-cluster
 44      weight: 30
 45
 46  template:
 47    spec:
 48      scaleTargetRef:
 49        apiVersion: apps/v1
 50        kind: Deployment
 51        name: my-app
 52
 53      minReplicas: 3  # Per cluster minimum
 54      maxReplicas: 20 # Per cluster maximum
 55
 56      metrics:
 57      - type: Resource
 58        resource:
 59          name: cpu
 60          target:
 61            type: Utilization
 62            averageUtilization: 70
 63
 64  # Override for specific clusters
 65  overrides:
 66  - clusterName: us-east-1-cluster
 67    clusterOverrides:
 68    - path: "/spec/minReplicas"
 69      value: 5  # Higher baseline in primary region
 70    - path: "/spec/maxReplicas"
 71      value: 50
 72
 73---
 74# Federated deployment
 75apiVersion: types.kubefed.io/v1beta1
 76kind: FederatedDeployment
 77metadata:
 78  name: my-app
 79  namespace: default
 80spec:
 81  placement:
 82    clusters:
 83    - name: us-east-1-cluster
 84    - name: eu-west-1-cluster
 85    - name: ap-south-1-cluster
 86
 87  template:
 88    spec:
 89      replicas: 5
 90      selector:
 91        matchLabels:
 92          app: my-app
 93      template:
 94        metadata:
 95          labels:
 96            app: my-app
 97        spec:
 98          containers:
 99          - name: app
100            image: myapp:v1.0
101            resources:
102              requests:
103                cpu: 500m
104                memory: 512Mi

Pattern 2B: Custom Multi-Cluster Autoscaler

  1// Custom multi-cluster autoscaling controller
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "time"
  8
  9    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 10    "k8s.io/client-go/kubernetes"
 11    "k8s.io/client-go/tools/clientcmd"
 12)
 13
 14type ClusterConfig struct {
 15    Name           string
 16    KubeConfig     string
 17    Region         string
 18    CostPerCPUHour float64
 19    Latency        time.Duration
 20}
 21
 22type MultiClusterAutoscaler struct {
 23    clusters map[string]*kubernetes.Clientset
 24    configs  []ClusterConfig
 25}
 26
 27func NewMultiClusterAutoscaler(configs []ClusterConfig) (*MultiClusterAutoscaler, error) {
 28    mca := &MultiClusterAutoscaler{
 29        clusters: make(map[string]*kubernetes.Clientset),
 30        configs:  configs,
 31    }
 32
 33    // Initialize clients for each cluster
 34    for _, config := range configs {
 35        clientConfig, err := clientcmd.BuildConfigFromFlags("", config.KubeConfig)
 36        if err != nil {
 37            return nil, err
 38        }
 39
 40        clientset, err := kubernetes.NewForConfig(clientConfig)
 41        if err != nil {
 42            return nil, err
 43        }
 44
 45        mca.clusters[config.Name] = clientset
 46    }
 47
 48    return mca, nil
 49}
 50
 51// Decision algorithm: cost-aware + latency-aware scaling
 52func (mca *MultiClusterAutoscaler) ScaleDecision(
 53    ctx context.Context,
 54    totalReplicas int,
 55    userRegion string,
 56) (map[string]int, error) {
 57
 58    allocation := make(map[string]int)
 59
 60    // Step 1: Get current capacity in each cluster
 61    capacities := make(map[string]int)
 62    for name, client := range mca.clusters {
 63        nodes, err := client.CoreV1().Nodes().List(ctx, metav1.ListOptions{})
 64        if err != nil {
 65            return nil, err
 66        }
 67
 68        // Calculate available capacity
 69        var availableCPU int64
 70        for _, node := range nodes.Items {
 71            availableCPU += node.Status.Allocatable.Cpu().MilliValue()
 72        }
 73        capacities[name] = int(availableCPU / 500) // Assume 500m per pod
 74    }
 75
 76    // Step 2: Cost-aware allocation
 77    // Prioritize cheapest region first
 78    sortedConfigs := sortByCost(mca.configs)
 79
 80    remaining := totalReplicas
 81    for _, config := range sortedConfigs {
 82        available := capacities[config.Name]
 83
 84        // Allocate up to available capacity
 85        allocated := min(remaining, available)
 86        allocation[config.Name] = allocated
 87        remaining -= allocated
 88
 89        if remaining == 0 {
 90            break
 91        }
 92    }
 93
 94    // Step 3: Latency-aware adjustment
 95    // If user is in specific region, ensure minimum local replicas
 96    if userRegion != "" {
 97        minLocal := max(3, totalReplicas/10) // At least 10% or 3 replicas
 98        if allocation[userRegion] < minLocal {
 99            allocation[userRegion] = minLocal
100        }
101    }
102
103    return allocation, nil
104}
105
106// Apply scaling decisions to clusters
107func (mca *MultiClusterAutoscaler) ApplyScaling(
108    ctx context.Context,
109    allocation map[string]int,
110    deployment string,
111    namespace string,
112) error {
113
114    for clusterName, replicas := range allocation {
115        client := mca.clusters[clusterName]
116
117        // Update deployment replica count
118        scale, err := client.AppsV1().Deployments(namespace).
119            GetScale(ctx, deployment, metav1.GetOptions{})
120        if err != nil {
121            return fmt.Errorf("failed to get scale for %s in %s: %v",
122                deployment, clusterName, err)
123        }
124
125        scale.Spec.Replicas = int32(replicas)
126
127        _, err = client.AppsV1().Deployments(namespace).
128            UpdateScale(ctx, deployment, scale, metav1.UpdateOptions{})
129        if err != nil {
130            return fmt.Errorf("failed to update scale for %s in %s: %v",
131                deployment, clusterName, err)
132        }
133
134        fmt.Printf("Scaled %s in %s to %d replicas\n",
135            deployment, clusterName, replicas)
136    }
137
138    return nil
139}
140
141func main() {
142    configs := []ClusterConfig{
143        {
144            Name:           "us-east-1",
145            KubeConfig:     "/home/user/.kube/us-east-1",
146            Region:         "us-east-1",
147            CostPerCPUHour: 0.04,
148            Latency:        50 * time.Millisecond,
149        },
150        {
151            Name:           "eu-west-1",
152            KubeConfig:     "/home/user/.kube/eu-west-1",
153            Region:         "eu-west-1",
154            CostPerCPUHour: 0.045,
155            Latency:        100 * time.Millisecond,
156        },
157        {
158            Name:           "ap-south-1",
159            KubeConfig:     "/home/user/.kube/ap-south-1",
160            Region:         "ap-south-1",
161            CostPerCPUHour: 0.038, // Cheapest
162            Latency:        150 * time.Millisecond,
163        },
164    }
165
166    autoscaler, err := NewMultiClusterAutoscaler(configs)
167    if err != nil {
168        panic(err)
169    }
170
171    ctx := context.Background()
172
173    // Main reconciliation loop
174    ticker := time.NewTicker(30 * time.Second)
175    defer ticker.Stop()
176
177    for range ticker.C {
178        // Get total desired replicas from global metrics
179        totalReplicas := calculateGlobalReplicas()
180
181        // Determine optimal allocation
182        allocation, err := autoscaler.ScaleDecision(
183            ctx,
184            totalReplicas,
185            "us-east-1", // Primary user region
186        )
187        if err != nil {
188            fmt.Printf("Error in scale decision: %v\n", err)
189            continue
190        }
191
192        // Apply scaling
193        err = autoscaler.ApplyScaling(
194            ctx,
195            allocation,
196            "my-app",
197            "production",
198        )
199        if err != nil {
200            fmt.Printf("Error applying scaling: %v\n", err)
201        }
202    }
203}
204
205func calculateGlobalReplicas() int {
206    // Aggregate metrics from all clusters
207    // Calculate desired total replicas
208    // This would query Prometheus/Thanos for global metrics
209    return 50 // Placeholder
210}
211
212func sortByCost(configs []ClusterConfig) []ClusterConfig {
213    // Sort by cost (cheapest first)
214    sorted := make([]ClusterConfig, len(configs))
215    copy(sorted, configs)
216    // ... sorting logic
217    return sorted
218}
219
220func min(a, b int) int {
221    if a < b {
222        return a
223    }
224    return b
225}
226
227func max(a, b int) int {
228    if a > b {
229        return a
230    }
231    return b
232}

Pattern 2C: Global Metrics Aggregation with Thanos

 1# Thanos setup for multi-cluster metrics
 2---
 3# Thanos Sidecar on each cluster's Prometheus
 4apiVersion: apps/v1
 5kind: StatefulSet
 6metadata:
 7  name: prometheus
 8  namespace: monitoring
 9spec:
10  template:
11    spec:
12      containers:
13      # Prometheus
14      - name: prometheus
15        image: prom/prometheus:latest
16        args:
17        - --storage.tsdb.path=/prometheus
18        - --storage.tsdb.min-block-duration=2h
19        - --storage.tsdb.max-block-duration=2h
20        volumeMounts:
21        - name: storage
22          mountPath: /prometheus
23
24      # Thanos Sidecar
25      - name: thanos-sidecar
26        image: thanosio/thanos:latest
27        args:
28        - sidecar
29        - --prometheus.url=http://localhost:9090
30        - --tsdb.path=/prometheus
31        - --objstore.config-file=/etc/thanos/objstore.yaml
32        - --grpc-address=0.0.0.0:10901
33        volumeMounts:
34        - name: storage
35          mountPath: /prometheus
36        - name: objstore-config
37          mountPath: /etc/thanos
38        ports:
39        - containerPort: 10901
40          name: grpc
41
42---
43# Thanos Query (global query layer)
44apiVersion: apps/v1
45kind: Deployment
46metadata:
47  name: thanos-query
48  namespace: monitoring
49spec:
50  replicas: 2
51  template:
52    spec:
53      containers:
54      - name: thanos-query
55        image: thanosio/thanos:latest
56        args:
57        - query
58        - --http-address=0.0.0.0:9090
59        - --grpc-address=0.0.0.0:10901
60        # Connect to all cluster Prometheus instances
61        - --store=prometheus-us-east-1.monitoring.svc.cluster.local:10901
62        - --store=prometheus-eu-west-1.monitoring.svc.cluster.local:10901
63        - --store=prometheus-ap-south-1.monitoring.svc.cluster.local:10901
64        - --query.replica-label=replica
65        ports:
66        - containerPort: 9090
67          name: http
68        - containerPort: 10901
69          name: grpc
70
71---
72# Global HPA using Thanos metrics
73apiVersion: v1
74kind: ConfigMap
75metadata:
76  name: prometheus-adapter-thanos
77  namespace: monitoring
78data:
79  config.yaml: |
80    rules:
81    # Global request rate across all clusters
82    - seriesQuery: 'http_requests_total{job="my-app"}'
83      resources:
84        template: <<.Resource>>
85      name:
86        as: "global_requests_per_second"
87      metricsQuery: |
88        sum(rate(http_requests_total{job="my-app"}[2m]))
89
90    # Global CPU usage
91    - seriesQuery: 'container_cpu_usage_seconds_total{pod=~"my-app.*"}'
92      resources:
93        overrides:
94          namespace: {resource: "namespace"}
95      name:
96        as: "global_cpu_usage"
97      metricsQuery: |
98        sum(rate(container_cpu_usage_seconds_total{pod=~"my-app.*"}[5m]))

Pattern 3: Aggressive Cost Optimization

Spot Instance Strategy with Multiple Fallbacks

  1# Karpenter NodePool with spot + on-demand mix
  2---
  3apiVersion: karpenter.sh/v1beta1
  4kind: NodePool
  5metadata:
  6  name: cost-optimized-spot
  7spec:
  8  template:
  9    metadata:
 10      labels:
 11        workload-type: spot-eligible
 12        cost-optimized: "true"
 13    spec:
 14      requirements:
 15      # Maximize spot instance types for availability
 16      - key: karpenter.sh/capacity-type
 17        operator: In
 18        values: ["spot"]
 19
 20      # Allow wide range of instance types
 21      - key: karpenter.k8s.aws/instance-category
 22        operator: In
 23        values: ["c", "m", "r", "t"]  # Compute, general, memory, burstable
 24
 25      - key: karpenter.k8s.aws/instance-generation
 26        operator: Gt
 27        values: ["4"]  # Generation 5+
 28
 29      # Size flexibility
 30      - key: karpenter.k8s.aws/instance-size
 31        operator: In
 32        values: ["large", "xlarge", "2xlarge", "4xlarge"]
 33
 34      nodeClassRef:
 35        name: cost-optimized
 36
 37  # Aggressive consolidation
 38  disruption:
 39    consolidationPolicy: WhenUnderutilized
 40    consolidateAfter: 30s
 41    expireAfter: 12h  # Refresh nodes every 12 hours
 42
 43  limits:
 44    cpu: "500"
 45    memory: 1000Gi
 46
 47---
 48# On-demand fallback NodePool
 49apiVersion: karpenter.sh/v1beta1
 50kind: NodePool
 51metadata:
 52  name: on-demand-fallback
 53spec:
 54  template:
 55    metadata:
 56      labels:
 57        workload-type: on-demand-fallback
 58    spec:
 59      requirements:
 60      - key: karpenter.sh/capacity-type
 61        operator: In
 62        values: ["on-demand"]
 63
 64      - key: karpenter.k8s.aws/instance-category
 65        operator: In
 66        values: ["m", "c"]
 67
 68      nodeClassRef:
 69        name: cost-optimized
 70
 71  weight: 10  # Lower priority, used when spot unavailable
 72
 73  limits:
 74    cpu: "200"
 75
 76---
 77# Application deployment with spot tolerance
 78apiVersion: apps/v1
 79kind: Deployment
 80metadata:
 81  name: cost-sensitive-app
 82  namespace: production
 83spec:
 84  replicas: 10
 85  template:
 86    spec:
 87      # Prefer spot nodes
 88      affinity:
 89        nodeAffinity:
 90          preferredDuringSchedulingIgnoredDuringExecution:
 91          - weight: 100
 92            preference:
 93              matchExpressions:
 94              - key: karpenter.sh/capacity-type
 95                operator: In
 96                values: ["spot"]
 97
 98          # Fallback to on-demand if needed
 99          - weight: 50
100            preference:
101              matchExpressions:
102              - key: workload-type
103                operator: In
104                values: ["on-demand-fallback"]
105
106      # Tolerate spot interruptions
107      tolerations:
108      - key: karpenter.sh/disruption
109        operator: Exists
110        effect: NoSchedule
111
112      # Topology spread for availability
113      topologySpreadConstraints:
114      - maxSkew: 1
115        topologyKey: topology.kubernetes.io/zone
116        whenUnsatisfiable: DoNotSchedule
117        labelSelector:
118          matchLabels:
119            app: cost-sensitive-app
120
121      containers:
122      - name: app
123        image: myapp:v1.0
124        resources:
125          requests:
126            cpu: 500m
127            memory: 512Mi
128
129---
130# PDB to handle spot interruptions gracefully
131apiVersion: policy/v1
132kind: PodDisruptionBudget
133metadata:
134  name: cost-sensitive-app-pdb
135  namespace: production
136spec:
137  minAvailable: 70%  # Keep 70% pods running during spot interruptions
138  selector:
139    matchLabels:
140      app: cost-sensitive-app

Cost-Aware Scheduling with Custom Scheduler

 1// Custom scheduler plugin for cost-aware pod placement
 2package main
 3
 4import (
 5    "context"
 6    "fmt"
 7
 8    v1 "k8s.io/api/core/v1"
 9    "k8s.io/apimachinery/pkg/runtime"
10    "k8s.io/kubernetes/pkg/scheduler/framework"
11)
12
13type CostAwarePlugin struct {
14    handle framework.Handle
15}
16
17var _ framework.ScorePlugin = &CostAwarePlugin{}
18
19// Pricing data (could be fetched from external API)
20var instancePricing = map[string]float64{
21    "t3.large":     0.0832,
22    "m5.large":     0.096,
23    "c5.large":     0.085,
24    "m5.xlarge":    0.192,
25    "c5.xlarge":    0.17,
26    "r5.large":     0.126,
27    "spot-t3.large": 0.0250,  // ~70% savings
28    "spot-m5.large": 0.0288,
29    "spot-c5.large": 0.0255,
30}
31
32func (c *CostAwarePlugin) Name() string {
33    return "CostAwarePlugin"
34}
35
36// Score nodes based on cost
37func (c *CostAwarePlugin) Score(
38    ctx context.Context,
39    state *framework.CycleState,
40    pod *v1.Pod,
41    nodeName string,
42) (int64, *framework.Status) {
43
44    nodeInfo, err := c.handle.SnapshotSharedLister().NodeInfos().Get(nodeName)
45    if err != nil {
46        return 0, framework.NewStatus(framework.Error, fmt.Sprintf("getting node %q: %v", nodeName, err))
47    }
48
49    node := nodeInfo.Node()
50
51    // Get instance type from node labels
52    instanceType := node.Labels["node.kubernetes.io/instance-type"]
53    capacityType := node.Labels["karpenter.sh/capacity-type"]
54
55    // Determine pricing key
56    pricingKey := instanceType
57    if capacityType == "spot" {
58        pricingKey = "spot-" + instanceType
59    }
60
61    // Get cost
62    cost, exists := instancePricing[pricingKey]
63    if !exists {
64        cost = 0.1 // Default cost if unknown
65    }
66
67    // Convert to score (lower cost = higher score)
68    // Normalize: max price 0.2, min price 0.02
69    // Score range: 0-100
70    normalizedCost := (cost - 0.02) / (0.2 - 0.02)
71    score := int64((1 - normalizedCost) * 100)
72
73    // Bonus for spot instances
74    if capacityType == "spot" {
75        score += 20
76    }
77
78    return score, framework.NewStatus(framework.Success)
79}
80
81// ScoreExtensions of the Score plugin
82func (c *CostAwarePlugin) ScoreExtensions() framework.ScoreExtensions {
83    return c
84}
85
86// NormalizeScore is called after scoring all nodes
87func (c *CostAwarePlugin) NormalizeScore(
88    ctx context.Context,
89    state *framework.CycleState,
90    pod *v1.Pod,
91    scores framework.NodeScoreList,
92) *framework.Status {
93    // Scores are already normalized in Score()
94    return framework.NewStatus(framework.Success)
95}
96
97func New(_ runtime.Object, h framework.Handle) (framework.Plugin, error) {
98    return &CostAwarePlugin{handle: h}, nil
99}

FinOps Dashboard and Automation

  1# CronJob for daily cost optimization report
  2---
  3apiVersion: batch/v1
  4kind: CronJob
  5metadata:
  6  name: cost-optimization-report
  7  namespace: finops
  8spec:
  9  schedule: "0 9 * * *"  # Daily at 9 AM
 10  jobTemplate:
 11    spec:
 12      template:
 13        spec:
 14          serviceAccountName: finops-reporter
 15          containers:
 16          - name: reporter
 17            image: finops-reporter:latest
 18            env:
 19            - name: PROMETHEUS_URL
 20              value: "http://prometheus.monitoring:9090"
 21            - name: SLACK_WEBHOOK
 22              valueFrom:
 23                secretKeyRef:
 24                  name: slack-webhook
 25                  key: url
 26            command:
 27            - /bin/bash
 28            - -c
 29            - |
 30              #!/bin/bash
 31
 32              echo "=== Daily Cost Optimization Report ==="
 33              echo ""
 34
 35              # Calculate total cluster cost
 36              TOTAL_CPU=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=sum(kube_pod_container_resource_requests{resource='cpu'})" | jq -r '.data.result[0].value[1]')
 37              TOTAL_MEM=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=sum(kube_pod_container_resource_requests{resource='memory'})" | jq -r '.data.result[0].value[1]')
 38
 39              CPU_COST=$(echo "$TOTAL_CPU * 0.04 * 24" | bc)
 40              MEM_COST=$(echo "$TOTAL_MEM / 1073741824 * 0.005 * 24" | bc)
 41              DAILY_COST=$(echo "$CPU_COST + $MEM_COST" | bc)
 42
 43              echo "Daily Cost: \$${DAILY_COST}"
 44              echo ""
 45
 46              # Identify optimization opportunities
 47              echo "=== Optimization Opportunities ==="
 48
 49              # Over-provisioned workloads (VPA recommendations)
 50              curl -s "$PROMETHEUS_URL/api/v1/query?query=(kube_pod_container_resource_requests{resource='cpu'} - on(pod) kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource='cpu'}) / kube_pod_container_resource_requests{resource='cpu'} > 0.5" \
 51                | jq -r '.data.result[] | "\(.metric.namespace)/\(.metric.pod): \(.value[1] * 100)% over-provisioned"'
 52
 53              # Spot instance opportunities
 54              ONDEMAND_COUNT=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=count(kube_node_labels{label_karpenter_sh_capacity_type='on-demand'})" | jq -r '.data.result[0].value[1]')
 55              echo ""
 56              echo "On-demand nodes: $ONDEMAND_COUNT (Consider spot instances for 70% savings)"
 57
 58              # Send to Slack
 59              curl -X POST $SLACK_WEBHOOK \
 60                -H 'Content-Type: application/json' \
 61                -d "{\"text\": \"Daily Cost Report: \\\$${DAILY_COST}\"}"              
 62
 63          restartPolicy: OnFailure
 64
 65---
 66# PrometheusRule for cost alerts
 67apiVersion: monitoring.coreos.com/v1
 68kind: PrometheusRule
 69metadata:
 70  name: cost-alerts
 71  namespace: monitoring
 72spec:
 73  groups:
 74  - name: cost-optimization
 75    interval: 1h
 76    rules:
 77    # Alert when daily cost exceeds budget
 78    - alert: DailyCostExceedsBudget
 79      expr: |
 80        (
 81          sum(kube_pod_container_resource_requests{resource="cpu"}) * 0.04 +
 82          sum(kube_pod_container_resource_requests{resource="memory"}) / 1073741824 * 0.005
 83        ) * 24 > 1000        
 84      labels:
 85        severity: warning
 86        team: finops
 87      annotations:
 88        summary: "Daily infrastructure cost exceeds $1000"
 89        description: "Current daily cost: ${{ $value }}"
 90
 91    # Alert on underutilized nodes
 92    - alert: UnderutilizedNodes
 93      expr: |
 94        (
 95          sum(kube_node_status_allocatable{resource="cpu"}) -
 96          sum(kube_pod_container_resource_requests{resource="cpu"})
 97        ) / sum(kube_node_status_allocatable{resource="cpu"}) > 0.5        
 98      for: 2h
 99      labels:
100        severity: info
101        team: platform
102      annotations:
103        summary: "Cluster has >50% unused CPU capacity"
104        description: "Consider scaling down or consolidating workloads"
105
106    # Spot savings opportunity
107    - alert: SpotSavingsOpportunity
108      expr: |
109        count(kube_node_labels{label_karpenter_sh_capacity_type="on-demand"})
110        /
111        count(kube_node_labels)
112        > 0.3        
113      for: 4h
114      labels:
115        severity: info
116        team: finops
117      annotations:
118        summary: ">30% on-demand nodes detected"
119        description: "Evaluate workloads for spot instance eligibility (70% potential savings)"

Pattern 4: Batch Job & Queue-Based Autoscaling

Pattern 4A: Kubernetes Job Autoscaling with KEDA

  1# KEDA ScaledJob for queue-driven batch processing
  2---
  3apiVersion: v1
  4kind: Secret
  5metadata:
  6  name: aws-sqs-credentials
  7  namespace: batch-processing
  8type: Opaque
  9stringData:
 10  AWS_ACCESS_KEY_ID: "AKIAIOSFODNN7EXAMPLE"
 11  AWS_SECRET_ACCESS_KEY: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
 12
 13---
 14apiVersion: keda.sh/v1alpha1
 15kind: TriggerAuthentication
 16metadata:
 17  name: aws-sqs-auth
 18  namespace: batch-processing
 19spec:
 20  secretTargetRef:
 21  - parameter: awsAccessKeyID
 22    name: aws-sqs-credentials
 23    key: AWS_ACCESS_KEY_ID
 24  - parameter: awsSecretAccessKey
 25    name: aws-sqs-credentials
 26    key: AWS_SECRET_ACCESS_KEY
 27
 28---
 29# ScaledJob (not Deployment) for batch processing
 30apiVersion: keda.sh/v1alpha1
 31kind: ScaledJob
 32metadata:
 33  name: image-processing-job
 34  namespace: batch-processing
 35spec:
 36  # Job template
 37  jobTargetRef:
 38    template:
 39      spec:
 40        containers:
 41        - name: processor
 42          image: image-processor:v1.0
 43          env:
 44          - name: SQS_QUEUE_URL
 45            value: "https://sqs.us-west-2.amazonaws.com/123456789/image-queue"
 46          - name: AWS_REGION
 47            value: "us-west-2"
 48          resources:
 49            requests:
 50              cpu: 2
 51              memory: 4Gi
 52            limits:
 53              cpu: 4
 54              memory: 8Gi
 55        restartPolicy: OnFailure
 56
 57  # Polling interval
 58  pollingInterval: 10  # Check queue every 10 seconds
 59
 60  # Cooldown period
 61  cooldownPeriod: 60   # Wait 60s after last trigger before scaling down
 62
 63  # Max replicas
 64  maxReplicaCount: 100
 65
 66  # Successful job retention
 67  successfulJobsHistoryLimit: 5
 68  failedJobsHistoryLimit: 5
 69
 70  # Scaling strategy
 71  scalingStrategy:
 72    strategy: "accurate"  # Create jobs based on queue length
 73    # "default" = one job per event
 74    # "custom" = custom logic
 75    # "accurate" = jobs = queue length / messages per job
 76
 77  triggers:
 78  - type: aws-sqs-queue
 79    authenticationRef:
 80      name: aws-sqs-auth
 81    metadata:
 82      queueURL: "https://sqs.us-west-2.amazonaws.com/123456789/image-queue"
 83      queueLength: "5"     # Process 5 messages per job
 84      awsRegion: "us-west-2"
 85      identityOwner: "operator"
 86
 87---
 88# Alternative: Kafka-based job scaling
 89apiVersion: keda.sh/v1alpha1
 90kind: ScaledJob
 91metadata:
 92  name: kafka-consumer-job
 93  namespace: batch-processing
 94spec:
 95  jobTargetRef:
 96    template:
 97      spec:
 98        containers:
 99        - name: consumer
100          image: kafka-consumer:v1.0
101          env:
102          - name: KAFKA_BROKERS
103            value: "kafka:9092"
104          - name: KAFKA_TOPIC
105            value: "events"
106          - name: KAFKA_CONSUMER_GROUP
107            value: "batch-processors"
108        restartPolicy: OnFailure
109
110  pollingInterval: 15
111  maxReplicaCount: 50
112
113  triggers:
114  - type: kafka
115    metadata:
116      bootstrapServers: "kafka:9092"
117      consumerGroup: "batch-processors"
118      topic: "events"
119      lagThreshold: "100"  # Create job when lag > 100 messages
120      offsetResetPolicy: "latest"

Pattern 4B: ML Training Job Autoscaling with Volcano

  1# Install Volcano scheduler
  2---
  3apiVersion: v1
  4kind: Namespace
  5metadata:
  6  name: volcano-system
  7
  8---
  9# Volcano scheduler deployment
 10# (Use official Volcano installation)
 11
 12---
 13# ML Training job with gang scheduling
 14apiVersion: batch.volcano.sh/v1alpha1
 15kind: Job
 16metadata:
 17  name: distributed-training
 18  namespace: ml-training
 19spec:
 20  # Minimum pods required to start job
 21  minAvailable: 4  # 1 master + 3 workers minimum
 22
 23  schedulerName: volcano
 24
 25  # Queue for resource management
 26  queue: ml-training-queue
 27
 28  # Plugins
 29  plugins:
 30    ssh: []        # Enable SSH between pods
 31    svc: []        # Create service for pod communication
 32    env: []        # Environment variable injection
 33
 34  # Policies
 35  policies:
 36  - event: PodEvicted
 37    action: RestartJob
 38  - event: PodFailed
 39    action: RestartJob
 40
 41  # Task groups
 42  tasks:
 43  # Master task
 44  - name: master
 45    replicas: 1
 46    template:
 47      spec:
 48        containers:
 49        - name: tensorflow
 50          image: tensorflow/tensorflow:latest-gpu
 51          command:
 52          - python
 53          - train.py
 54          - --role=master
 55          resources:
 56            requests:
 57              cpu: 4
 58              memory: 16Gi
 59              nvidia.com/gpu: 1
 60            limits:
 61              cpu: 8
 62              memory: 32Gi
 63              nvidia.com/gpu: 1
 64
 65  # Worker tasks (auto-scalable)
 66  - name: worker
 67    replicas: 3
 68    minAvailable: 1  # At least 1 worker
 69    template:
 70      spec:
 71        containers:
 72        - name: tensorflow
 73          image: tensorflow/tensorflow:latest-gpu
 74          command:
 75          - python
 76          - train.py
 77          - --role=worker
 78          resources:
 79            requests:
 80              cpu: 8
 81              memory: 32Gi
 82              nvidia.com/gpu: 2
 83            limits:
 84              cpu: 16
 85              memory: 64Gi
 86              nvidia.com/gpu: 2
 87
 88  # Parameter server tasks
 89  - name: ps
 90    replicas: 2
 91    template:
 92      spec:
 93        containers:
 94        - name: tensorflow
 95          image: tensorflow/tensorflow:latest
 96          command:
 97          - python
 98          - train.py
 99          - --role=ps
100          resources:
101            requests:
102              cpu: 2
103              memory: 8Gi
104            limits:
105              cpu: 4
106              memory: 16Gi
107
108---
109# Queue with capacity limits
110apiVersion: scheduling.volcano.sh/v1beta1
111kind: Queue
112metadata:
113  name: ml-training-queue
114spec:
115  weight: 1
116  capability:
117    cpu: "100"
118    memory: "500Gi"
119    nvidia.com/gpu: "20"
120
121---
122# HPA for worker pods (scale workers based on GPU utilization)
123apiVersion: autoscaling/v2
124kind: HorizontalPodAutoscaler
125metadata:
126  name: training-workers-hpa
127  namespace: ml-training
128spec:
129  scaleTargetRef:
130    apiVersion: batch.volcano.sh/v1alpha1
131    kind: Job
132    name: distributed-training
133
134  minReplicas: 3
135  maxReplicas: 20
136
137  metrics:
138  # GPU utilization
139  - type: Pods
140    pods:
141      metric:
142        name: DCGM_FI_DEV_GPU_UTIL
143      target:
144        type: AverageValue
145        averageValue: "80"  # Target 80% GPU utilization
146
147  # Training throughput
148  - type: Pods
149    pods:
150      metric:
151        name: training_samples_per_second
152      target:
153        type: AverageValue
154        averageValue: "1000"

Pattern 4C: Scheduled Autoscaling (Predictive)

 1# CronHPA for scheduled scaling
 2---
 3apiVersion: autoscaling.alibabacloud.com/v1beta1
 4kind: CronHorizontalPodAutoscaler
 5metadata:
 6  name: business-hours-scaling
 7  namespace: production
 8spec:
 9  scaleTargetRef:
10    apiVersion: apps/v1
11    kind: Deployment
12    name: api-server
13
14  # Business hours scaling schedule
15  jobs:
16  # Scale up for morning traffic (8 AM)
17  - name: morning-scale-up
18    schedule: "0 8 * * 1-5"  # Weekdays at 8 AM
19    targetSize: 20
20
21  # Scale up for lunch traffic (12 PM)
22  - name: lunch-scale-up
23    schedule: "0 12 * * 1-5"
24    targetSize: 30
25
26  # Scale down for evening (6 PM)
27  - name: evening-scale-down
28    schedule: "0 18 * * 1-5"
29    targetSize: 15
30
31  # Scale down for night (10 PM)
32  - name: night-scale-down
33    schedule: "0 22 * * *"
34    targetSize: 5
35
36  # Weekend minimal scaling
37  - name: weekend-minimal
38    schedule: "0 0 * * 0,6"  # Midnight on Sat/Sun
39    targetSize: 3
40
41---
42# Alternative: Using native CronJob + kubectl scale
43apiVersion: batch/v1
44kind: CronJob
45metadata:
46  name: morning-scale-up
47  namespace: production
48spec:
49  schedule: "0 8 * * 1-5"
50  jobTemplate:
51    spec:
52      template:
53        spec:
54          serviceAccountName: autoscaler
55          containers:
56          - name: kubectl
57            image: bitnami/kubectl:latest
58            command:
59            - kubectl
60            - scale
61            - deployment/api-server
62            - --replicas=20
63            - -n
64            - production
65          restartPolicy: OnFailure

Pattern 5: Emerging Technologies & Future Patterns

Pattern 5A: Predictive Autoscaling with Machine Learning

  1# ML-based predictive autoscaling model
  2import pandas as pd
  3import numpy as np
  4from sklearn.ensemble import RandomForestRegressor
  5from kubernetes import client, config
  6import datetime
  7
  8class PredictiveAutoscaler:
  9    def __init__(self):
 10        config.load_kube_config()
 11        self.apps_v1 = client.AppsV1Api()
 12        self.model = RandomForestRegressor(n_estimators=100)
 13        self.is_trained = False
 14
 15    def collect_training_data(self, days=30):
 16        """Collect historical data for training"""
 17        # Query Prometheus for historical metrics
 18        # Features: hour, day_of_week, month, previous_load, etc.
 19        # Target: actual_replicas_needed
 20
 21        data = {
 22            'hour': [],
 23            'day_of_week': [],
 24            'month': [],
 25            'previous_load': [],
 26            'previous_replicas': [],
 27            'actual_replicas': []
 28        }
 29
 30        # Fetch from Prometheus
 31        # ... (implementation details)
 32
 33        return pd.DataFrame(data)
 34
 35    def train(self):
 36        """Train the prediction model"""
 37        df = self.collect_training_data()
 38
 39        X = df[['hour', 'day_of_week', 'month', 'previous_load', 'previous_replicas']]
 40        y = df['actual_replicas']
 41
 42        self.model.fit(X, y)
 43        self.is_trained = True
 44
 45        print(f"Model trained with {len(df)} samples")
 46        print(f"Feature importances: {self.model.feature_importances_}")
 47
 48    def predict_replicas(self, deployment, namespace):
 49        """Predict required replicas for next hour"""
 50        if not self.is_trained:
 51            raise Exception("Model not trained")
 52
 53        now = datetime.datetime.now()
 54
 55        # Current state
 56        deployment_obj = self.apps_v1.read_namespaced_deployment(
 57            deployment, namespace
 58        )
 59        current_replicas = deployment_obj.spec.replicas
 60
 61        # Get current load from Prometheus
 62        current_load = self.get_current_load(deployment, namespace)
 63
 64        # Prepare features
 65        features = np.array([[
 66            now.hour,
 67            now.weekday(),
 68            now.month,
 69            current_load,
 70            current_replicas
 71        ]])
 72
 73        # Predict
 74        predicted_replicas = int(self.model.predict(features)[0])
 75
 76        # Apply safety bounds
 77        min_replicas = 2
 78        max_replicas = 100
 79        predicted_replicas = max(min_replicas, min(predicted_replicas, max_replicas))
 80
 81        return predicted_replicas
 82
 83    def apply_scaling(self, deployment, namespace, replicas):
 84        """Apply predicted scaling"""
 85        body = {
 86            'spec': {
 87                'replicas': replicas
 88            }
 89        }
 90
 91        self.apps_v1.patch_namespaced_deployment_scale(
 92            deployment,
 93            namespace,
 94            body
 95        )
 96
 97        print(f"Scaled {deployment} to {replicas} replicas")
 98
 99    def run(self, deployment, namespace, interval=300):
100        """Main loop"""
101        import time
102
103        while True:
104            try:
105                predicted = self.predict_replicas(deployment, namespace)
106                self.apply_scaling(deployment, namespace, predicted)
107
108                print(f"[{datetime.datetime.now()}] Scaled to {predicted} replicas")
109
110            except Exception as e:
111                print(f"Error: {e}")
112
113            time.sleep(interval)  # Every 5 minutes
114
115# Usage
116if __name__ == "__main__":
117    autoscaler = PredictiveAutoscaler()
118    autoscaler.train()
119    autoscaler.run("api-server", "production")

Pattern 5B: Serverless Kubernetes with Knative

 1# Knative Service with autoscaling
 2---
 3apiVersion: serving.knative.dev/v1
 4kind: Service
 5metadata:
 6  name: knative-app
 7  namespace: serverless
 8spec:
 9  template:
10    metadata:
11      annotations:
12        # Autoscaling configuration
13        autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
14        autoscaling.knative.dev/metric: "concurrency"
15        autoscaling.knative.dev/target: "10"  # Target 10 concurrent requests
16        autoscaling.knative.dev/minScale: "0"  # Scale to zero
17        autoscaling.knative.dev/maxScale: "100"
18        autoscaling.knative.dev/scaleDownDelay: "30s"
19        autoscaling.knative.dev/window: "60s"  # Evaluation window
20
21    spec:
22      containers:
23      - image: myapp:v1.0
24        ports:
25        - containerPort: 8080
26        resources:
27          requests:
28            cpu: 100m
29            memory: 128Mi
30          limits:
31            cpu: 1000m
32            memory: 512Mi
33
34---
35# Advanced: RPS-based autoscaling
36apiVersion: serving.knative.dev/v1
37kind: Service
38metadata:
39  name: rps-based-app
40  namespace: serverless
41spec:
42  template:
43    metadata:
44      annotations:
45        autoscaling.knative.dev/metric: "rps"  # Requests per second
46        autoscaling.knative.dev/target: "100"   # Target 100 RPS per pod
47        autoscaling.knative.dev/targetUtilizationPercentage: "70"
48    spec:
49      containers:
50      - image: myapp:v1.0
51        resources:
52          requests:
53            cpu: 200m
54            memory: 256Mi

Pattern 5C: Service Mesh Integration (Istio)

 1# Istio VirtualService with traffic-based autoscaling
 2---
 3apiVersion: networking.istio.io/v1beta1
 4kind: VirtualService
 5metadata:
 6  name: my-app
 7  namespace: production
 8spec:
 9  hosts:
10  - my-app.example.com
11  http:
12  - match:
13    - headers:
14        x-version:
15          exact: canary
16    route:
17    - destination:
18        host: my-app
19        subset: canary
20      weight: 10  # 10% traffic to canary
21  - route:
22    - destination:
23        host: my-app
24        subset: stable
25      weight: 90
26
27---
28# DestinationRule
29apiVersion: networking.istio.io/v1beta1
30kind: DestinationRule
31metadata:
32  name: my-app
33  namespace: production
34spec:
35  host: my-app
36  subsets:
37  - name: stable
38    labels:
39      version: stable
40  - name: canary
41    labels:
42      version: canary
43
44---
45# HPA using Istio metrics
46apiVersion: autoscaling/v2
47kind: HorizontalPodAutoscaler
48metadata:
49  name: my-app-istio-hpa
50  namespace: production
51spec:
52  scaleTargetRef:
53    apiVersion: apps/v1
54    kind: Deployment
55    name: my-app
56
57  minReplicas: 2
58  maxReplicas: 50
59
60  metrics:
61  # Istio request rate
62  - type: Pods
63    pods:
64      metric:
65        name: istio_requests_per_second
66      target:
67        type: AverageValue
68        averageValue: "100"
69
70  # Istio P99 latency
71  - type: Pods
72    pods:
73      metric:
74        name: istio_request_duration_p99
75      target:
76        type: AverageValue
77        averageValue: "200m"  # 200ms
78
79---
80# Prometheus rules for Istio metrics
81apiVersion: monitoring.coreos.com/v1
82kind: PrometheusRule
83metadata:
84  name: istio-custom-metrics
85  namespace: monitoring
86spec:
87  groups:
88  - name: istio-autoscaling
89    interval: 15s
90    rules:
91    - record: istio_requests_per_second
92      expr: |
93        sum(rate(istio_requests_total{destination_workload="my-app"}[2m])) by (pod)        
94
95    - record: istio_request_duration_p99
96      expr: |
97        histogram_quantile(0.99,
98          sum(rate(istio_request_duration_milliseconds_bucket{destination_workload="my-app"}[2m])) by (pod, le)
99        )

Best Practices Summary

Stateful Applications

✅ Use conservative scaling policies (slower scale-up/down) ✅ Implement proper health checks and readiness probes ✅ Plan for data synchronization time ✅ Use PVCs with appropriate storage classes ✅ Consider split architectures (read/write separation)

Multi-Cluster

✅ Centralize metrics with Thanos or Prometheus federation ✅ Implement intelligent routing with global load balancers ✅ Use cost-aware scheduling ✅ Plan for cross-cluster failover ✅ Monitor inter-cluster latency

Cost Optimization

✅ Maximize spot instance usage (70-90% savings) ✅ Implement aggressive consolidation ✅ Use FinOps dashboards for visibility ✅ Set up cost alerts and budgets ✅ Regular right-sizing reviews

Batch Jobs

✅ Use KEDA ScaledJobs for queue-driven processing ✅ Implement proper job cleanup policies ✅ Set resource limits to prevent runaway costs ✅ Use gang scheduling for distributed jobs ✅ Monitor job success rates

Key Takeaways

Stateful Scaling: Requires careful planning, slower policies, and split read/write architectures
Multi-Cluster: Centralized metrics and intelligent distribution critical for success
Cost Optimization: Spot instances + right-sizing + consolidation = 60-80% savings
Batch Processing: Queue-based autoscaling with KEDA scales jobs efficiently
Future: ML-based prediction, serverless K8s, and service mesh integration emerging

Autoscaling Series

Conclusion

Advanced autoscaling patterns unlock significant value:

Stateful applications can scale safely with proper planning
Multi-cluster deployments enable global scale and resilience
Cost optimization delivers 60-80% infrastructure savings
Batch processing scales efficiently with queue-based triggers
Emerging technologies push boundaries of what’s possible

These patterns, combined with foundational HPA and VPA, create comprehensive autoscaling architectures that balance performance, cost, and reliability at scale.

Next up: Part 7 - Production Troubleshooting & War Stories 🔧

Happy scaling! 🚀