Kubernetes 完整指南(三):進階功能與生產環境實踐

🎯 前言

經過前兩篇的學習,我們已經掌握了 Kubernetes 的基礎概念與核心資源操作。本文將深入探討進階功能與生產環境實踐,幫助你構建企業級的容器平台。

本文重點:

  • 自動擴展(HPA/VPA/CA)
  • RBAC 權限管理
  • Network Policy 網路策略
  • Helm 套件管理
  • 監控與告警系統
  • 日誌收集方案
  • CI/CD 整合
  • 生產環境最佳實踐

⚡ 自動擴展機制

擴展類型對照

graph TB
    A[Kubernetes 自動擴展] --> B[HPA<br/>水平 Pod 擴展]
    A --> C[VPA<br/>垂直 Pod 擴展]
    A --> D[CA<br/>叢集自動擴展]

    B --> B1[根據 CPU/記憶體<br/>自動調整 Pod 數量]
    C --> C1[根據資源使用<br/>調整 Pod 資源限制]
    D --> D1[根據負載<br/>自動增減節點]

    style A fill:#326ce5
    style B fill:#4ecdc4
    style C fill:#feca57
    style D fill:#ff6b6b

HPA (Horizontal Pod Autoscaler)

基於 CPU 的 HPA:

 1apiVersion: autoscaling/v2
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: nginx-hpa
 5  namespace: default
 6spec:
 7  # 目標 Deployment
 8  scaleTargetRef:
 9    apiVersion: apps/v1
10    kind: Deployment
11    name: nginx
12
13  # Pod 數量範圍
14  minReplicas: 2
15  maxReplicas: 10
16
17  # 擴展行為
18  behavior:
19    scaleDown:
20      stabilizationWindowSeconds: 300  # 縮容穩定視窗
21      policies:
22      - type: Percent
23        value: 50  # 每次最多縮容 50%
24        periodSeconds: 60
25      - type: Pods
26        value: 2   # 每次最多縮容 2 個 Pod
27        periodSeconds: 60
28      selectPolicy: Min  # 選擇最小值
29
30    scaleUp:
31      stabilizationWindowSeconds: 0
32      policies:
33      - type: Percent
34        value: 100  # 每次最多擴容 100%
35        periodSeconds: 30
36      - type: Pods
37        value: 4    # 每次最多擴容 4 個 Pod
38        periodSeconds: 30
39      selectPolicy: Max  # 選擇最大值
40
41  # 指標配置
42  metrics:
43  # CPU 使用率
44  - type: Resource
45    resource:
46      name: cpu
47      target:
48        type: Utilization
49        averageUtilization: 70  # 目標 CPU 使用率 70%
50
51  # 記憶體使用率
52  - type: Resource
53    resource:
54      name: memory
55      target:
56        type: Utilization
57        averageUtilization: 80  # 目標記憶體使用率 80%
58
59  # 自訂指標(Prometheus)
60  - type: Pods
61    pods:
62      metric:
63        name: http_requests_per_second
64      target:
65        type: AverageValue
66        averageValue: "1000"
67
68  # 外部指標
69  - type: External
70    external:
71      metric:
72        name: queue_length
73        selector:
74          matchLabels:
75            queue: worker_tasks
76      target:
77        type: AverageValue
78        averageValue: "30"

HPA 操作指令

 1# 創建 HPA(簡單版)
 2kubectl autoscale deployment nginx --min=2 --max=10 --cpu-percent=70
 3
 4# 創建 HPA(YAML)
 5kubectl apply -f hpa.yaml
 6
 7# 查看 HPA
 8kubectl get hpa
 9kubectl describe hpa nginx-hpa
10
11# 監視 HPA
12kubectl get hpa --watch
13
14# 手動測試(產生負載)
15kubectl run -it --rm load-generator --image=busybox -- /bin/sh
16while true; do wget -q -O- http://nginx-service; done
17
18# 刪除 HPA
19kubectl delete hpa nginx-hpa

VPA (Vertical Pod Autoscaler)

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name: nginx-vpa
 5spec:
 6  targetRef:
 7    apiVersion: apps/v1
 8    kind: Deployment
 9    name: nginx
10
11  # 更新策略
12  updatePolicy:
13    updateMode: "Auto"  # Auto, Recreate, Initial, Off
14
15  # 資源策略
16  resourcePolicy:
17    containerPolicies:
18    - containerName: nginx
19      minAllowed:
20        cpu: 100m
21        memory: 50Mi
22      maxAllowed:
23        cpu: 2
24        memory: 1Gi
25      mode: Auto

VPA 模式對照表:

模式說明行為
Off僅提供建議不自動調整
Initial創建時設定只在創建時應用
Recreate重建 Pod刪除並重建 Pod
Auto自動調整就地更新或重建

Cluster Autoscaler

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: cluster-autoscaler
 5  namespace: kube-system
 6spec:
 7  replicas: 1
 8  selector:
 9    matchLabels:
10      app: cluster-autoscaler
11  template:
12    metadata:
13      labels:
14        app: cluster-autoscaler
15    spec:
16      serviceAccountName: cluster-autoscaler
17      containers:
18      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
19        name: cluster-autoscaler
20        command:
21        - ./cluster-autoscaler
22        - --v=4
23        - --stderrthreshold=info
24        - --cloud-provider=aws
25        - --skip-nodes-with-local-storage=false
26        - --expander=least-waste
27        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
28        - --balance-similar-node-groups
29        - --skip-nodes-with-system-pods=false
30        env:
31        - name: AWS_REGION
32          value: us-west-2

🔐 RBAC 權限管理

RBAC 架構

graph TB
    subgraph "主體 (Subject)"
        U[User<br/>使用者]
        G[Group<br/>群組]
        SA[ServiceAccount<br/>服務帳號]
    end

    subgraph "繫結 (Binding)"
        RB[RoleBinding<br/>命名空間級別]
        CRB[ClusterRoleBinding<br/>叢集級別]
    end

    subgraph "角色 (Role)"
        R[Role<br/>命名空間級別]
        CR[ClusterRole<br/>叢集級別]
    end

    subgraph "資源 (Resources)"
        P[Pods]
        D[Deployments]
        S[Services]
        N[Nodes]
    end

    U --> RB
    G --> RB
    SA --> RB

    U --> CRB
    G --> CRB
    SA --> CRB

    RB --> R
    CRB --> CR

    R -.->|存取| P
    R -.->|存取| D
    R -.->|存取| S

    CR -.->|存取| N
    CR -.->|存取| P
    CR -.->|存取| D

    style U fill:#326ce5
    style RB fill:#4ecdc4
    style R fill:#feca57
    style P fill:#ff6b6b

Role 與 ClusterRole

Role(命名空間級別):

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: Role
 3metadata:
 4  namespace: default
 5  name: pod-reader
 6rules:
 7# Pod 資源
 8- apiGroups: [""]
 9  resources: ["pods"]
10  verbs: ["get", "watch", "list"]
11
12# Pod 日誌
13- apiGroups: [""]
14  resources: ["pods/log"]
15  verbs: ["get", "list"]
16
17# ConfigMap 與 Secret
18- apiGroups: [""]
19  resources: ["configmaps", "secrets"]
20  verbs: ["get"]
21
22# Deployment
23- apiGroups: ["apps"]
24  resources: ["deployments"]
25  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
26
27# Service
28- apiGroups: [""]
29  resources: ["services"]
30  verbs: ["get", "list", "create", "delete"]

ClusterRole(叢集級別):

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: ClusterRole
 3metadata:
 4  name: cluster-admin-custom
 5rules:
 6# 所有資源的完整權限
 7- apiGroups: ["*"]
 8  resources: ["*"]
 9  verbs: ["*"]
10
11# 非資源 URL
12- nonResourceURLs: ["*"]
13  verbs: ["*"]

RoleBinding 與 ClusterRoleBinding

RoleBinding:

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: RoleBinding
 3metadata:
 4  name: read-pods
 5  namespace: default
 6subjects:
 7# 使用者
 8- kind: User
 9  name: jane
10  apiGroup: rbac.authorization.k8s.io
11
12# 群組
13- kind: Group
14  name: developers
15  apiGroup: rbac.authorization.k8s.io
16
17# ServiceAccount
18- kind: ServiceAccount
19  name: my-service-account
20  namespace: default
21
22roleRef:
23  kind: Role
24  name: pod-reader
25  apiGroup: rbac.authorization.k8s.io

ClusterRoleBinding:

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: ClusterRoleBinding
 3metadata:
 4  name: read-all-pods
 5subjects:
 6- kind: Group
 7  name: system:authenticated
 8  apiGroup: rbac.authorization.k8s.io
 9roleRef:
10  kind: ClusterRole
11  name: view
12  apiGroup: rbac.authorization.k8s.io

ServiceAccount

 1apiVersion: v1
 2kind: ServiceAccount
 3metadata:
 4  name: my-app-sa
 5  namespace: default
 6automountServiceAccountToken: true
 7secrets:
 8- name: my-app-token
 9
10---
11apiVersion: v1
12kind: Secret
13metadata:
14  name: my-app-token
15  namespace: default
16  annotations:
17    kubernetes.io/service-account.name: my-app-sa
18type: kubernetes.io/service-account-token

RBAC 操作指令

 1# 查看角色
 2kubectl get roles
 3kubectl get clusterroles
 4kubectl describe role pod-reader
 5
 6# 查看繫結
 7kubectl get rolebindings
 8kubectl get clusterrolebindings
 9kubectl describe rolebinding read-pods
10
11# 查看 ServiceAccount
12kubectl get serviceaccounts
13kubectl get sa  # 簡寫
14kubectl describe sa my-app-sa
15
16# 檢查權限
17kubectl auth can-i create deployments
18kubectl auth can-i delete pods --namespace=default
19kubectl auth can-i '*' '*' --all-namespaces
20
21# 以特定使用者身分檢查
22kubectl auth can-i list pods --as=jane
23kubectl auth can-i create deployments --as=system:serviceaccount:default:my-app-sa
24
25# 創建 ServiceAccount Token
26kubectl create token my-app-sa --duration=24h
27
28# 查看當前使用者
29kubectl config view --minify -o jsonpath='{.contexts[0].context.user}'

預設 ClusterRole

ClusterRole說明權限範圍
cluster-admin超級管理員完整權限
admin命名空間管理員命名空間內完整權限
edit編輯者讀寫大部分資源
view檢視者唯讀權限

🌐 Network Policy

Network Policy 概念

graph TB
    subgraph "Frontend Namespace"
        WEB[Web Pod]
    end

    subgraph "Backend Namespace"
        API[API Pod]
        CACHE[Cache Pod]
    end

    subgraph "Database Namespace"
        DB[(Database Pod)]
    end

    WEB -->|允許| API
    API -->|允許| DB
    API -->|允許| CACHE
    WEB -.->|拒絕| DB
    WEB -.->|拒絕| CACHE

    INTERNET[網際網路] -->|允許| WEB
    INTERNET -.->|拒絕| API
    INTERNET -.->|拒絕| DB

    style WEB fill:#4ecdc4
    style API fill:#feca57
    style DB fill:#ff6b6b

Network Policy 完整範例

 1apiVersion: networking.k8s.io/v1
 2kind: NetworkPolicy
 3metadata:
 4  name: api-network-policy
 5  namespace: backend
 6spec:
 7  # 應用到哪些 Pod
 8  podSelector:
 9    matchLabels:
10      app: api
11      tier: backend
12
13  # 策略類型
14  policyTypes:
15  - Ingress  # 入站流量
16  - Egress   # 出站流量
17
18  # 入站規則
19  ingress:
20  # 規則 1: 允許來自 frontend 的流量
21  - from:
22    - namespaceSelector:
23        matchLabels:
24          name: frontend
25      podSelector:
26        matchLabels:
27          app: web
28    ports:
29    - protocol: TCP
30      port: 8080
31
32  # 規則 2: 允許來自特定 IP 範圍
33  - from:
34    - ipBlock:
35        cidr: 10.0.0.0/16
36        except:
37        - 10.0.1.0/24
38    ports:
39    - protocol: TCP
40      port: 8080
41
42  # 出站規則
43  egress:
44  # 規則 1: 允許存取資料庫
45  - to:
46    - namespaceSelector:
47        matchLabels:
48          name: database
49      podSelector:
50        matchLabels:
51          app: postgres
52    ports:
53    - protocol: TCP
54      port: 5432
55
56  # 規則 2: 允許存取 Redis
57  - to:
58    - podSelector:
59        matchLabels:
60          app: redis
61    ports:
62    - protocol: TCP
63      port: 6379
64
65  # 規則 3: 允許 DNS 查詢
66  - to:
67    - namespaceSelector:
68        matchLabels:
69          name: kube-system
70      podSelector:
71        matchLabels:
72          k8s-app: kube-dns
73    ports:
74    - protocol: UDP
75      port: 53
76
77  # 規則 4: 允許存取外部 API
78  - to:
79    - ipBlock:
80        cidr: 0.0.0.0/0
81        except:
82        - 10.0.0.0/8
83        - 172.16.0.0/12
84        - 192.168.0.0/16
85    ports:
86    - protocol: TCP
87      port: 443

Network Policy 常見模式

1. 預設拒絕所有流量:

 1apiVersion: networking.k8s.io/v1
 2kind: NetworkPolicy
 3metadata:
 4  name: default-deny-all
 5  namespace: production
 6spec:
 7  podSelector: {}
 8  policyTypes:
 9  - Ingress
10  - Egress

2. 允許特定命名空間:

 1apiVersion: networking.k8s.io/v1
 2kind: NetworkPolicy
 3metadata:
 4  name: allow-from-namespace
 5spec:
 6  podSelector:
 7    matchLabels:
 8      app: backend
 9  ingress:
10  - from:
11    - namespaceSelector:
12        matchLabels:
13          environment: production

3. 允許 DNS:

 1apiVersion: networking.k8s.io/v1
 2kind: NetworkPolicy
 3metadata:
 4  name: allow-dns
 5spec:
 6  podSelector: {}
 7  egress:
 8  - to:
 9    - namespaceSelector:
10        matchLabels:
11          name: kube-system
12    - podSelector:
13        matchLabels:
14          k8s-app: kube-dns
15    ports:
16    - protocol: UDP
17      port: 53

Network Policy 操作指令

 1# 創建 Network Policy
 2kubectl apply -f network-policy.yaml
 3
 4# 查看 Network Policy
 5kubectl get networkpolicies
 6kubectl get netpol  # 簡寫
 7kubectl describe networkpolicy api-network-policy
 8
 9# 測試網路連通性(從 Pod A 測試連到 Pod B)
10kubectl exec -it pod-a -- curl http://pod-b-service
11
12# 刪除 Network Policy
13kubectl delete networkpolicy api-network-policy

📦 Helm 套件管理

Helm 架構

graph TB
    H[Helm CLI] --> CHART[Chart<br/>套件定義]
    CHART --> TEMPLATE[Templates<br/>YAML 模板]
    CHART --> VALUES[values.yaml<br/>配置值]
    CHART --> CHART_YAML[Chart.yaml<br/>元資料]

    H -->|helm install| K8S[Kubernetes API]
    K8S --> RELEASE[Release<br/>部署實例]

    REPO[Helm Repository] -.->|helm pull| CHART

    style H fill:#326ce5
    style CHART fill:#4ecdc4
    style K8S fill:#feca57
    style RELEASE fill:#ff6b6b

Helm Chart 結構

my-app/
├── Chart.yaml          # Chart 元資料
├── values.yaml         # 預設值
├── values-dev.yaml     # 開發環境值
├── values-prod.yaml    # 生產環境值
├── templates/          # K8s 資源模板
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── configmap.yaml
│   ├── secret.yaml
│   ├── hpa.yaml
│   ├── _helpers.tpl   # 輔助函數
│   ├── NOTES.txt      # 安裝說明
│   └── tests/         # 測試
│       └── test-connection.yaml
├── charts/            # 依賴 Chart
├── .helmignore        # 忽略檔案
└── README.md

Chart.yaml 範例

 1apiVersion: v2
 2name: my-app
 3description: My Application Helm Chart
 4type: application
 5version: 1.0.0
 6appVersion: "1.24.0"
 7keywords:
 8  - web
 9  - application
10home: https://example.com
11sources:
12  - https://github.com/example/my-app
13maintainers:
14  - name: DevOps Team
15    email: devops@example.com
16dependencies:
17  - name: postgresql
18    version: "12.1.0"
19    repository: "https://charts.bitnami.com/bitnami"
20    condition: postgresql.enabled
21  - name: redis
22    version: "17.0.0"
23    repository: "https://charts.bitnami.com/bitnami"
24    condition: redis.enabled

values.yaml 範例

  1# 副本數
  2replicaCount: 3
  3
  4# 映像配置
  5image:
  6  repository: myapp
  7  pullPolicy: IfNotPresent
  8  tag: "1.24.0"
  9
 10imagePullSecrets: []
 11nameOverride: ""
 12fullnameOverride: ""
 13
 14# ServiceAccount
 15serviceAccount:
 16  create: true
 17  annotations: {}
 18  name: ""
 19
 20# Pod 註解
 21podAnnotations:
 22  prometheus.io/scrape: "true"
 23  prometheus.io/port: "9090"
 24
 25# Pod 安全上下文
 26podSecurityContext:
 27  runAsNonRoot: true
 28  runAsUser: 1000
 29  fsGroup: 2000
 30
 31securityContext:
 32  capabilities:
 33    drop:
 34    - ALL
 35  readOnlyRootFilesystem: true
 36  allowPrivilegeEscalation: false
 37
 38# Service 配置
 39service:
 40  type: ClusterIP
 41  port: 80
 42  targetPort: 8080
 43
 44# Ingress 配置
 45ingress:
 46  enabled: true
 47  className: nginx
 48  annotations:
 49    cert-manager.io/cluster-issuer: letsencrypt-prod
 50  hosts:
 51    - host: app.example.com
 52      paths:
 53        - path: /
 54          pathType: Prefix
 55  tls:
 56    - secretName: app-tls
 57      hosts:
 58        - app.example.com
 59
 60# 資源限制
 61resources:
 62  limits:
 63    cpu: 500m
 64    memory: 512Mi
 65  requests:
 66    cpu: 250m
 67    memory: 256Mi
 68
 69# HPA
 70autoscaling:
 71  enabled: true
 72  minReplicas: 2
 73  maxReplicas: 10
 74  targetCPUUtilizationPercentage: 70
 75  targetMemoryUtilizationPercentage: 80
 76
 77# NodeSelector
 78nodeSelector: {}
 79
 80# Tolerations
 81tolerations: []
 82
 83# Affinity
 84affinity: {}
 85
 86# 環境變數
 87env:
 88  - name: ENVIRONMENT
 89    value: "production"
 90  - name: LOG_LEVEL
 91    value: "info"
 92
 93# ConfigMap
 94configMap:
 95  data:
 96    app.properties: |
 97      server.port=8080
 98      server.host=0.0.0.0      
 99
100# Secret
101secret:
102  data:
103    database-password: ""
104    api-key: ""
105
106# PostgreSQL 依賴
107postgresql:
108  enabled: true
109  auth:
110    username: myapp
111    password: ""
112    database: myapp
113  primary:
114    persistence:
115      enabled: true
116      size: 10Gi
117
118# Redis 依賴
119redis:
120  enabled: true
121  auth:
122    enabled: true
123    password: ""
124  master:
125    persistence:
126      enabled: true
127      size: 8Gi

templates/deployment.yaml 範例

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: {{ include "my-app.fullname" . }}
 5  labels:
 6    {{- include "my-app.labels" . | nindent 4 }}
 7spec:
 8  {{- if not .Values.autoscaling.enabled }}
 9  replicas: {{ .Values.replicaCount }}
10  {{- end }}
11  selector:
12    matchLabels:
13      {{- include "my-app.selectorLabels" . | nindent 6 }}
14  template:
15    metadata:
16      annotations:
17        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
18        {{- with .Values.podAnnotations }}
19        {{- toYaml . | nindent 8 }}
20        {{- end }}
21      labels:
22        {{- include "my-app.selectorLabels" . | nindent 8 }}
23    spec:
24      {{- with .Values.imagePullSecrets }}
25      imagePullSecrets:
26        {{- toYaml . | nindent 8 }}
27      {{- end }}
28      serviceAccountName: {{ include "my-app.serviceAccountName" . }}
29      securityContext:
30        {{- toYaml .Values.podSecurityContext | nindent 8 }}
31      containers:
32      - name: {{ .Chart.Name }}
33        securityContext:
34          {{- toYaml .Values.securityContext | nindent 12 }}
35        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
36        imagePullPolicy: {{ .Values.image.pullPolicy }}
37        ports:
38        - name: http
39          containerPort: {{ .Values.service.targetPort }}
40          protocol: TCP
41        livenessProbe:
42          httpGet:
43            path: /health
44            port: http
45          initialDelaySeconds: 30
46          periodSeconds: 10
47        readinessProbe:
48          httpGet:
49            path: /ready
50            port: http
51          initialDelaySeconds: 5
52          periodSeconds: 5
53        env:
54          {{- toYaml .Values.env | nindent 12 }}
55        resources:
56          {{- toYaml .Values.resources | nindent 12 }}
57      {{- with .Values.nodeSelector }}
58      nodeSelector:
59        {{- toYaml . | nindent 8 }}
60      {{- end }}
61      {{- with .Values.affinity }}
62      affinity:
63        {{- toYaml . | nindent 8 }}
64      {{- end }}
65      {{- with .Values.tolerations }}
66      tolerations:
67        {{- toYaml . | nindent 8 }}
68      {{- end }}

Helm 操作指令

 1# 安裝 Helm
 2curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
 3
 4# 添加倉庫
 5helm repo add stable https://charts.helm.sh/stable
 6helm repo add bitnami https://charts.bitnami.com/bitnami
 7helm repo update
 8
 9# 搜尋 Chart
10helm search repo nginx
11helm search hub wordpress
12
13# 查看 Chart 資訊
14helm show chart bitnami/nginx
15helm show values bitnami/nginx
16helm show readme bitnami/nginx
17
18# 安裝 Chart
19helm install my-release bitnami/nginx
20helm install my-app ./my-app-chart
21helm install my-app ./my-app-chart -f values-prod.yaml
22helm install my-app ./my-app-chart --set replicaCount=5
23helm install my-app ./my-app-chart --namespace production --create-namespace
24
25# 查看 Release
26helm list
27helm list -A  # 所有命名空間
28helm status my-app
29helm get values my-app
30helm get manifest my-app
31
32# 升級 Release
33helm upgrade my-app ./my-app-chart
34helm upgrade my-app ./my-app-chart -f values-prod.yaml
35helm upgrade my-app ./my-app-chart --set image.tag=1.25.0
36helm upgrade --install my-app ./my-app-chart  # 不存在則安裝
37
38# 回滾 Release
39helm rollback my-app
40helm rollback my-app 2  # 回滾到版本 2
41helm history my-app
42
43# 刪除 Release
44helm uninstall my-app
45helm uninstall my-app --keep-history
46
47# 驗證 Chart
48helm lint ./my-app-chart
49helm template my-app ./my-app-chart
50helm install --dry-run --debug my-app ./my-app-chart
51
52# 打包 Chart
53helm package ./my-app-chart
54helm package ./my-app-chart --version 1.0.1
55
56# 創建 Chart
57helm create my-new-chart
58
59# 依賴管理
60helm dependency update ./my-app-chart
61helm dependency build ./my-app-chart
62helm dependency list ./my-app-chart

📊 監控與告警系統

Prometheus + Grafana 架構

graph TB
    subgraph "資料收集"
        NE[Node Exporter<br/>節點指標]
        KSM[Kube State Metrics<br/>K8s 資源狀態]
        CA[cAdvisor<br/>容器指標]
        APP[Application<br/>自訂指標]
    end

    subgraph "Prometheus"
        PROM[Prometheus Server<br/>時序資料庫]
        ALERT[Alertmanager<br/>告警管理]
    end

    subgraph "視覺化"
        GRAF[Grafana<br/>儀表板]
    end

    NE -->|metrics| PROM
    KSM -->|metrics| PROM
    CA -->|metrics| PROM
    APP -->|metrics| PROM

    PROM -->|alerts| ALERT
    ALERT -->|通知| EMAIL[Email]
    ALERT -->|通知| SLACK[Slack]
    ALERT -->|通知| WEBHOOK[Webhook]

    GRAF -->|查詢| PROM

    style PROM fill:#326ce5
    style GRAF fill:#ff6b6b
    style ALERT fill:#feca57

Prometheus 安裝(Helm)

 1# 添加 Prometheus 社群 Helm 倉庫
 2helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 3helm repo update
 4
 5# 安裝 kube-prometheus-stack
 6helm install prometheus prometheus-community/kube-prometheus-stack \
 7  --namespace monitoring \
 8  --create-namespace \
 9  --set prometheus.prometheusSpec.retention=30d \
10  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
11  --set grafana.adminPassword=admin123
12
13# 查看安裝的資源
14kubectl get all -n monitoring
15
16# 存取 Prometheus
17kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
18
19# 存取 Grafana
20kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

ServiceMonitor 配置

 1apiVersion: monitoring.coreos.com/v1
 2kind: ServiceMonitor
 3metadata:
 4  name: my-app-metrics
 5  namespace: monitoring
 6  labels:
 7    release: prometheus
 8spec:
 9  selector:
10    matchLabels:
11      app: my-app
12  endpoints:
13  - port: metrics
14    interval: 30s
15    path: /metrics
16  namespaceSelector:
17    matchNames:
18    - default

PrometheusRule 告警規則

 1apiVersion: monitoring.coreos.com/v1
 2kind: PrometheusRule
 3metadata:
 4  name: my-app-alerts
 5  namespace: monitoring
 6  labels:
 7    release: prometheus
 8spec:
 9  groups:
10  - name: my-app
11    interval: 30s
12    rules:
13    # Pod 重啟過多
14    - alert: PodRestarting
15      expr: |
16        rate(kube_pod_container_status_restarts_total[15m]) > 0        
17      for: 5m
18      labels:
19        severity: warning
20      annotations:
21        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} restarting"
22        description: "Pod has restarted {{ $value }} times in the last 15 minutes"
23
24    # CPU 使用率過高
25    - alert: HighCPUUsage
26      expr: |
27        sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod) > 0.8        
28      for: 5m
29      labels:
30        severity: warning
31      annotations:
32        summary: "High CPU usage detected"
33        description: "Pod {{ $labels.pod }} CPU usage is {{ $value | humanizePercentage }}"
34
35    # 記憶體使用率過高
36    - alert: HighMemoryUsage
37      expr: |
38        sum(container_memory_working_set_bytes{namespace="default"}) by (pod) /
39        sum(container_spec_memory_limit_bytes{namespace="default"}) by (pod) > 0.9        
40      for: 5m
41      labels:
42        severity: critical
43      annotations:
44        summary: "High memory usage detected"
45        description: "Pod {{ $labels.pod }} memory usage is {{ $value | humanizePercentage }}"
46
47    # 服務不可用
48    - alert: ServiceDown
49      expr: |
50        up{job="my-app"} == 0        
51      for: 1m
52      labels:
53        severity: critical
54      annotations:
55        summary: "Service is down"
56        description: "The service {{ $labels.job }} has been down for more than 1 minute"
57
58    # HTTP 錯誤率過高
59    - alert: HighErrorRate
60      expr: |
61        sum(rate(http_requests_total{status=~"5.."}[5m])) by (service) /
62        sum(rate(http_requests_total[5m])) by (service) > 0.05        
63      for: 5m
64      labels:
65        severity: warning
66      annotations:
67        summary: "High HTTP error rate"
68        description: "Service {{ $labels.service }} error rate is {{ $value | humanizePercentage }}"

📝 日誌收集方案

EFK Stack 架構

graph TB
    subgraph "日誌來源"
        POD1[Pod 1]
        POD2[Pod 2]
        POD3[Pod 3]
        NODE[Node Logs]
    end

    subgraph "日誌收集"
        FB[Fluent Bit<br/>DaemonSet]
    end

    subgraph "日誌處理"
        ES[Elasticsearch<br/>儲存與索引]
    end

    subgraph "日誌視覺化"
        KB[Kibana<br/>查詢與分析]
    end

    POD1 -->|stdout/stderr| FB
    POD2 -->|stdout/stderr| FB
    POD3 -->|stdout/stderr| FB
    NODE -->|/var/log| FB

    FB -->|轉發| ES
    KB -->|查詢| ES

    style FB fill:#326ce5
    style ES fill:#4ecdc4
    style KB fill:#ff6b6b

Fluent Bit 配置

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  name: fluent-bit-config
 5  namespace: logging
 6data:
 7  fluent-bit.conf: |
 8    [SERVICE]
 9        Daemon Off
10        Flush 1
11        Log_Level info
12        Parsers_File parsers.conf
13        HTTP_Server On
14        HTTP_Listen 0.0.0.0
15        HTTP_Port 2020
16        Health_Check On
17
18    [INPUT]
19        Name              tail
20        Path              /var/log/containers/*.log
21        multiline.parser  docker, cri
22        Tag               kube.*
23        Mem_Buf_Limit     5MB
24        Skip_Long_Lines   On
25
26    [FILTER]
27        Name                kubernetes
28        Match               kube.*
29        Kube_URL            https://kubernetes.default.svc:443
30        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
31        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
32        Kube_Tag_Prefix     kube.var.log.containers.
33        Merge_Log           On
34        Keep_Log            Off
35        K8S-Logging.Parser  On
36        K8S-Logging.Exclude On
37
38    [OUTPUT]
39        Name            es
40        Match           *
41        Host            elasticsearch.logging.svc.cluster.local
42        Port            9200
43        Logstash_Format On
44        Retry_Limit     False
45        Type            _doc    
46
47  parsers.conf: |
48    [PARSER]
49        Name   json
50        Format json
51        Time_Key time
52        Time_Format %d/%b/%Y:%H:%M:%S %z
53
54    [PARSER]
55        Name        docker
56        Format      json
57        Time_Key    time
58        Time_Format %Y-%m-%dT%H:%M:%S.%L
59        Time_Keep   On
60
61    [PARSER]
62        Name        syslog
63        Format      regex
64        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
65        Time_Key    time
66        Time_Format %b %d %H:%M:%S    

🚀 CI/CD 整合

GitLab CI/CD Pipeline

 1# .gitlab-ci.yml
 2variables:
 3  DOCKER_REGISTRY: registry.example.com
 4  IMAGE_NAME: ${DOCKER_REGISTRY}/myapp
 5  KUBE_NAMESPACE: production
 6  KUBECONFIG: /etc/deploy/config
 7
 8stages:
 9  - test
10  - build
11  - deploy
12
13# 測試階段
14test:
15  stage: test
16  image: node:18
17  script:
18    - npm ci
19    - npm run lint
20    - npm run test
21    - npm run test:coverage
22  coverage: '/Statements\s+:\s+(\d+\.\d+)%/'
23  artifacts:
24    reports:
25      coverage_report:
26        coverage_format: cobertura
27        path: coverage/cobertura-coverage.xml
28    paths:
29      - coverage/
30  only:
31    - branches
32
33# 建立映像
34build:
35  stage: build
36  image: docker:24
37  services:
38    - docker:24-dind
39  before_script:
40    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $DOCKER_REGISTRY
41  script:
42    - docker build
43        --build-arg VERSION=${CI_COMMIT_SHORT_SHA}
44        -t ${IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}
45        -t ${IMAGE_NAME}:latest
46        .
47    - docker push ${IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}
48    - docker push ${IMAGE_NAME}:latest
49  only:
50    - main
51    - develop
52
53# 部署到開發環境
54deploy:dev:
55  stage: deploy
56  image: bitnami/kubectl:latest
57  script:
58    - kubectl config use-context dev-cluster
59    - kubectl set image deployment/myapp myapp=${IMAGE_NAME}:${CI_COMMIT_SHORT_SHA} -n development
60    - kubectl rollout status deployment/myapp -n development
61  environment:
62    name: development
63    url: https://dev.example.com
64  only:
65    - develop
66
67# 部署到生產環境
68deploy:prod:
69  stage: deploy
70  image: bitnami/kubectl:latest
71  script:
72    - kubectl config use-context prod-cluster
73    - |
74      helm upgrade --install myapp ./helm/myapp \
75        --namespace ${KUBE_NAMESPACE} \
76        --set image.tag=${CI_COMMIT_SHORT_SHA} \
77        --set replicaCount=3 \
78        --values ./helm/myapp/values-prod.yaml \
79        --wait \
80        --timeout 5m      
81    - kubectl get pods -n ${KUBE_NAMESPACE} -l app=myapp
82  environment:
83    name: production
84    url: https://example.com
85  when: manual
86  only:
87    - main

GitHub Actions Workflow

  1# .github/workflows/deploy.yml
  2name: Build and Deploy to Kubernetes
  3
  4on:
  5  push:
  6    branches: [main, develop]
  7  pull_request:
  8    branches: [main]
  9
 10env:
 11  REGISTRY: ghcr.io
 12  IMAGE_NAME: ${{ github.repository }}
 13
 14jobs:
 15  test:
 16    runs-on: ubuntu-latest
 17    steps:
 18      - uses: actions/checkout@v3
 19
 20      - name: Setup Node.js
 21        uses: actions/setup-node@v3
 22        with:
 23          node-version: '18'
 24          cache: 'npm'
 25
 26      - name: Install dependencies
 27        run: npm ci
 28
 29      - name: Run tests
 30        run: |
 31          npm run lint
 32          npm run test
 33          npm run test:coverage          
 34
 35      - name: Upload coverage
 36        uses: codecov/codecov-action@v3
 37        with:
 38          files: ./coverage/lcov.info
 39
 40  build:
 41    needs: test
 42    runs-on: ubuntu-latest
 43    permissions:
 44      contents: read
 45      packages: write
 46    steps:
 47      - uses: actions/checkout@v3
 48
 49      - name: Set up Docker Buildx
 50        uses: docker/setup-buildx-action@v2
 51
 52      - name: Log in to Container Registry
 53        uses: docker/login-action@v2
 54        with:
 55          registry: ${{ env.REGISTRY }}
 56          username: ${{ github.actor }}
 57          password: ${{ secrets.GITHUB_TOKEN }}
 58
 59      - name: Extract metadata
 60        id: meta
 61        uses: docker/metadata-action@v4
 62        with:
 63          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
 64          tags: |
 65            type=ref,event=branch
 66            type=sha,prefix={{branch}}-            
 67
 68      - name: Build and push
 69        uses: docker/build-push-action@v4
 70        with:
 71          context: .
 72          push: true
 73          tags: ${{ steps.meta.outputs.tags }}
 74          cache-from: type=gha
 75          cache-to: type=gha,mode=max
 76
 77  deploy:
 78    needs: build
 79    runs-on: ubuntu-latest
 80    if: github.ref == 'refs/heads/main'
 81    steps:
 82      - uses: actions/checkout@v3
 83
 84      - name: Setup kubectl
 85        uses: azure/setup-kubectl@v3
 86
 87      - name: Configure kubectl
 88        run: |
 89          echo "${{ secrets.KUBECONFIG }}" | base64 -d > kubeconfig
 90          export KUBECONFIG=kubeconfig          
 91
 92      - name: Deploy to Kubernetes
 93        run: |
 94          kubectl set image deployment/myapp \
 95            myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:main-${{ github.sha }} \
 96            -n production
 97
 98          kubectl rollout status deployment/myapp -n production          
 99
100      - name: Verify deployment
101        run: |
102          kubectl get pods -n production -l app=myapp
103          kubectl get svc -n production -l app=myapp          

🎯 生產環境最佳實踐

安全性最佳實踐清單

類別最佳實踐實施方法
映像安全使用最小化基礎映像Alpine, Distroless
定期掃描漏洞Trivy, Clair
使用私有 RegistryHarbor, ECR
RBAC最小權限原則Role, RoleBinding
避免使用 cluster-admin自訂 ClusterRole
網路使用 Network Policy限制 Pod 通訊
使用服務網格Istio, Linkerd
資源設定資源限制requests/limits
使用 LimitRange預設限制
密鑰加密 SecretsKMS, Sealed Secrets
輪換密鑰定期更新
審計啟用審計日誌Audit Policy
監控異常行為Falco

高可用性配置

多副本部署:

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: critical-app
 5spec:
 6  replicas: 5
 7  strategy:
 8    rollingUpdate:
 9      maxSurge: 2
10      maxUnavailable: 1
11  template:
12    spec:
13      # Pod 反親和性
14      affinity:
15        podAntiAffinity:
16          requiredDuringSchedulingIgnoredDuringExecution:
17          - labelSelector:
18              matchExpressions:
19              - key: app
20                operator: In
21                values:
22                - critical-app
23            topologyKey: kubernetes.io/hostname
24      # 優先級
25      priorityClassName: high-priority
26      # 中斷預算
27      terminationGracePeriodSeconds: 60

Pod Disruption Budget:

1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4  name: critical-app-pdb
5spec:
6  minAvailable: 3
7  selector:
8    matchLabels:
9      app: critical-app

資源配額管理

 1apiVersion: v1
 2kind: ResourceQuota
 3metadata:
 4  name: production-quota
 5  namespace: production
 6spec:
 7  hard:
 8    requests.cpu: "100"
 9    requests.memory: 200Gi
10    limits.cpu: "200"
11    limits.memory: 400Gi
12    persistentvolumeclaims: "20"
13    pods: "100"
14    services: "50"
15    configmaps: "50"
16    secrets: "50"
17
18---
19apiVersion: v1
20kind: LimitRange
21metadata:
22  name: production-limits
23  namespace: production
24spec:
25  limits:
26  # Pod 限制
27  - max:
28      cpu: "4"
29      memory: 8Gi
30    min:
31      cpu: 100m
32      memory: 128Mi
33    type: Pod
34  # Container 限制
35  - default:
36      cpu: 500m
37      memory: 512Mi
38    defaultRequest:
39      cpu: 250m
40      memory: 256Mi
41    max:
42      cpu: "2"
43      memory: 4Gi
44    min:
45      cpu: 50m
46      memory: 64Mi
47    type: Container
48  # PVC 限制
49  - max:
50      storage: 100Gi
51    min:
52      storage: 1Gi
53    type: PersistentVolumeClaim

備份與災難恢復

Velero 備份:

 1# 安裝 Velero
 2velero install \
 3  --provider aws \
 4  --plugins velero/velero-plugin-for-aws:v1.7.0 \
 5  --bucket velero-backups \
 6  --backup-location-config region=us-west-2 \
 7  --snapshot-location-config region=us-west-2 \
 8  --secret-file ./credentials-velero
 9
10# 創建備份
11velero backup create full-backup --include-namespaces production
12velero backup create daily-backup --schedule="0 2 * * *"
13
14# 還原備份
15velero restore create --from-backup full-backup
16
17# 查看備份
18velero backup get
19velero restore get

📊 總結與檢查清單

核心知識回顧

本系列三篇文章完整涵蓋了 Kubernetes 從入門到生產:

第一篇:基礎概念

  • K8s 架構與元件
  • 核心資源概念
  • 安裝與配置

第二篇:核心資源操作

  • kubectl 指令大全
  • Pod、Deployment、Service
  • Ingress、Volume、ConfigMap

第三篇:進階功能(本篇)

  • 自動擴展(HPA/VPA/CA)
  • RBAC 權限管理
  • Network Policy
  • Helm 套件管理
  • 監控告警
  • 日誌收集
  • CI/CD 整合
  • 生產最佳實踐

生產環境檢查清單

📋 部署前檢查

  • 配置 RBAC 權限
  • 設定 Network Policy
  • 配置資源限制(requests/limits)
  • 設定 PodDisruptionBudget
  • 配置健康檢查(liveness/readiness)
  • 設定 HPA 自動擴展
  • 配置多副本高可用
  • 使用 Pod 反親和性

🔐 安全性檢查

  • 掃描映像漏洞
  • 加密 Secrets
  • 限制特權容器
  • 配置安全上下文
  • 啟用審計日誌
  • 定期輪換密鑰
  • 使用私有 Registry

📊 監控與日誌

  • 部署 Prometheus + Grafana
  • 配置告警規則
  • 部署日誌收集系統
  • 設定日誌保留策略
  • 配置儀表板
  • 設定告警通知

🔄 備份與恢復

  • 配置 etcd 備份
  • 設定資源備份策略
  • 測試災難恢復流程
  • 文件化恢復步驟

學習資源推薦

認證考試:

  • CKA(管理員)
  • CKAD(開發者)
  • CKS(安全專家)

進階學習:

  • Service Mesh(Istio、Linkerd)
  • Operator Pattern
  • GitOps(ArgoCD、Flux)
  • Serverless(Knative)

🎉 結語

Kubernetes 是一個功能強大但複雜的平台。透過本系列三篇文章的學習,您已經掌握了:

  1. 基礎知識:理解 K8s 架構與核心概念
  2. 實務操作:熟練使用 kubectl 管理資源
  3. 進階技能:掌握生產環境部署與最佳實踐

下一步建議

  • 實踐專案:在實際項目中應用 K8s
  • 深入學習:探索 Service Mesh、Operator
  • 社群參與:參與開源項目,分享經驗
  • 持續優化:關注效能、安全性、成本

Kubernetes 的學習是持續的過程,隨著實踐經驗的累積,您將能夠構建更強大、更可靠的雲原生應用平台!

祝您在雲原生技術的道路上不斷進步!🚀