🎯 前言
經過前兩篇的學習,我們已經掌握了 Kubernetes 的基礎概念與核心資源操作。本文將深入探討進階功能與生產環境實踐,幫助你構建企業級的容器平台。
本文重點:
- 自動擴展(HPA/VPA/CA)
- RBAC 權限管理
- Network Policy 網路策略
- Helm 套件管理
- 監控與告警系統
- 日誌收集方案
- CI/CD 整合
- 生產環境最佳實踐
⚡ 自動擴展機制
擴展類型對照
graph TB
A[Kubernetes 自動擴展] --> B[HPA<br/>水平 Pod 擴展]
A --> C[VPA<br/>垂直 Pod 擴展]
A --> D[CA<br/>叢集自動擴展]
B --> B1[根據 CPU/記憶體<br/>自動調整 Pod 數量]
C --> C1[根據資源使用<br/>調整 Pod 資源限制]
D --> D1[根據負載<br/>自動增減節點]
style A fill:#326ce5
style B fill:#4ecdc4
style C fill:#feca57
style D fill:#ff6b6b
HPA (Horizontal Pod Autoscaler)
基於 CPU 的 HPA:
1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4 name: nginx-hpa
5 namespace: default
6spec:
7 # 目標 Deployment
8 scaleTargetRef:
9 apiVersion: apps/v1
10 kind: Deployment
11 name: nginx
12
13 # Pod 數量範圍
14 minReplicas: 2
15 maxReplicas: 10
16
17 # 擴展行為
18 behavior:
19 scaleDown:
20 stabilizationWindowSeconds: 300 # 縮容穩定視窗
21 policies:
22 - type: Percent
23 value: 50 # 每次最多縮容 50%
24 periodSeconds: 60
25 - type: Pods
26 value: 2 # 每次最多縮容 2 個 Pod
27 periodSeconds: 60
28 selectPolicy: Min # 選擇最小值
29
30 scaleUp:
31 stabilizationWindowSeconds: 0
32 policies:
33 - type: Percent
34 value: 100 # 每次最多擴容 100%
35 periodSeconds: 30
36 - type: Pods
37 value: 4 # 每次最多擴容 4 個 Pod
38 periodSeconds: 30
39 selectPolicy: Max # 選擇最大值
40
41 # 指標配置
42 metrics:
43 # CPU 使用率
44 - type: Resource
45 resource:
46 name: cpu
47 target:
48 type: Utilization
49 averageUtilization: 70 # 目標 CPU 使用率 70%
50
51 # 記憶體使用率
52 - type: Resource
53 resource:
54 name: memory
55 target:
56 type: Utilization
57 averageUtilization: 80 # 目標記憶體使用率 80%
58
59 # 自訂指標(Prometheus)
60 - type: Pods
61 pods:
62 metric:
63 name: http_requests_per_second
64 target:
65 type: AverageValue
66 averageValue: "1000"
67
68 # 外部指標
69 - type: External
70 external:
71 metric:
72 name: queue_length
73 selector:
74 matchLabels:
75 queue: worker_tasks
76 target:
77 type: AverageValue
78 averageValue: "30"
HPA 操作指令
1# 創建 HPA(簡單版)
2kubectl autoscale deployment nginx --min=2 --max=10 --cpu-percent=70
3
4# 創建 HPA(YAML)
5kubectl apply -f hpa.yaml
6
7# 查看 HPA
8kubectl get hpa
9kubectl describe hpa nginx-hpa
10
11# 監視 HPA
12kubectl get hpa --watch
13
14# 手動測試(產生負載)
15kubectl run -it --rm load-generator --image=busybox -- /bin/sh
16while true; do wget -q -O- http://nginx-service; done
17
18# 刪除 HPA
19kubectl delete hpa nginx-hpa
VPA (Vertical Pod Autoscaler)
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: nginx-vpa
5spec:
6 targetRef:
7 apiVersion: apps/v1
8 kind: Deployment
9 name: nginx
10
11 # 更新策略
12 updatePolicy:
13 updateMode: "Auto" # Auto, Recreate, Initial, Off
14
15 # 資源策略
16 resourcePolicy:
17 containerPolicies:
18 - containerName: nginx
19 minAllowed:
20 cpu: 100m
21 memory: 50Mi
22 maxAllowed:
23 cpu: 2
24 memory: 1Gi
25 mode: Auto
VPA 模式對照表:
模式 | 說明 | 行為 |
---|---|---|
Off | 僅提供建議 | 不自動調整 |
Initial | 創建時設定 | 只在創建時應用 |
Recreate | 重建 Pod | 刪除並重建 Pod |
Auto | 自動調整 | 就地更新或重建 |
Cluster Autoscaler
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: cluster-autoscaler
5 namespace: kube-system
6spec:
7 replicas: 1
8 selector:
9 matchLabels:
10 app: cluster-autoscaler
11 template:
12 metadata:
13 labels:
14 app: cluster-autoscaler
15 spec:
16 serviceAccountName: cluster-autoscaler
17 containers:
18 - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
19 name: cluster-autoscaler
20 command:
21 - ./cluster-autoscaler
22 - --v=4
23 - --stderrthreshold=info
24 - --cloud-provider=aws
25 - --skip-nodes-with-local-storage=false
26 - --expander=least-waste
27 - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
28 - --balance-similar-node-groups
29 - --skip-nodes-with-system-pods=false
30 env:
31 - name: AWS_REGION
32 value: us-west-2
🔐 RBAC 權限管理
RBAC 架構
graph TB
subgraph "主體 (Subject)"
U[User<br/>使用者]
G[Group<br/>群組]
SA[ServiceAccount<br/>服務帳號]
end
subgraph "繫結 (Binding)"
RB[RoleBinding<br/>命名空間級別]
CRB[ClusterRoleBinding<br/>叢集級別]
end
subgraph "角色 (Role)"
R[Role<br/>命名空間級別]
CR[ClusterRole<br/>叢集級別]
end
subgraph "資源 (Resources)"
P[Pods]
D[Deployments]
S[Services]
N[Nodes]
end
U --> RB
G --> RB
SA --> RB
U --> CRB
G --> CRB
SA --> CRB
RB --> R
CRB --> CR
R -.->|存取| P
R -.->|存取| D
R -.->|存取| S
CR -.->|存取| N
CR -.->|存取| P
CR -.->|存取| D
style U fill:#326ce5
style RB fill:#4ecdc4
style R fill:#feca57
style P fill:#ff6b6b
Role 與 ClusterRole
Role(命名空間級別):
1apiVersion: rbac.authorization.k8s.io/v1
2kind: Role
3metadata:
4 namespace: default
5 name: pod-reader
6rules:
7# Pod 資源
8- apiGroups: [""]
9 resources: ["pods"]
10 verbs: ["get", "watch", "list"]
11
12# Pod 日誌
13- apiGroups: [""]
14 resources: ["pods/log"]
15 verbs: ["get", "list"]
16
17# ConfigMap 與 Secret
18- apiGroups: [""]
19 resources: ["configmaps", "secrets"]
20 verbs: ["get"]
21
22# Deployment
23- apiGroups: ["apps"]
24 resources: ["deployments"]
25 verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
26
27# Service
28- apiGroups: [""]
29 resources: ["services"]
30 verbs: ["get", "list", "create", "delete"]
ClusterRole(叢集級別):
1apiVersion: rbac.authorization.k8s.io/v1
2kind: ClusterRole
3metadata:
4 name: cluster-admin-custom
5rules:
6# 所有資源的完整權限
7- apiGroups: ["*"]
8 resources: ["*"]
9 verbs: ["*"]
10
11# 非資源 URL
12- nonResourceURLs: ["*"]
13 verbs: ["*"]
RoleBinding 與 ClusterRoleBinding
RoleBinding:
1apiVersion: rbac.authorization.k8s.io/v1
2kind: RoleBinding
3metadata:
4 name: read-pods
5 namespace: default
6subjects:
7# 使用者
8- kind: User
9 name: jane
10 apiGroup: rbac.authorization.k8s.io
11
12# 群組
13- kind: Group
14 name: developers
15 apiGroup: rbac.authorization.k8s.io
16
17# ServiceAccount
18- kind: ServiceAccount
19 name: my-service-account
20 namespace: default
21
22roleRef:
23 kind: Role
24 name: pod-reader
25 apiGroup: rbac.authorization.k8s.io
ClusterRoleBinding:
1apiVersion: rbac.authorization.k8s.io/v1
2kind: ClusterRoleBinding
3metadata:
4 name: read-all-pods
5subjects:
6- kind: Group
7 name: system:authenticated
8 apiGroup: rbac.authorization.k8s.io
9roleRef:
10 kind: ClusterRole
11 name: view
12 apiGroup: rbac.authorization.k8s.io
ServiceAccount
1apiVersion: v1
2kind: ServiceAccount
3metadata:
4 name: my-app-sa
5 namespace: default
6automountServiceAccountToken: true
7secrets:
8- name: my-app-token
9
10---
11apiVersion: v1
12kind: Secret
13metadata:
14 name: my-app-token
15 namespace: default
16 annotations:
17 kubernetes.io/service-account.name: my-app-sa
18type: kubernetes.io/service-account-token
RBAC 操作指令
1# 查看角色
2kubectl get roles
3kubectl get clusterroles
4kubectl describe role pod-reader
5
6# 查看繫結
7kubectl get rolebindings
8kubectl get clusterrolebindings
9kubectl describe rolebinding read-pods
10
11# 查看 ServiceAccount
12kubectl get serviceaccounts
13kubectl get sa # 簡寫
14kubectl describe sa my-app-sa
15
16# 檢查權限
17kubectl auth can-i create deployments
18kubectl auth can-i delete pods --namespace=default
19kubectl auth can-i '*' '*' --all-namespaces
20
21# 以特定使用者身分檢查
22kubectl auth can-i list pods --as=jane
23kubectl auth can-i create deployments --as=system:serviceaccount:default:my-app-sa
24
25# 創建 ServiceAccount Token
26kubectl create token my-app-sa --duration=24h
27
28# 查看當前使用者
29kubectl config view --minify -o jsonpath='{.contexts[0].context.user}'
預設 ClusterRole
ClusterRole | 說明 | 權限範圍 |
---|---|---|
cluster-admin | 超級管理員 | 完整權限 |
admin | 命名空間管理員 | 命名空間內完整權限 |
edit | 編輯者 | 讀寫大部分資源 |
view | 檢視者 | 唯讀權限 |
🌐 Network Policy
Network Policy 概念
graph TB
subgraph "Frontend Namespace"
WEB[Web Pod]
end
subgraph "Backend Namespace"
API[API Pod]
CACHE[Cache Pod]
end
subgraph "Database Namespace"
DB[(Database Pod)]
end
WEB -->|允許| API
API -->|允許| DB
API -->|允許| CACHE
WEB -.->|拒絕| DB
WEB -.->|拒絕| CACHE
INTERNET[網際網路] -->|允許| WEB
INTERNET -.->|拒絕| API
INTERNET -.->|拒絕| DB
style WEB fill:#4ecdc4
style API fill:#feca57
style DB fill:#ff6b6b
Network Policy 完整範例
1apiVersion: networking.k8s.io/v1
2kind: NetworkPolicy
3metadata:
4 name: api-network-policy
5 namespace: backend
6spec:
7 # 應用到哪些 Pod
8 podSelector:
9 matchLabels:
10 app: api
11 tier: backend
12
13 # 策略類型
14 policyTypes:
15 - Ingress # 入站流量
16 - Egress # 出站流量
17
18 # 入站規則
19 ingress:
20 # 規則 1: 允許來自 frontend 的流量
21 - from:
22 - namespaceSelector:
23 matchLabels:
24 name: frontend
25 podSelector:
26 matchLabels:
27 app: web
28 ports:
29 - protocol: TCP
30 port: 8080
31
32 # 規則 2: 允許來自特定 IP 範圍
33 - from:
34 - ipBlock:
35 cidr: 10.0.0.0/16
36 except:
37 - 10.0.1.0/24
38 ports:
39 - protocol: TCP
40 port: 8080
41
42 # 出站規則
43 egress:
44 # 規則 1: 允許存取資料庫
45 - to:
46 - namespaceSelector:
47 matchLabels:
48 name: database
49 podSelector:
50 matchLabels:
51 app: postgres
52 ports:
53 - protocol: TCP
54 port: 5432
55
56 # 規則 2: 允許存取 Redis
57 - to:
58 - podSelector:
59 matchLabels:
60 app: redis
61 ports:
62 - protocol: TCP
63 port: 6379
64
65 # 規則 3: 允許 DNS 查詢
66 - to:
67 - namespaceSelector:
68 matchLabels:
69 name: kube-system
70 podSelector:
71 matchLabels:
72 k8s-app: kube-dns
73 ports:
74 - protocol: UDP
75 port: 53
76
77 # 規則 4: 允許存取外部 API
78 - to:
79 - ipBlock:
80 cidr: 0.0.0.0/0
81 except:
82 - 10.0.0.0/8
83 - 172.16.0.0/12
84 - 192.168.0.0/16
85 ports:
86 - protocol: TCP
87 port: 443
Network Policy 常見模式
1. 預設拒絕所有流量:
1apiVersion: networking.k8s.io/v1
2kind: NetworkPolicy
3metadata:
4 name: default-deny-all
5 namespace: production
6spec:
7 podSelector: {}
8 policyTypes:
9 - Ingress
10 - Egress
2. 允許特定命名空間:
1apiVersion: networking.k8s.io/v1
2kind: NetworkPolicy
3metadata:
4 name: allow-from-namespace
5spec:
6 podSelector:
7 matchLabels:
8 app: backend
9 ingress:
10 - from:
11 - namespaceSelector:
12 matchLabels:
13 environment: production
3. 允許 DNS:
1apiVersion: networking.k8s.io/v1
2kind: NetworkPolicy
3metadata:
4 name: allow-dns
5spec:
6 podSelector: {}
7 egress:
8 - to:
9 - namespaceSelector:
10 matchLabels:
11 name: kube-system
12 - podSelector:
13 matchLabels:
14 k8s-app: kube-dns
15 ports:
16 - protocol: UDP
17 port: 53
Network Policy 操作指令
1# 創建 Network Policy
2kubectl apply -f network-policy.yaml
3
4# 查看 Network Policy
5kubectl get networkpolicies
6kubectl get netpol # 簡寫
7kubectl describe networkpolicy api-network-policy
8
9# 測試網路連通性(從 Pod A 測試連到 Pod B)
10kubectl exec -it pod-a -- curl http://pod-b-service
11
12# 刪除 Network Policy
13kubectl delete networkpolicy api-network-policy
📦 Helm 套件管理
Helm 架構
graph TB
H[Helm CLI] --> CHART[Chart<br/>套件定義]
CHART --> TEMPLATE[Templates<br/>YAML 模板]
CHART --> VALUES[values.yaml<br/>配置值]
CHART --> CHART_YAML[Chart.yaml<br/>元資料]
H -->|helm install| K8S[Kubernetes API]
K8S --> RELEASE[Release<br/>部署實例]
REPO[Helm Repository] -.->|helm pull| CHART
style H fill:#326ce5
style CHART fill:#4ecdc4
style K8S fill:#feca57
style RELEASE fill:#ff6b6b
Helm Chart 結構
my-app/
├── Chart.yaml # Chart 元資料
├── values.yaml # 預設值
├── values-dev.yaml # 開發環境值
├── values-prod.yaml # 生產環境值
├── templates/ # K8s 資源模板
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── configmap.yaml
│ ├── secret.yaml
│ ├── hpa.yaml
│ ├── _helpers.tpl # 輔助函數
│ ├── NOTES.txt # 安裝說明
│ └── tests/ # 測試
│ └── test-connection.yaml
├── charts/ # 依賴 Chart
├── .helmignore # 忽略檔案
└── README.md
Chart.yaml 範例
1apiVersion: v2
2name: my-app
3description: My Application Helm Chart
4type: application
5version: 1.0.0
6appVersion: "1.24.0"
7keywords:
8 - web
9 - application
10home: https://example.com
11sources:
12 - https://github.com/example/my-app
13maintainers:
14 - name: DevOps Team
15 email: devops@example.com
16dependencies:
17 - name: postgresql
18 version: "12.1.0"
19 repository: "https://charts.bitnami.com/bitnami"
20 condition: postgresql.enabled
21 - name: redis
22 version: "17.0.0"
23 repository: "https://charts.bitnami.com/bitnami"
24 condition: redis.enabled
values.yaml 範例
1# 副本數
2replicaCount: 3
3
4# 映像配置
5image:
6 repository: myapp
7 pullPolicy: IfNotPresent
8 tag: "1.24.0"
9
10imagePullSecrets: []
11nameOverride: ""
12fullnameOverride: ""
13
14# ServiceAccount
15serviceAccount:
16 create: true
17 annotations: {}
18 name: ""
19
20# Pod 註解
21podAnnotations:
22 prometheus.io/scrape: "true"
23 prometheus.io/port: "9090"
24
25# Pod 安全上下文
26podSecurityContext:
27 runAsNonRoot: true
28 runAsUser: 1000
29 fsGroup: 2000
30
31securityContext:
32 capabilities:
33 drop:
34 - ALL
35 readOnlyRootFilesystem: true
36 allowPrivilegeEscalation: false
37
38# Service 配置
39service:
40 type: ClusterIP
41 port: 80
42 targetPort: 8080
43
44# Ingress 配置
45ingress:
46 enabled: true
47 className: nginx
48 annotations:
49 cert-manager.io/cluster-issuer: letsencrypt-prod
50 hosts:
51 - host: app.example.com
52 paths:
53 - path: /
54 pathType: Prefix
55 tls:
56 - secretName: app-tls
57 hosts:
58 - app.example.com
59
60# 資源限制
61resources:
62 limits:
63 cpu: 500m
64 memory: 512Mi
65 requests:
66 cpu: 250m
67 memory: 256Mi
68
69# HPA
70autoscaling:
71 enabled: true
72 minReplicas: 2
73 maxReplicas: 10
74 targetCPUUtilizationPercentage: 70
75 targetMemoryUtilizationPercentage: 80
76
77# NodeSelector
78nodeSelector: {}
79
80# Tolerations
81tolerations: []
82
83# Affinity
84affinity: {}
85
86# 環境變數
87env:
88 - name: ENVIRONMENT
89 value: "production"
90 - name: LOG_LEVEL
91 value: "info"
92
93# ConfigMap
94configMap:
95 data:
96 app.properties: |
97 server.port=8080
98 server.host=0.0.0.0
99
100# Secret
101secret:
102 data:
103 database-password: ""
104 api-key: ""
105
106# PostgreSQL 依賴
107postgresql:
108 enabled: true
109 auth:
110 username: myapp
111 password: ""
112 database: myapp
113 primary:
114 persistence:
115 enabled: true
116 size: 10Gi
117
118# Redis 依賴
119redis:
120 enabled: true
121 auth:
122 enabled: true
123 password: ""
124 master:
125 persistence:
126 enabled: true
127 size: 8Gi
templates/deployment.yaml 範例
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: {{ include "my-app.fullname" . }}
5 labels:
6 {{- include "my-app.labels" . | nindent 4 }}
7spec:
8 {{- if not .Values.autoscaling.enabled }}
9 replicas: {{ .Values.replicaCount }}
10 {{- end }}
11 selector:
12 matchLabels:
13 {{- include "my-app.selectorLabels" . | nindent 6 }}
14 template:
15 metadata:
16 annotations:
17 checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
18 {{- with .Values.podAnnotations }}
19 {{- toYaml . | nindent 8 }}
20 {{- end }}
21 labels:
22 {{- include "my-app.selectorLabels" . | nindent 8 }}
23 spec:
24 {{- with .Values.imagePullSecrets }}
25 imagePullSecrets:
26 {{- toYaml . | nindent 8 }}
27 {{- end }}
28 serviceAccountName: {{ include "my-app.serviceAccountName" . }}
29 securityContext:
30 {{- toYaml .Values.podSecurityContext | nindent 8 }}
31 containers:
32 - name: {{ .Chart.Name }}
33 securityContext:
34 {{- toYaml .Values.securityContext | nindent 12 }}
35 image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
36 imagePullPolicy: {{ .Values.image.pullPolicy }}
37 ports:
38 - name: http
39 containerPort: {{ .Values.service.targetPort }}
40 protocol: TCP
41 livenessProbe:
42 httpGet:
43 path: /health
44 port: http
45 initialDelaySeconds: 30
46 periodSeconds: 10
47 readinessProbe:
48 httpGet:
49 path: /ready
50 port: http
51 initialDelaySeconds: 5
52 periodSeconds: 5
53 env:
54 {{- toYaml .Values.env | nindent 12 }}
55 resources:
56 {{- toYaml .Values.resources | nindent 12 }}
57 {{- with .Values.nodeSelector }}
58 nodeSelector:
59 {{- toYaml . | nindent 8 }}
60 {{- end }}
61 {{- with .Values.affinity }}
62 affinity:
63 {{- toYaml . | nindent 8 }}
64 {{- end }}
65 {{- with .Values.tolerations }}
66 tolerations:
67 {{- toYaml . | nindent 8 }}
68 {{- end }}
Helm 操作指令
1# 安裝 Helm
2curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
3
4# 添加倉庫
5helm repo add stable https://charts.helm.sh/stable
6helm repo add bitnami https://charts.bitnami.com/bitnami
7helm repo update
8
9# 搜尋 Chart
10helm search repo nginx
11helm search hub wordpress
12
13# 查看 Chart 資訊
14helm show chart bitnami/nginx
15helm show values bitnami/nginx
16helm show readme bitnami/nginx
17
18# 安裝 Chart
19helm install my-release bitnami/nginx
20helm install my-app ./my-app-chart
21helm install my-app ./my-app-chart -f values-prod.yaml
22helm install my-app ./my-app-chart --set replicaCount=5
23helm install my-app ./my-app-chart --namespace production --create-namespace
24
25# 查看 Release
26helm list
27helm list -A # 所有命名空間
28helm status my-app
29helm get values my-app
30helm get manifest my-app
31
32# 升級 Release
33helm upgrade my-app ./my-app-chart
34helm upgrade my-app ./my-app-chart -f values-prod.yaml
35helm upgrade my-app ./my-app-chart --set image.tag=1.25.0
36helm upgrade --install my-app ./my-app-chart # 不存在則安裝
37
38# 回滾 Release
39helm rollback my-app
40helm rollback my-app 2 # 回滾到版本 2
41helm history my-app
42
43# 刪除 Release
44helm uninstall my-app
45helm uninstall my-app --keep-history
46
47# 驗證 Chart
48helm lint ./my-app-chart
49helm template my-app ./my-app-chart
50helm install --dry-run --debug my-app ./my-app-chart
51
52# 打包 Chart
53helm package ./my-app-chart
54helm package ./my-app-chart --version 1.0.1
55
56# 創建 Chart
57helm create my-new-chart
58
59# 依賴管理
60helm dependency update ./my-app-chart
61helm dependency build ./my-app-chart
62helm dependency list ./my-app-chart
📊 監控與告警系統
Prometheus + Grafana 架構
graph TB
subgraph "資料收集"
NE[Node Exporter<br/>節點指標]
KSM[Kube State Metrics<br/>K8s 資源狀態]
CA[cAdvisor<br/>容器指標]
APP[Application<br/>自訂指標]
end
subgraph "Prometheus"
PROM[Prometheus Server<br/>時序資料庫]
ALERT[Alertmanager<br/>告警管理]
end
subgraph "視覺化"
GRAF[Grafana<br/>儀表板]
end
NE -->|metrics| PROM
KSM -->|metrics| PROM
CA -->|metrics| PROM
APP -->|metrics| PROM
PROM -->|alerts| ALERT
ALERT -->|通知| EMAIL[Email]
ALERT -->|通知| SLACK[Slack]
ALERT -->|通知| WEBHOOK[Webhook]
GRAF -->|查詢| PROM
style PROM fill:#326ce5
style GRAF fill:#ff6b6b
style ALERT fill:#feca57
Prometheus 安裝(Helm)
1# 添加 Prometheus 社群 Helm 倉庫
2helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
3helm repo update
4
5# 安裝 kube-prometheus-stack
6helm install prometheus prometheus-community/kube-prometheus-stack \
7 --namespace monitoring \
8 --create-namespace \
9 --set prometheus.prometheusSpec.retention=30d \
10 --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
11 --set grafana.adminPassword=admin123
12
13# 查看安裝的資源
14kubectl get all -n monitoring
15
16# 存取 Prometheus
17kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
18
19# 存取 Grafana
20kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
ServiceMonitor 配置
1apiVersion: monitoring.coreos.com/v1
2kind: ServiceMonitor
3metadata:
4 name: my-app-metrics
5 namespace: monitoring
6 labels:
7 release: prometheus
8spec:
9 selector:
10 matchLabels:
11 app: my-app
12 endpoints:
13 - port: metrics
14 interval: 30s
15 path: /metrics
16 namespaceSelector:
17 matchNames:
18 - default
PrometheusRule 告警規則
1apiVersion: monitoring.coreos.com/v1
2kind: PrometheusRule
3metadata:
4 name: my-app-alerts
5 namespace: monitoring
6 labels:
7 release: prometheus
8spec:
9 groups:
10 - name: my-app
11 interval: 30s
12 rules:
13 # Pod 重啟過多
14 - alert: PodRestarting
15 expr: |
16 rate(kube_pod_container_status_restarts_total[15m]) > 0
17 for: 5m
18 labels:
19 severity: warning
20 annotations:
21 summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} restarting"
22 description: "Pod has restarted {{ $value }} times in the last 15 minutes"
23
24 # CPU 使用率過高
25 - alert: HighCPUUsage
26 expr: |
27 sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod) > 0.8
28 for: 5m
29 labels:
30 severity: warning
31 annotations:
32 summary: "High CPU usage detected"
33 description: "Pod {{ $labels.pod }} CPU usage is {{ $value | humanizePercentage }}"
34
35 # 記憶體使用率過高
36 - alert: HighMemoryUsage
37 expr: |
38 sum(container_memory_working_set_bytes{namespace="default"}) by (pod) /
39 sum(container_spec_memory_limit_bytes{namespace="default"}) by (pod) > 0.9
40 for: 5m
41 labels:
42 severity: critical
43 annotations:
44 summary: "High memory usage detected"
45 description: "Pod {{ $labels.pod }} memory usage is {{ $value | humanizePercentage }}"
46
47 # 服務不可用
48 - alert: ServiceDown
49 expr: |
50 up{job="my-app"} == 0
51 for: 1m
52 labels:
53 severity: critical
54 annotations:
55 summary: "Service is down"
56 description: "The service {{ $labels.job }} has been down for more than 1 minute"
57
58 # HTTP 錯誤率過高
59 - alert: HighErrorRate
60 expr: |
61 sum(rate(http_requests_total{status=~"5.."}[5m])) by (service) /
62 sum(rate(http_requests_total[5m])) by (service) > 0.05
63 for: 5m
64 labels:
65 severity: warning
66 annotations:
67 summary: "High HTTP error rate"
68 description: "Service {{ $labels.service }} error rate is {{ $value | humanizePercentage }}"
📝 日誌收集方案
EFK Stack 架構
graph TB
subgraph "日誌來源"
POD1[Pod 1]
POD2[Pod 2]
POD3[Pod 3]
NODE[Node Logs]
end
subgraph "日誌收集"
FB[Fluent Bit<br/>DaemonSet]
end
subgraph "日誌處理"
ES[Elasticsearch<br/>儲存與索引]
end
subgraph "日誌視覺化"
KB[Kibana<br/>查詢與分析]
end
POD1 -->|stdout/stderr| FB
POD2 -->|stdout/stderr| FB
POD3 -->|stdout/stderr| FB
NODE -->|/var/log| FB
FB -->|轉發| ES
KB -->|查詢| ES
style FB fill:#326ce5
style ES fill:#4ecdc4
style KB fill:#ff6b6b
Fluent Bit 配置
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: fluent-bit-config
5 namespace: logging
6data:
7 fluent-bit.conf: |
8 [SERVICE]
9 Daemon Off
10 Flush 1
11 Log_Level info
12 Parsers_File parsers.conf
13 HTTP_Server On
14 HTTP_Listen 0.0.0.0
15 HTTP_Port 2020
16 Health_Check On
17
18 [INPUT]
19 Name tail
20 Path /var/log/containers/*.log
21 multiline.parser docker, cri
22 Tag kube.*
23 Mem_Buf_Limit 5MB
24 Skip_Long_Lines On
25
26 [FILTER]
27 Name kubernetes
28 Match kube.*
29 Kube_URL https://kubernetes.default.svc:443
30 Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
31 Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
32 Kube_Tag_Prefix kube.var.log.containers.
33 Merge_Log On
34 Keep_Log Off
35 K8S-Logging.Parser On
36 K8S-Logging.Exclude On
37
38 [OUTPUT]
39 Name es
40 Match *
41 Host elasticsearch.logging.svc.cluster.local
42 Port 9200
43 Logstash_Format On
44 Retry_Limit False
45 Type _doc
46
47 parsers.conf: |
48 [PARSER]
49 Name json
50 Format json
51 Time_Key time
52 Time_Format %d/%b/%Y:%H:%M:%S %z
53
54 [PARSER]
55 Name docker
56 Format json
57 Time_Key time
58 Time_Format %Y-%m-%dT%H:%M:%S.%L
59 Time_Keep On
60
61 [PARSER]
62 Name syslog
63 Format regex
64 Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
65 Time_Key time
66 Time_Format %b %d %H:%M:%S
🚀 CI/CD 整合
GitLab CI/CD Pipeline
1# .gitlab-ci.yml
2variables:
3 DOCKER_REGISTRY: registry.example.com
4 IMAGE_NAME: ${DOCKER_REGISTRY}/myapp
5 KUBE_NAMESPACE: production
6 KUBECONFIG: /etc/deploy/config
7
8stages:
9 - test
10 - build
11 - deploy
12
13# 測試階段
14test:
15 stage: test
16 image: node:18
17 script:
18 - npm ci
19 - npm run lint
20 - npm run test
21 - npm run test:coverage
22 coverage: '/Statements\s+:\s+(\d+\.\d+)%/'
23 artifacts:
24 reports:
25 coverage_report:
26 coverage_format: cobertura
27 path: coverage/cobertura-coverage.xml
28 paths:
29 - coverage/
30 only:
31 - branches
32
33# 建立映像
34build:
35 stage: build
36 image: docker:24
37 services:
38 - docker:24-dind
39 before_script:
40 - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $DOCKER_REGISTRY
41 script:
42 - docker build
43 --build-arg VERSION=${CI_COMMIT_SHORT_SHA}
44 -t ${IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}
45 -t ${IMAGE_NAME}:latest
46 .
47 - docker push ${IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}
48 - docker push ${IMAGE_NAME}:latest
49 only:
50 - main
51 - develop
52
53# 部署到開發環境
54deploy:dev:
55 stage: deploy
56 image: bitnami/kubectl:latest
57 script:
58 - kubectl config use-context dev-cluster
59 - kubectl set image deployment/myapp myapp=${IMAGE_NAME}:${CI_COMMIT_SHORT_SHA} -n development
60 - kubectl rollout status deployment/myapp -n development
61 environment:
62 name: development
63 url: https://dev.example.com
64 only:
65 - develop
66
67# 部署到生產環境
68deploy:prod:
69 stage: deploy
70 image: bitnami/kubectl:latest
71 script:
72 - kubectl config use-context prod-cluster
73 - |
74 helm upgrade --install myapp ./helm/myapp \
75 --namespace ${KUBE_NAMESPACE} \
76 --set image.tag=${CI_COMMIT_SHORT_SHA} \
77 --set replicaCount=3 \
78 --values ./helm/myapp/values-prod.yaml \
79 --wait \
80 --timeout 5m
81 - kubectl get pods -n ${KUBE_NAMESPACE} -l app=myapp
82 environment:
83 name: production
84 url: https://example.com
85 when: manual
86 only:
87 - main
GitHub Actions Workflow
1# .github/workflows/deploy.yml
2name: Build and Deploy to Kubernetes
3
4on:
5 push:
6 branches: [main, develop]
7 pull_request:
8 branches: [main]
9
10env:
11 REGISTRY: ghcr.io
12 IMAGE_NAME: ${{ github.repository }}
13
14jobs:
15 test:
16 runs-on: ubuntu-latest
17 steps:
18 - uses: actions/checkout@v3
19
20 - name: Setup Node.js
21 uses: actions/setup-node@v3
22 with:
23 node-version: '18'
24 cache: 'npm'
25
26 - name: Install dependencies
27 run: npm ci
28
29 - name: Run tests
30 run: |
31 npm run lint
32 npm run test
33 npm run test:coverage
34
35 - name: Upload coverage
36 uses: codecov/codecov-action@v3
37 with:
38 files: ./coverage/lcov.info
39
40 build:
41 needs: test
42 runs-on: ubuntu-latest
43 permissions:
44 contents: read
45 packages: write
46 steps:
47 - uses: actions/checkout@v3
48
49 - name: Set up Docker Buildx
50 uses: docker/setup-buildx-action@v2
51
52 - name: Log in to Container Registry
53 uses: docker/login-action@v2
54 with:
55 registry: ${{ env.REGISTRY }}
56 username: ${{ github.actor }}
57 password: ${{ secrets.GITHUB_TOKEN }}
58
59 - name: Extract metadata
60 id: meta
61 uses: docker/metadata-action@v4
62 with:
63 images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
64 tags: |
65 type=ref,event=branch
66 type=sha,prefix={{branch}}-
67
68 - name: Build and push
69 uses: docker/build-push-action@v4
70 with:
71 context: .
72 push: true
73 tags: ${{ steps.meta.outputs.tags }}
74 cache-from: type=gha
75 cache-to: type=gha,mode=max
76
77 deploy:
78 needs: build
79 runs-on: ubuntu-latest
80 if: github.ref == 'refs/heads/main'
81 steps:
82 - uses: actions/checkout@v3
83
84 - name: Setup kubectl
85 uses: azure/setup-kubectl@v3
86
87 - name: Configure kubectl
88 run: |
89 echo "${{ secrets.KUBECONFIG }}" | base64 -d > kubeconfig
90 export KUBECONFIG=kubeconfig
91
92 - name: Deploy to Kubernetes
93 run: |
94 kubectl set image deployment/myapp \
95 myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:main-${{ github.sha }} \
96 -n production
97
98 kubectl rollout status deployment/myapp -n production
99
100 - name: Verify deployment
101 run: |
102 kubectl get pods -n production -l app=myapp
103 kubectl get svc -n production -l app=myapp
🎯 生產環境最佳實踐
安全性最佳實踐清單
類別 | 最佳實踐 | 實施方法 |
---|---|---|
映像安全 | 使用最小化基礎映像 | Alpine, Distroless |
定期掃描漏洞 | Trivy, Clair | |
使用私有 Registry | Harbor, ECR | |
RBAC | 最小權限原則 | Role, RoleBinding |
避免使用 cluster-admin | 自訂 ClusterRole | |
網路 | 使用 Network Policy | 限制 Pod 通訊 |
使用服務網格 | Istio, Linkerd | |
資源 | 設定資源限制 | requests/limits |
使用 LimitRange | 預設限制 | |
密鑰 | 加密 Secrets | KMS, Sealed Secrets |
輪換密鑰 | 定期更新 | |
審計 | 啟用審計日誌 | Audit Policy |
監控異常行為 | Falco |
高可用性配置
多副本部署:
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: critical-app
5spec:
6 replicas: 5
7 strategy:
8 rollingUpdate:
9 maxSurge: 2
10 maxUnavailable: 1
11 template:
12 spec:
13 # Pod 反親和性
14 affinity:
15 podAntiAffinity:
16 requiredDuringSchedulingIgnoredDuringExecution:
17 - labelSelector:
18 matchExpressions:
19 - key: app
20 operator: In
21 values:
22 - critical-app
23 topologyKey: kubernetes.io/hostname
24 # 優先級
25 priorityClassName: high-priority
26 # 中斷預算
27 terminationGracePeriodSeconds: 60
Pod Disruption Budget:
1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4 name: critical-app-pdb
5spec:
6 minAvailable: 3
7 selector:
8 matchLabels:
9 app: critical-app
資源配額管理
1apiVersion: v1
2kind: ResourceQuota
3metadata:
4 name: production-quota
5 namespace: production
6spec:
7 hard:
8 requests.cpu: "100"
9 requests.memory: 200Gi
10 limits.cpu: "200"
11 limits.memory: 400Gi
12 persistentvolumeclaims: "20"
13 pods: "100"
14 services: "50"
15 configmaps: "50"
16 secrets: "50"
17
18---
19apiVersion: v1
20kind: LimitRange
21metadata:
22 name: production-limits
23 namespace: production
24spec:
25 limits:
26 # Pod 限制
27 - max:
28 cpu: "4"
29 memory: 8Gi
30 min:
31 cpu: 100m
32 memory: 128Mi
33 type: Pod
34 # Container 限制
35 - default:
36 cpu: 500m
37 memory: 512Mi
38 defaultRequest:
39 cpu: 250m
40 memory: 256Mi
41 max:
42 cpu: "2"
43 memory: 4Gi
44 min:
45 cpu: 50m
46 memory: 64Mi
47 type: Container
48 # PVC 限制
49 - max:
50 storage: 100Gi
51 min:
52 storage: 1Gi
53 type: PersistentVolumeClaim
備份與災難恢復
Velero 備份:
1# 安裝 Velero
2velero install \
3 --provider aws \
4 --plugins velero/velero-plugin-for-aws:v1.7.0 \
5 --bucket velero-backups \
6 --backup-location-config region=us-west-2 \
7 --snapshot-location-config region=us-west-2 \
8 --secret-file ./credentials-velero
9
10# 創建備份
11velero backup create full-backup --include-namespaces production
12velero backup create daily-backup --schedule="0 2 * * *"
13
14# 還原備份
15velero restore create --from-backup full-backup
16
17# 查看備份
18velero backup get
19velero restore get
📊 總結與檢查清單
核心知識回顧
本系列三篇文章完整涵蓋了 Kubernetes 從入門到生產:
第一篇:基礎概念
- K8s 架構與元件
- 核心資源概念
- 安裝與配置
第二篇:核心資源操作
- kubectl 指令大全
- Pod、Deployment、Service
- Ingress、Volume、ConfigMap
第三篇:進階功能(本篇)
- 自動擴展(HPA/VPA/CA)
- RBAC 權限管理
- Network Policy
- Helm 套件管理
- 監控告警
- 日誌收集
- CI/CD 整合
- 生產最佳實踐
生產環境檢查清單
📋 部署前檢查
- 配置 RBAC 權限
- 設定 Network Policy
- 配置資源限制(requests/limits)
- 設定 PodDisruptionBudget
- 配置健康檢查(liveness/readiness)
- 設定 HPA 自動擴展
- 配置多副本高可用
- 使用 Pod 反親和性
🔐 安全性檢查
- 掃描映像漏洞
- 加密 Secrets
- 限制特權容器
- 配置安全上下文
- 啟用審計日誌
- 定期輪換密鑰
- 使用私有 Registry
📊 監控與日誌
- 部署 Prometheus + Grafana
- 配置告警規則
- 部署日誌收集系統
- 設定日誌保留策略
- 配置儀表板
- 設定告警通知
🔄 備份與恢復
- 配置 etcd 備份
- 設定資源備份策略
- 測試災難恢復流程
- 文件化恢復步驟
學習資源推薦
認證考試:
- CKA(管理員)
- CKAD(開發者)
- CKS(安全專家)
進階學習:
- Service Mesh(Istio、Linkerd)
- Operator Pattern
- GitOps(ArgoCD、Flux)
- Serverless(Knative)
🎉 結語
Kubernetes 是一個功能強大但複雜的平台。透過本系列三篇文章的學習,您已經掌握了:
- 基礎知識:理解 K8s 架構與核心概念
- 實務操作:熟練使用 kubectl 管理資源
- 進階技能:掌握生產環境部署與最佳實踐
下一步建議
- 實踐專案:在實際項目中應用 K8s
- 深入學習:探索 Service Mesh、Operator
- 社群參與:參與開源項目,分享經驗
- 持續優化:關注效能、安全性、成本
Kubernetes 的學習是持續的過程,隨著實踐經驗的累積,您將能夠構建更強大、更可靠的雲原生應用平台!
祝您在雲原生技術的道路上不斷進步!🚀