Faq Devops
Last updated: Apr 3, 2026Table of Contents
- Table of Contents
- Estimating Kubernetes Resources from QPS/Traffic
- 1) Start with a Mental Model
- 2) Measure Per-Request Cost (the MOST Important Step)
- 3) Estimate Concurrency from QPS
- 4) Calculate Total Resource Needs
- 5) Translate into Pod Sizing
- 6) Set Kubernetes Requests & Limits
- 7) Example End-to-End
- 8) Real-World Factors
- 9) Use Autoscaling (Critical)
- 10) Practical Workflow (What Good Teams Do)
- 11) Golden Shortcuts (When You Lack Data)
DevOps Engineering FAQ
Table of Contents
Estimating Kubernetes Resources from QPS/Traffic
Estimating Kubernetes (K8s) resources from QPS/traffic isn’t an exact formula—it’s a structured modeling + measurement + iteration process. As a DevOps engineer, you’re basically translating workload characteristics -> resource consumption -> K8s requests/limits.
1) Start with a Mental Model
At a high level:
Total CPU ≈ QPS × CPU cost per request
Total Memory ≈ baseline + (concurrency × memory per request)
So you need to answer 3 key questions:
- How expensive is one request?
- How many requests happen at the same time (concurrency)?
- What’s the baseline overhead of the app?
2) Measure Per-Request Cost (the MOST Important Step)
You cannot guess this accurately—you must benchmark.
Run a load test (e.g. k6, Locust, JMeter)
Simulate realistic traffic and measure:
- CPU usage at stable QPS
- Memory usage
- Latency (p50, p95, p99)
Derive
CPU per request:
CPU per request = total CPU used / QPS
Example:
- 2 CPU cores used at 200 QPS
CPU per request = 2 / 200 = 0.01 cores = 10 millicores
Memory per request:
Memory is NOT per request linearly—it depends on concurrency.
Memory per request ≈ (peak memory - baseline memory) / concurrent requests
3) Estimate Concurrency from QPS
Use Little’s Law:
Concurrency = QPS × Avg Response Time
Example:
- QPS = 500
- Avg latency = 200ms = 0.2s
Concurrency = 500 × 0.2 = 100 requests in-flight
4) Calculate Total Resource Needs
CPU
Total CPU = QPS × CPU per request
Add headroom (important):
Final CPU = Total CPU × 1.5 ~ 2
Memory
Total Memory = Baseline + (Concurrency × Memory per request)
Add buffer (GC, spikes):
Final Memory = Total Memory × 1.3 ~ 1.5
5) Translate into Pod Sizing
Now decide:
- Option A: Scale vertically (bigger pods)
- Option B: Scale horizontally (more pods, preferred)
Per-pod sizing strategy
Pick a target QPS per pod:
Example:
- Total QPS = 1000
- Want ~100 QPS per pod
- -> 10 pods
Per-pod CPU:
CPU per pod = QPS per pod × CPU per request
Per-pod Memory:
Memory per pod = baseline + (concurrency per pod × memory per request)
6) Set Kubernetes Requests & Limits
In Kubernetes:
- requests = guaranteed resources (scheduler uses this)
- limits = max allowed
CPU
- requests = ~70-80% of expected usage
- limits = ~1.5-2x requests
Memory
- requests = close to actual usage
- limits = slightly above (OOM is dangerous)
7) Example End-to-End
Given
- QPS = 1000
- Avg latency = 100ms (0.1s)
- CPU per request = 5 millicores
- Baseline memory = 200MB
- Memory per request = 1MB
Step 1: Concurrency
Concurrency = 1000 × 0.1 = 100
Step 2: Total CPU
CPU = 1000 × 5m = 5000m = 5 cores
With buffer -> 7-10 cores
Step 3: Total Memory
Memory = 200MB + (100 × 1MB) = 300MB
With buffer -> ~400MB
Step 4: Pod split (10 pods)
Per pod:
- QPS = 100
- CPU = 0.5 core
- Memory ≈ 40MB + baseline slice
Step 5: K8s config
resources:
requests:
cpu: "500m"
memory: "64Mi"
limits:
cpu: "1000m"
memory: "128Mi"
8) Real-World Factors
Traffic Shape
- Burst vs steady
- Peak vs average (always design for peak)
Workload Type
Different apps behave VERY differently:
| Type | CPU | Memory |
|---|---|---|
| CPU-bound (encoding, crypto) | high | low |
| IO-bound (API calls) | low | moderate |
| JVM apps | moderate | high |
| Node.js | low | moderate |
GC & Runtime Overhead
- JVM -> large memory buffer needed
- Go -> more predictable
- Python -> spikes possible
External Dependencies
If your app waits on DB, Redis, or external APIs:
- CPU per request is LOW but concurrency is HIGH
9) Use Autoscaling (Critical)
HPA (Horizontal Pod Autoscaler)
Scale based on:
- CPU (default)
- Or better -> custom metrics:
- QPS
- Latency
- Queue length
Rule of thumb
- CPU target: ~60-70%
- Keep headroom for spikes
10) Practical Workflow (What Good Teams Do)
- Benchmark (load test)
- Derive per-request cost
- Estimate peak QPS
- Calculate baseline resources
- Deploy with conservative buffer
- Observe (Prometheus/Grafana)
- Tune requests/limits
- Add HPA
11) Golden Shortcuts (When You Lack Data)
If no benchmark yet:
- Start with:
- CPU: 100-500m per pod
- Memory: 128-512Mi
- Load test ASAP
- Adjust aggressively