Database performance is often the bottleneck in web applications. Over the past year, we’ve transformed our database layer from a source of constant performance issues to a highly optimized system that consistently delivers sub-millisecond response times.
The Performance Crisis
Our journey began with a crisis. Our main application database was experiencing:
- Query timeouts: 15% of queries taking over 30 seconds
- Connection pool exhaustion: Regular 503 errors during peak traffic
- Cascading failures: Slow queries blocking other operations
- Poor user experience: Page load times exceeding 5 seconds
The root cause analysis revealed multiple issues across our database architecture.
Systematic Performance Analysis
1. Query Analysis and Profiling
The first step was understanding where time was being spent:
1-- Enable query logging for analysis
2ALTER SYSTEM SET log_statement = 'all';
3ALTER SYSTEM SET log_duration = on;
4ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
5
6-- Analyze slow queries
7SELECT
8 query,
9 calls,
10 total_time,
11 mean_time,
12 rows
13FROM pg_stat_statements
14ORDER BY mean_time DESC
15LIMIT 10;
This revealed our top performance killers:
1-- Problematic query #1: N+1 query pattern
2SELECT * FROM users WHERE id = $1; -- Called 1000+ times per request
3
4-- Problematic query #2: Missing index
5SELECT * FROM orders
6WHERE created_at BETWEEN $1 AND $2
7AND status = 'pending'
8ORDER BY created_at DESC; -- 45 second execution time
2. Index Strategy Overhaul
Strategic indexing provided immediate improvements:
1-- Composite index for filtering and sorting
2CREATE INDEX CONCURRENTLY idx_orders_status_created
3ON orders (status, created_at DESC)
4WHERE status IN ('pending', 'processing');
5
6-- Partial index for active users
7CREATE INDEX CONCURRENTLY idx_users_active_email
8ON users (email)
9WHERE deleted_at IS NULL;
10
11-- Expression index for case-insensitive searches
12CREATE INDEX CONCURRENTLY idx_users_lower_email
13ON users (LOWER(email));
Results:
- Query time reduced from 45s to 12ms (99.97% improvement)
- Index size kept minimal with partial indexes
- No impact on write performance
3. Query Optimization Techniques
Eliminating N+1 Queries
1// Before: N+1 queries
2func GetUsersWithOrders(userIDs []string) ([]*User, error) {
3 users := make([]*User, 0, len(userIDs))
4
5 for _, id := range userIDs {
6 user, err := db.GetUser(id) // 1 query per user
7 if err != nil {
8 return nil, err
9 }
10
11 orders, err := db.GetOrdersByUserID(id) // 1 query per user
12 if err != nil {
13 return nil, err
14 }
15
16 user.Orders = orders
17 users = append(users, user)
18 }
19
20 return users, nil
21}
22
23// After: 2 queries total
24func GetUsersWithOrders(userIDs []string) ([]*User, error) {
25 // Single query to get all users
26 users, err := db.GetUsersByIDs(userIDs)
27 if err != nil {
28 return nil, err
29 }
30
31 // Single query to get all orders
32 orders, err := db.GetOrdersByUserIDs(userIDs)
33 if err != nil {
34 return nil, err
35 }
36
37 // Group orders by user ID in memory
38 orderMap := groupOrdersByUserID(orders)
39
40 for _, user := range users {
41 user.Orders = orderMap[user.ID]
42 }
43
44 return users, nil
45}
Query Restructuring
1-- Before: Inefficient subquery
2SELECT * FROM products p
3WHERE p.category_id IN (
4 SELECT c.id FROM categories c
5 WHERE c.name LIKE '%electronics%'
6);
7
8-- After: Efficient join
9SELECT p.*
10FROM products p
11INNER JOIN categories c ON p.category_id = c.id
12WHERE c.name LIKE '%electronics%';
Caching Architecture
Multi-Level Caching Strategy
1type CacheHierarchy struct {
2 l1 *LocalCache // Application-level cache
3 l2 *RedisCache // Distributed cache
4 l3 *Database // Source of truth
5}
6
7func (ch *CacheHierarchy) GetUser(id string) (*User, error) {
8 // L1: Check local cache
9 if user, found := ch.l1.Get("user:" + id); found {
10 return user.(*User), nil
11 }
12
13 // L2: Check Redis
14 if userData, err := ch.l2.Get("user:" + id); err == nil {
15 user := &User{}
16 json.Unmarshal(userData, user)
17
18 // Populate L1 cache
19 ch.l1.Set("user:"+id, user, time.Minute*5)
20 return user, nil
21 }
22
23 // L3: Query database
24 user, err := ch.l3.GetUser(id)
25 if err != nil {
26 return nil, err
27 }
28
29 // Populate both cache levels
30 userData, _ := json.Marshal(user)
31 ch.l2.Set("user:"+id, userData, time.Hour)
32 ch.l1.Set("user:"+id, user, time.Minute*5)
33
34 return user, nil
35}
Cache Invalidation Strategy
1type CacheInvalidator struct {
2 redis *redis.Client
3 local cache.Cache
4 patterns map[string][]string
5}
6
7func (ci *CacheInvalidator) InvalidateUser(userID string) error {
8 keys := []string{
9 "user:" + userID,
10 "user_orders:" + userID,
11 "user_preferences:" + userID,
12 }
13
14 // Invalidate Redis cache
15 for _, key := range keys {
16 if err := ci.redis.Del(key).Err(); err != nil {
17 return err
18 }
19 }
20
21 // Invalidate local cache
22 for _, key := range keys {
23 ci.local.Delete(key)
24 }
25
26 return nil
27}
Connection Pool Optimization
Database Connection Management
1type DBConfig struct {
2 MaxOpenConns int
3 MaxIdleConns int
4 ConnMaxLifetime time.Duration
5 ConnMaxIdleTime time.Duration
6}
7
8func optimizeConnectionPool() *sql.DB {
9 config := &DBConfig{
10 MaxOpenConns: 50, // Based on server capacity
11 MaxIdleConns: 25, // Half of max open
12 ConnMaxLifetime: 30 * time.Minute,
13 ConnMaxIdleTime: 10 * time.Minute,
14 }
15
16 db.SetMaxOpenConns(config.MaxOpenConns)
17 db.SetMaxIdleConns(config.MaxIdleConns)
18 db.SetConnMaxLifetime(config.ConnMaxLifetime)
19 db.SetConnMaxIdleTime(config.ConnMaxIdleTime)
20
21 return db
22}
Connection Pool Monitoring
1func monitorConnectionPool(db *sql.DB) {
2 ticker := time.NewTicker(time.Minute)
3 defer ticker.Stop()
4
5 for {
6 select {
7 case <-ticker.C:
8 stats := db.Stats()
9
10 metrics.Gauge("db.connections.open").Set(float64(stats.OpenConnections))
11 metrics.Gauge("db.connections.idle").Set(float64(stats.Idle))
12 metrics.Gauge("db.connections.in_use").Set(float64(stats.InUse))
13 metrics.Counter("db.connections.wait_count").Add(float64(stats.WaitCount))
14
15 if stats.WaitCount > 0 {
16 log.Warn("Database connection pool under pressure",
17 "wait_count", stats.WaitCount,
18 "wait_duration", stats.WaitDuration)
19 }
20 }
21 }
22}
Read Replica Strategy
Load Balancing Reads and Writes
1type DatabaseCluster struct {
2 master *sql.DB
3 replicas []*sql.DB
4 selector ReplicaSelector
5}
6
7type QueryContext struct {
8 Type QueryType
9 UserID string
10 Priority Priority
11}
12
13func (dc *DatabaseCluster) Query(ctx QueryContext, query string, args ...interface{}) (*sql.Rows, error) {
14 switch ctx.Type {
15 case QueryTypeWrite, QueryTypeTransactional:
16 return dc.master.Query(query, args...)
17
18 case QueryTypeRead:
19 replica := dc.selector.SelectReplica(ctx)
20 return replica.Query(query, args...)
21
22 default:
23 return nil, errors.New("unknown query type")
24 }
25}
26
27// Weighted round-robin replica selection
28type WeightedRoundRobin struct {
29 replicas []ReplicaWithWeight
30 current int
31 mutex sync.Mutex
32}
33
34func (wrr *WeightedRoundRobin) SelectReplica(ctx QueryContext) *sql.DB {
35 wrr.mutex.Lock()
36 defer wrr.mutex.Unlock()
37
38 // High priority queries go to least loaded replica
39 if ctx.Priority == PriorityHigh {
40 return wrr.selectLeastLoaded()
41 }
42
43 // Standard round-robin for normal queries
44 replica := wrr.replicas[wrr.current%len(wrr.replicas)]
45 wrr.current++
46 return replica.DB
47}
Query Result Optimization
Pagination Best Practices
1-- Inefficient: OFFSET becomes slow with large offsets
2SELECT * FROM posts
3ORDER BY created_at DESC
4LIMIT 20 OFFSET 10000;
5
6-- Efficient: Cursor-based pagination
7SELECT * FROM posts
8WHERE created_at < $1 -- cursor from previous page
9ORDER BY created_at DESC
10LIMIT 20;
1type CursorPagination struct {
2 Cursor string `json:"cursor"`
3 Limit int `json:"limit"`
4}
5
6func (p *PostService) GetPosts(pagination CursorPagination) (*PostsResponse, error) {
7 var cursor time.Time
8 var err error
9
10 if pagination.Cursor != "" {
11 cursor, err = time.Parse(time.RFC3339, pagination.Cursor)
12 if err != nil {
13 return nil, err
14 }
15 } else {
16 cursor = time.Now()
17 }
18
19 query := `
20 SELECT id, title, content, created_at
21 FROM posts
22 WHERE created_at < $1
23 ORDER BY created_at DESC
24 LIMIT $2`
25
26 rows, err := p.db.Query(query, cursor, pagination.Limit+1)
27 if err != nil {
28 return nil, err
29 }
30 defer rows.Close()
31
32 posts := make([]*Post, 0, pagination.Limit)
33 var nextCursor string
34
35 for rows.Next() {
36 post := &Post{}
37 if err := rows.Scan(&post.ID, &post.Title, &post.Content, &post.CreatedAt); err != nil {
38 return nil, err
39 }
40
41 if len(posts) == pagination.Limit {
42 nextCursor = post.CreatedAt.Format(time.RFC3339)
43 break
44 }
45
46 posts = append(posts, post)
47 }
48
49 return &PostsResponse{
50 Posts: posts,
51 NextCursor: nextCursor,
52 }, nil
53}
Selective Field Loading
1// Don't load unnecessary data
2type UserSummary struct {
3 ID string `json:"id"`
4 Name string `json:"name"`
5 Email string `json:"email"`
6}
7
8func (us *UserService) GetUserSummaries(ids []string) ([]*UserSummary, error) {
9 query := `
10 SELECT id, name, email
11 FROM users
12 WHERE id = ANY($1)
13 AND deleted_at IS NULL`
14
15 rows, err := us.db.Query(query, pq.Array(ids))
16 if err != nil {
17 return nil, err
18 }
19 defer rows.Close()
20
21 summaries := make([]*UserSummary, 0, len(ids))
22 for rows.Next() {
23 summary := &UserSummary{}
24 if err := rows.Scan(&summary.ID, &summary.Name, &summary.Email); err != nil {
25 return nil, err
26 }
27 summaries = append(summaries, summary)
28 }
29
30 return summaries, nil
31}
Monitoring and Alerting
Key Metrics Dashboard
1type DatabaseMetrics struct {
2 QueryLatencyP50 time.Duration
3 QueryLatencyP95 time.Duration
4 QueryLatencyP99 time.Duration
5 QueriesPerSecond float64
6 ErrorRate float64
7 CacheHitRate float64
8 ConnectionUtilization float64
9}
10
11func collectMetrics(db *sql.DB, cache cache.Cache) {
12 ticker := time.NewTicker(time.Second * 10)
13 defer ticker.Stop()
14
15 for {
16 select {
17 case <-ticker.C:
18 stats := db.Stats()
19
20 // Connection metrics
21 metrics.Gauge("db.connections.utilization").Set(
22 float64(stats.InUse) / float64(stats.MaxOpenConns))
23
24 // Query performance metrics
25 queryStats := getQueryStats()
26 metrics.Histogram("db.query.latency").Observe(queryStats.MeanLatency)
27 metrics.Counter("db.queries.total").Add(queryStats.Count)
28
29 // Cache metrics
30 cacheStats := cache.Stats()
31 hitRate := float64(cacheStats.Hits) / float64(cacheStats.Hits + cacheStats.Misses)
32 metrics.Gauge("cache.hit_rate").Set(hitRate)
33 }
34 }
35}
Automated Performance Alerting
1# Prometheus alerting rules
2groups:
3 - name: database_performance
4 rules:
5 - alert: HighQueryLatency
6 expr: histogram_quantile(0.95, db_query_duration_seconds) > 1
7 for: 2m
8 annotations:
9 summary: "Database query latency is high"
10
11 - alert: LowCacheHitRate
12 expr: cache_hit_rate < 0.8
13 for: 5m
14 annotations:
15 summary: "Cache hit rate is below optimal threshold"
16
17 - alert: DatabaseConnectionsHigh
18 expr: db_connections_utilization > 0.8
19 for: 1m
20 annotations:
21 summary: "Database connection pool utilization is high"
Results and Impact
Our systematic approach to database optimization yielded significant improvements:
Performance Improvements
- Query latency: 95th percentile reduced from 15s to 50ms
- Cache hit rate: Increased from 60% to 95%
- Connection pool efficiency: Eliminated connection timeouts
- Error rate: Reduced database errors by 99.8%
Business Impact
- Page load times: Improved from 5s to under 500ms
- User satisfaction: 40% increase in user engagement metrics
- Infrastructure costs: 30% reduction in database server requirements
- Developer productivity: Faster development cycles with reliable performance
Key Takeaways
- Measure first: Use profiling tools to identify actual bottlenecks
- Index strategically: Create indexes that support your most common query patterns
- Cache intelligently: Implement multi-level caching with proper invalidation
- Optimize connections: Right-size connection pools and monitor utilization
- Read replicas: Distribute read load to improve overall performance
- Monitor continuously: Set up comprehensive metrics and alerting
Database performance optimization is an iterative process. Start with the biggest impact changes (usually indexing and caching), then progressively optimize based on monitoring data and changing usage patterns.
The investment in database performance pays dividends not just in user experience, but also in system reliability, development velocity, and infrastructure costs.