Cache Monitoring
Effective monitoring is essential for maintaining optimal Redis performance and availability. This guide covers monitoring tools, key metrics, alerting, and troubleshooting for DanubeData managed Redis instances.
Overview
DanubeData provides comprehensive monitoring for Redis instances:
- Real-time Metrics: CPU, memory, connections, operations
- Performance Tracking: Hit rates, latency, throughput
- Historical Data: 30 days of retention
- Custom Alerts: Email notifications for critical events
- Slow Query Logging: Identify performance bottlenecks
Key Metrics
Memory Metrics
Used Memory
Total memory used by Redis:
- What it shows: Current RAM consumption
- Healthy range: 60-80% of allocated memory
- Alert at: > 90%
Command:
INFO memory | grep used_memory_human
# used_memory_human:2.50G
Hit Rate
Cache hit ratio:
- Formula: hits / (hits + misses) × 100
- Healthy range: > 90%
- Alert at: < 80%
Command:
INFO stats | grep keyspace
# keyspace_hits:1000000
# keyspace_misses:50000
# Hit rate: 95.2%
Evicted Keys
Keys removed due to memory pressure:
- What it shows: Memory eviction activity
- Healthy range: 0-10 keys/sec
- Alert at: > 100 keys/sec
Command:
INFO stats | grep evicted_keys
Performance Metrics
Operations Per Second
Total commands processed:
- What it shows: Workload intensity
- Healthy range: Varies by instance size
- Monitor for: Sudden spikes or drops
Command:
INFO stats | grep instantaneous_ops_per_sec
# instantaneous_ops_per_sec:5432
Latency
Average command execution time:
- What it shows: Response time
- Healthy range: < 1ms
- Alert at: > 10ms
Command:
redis-cli --latency -h redis-123456.danubedata.com -a password
# min: 0, max: 1, avg: 0.08 (96 samples)
Slow Queries
Commands exceeding threshold:
- Default threshold: 10ms
- Monitor: Count and patterns
- Alert at: > 10 slow queries/minute
Command:
SLOWLOG GET 10
Connection Metrics
Connected Clients
Active client connections:
- Healthy range: < max_connections
- Alert at: > 80% of max_connections
Command:
INFO clients | grep connected_clients
# connected_clients:245
Rejected Connections
Connections refused due to limits:
- Healthy: 0
- Alert at: > 0
Command:
INFO stats | grep rejected_connections
Replication Metrics
Replication Lag
Time/bytes behind primary:
- Healthy: < 1 second
- Alert at: > 5 seconds
Command:
# On primary
INFO replication
# On replica
INFO replication | grep master_last_io_seconds_ago
Dashboard Monitoring
Overview Dashboard
Navigate to your Redis instance dashboard to view:
- Resource Usage: CPU, memory, network
- Performance: Ops/sec, latency, hit rate
- Connections: Active, total, rejected
- Replication: Lag, status, connected replicas
Real-time Graphs
Available graphs (1 hour to 30 days):
- Memory Usage: Used vs allocated
- Operations Per Second: Total command throughput
- Hit Rate: Cache effectiveness
- Connected Clients: Connection count
- Network I/O: Bytes in/out
- CPU Usage: Processor utilization
Customizing View
- Click Metrics tab
- Select time range (1h, 6h, 24h, 7d, 30d)
- Choose metrics to display
- Set refresh interval (10s, 30s, 1m)
Command-Line Monitoring
INFO Command
Get comprehensive server information:
# All info
redis-cli INFO
# Specific section
redis-cli INFO memory
redis-cli INFO stats
redis-cli INFO replication
redis-cli INFO cpu
redis-cli INFO clients
Key Statistics
# Database statistics
redis-cli INFO keyspace
# db0:keys=10000,expires=5000,avg_ttl=3600000
# Memory details
redis-cli INFO memory | grep -E 'used_memory_human|used_memory_peak_human|mem_fragmentation_ratio'
# Commands processed
redis-cli INFO stats | grep total_commands_processed
# Hit rate calculation
redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'
Real-time Monitoring
# Monitor all commands in real-time
redis-cli MONITOR
# Continuous stats
redis-cli --stat
------- data ------ --------------------- load -------------------- - child -
keys mem clients blocked requests connections
10000 2.50G 245 0 1000000 (+0) 5000
10000 2.50G 245 0 1000050 (+50) 5000
# Latency monitoring
redis-cli --latency
min: 0, max: 2, avg: 0.15 (1234 samples)
# Big keys analysis
redis-cli --bigkeys
Slow Query Log
Monitor slow commands:
# Get last 10 slow queries
SLOWLOG GET 10
# Get slow query count
SLOWLOG LEN
# Reset slow log
SLOWLOG RESET
# Example output:
1) 1) (integer) 123 # Query ID
2) (integer) 1634567890 # Timestamp
3) (integer) 15000 # Execution time (microseconds)
4) 1) "KEYS" # Command
2) "*"
Performance Analysis
Memory Analysis
# Memory breakdown
redis-cli INFO memory
# Key categories:
# used_memory: Total allocated by Redis
# used_memory_rss: Actual RAM used (OS perspective)
# used_memory_peak: Maximum memory used
# mem_fragmentation_ratio: RSS/used (ideal: 1.0-1.5)
# Sample objects
redis-cli --memkeys --memkeys-samples 10000
Command Statistics
# Command stats
redis-cli INFO commandstats
# Output:
# cmdstat_get:calls=1000000,usec=500000,usec_per_call=0.50
# cmdstat_set:calls=500000,usec=300000,usec_per_call=0.60
# Most frequent commands
redis-cli INFO commandstats | sort -t= -k2 -nr | head -10
Key Space Analysis
# Keys by database
redis-cli INFO keyspace
# Sample keys
redis-cli --scan --pattern 'user:*' | head -20
# Key types distribution
for key in $(redis-cli --scan | head -1000); do
redis-cli TYPE $key
done | sort | uniq -c
Alerting
Available Alerts
Configure alerts for:
- High Memory Usage: > 90%
- Low Hit Rate: < 80%
- High Eviction Rate: > 100 keys/sec
- Connection Limit: > 80% of max
- Replication Lag: > 5 seconds
- Instance Down: Health check failure
- Persistence Failure: RDB/AOF save failed
Setting Up Alerts
- Navigate to Redis instance
- Click Settings > Alerts
- Click Add Alert
- Configure:
- Metric: Select metric to monitor
- Condition: Threshold and comparison
- Duration: How long condition must persist
- Notification: Email recipients
- Click Save Alert
Alert Examples
High Memory Alert:
Metric: Memory Usage
Condition: > 90%
Duration: 5 minutes
Action: Email admin@example.com
Low Hit Rate Alert:
Metric: Hit Rate
Condition: < 80%
Duration: 15 minutes
Action: Email team@example.com
Replication Lag Alert:
Metric: Replication Lag
Condition: > 5 seconds
Duration: 2 minutes
Action: Email ops@example.com
Monitoring Scripts
Health Check Script
import redis
import sys
def check_redis_health(host, password):
try:
r = redis.Redis(host=host, password=password, ssl=True)
# Basic connectivity
if not r.ping():
print("ERROR: Cannot ping Redis")
return False
# Memory check
info = r.info('memory')
memory_used_pct = (info['used_memory'] / info['maxmemory']) * 100
if memory_used_pct > 90:
print(f"WARNING: Memory usage at {memory_used_pct:.1f}%")
# Hit rate check
stats = r.info('stats')
hits = stats['keyspace_hits']
misses = stats['keyspace_misses']
hit_rate = (hits / (hits + misses)) * 100 if (hits + misses) > 0 else 0
if hit_rate < 80:
print(f"WARNING: Hit rate at {hit_rate:.1f}%")
# Connection check
clients = r.info('clients')
if clients['connected_clients'] > 8000: # Assuming max 10000
print(f"WARNING: High connection count: {clients['connected_clients']}")
print("HEALTHY: All checks passed")
return True
except Exception as e:
print(f"ERROR: {e}")
return False
if __name__ == '__main__':
result = check_redis_health('redis-123456.danubedata.com', 'password')
sys.exit(0 if result else 1)
Metrics Collection Script
#!/bin/bash
# collect_redis_metrics.sh
HOST="redis-123456.danubedata.com"
PASS="your_password"
OUTPUT="/var/log/redis/metrics.log"
while true; do
TIMESTAMP=$(date +%s)
# Get metrics
MEMORY=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO memory | grep used_memory: | cut -d: -f2)
OPS=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO stats | grep instantaneous_ops_per_sec | cut -d: -f2)
CLIENTS=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO clients | grep connected_clients | cut -d: -f2)
# Log metrics
echo "$TIMESTAMP,$MEMORY,$OPS,$CLIENTS" >> $OUTPUT
sleep 60
done
Best Practices
Monitoring Strategy
- Monitor Key Metrics: Focus on memory, hit rate, latency
- Set Appropriate Alerts: Not too sensitive, not too lenient
- Regular Review: Weekly review of trends
- Baseline Performance: Know your normal patterns
- Proactive Monitoring: Catch issues before they impact users
Performance Baselines
Establish baselines for:
- Peak Traffic: Ops/sec during busy periods
- Average Latency: Typical response times
- Memory Growth: Daily/weekly memory increase
- Hit Rate: Expected cache effectiveness
- Connection Patterns: Normal connection count
Troubleshooting Workflow
- Identify Symptom: What is the observed issue?
- Check Dashboard: Review recent metrics
- Run Commands: Use Redis CLI for details
- Review Logs: Check for errors or warnings
- Analyze Patterns: Look for correlations
- Test Fix: Implement and verify solution
- Document: Record issue and resolution
Common Issues and Solutions
High Memory Usage
Detection:
redis-cli INFO memory | grep used_memory_human
Solutions:
- Set expiration on keys
- Review eviction policy
- Implement data cleanup
- Upgrade to larger instance
Low Hit Rate
Detection:
redis-cli INFO stats | grep keyspace
Solutions:
- Review cache key patterns
- Increase TTL for stable data
- Pre-warm cache
- Review application caching logic
High Latency
Detection:
redis-cli --latency
Solutions:
- Check slow query log
- Review command patterns
- Use pipelining
- Upgrade instance
- Check network latency
Connection Exhaustion
Detection:
redis-cli INFO clients | grep connected_clients
Solutions:
- Implement connection pooling
- Fix connection leaks
- Increase max_connections
- Review client configuration