Cache Monitoring

Effective monitoring is essential for maintaining optimal Redis performance and availability. This guide covers monitoring tools, key metrics, alerting, and troubleshooting for DanubeData managed Redis instances.

Overview

DanubeData provides comprehensive monitoring for Redis instances:

Real-time Metrics: CPU, memory, connections, operations
Performance Tracking: Hit rates, latency, throughput
Historical Data: 30 days of retention
Custom Alerts: Email notifications for critical events
Slow Query Logging: Identify performance bottlenecks

Key Metrics

Memory Metrics

Used Memory

Total memory used by Redis:

What it shows: Current RAM consumption
Healthy range: 60-80% of allocated memory
Alert at: > 90%

Command:

INFO memory | grep used_memory_human
# used_memory_human:2.50G

Hit Rate

Cache hit ratio:

Formula: hits / (hits + misses) × 100
Healthy range: > 90%
Alert at: < 80%

Command:

INFO stats | grep keyspace
# keyspace_hits:1000000
# keyspace_misses:50000
# Hit rate: 95.2%

Evicted Keys

Keys removed due to memory pressure:

What it shows: Memory eviction activity
Healthy range: 0-10 keys/sec
Alert at: > 100 keys/sec

Command:

INFO stats | grep evicted_keys

Performance Metrics

Operations Per Second

Total commands processed:

What it shows: Workload intensity
Healthy range: Varies by instance size
Monitor for: Sudden spikes or drops

Command:

INFO stats | grep instantaneous_ops_per_sec
# instantaneous_ops_per_sec:5432

Latency

Average command execution time:

What it shows: Response time
Healthy range: < 1ms
Alert at: > 10ms

Command:

redis-cli --latency -h redis-123456.danubedata.com -a password
# min: 0, max: 1, avg: 0.08 (96 samples)

Slow Queries

Commands exceeding threshold:

Default threshold: 10ms
Monitor: Count and patterns
Alert at: > 10 slow queries/minute

Command:

SLOWLOG GET 10

Connection Metrics

Connected Clients

Active client connections:

Healthy range: < max_connections
Alert at: > 80% of max_connections

Command:

INFO clients | grep connected_clients
# connected_clients:245

Rejected Connections

Connections refused due to limits:

Healthy: 0
Alert at: > 0

Command:

INFO stats | grep rejected_connections

Replication Metrics

Replication Lag

Time/bytes behind primary:

Healthy: < 1 second
Alert at: > 5 seconds

Command:

# On primary
INFO replication

# On replica
INFO replication | grep master_last_io_seconds_ago

Dashboard Monitoring

Overview Dashboard

Navigate to your Redis instance dashboard to view:

Resource Usage: CPU, memory, network
Performance: Ops/sec, latency, hit rate
Connections: Active, total, rejected
Replication: Lag, status, connected replicas

Real-time Graphs

Available graphs (1 hour to 30 days):

Memory Usage: Used vs allocated
Operations Per Second: Total command throughput
Hit Rate: Cache effectiveness
Connected Clients: Connection count
Network I/O: Bytes in/out
CPU Usage: Processor utilization

Customizing View

Click Metrics tab
Select time range (1h, 6h, 24h, 7d, 30d)
Choose metrics to display
Set refresh interval (10s, 30s, 1m)

Command-Line Monitoring

INFO Command

Get comprehensive server information:

# All info
redis-cli INFO

# Specific section
redis-cli INFO memory
redis-cli INFO stats
redis-cli INFO replication
redis-cli INFO cpu
redis-cli INFO clients

Key Statistics

# Database statistics
redis-cli INFO keyspace
# db0:keys=10000,expires=5000,avg_ttl=3600000

# Memory details
redis-cli INFO memory | grep -E 'used_memory_human|used_memory_peak_human|mem_fragmentation_ratio'

# Commands processed
redis-cli INFO stats | grep total_commands_processed

# Hit rate calculation
redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'

Real-time Monitoring

# Monitor all commands in real-time
redis-cli MONITOR

# Continuous stats
redis-cli --stat
------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections
10000      2.50G    245     0       1000000 (+0)        5000
10000      2.50G    245     0       1000050 (+50)       5000

# Latency monitoring
redis-cli --latency
min: 0, max: 2, avg: 0.15 (1234 samples)

# Big keys analysis
redis-cli --bigkeys

Slow Query Log

Monitor slow commands:

# Get last 10 slow queries
SLOWLOG GET 10

# Get slow query count
SLOWLOG LEN

# Reset slow log
SLOWLOG RESET

# Example output:
1) 1) (integer) 123      # Query ID
   2) (integer) 1634567890  # Timestamp
   3) (integer) 15000     # Execution time (microseconds)
   4) 1) "KEYS"          # Command
      2) "*"

Performance Analysis

Memory Analysis

# Memory breakdown
redis-cli INFO memory

# Key categories:
# used_memory: Total allocated by Redis
# used_memory_rss: Actual RAM used (OS perspective)
# used_memory_peak: Maximum memory used
# mem_fragmentation_ratio: RSS/used (ideal: 1.0-1.5)

# Sample objects
redis-cli --memkeys --memkeys-samples 10000

Command Statistics

# Command stats
redis-cli INFO commandstats

# Output:
# cmdstat_get:calls=1000000,usec=500000,usec_per_call=0.50
# cmdstat_set:calls=500000,usec=300000,usec_per_call=0.60

# Most frequent commands
redis-cli INFO commandstats | sort -t= -k2 -nr | head -10

Key Space Analysis

# Keys by database
redis-cli INFO keyspace

# Sample keys
redis-cli --scan --pattern 'user:*' | head -20

# Key types distribution
for key in $(redis-cli --scan | head -1000); do
    redis-cli TYPE $key
done | sort | uniq -c

Alerting

Available Alerts

Configure alerts for:

High Memory Usage: > 90%
Low Hit Rate: < 80%
High Eviction Rate: > 100 keys/sec
Connection Limit: > 80% of max
Replication Lag: > 5 seconds
Instance Down: Health check failure
Persistence Failure: RDB/AOF save failed

Setting Up Alerts

Navigate to Redis instance
Click Settings > Alerts
Click Add Alert
Configure:
- Metric: Select metric to monitor
- Condition: Threshold and comparison
- Duration: How long condition must persist
- Notification: Email recipients
Click Save Alert

Alert Examples

High Memory Alert:

Metric: Memory Usage
Condition: > 90%
Duration: 5 minutes
Action: Email admin@example.com

Low Hit Rate Alert:

Metric: Hit Rate
Condition: < 80%
Duration: 15 minutes
Action: Email team@example.com

Replication Lag Alert:

Metric: Replication Lag
Condition: > 5 seconds
Duration: 2 minutes
Action: Email ops@example.com

Monitoring Scripts

Health Check Script

import redis
import sys

def check_redis_health(host, password):
    try:
        r = redis.Redis(host=host, password=password, ssl=True)
        
        # Basic connectivity
        if not r.ping():
            print("ERROR: Cannot ping Redis")
            return False
        
        # Memory check
        info = r.info('memory')
        memory_used_pct = (info['used_memory'] / info['maxmemory']) * 100
        if memory_used_pct > 90:
            print(f"WARNING: Memory usage at {memory_used_pct:.1f}%")
        
        # Hit rate check
        stats = r.info('stats')
        hits = stats['keyspace_hits']
        misses = stats['keyspace_misses']
        hit_rate = (hits / (hits + misses)) * 100 if (hits + misses) > 0 else 0
        if hit_rate < 80:
            print(f"WARNING: Hit rate at {hit_rate:.1f}%")
        
        # Connection check
        clients = r.info('clients')
        if clients['connected_clients'] > 8000:  # Assuming max 10000
            print(f"WARNING: High connection count: {clients['connected_clients']}")
        
        print("HEALTHY: All checks passed")
        return True
        
    except Exception as e:
        print(f"ERROR: {e}")
        return False

if __name__ == '__main__':
    result = check_redis_health('redis-123456.danubedata.com', 'password')
    sys.exit(0 if result else 1)

Metrics Collection Script

#!/bin/bash
# collect_redis_metrics.sh

HOST="redis-123456.danubedata.com"
PASS="your_password"
OUTPUT="/var/log/redis/metrics.log"

while true; do
    TIMESTAMP=$(date +%s)
    
    # Get metrics
    MEMORY=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO memory | grep used_memory: | cut -d: -f2)
    OPS=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO stats | grep instantaneous_ops_per_sec | cut -d: -f2)
    CLIENTS=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO clients | grep connected_clients | cut -d: -f2)
    
    # Log metrics
    echo "$TIMESTAMP,$MEMORY,$OPS,$CLIENTS" >> $OUTPUT
    
    sleep 60
done

Best Practices

Monitoring Strategy

Monitor Key Metrics: Focus on memory, hit rate, latency
Set Appropriate Alerts: Not too sensitive, not too lenient
Regular Review: Weekly review of trends
Baseline Performance: Know your normal patterns
Proactive Monitoring: Catch issues before they impact users

Performance Baselines

Establish baselines for:

Peak Traffic: Ops/sec during busy periods
Average Latency: Typical response times
Memory Growth: Daily/weekly memory increase
Hit Rate: Expected cache effectiveness
Connection Patterns: Normal connection count

Troubleshooting Workflow

Identify Symptom: What is the observed issue?
Check Dashboard: Review recent metrics
Run Commands: Use Redis CLI for details
Review Logs: Check for errors or warnings
Analyze Patterns: Look for correlations
Test Fix: Implement and verify solution
Document: Record issue and resolution

Common Issues and Solutions

High Memory Usage

Detection:

redis-cli INFO memory | grep used_memory_human

Solutions:

Set expiration on keys
Review eviction policy
Implement data cleanup
Upgrade to larger instance

Low Hit Rate

Detection:

redis-cli INFO stats | grep keyspace

Solutions:

Review cache key patterns
Increase TTL for stable data
Pre-warm cache
Review application caching logic

High Latency

Detection:

redis-cli --latency

Solutions:

Check slow query log
Review command patterns
Use pipelining
Upgrade instance
Check network latency

Connection Exhaustion

Detection:

redis-cli INFO clients | grep connected_clients

Solutions:

Implement connection pooling
Fix connection leaks
Increase max_connections
Review client configuration

Cache Monitoring

Overview

Key Metrics

Memory Metrics

Used Memory

Hit Rate

Evicted Keys

Performance Metrics

Operations Per Second

Latency

Slow Queries

Connection Metrics

Connected Clients

Rejected Connections

Replication Metrics

Replication Lag

Dashboard Monitoring

Overview Dashboard

Real-time Graphs

Customizing View

Command-Line Monitoring

INFO Command

Key Statistics

Real-time Monitoring

Slow Query Log

Performance Analysis

Memory Analysis

Command Statistics

Key Space Analysis

Alerting

Available Alerts

Setting Up Alerts

Alert Examples

Monitoring Scripts

Health Check Script

Metrics Collection Script

Best Practices

Monitoring Strategy

Performance Baselines

Troubleshooting Workflow

Common Issues and Solutions

High Memory Usage

Low Hit Rate

High Latency

Connection Exhaustion

Related Documentation