Documentation

Cache Monitoring

Effective monitoring is essential for maintaining optimal Redis performance and availability. This guide covers monitoring tools, key metrics, alerting, and troubleshooting for DanubeData managed Redis instances.

Overview

DanubeData provides comprehensive monitoring for Redis instances:

  • Real-time Metrics: CPU, memory, connections, operations
  • Performance Tracking: Hit rates, latency, throughput
  • Historical Data: 30 days of retention
  • Custom Alerts: Email notifications for critical events
  • Slow Query Logging: Identify performance bottlenecks

Key Metrics

Memory Metrics

Used Memory

Total memory used by Redis:

  • What it shows: Current RAM consumption
  • Healthy range: 60-80% of allocated memory
  • Alert at: > 90%

Command:

INFO memory | grep used_memory_human
# used_memory_human:2.50G

Hit Rate

Cache hit ratio:

  • Formula: hits / (hits + misses) × 100
  • Healthy range: > 90%
  • Alert at: < 80%

Command:

INFO stats | grep keyspace
# keyspace_hits:1000000
# keyspace_misses:50000
# Hit rate: 95.2%

Evicted Keys

Keys removed due to memory pressure:

  • What it shows: Memory eviction activity
  • Healthy range: 0-10 keys/sec
  • Alert at: > 100 keys/sec

Command:

INFO stats | grep evicted_keys

Performance Metrics

Operations Per Second

Total commands processed:

  • What it shows: Workload intensity
  • Healthy range: Varies by instance size
  • Monitor for: Sudden spikes or drops

Command:

INFO stats | grep instantaneous_ops_per_sec
# instantaneous_ops_per_sec:5432

Latency

Average command execution time:

  • What it shows: Response time
  • Healthy range: < 1ms
  • Alert at: > 10ms

Command:

redis-cli --latency -h redis-123456.danubedata.com -a password
# min: 0, max: 1, avg: 0.08 (96 samples)

Slow Queries

Commands exceeding threshold:

  • Default threshold: 10ms
  • Monitor: Count and patterns
  • Alert at: > 10 slow queries/minute

Command:

SLOWLOG GET 10

Connection Metrics

Connected Clients

Active client connections:

  • Healthy range: < max_connections
  • Alert at: > 80% of max_connections

Command:

INFO clients | grep connected_clients
# connected_clients:245

Rejected Connections

Connections refused due to limits:

  • Healthy: 0
  • Alert at: > 0

Command:

INFO stats | grep rejected_connections

Replication Metrics

Replication Lag

Time/bytes behind primary:

  • Healthy: < 1 second
  • Alert at: > 5 seconds

Command:

# On primary
INFO replication

# On replica
INFO replication | grep master_last_io_seconds_ago

Dashboard Monitoring

Overview Dashboard

Navigate to your Redis instance dashboard to view:

  • Resource Usage: CPU, memory, network
  • Performance: Ops/sec, latency, hit rate
  • Connections: Active, total, rejected
  • Replication: Lag, status, connected replicas

Real-time Graphs

Available graphs (1 hour to 30 days):

  1. Memory Usage: Used vs allocated
  2. Operations Per Second: Total command throughput
  3. Hit Rate: Cache effectiveness
  4. Connected Clients: Connection count
  5. Network I/O: Bytes in/out
  6. CPU Usage: Processor utilization

Customizing View

  1. Click Metrics tab
  2. Select time range (1h, 6h, 24h, 7d, 30d)
  3. Choose metrics to display
  4. Set refresh interval (10s, 30s, 1m)

Command-Line Monitoring

INFO Command

Get comprehensive server information:

# All info
redis-cli INFO

# Specific section
redis-cli INFO memory
redis-cli INFO stats
redis-cli INFO replication
redis-cli INFO cpu
redis-cli INFO clients

Key Statistics

# Database statistics
redis-cli INFO keyspace
# db0:keys=10000,expires=5000,avg_ttl=3600000

# Memory details
redis-cli INFO memory | grep -E 'used_memory_human|used_memory_peak_human|mem_fragmentation_ratio'

# Commands processed
redis-cli INFO stats | grep total_commands_processed

# Hit rate calculation
redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'

Real-time Monitoring

# Monitor all commands in real-time
redis-cli MONITOR

# Continuous stats
redis-cli --stat
------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections
10000      2.50G    245     0       1000000 (+0)        5000
10000      2.50G    245     0       1000050 (+50)       5000

# Latency monitoring
redis-cli --latency
min: 0, max: 2, avg: 0.15 (1234 samples)

# Big keys analysis
redis-cli --bigkeys

Slow Query Log

Monitor slow commands:

# Get last 10 slow queries
SLOWLOG GET 10

# Get slow query count
SLOWLOG LEN

# Reset slow log
SLOWLOG RESET

# Example output:
1) 1) (integer) 123      # Query ID
   2) (integer) 1634567890  # Timestamp
   3) (integer) 15000     # Execution time (microseconds)
   4) 1) "KEYS"          # Command
      2) "*"

Performance Analysis

Memory Analysis

# Memory breakdown
redis-cli INFO memory

# Key categories:
# used_memory: Total allocated by Redis
# used_memory_rss: Actual RAM used (OS perspective)
# used_memory_peak: Maximum memory used
# mem_fragmentation_ratio: RSS/used (ideal: 1.0-1.5)

# Sample objects
redis-cli --memkeys --memkeys-samples 10000

Command Statistics

# Command stats
redis-cli INFO commandstats

# Output:
# cmdstat_get:calls=1000000,usec=500000,usec_per_call=0.50
# cmdstat_set:calls=500000,usec=300000,usec_per_call=0.60

# Most frequent commands
redis-cli INFO commandstats | sort -t= -k2 -nr | head -10

Key Space Analysis

# Keys by database
redis-cli INFO keyspace

# Sample keys
redis-cli --scan --pattern 'user:*' | head -20

# Key types distribution
for key in $(redis-cli --scan | head -1000); do
    redis-cli TYPE $key
done | sort | uniq -c

Alerting

Available Alerts

Configure alerts for:

  • High Memory Usage: > 90%
  • Low Hit Rate: < 80%
  • High Eviction Rate: > 100 keys/sec
  • Connection Limit: > 80% of max
  • Replication Lag: > 5 seconds
  • Instance Down: Health check failure
  • Persistence Failure: RDB/AOF save failed

Setting Up Alerts

  1. Navigate to Redis instance
  2. Click Settings > Alerts
  3. Click Add Alert
  4. Configure:
    • Metric: Select metric to monitor
    • Condition: Threshold and comparison
    • Duration: How long condition must persist
    • Notification: Email recipients
  5. Click Save Alert

Alert Examples

High Memory Alert:

Metric: Memory Usage
Condition: > 90%
Duration: 5 minutes
Action: Email admin@example.com

Low Hit Rate Alert:

Metric: Hit Rate
Condition: < 80%
Duration: 15 minutes
Action: Email team@example.com

Replication Lag Alert:

Metric: Replication Lag
Condition: > 5 seconds
Duration: 2 minutes
Action: Email ops@example.com

Monitoring Scripts

Health Check Script

import redis
import sys

def check_redis_health(host, password):
    try:
        r = redis.Redis(host=host, password=password, ssl=True)
        
        # Basic connectivity
        if not r.ping():
            print("ERROR: Cannot ping Redis")
            return False
        
        # Memory check
        info = r.info('memory')
        memory_used_pct = (info['used_memory'] / info['maxmemory']) * 100
        if memory_used_pct > 90:
            print(f"WARNING: Memory usage at {memory_used_pct:.1f}%")
        
        # Hit rate check
        stats = r.info('stats')
        hits = stats['keyspace_hits']
        misses = stats['keyspace_misses']
        hit_rate = (hits / (hits + misses)) * 100 if (hits + misses) > 0 else 0
        if hit_rate < 80:
            print(f"WARNING: Hit rate at {hit_rate:.1f}%")
        
        # Connection check
        clients = r.info('clients')
        if clients['connected_clients'] > 8000:  # Assuming max 10000
            print(f"WARNING: High connection count: {clients['connected_clients']}")
        
        print("HEALTHY: All checks passed")
        return True
        
    except Exception as e:
        print(f"ERROR: {e}")
        return False

if __name__ == '__main__':
    result = check_redis_health('redis-123456.danubedata.com', 'password')
    sys.exit(0 if result else 1)

Metrics Collection Script

#!/bin/bash
# collect_redis_metrics.sh

HOST="redis-123456.danubedata.com"
PASS="your_password"
OUTPUT="/var/log/redis/metrics.log"

while true; do
    TIMESTAMP=$(date +%s)
    
    # Get metrics
    MEMORY=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO memory | grep used_memory: | cut -d: -f2)
    OPS=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO stats | grep instantaneous_ops_per_sec | cut -d: -f2)
    CLIENTS=$(redis-cli -h $HOST -a $PASS --no-auth-warning INFO clients | grep connected_clients | cut -d: -f2)
    
    # Log metrics
    echo "$TIMESTAMP,$MEMORY,$OPS,$CLIENTS" >> $OUTPUT
    
    sleep 60
done

Best Practices

Monitoring Strategy

  1. Monitor Key Metrics: Focus on memory, hit rate, latency
  2. Set Appropriate Alerts: Not too sensitive, not too lenient
  3. Regular Review: Weekly review of trends
  4. Baseline Performance: Know your normal patterns
  5. Proactive Monitoring: Catch issues before they impact users

Performance Baselines

Establish baselines for:

  • Peak Traffic: Ops/sec during busy periods
  • Average Latency: Typical response times
  • Memory Growth: Daily/weekly memory increase
  • Hit Rate: Expected cache effectiveness
  • Connection Patterns: Normal connection count

Troubleshooting Workflow

  1. Identify Symptom: What is the observed issue?
  2. Check Dashboard: Review recent metrics
  3. Run Commands: Use Redis CLI for details
  4. Review Logs: Check for errors or warnings
  5. Analyze Patterns: Look for correlations
  6. Test Fix: Implement and verify solution
  7. Document: Record issue and resolution

Common Issues and Solutions

High Memory Usage

Detection:

redis-cli INFO memory | grep used_memory_human

Solutions:

  • Set expiration on keys
  • Review eviction policy
  • Implement data cleanup
  • Upgrade to larger instance

Low Hit Rate

Detection:

redis-cli INFO stats | grep keyspace

Solutions:

  • Review cache key patterns
  • Increase TTL for stable data
  • Pre-warm cache
  • Review application caching logic

High Latency

Detection:

redis-cli --latency

Solutions:

  • Check slow query log
  • Review command patterns
  • Use pipelining
  • Upgrade instance
  • Check network latency

Connection Exhaustion

Detection:

redis-cli INFO clients | grep connected_clients

Solutions:

  • Implement connection pooling
  • Fix connection leaks
  • Increase max_connections
  • Review client configuration

Related Documentation