Redis Replicas
Redis replicas provide high availability, improved read performance, and automatic failover for your managed Redis instances. This guide covers replica configuration, management, and best practices.
Overview
Redis replicas create read-only copies of your primary Redis instance:
- High Availability: Automatic failover if primary fails
- Read Scaling: Distribute read operations across replicas
- Disaster Recovery: Maintain standby instance for recovery
- Zero Data Loss: Synchronous replication available
- Automatic Promotion: Replica automatically becomes primary on failure
How Redis Replication Works
Replication Architecture
┌─────────────┐ Async/Sync Replication ┌─────────────┐
│ Primary │ ────────────────────────────────> │ Replica │
│ (Read/Write)│ │ (Read-Only) │
└─────────────┘ └─────────────┘
│ │
│ │
Write Ops Read Ops
Replication Process
- Initial Sync: Replica requests full copy from primary
- RDB Transfer: Primary sends snapshot to replica
- Command Stream: Primary streams write commands to replica
- Apply Commands: Replica applies commands in order
- Stay Synchronized: Continuous replication of all writes
Replication Lag
- Asynchronous: Typically < 10ms lag
- Synchronous: Zero lag (available for critical data)
- Monitoring: Lag displayed in dashboard
- Automatic Catch-up: Replicas automatically sync after disconnection
Creating Replicas
Prerequisites
- Existing Redis instance (primary)
- Instance must be in healthy state
- Sufficient account limits for additional resources
Via Dashboard
- Navigate to your Redis instance
- Click Replicas tab
- Click Create Replica
- Configure replica:
- Name: Descriptive name
- Region: Same or different data center
- Profile: Match or differ from primary
- Replication Mode: Asynchronous or Synchronous
- Click Create
Replica will be ready within 2-5 minutes.
Replica Configuration
Same-Region Replicas
Best for:
- High availability within region
- Read scaling
- Minimal replication lag (< 10ms)
- Lower cost
Cross-Region Replicas
Best for:
- Disaster recovery
- Geographic distribution
- Compliance requirements
- Serving users in different regions
- Higher lag (~50-100ms depending on distance)
Connecting to Replicas
Connection Endpoints
Each replica has its own endpoint:
Primary: redis-123456.danubedata.com:6379
Replica 1: redis-replica-123456-01.danubedata.com:6379
Replica 2: redis-replica-123456-02.danubedata.com:6379
Read-Only Access
Replicas are read-only by default:
import redis
# Primary - read/write
primary = redis.Redis(
host='redis-123456.danubedata.com',
port=6379,
password='password',
ssl=True
)
# Replica - read-only
replica = redis.Redis(
host='redis-replica-123456-01.danubedata.com',
port=6379,
password='password',
ssl=True
)
# Writes go to primary
primary.set('key', 'value') # ✓ Works
# Reads can use replica
value = replica.get('key') # ✓ Works
# Writes to replica will fail
replica.set('key', 'value') # ✗ Error: READONLY
Application Configuration
Python with Read/Write Splitting
from redis import Redis
import random
class RedisClient:
def __init__(self, primary_host, replica_hosts):
self.primary = Redis(host=primary_host, ssl=True, ...)
self.replicas = [Redis(host=host, ssl=True, ...) for host in replica_hosts]
def get_replica(self):
"""Get random replica for load balancing"""
return random.choice(self.replicas) if self.replicas else self.primary
def get(self, key):
"""Read from replica"""
return self.get_replica().get(key)
def set(self, key, value, **kwargs):
"""Write to primary"""
return self.primary.set(key, value, **kwargs)
def delete(self, key):
"""Delete from primary"""
return self.primary.delete(key)
# Usage
redis_client = RedisClient(
primary_host='redis-123456.danubedata.com',
replica_hosts=[
'redis-replica-123456-01.danubedata.com',
'redis-replica-123456-02.danubedata.com',
]
)
# Writes to primary
redis_client.set('user:1000', 'John')
# Reads from replica
user = redis_client.get('user:1000')
Laravel Configuration
// config/database.php
'redis' => [
'client' => env('REDIS_CLIENT', 'phpredis'),
'options' => [
'cluster' => env('REDIS_CLUSTER', 'redis'),
'prefix' => env('REDIS_PREFIX', Str::slug(env('APP_NAME', 'laravel'), '_').'_database_'),
],
'default' => [
'url' => env('REDIS_URL'),
'host' => env('REDIS_HOST', 'redis-123456.danubedata.com'),
'password' => env('REDIS_PASSWORD', null),
'port' => env('REDIS_PORT', '6379'),
'database' => env('REDIS_DB', '0'),
'read_write_timeout' => 60,
'context' => [
'stream' => [
'verify_peer' => true,
'verify_peer_name' => true,
],
],
],
'replica' => [
'url' => env('REDIS_REPLICA_URL'),
'host' => env('REDIS_REPLICA_HOST', 'redis-replica-123456-01.danubedata.com'),
'password' => env('REDIS_PASSWORD', null),
'port' => env('REDIS_PORT', '6379'),
'database' => env('REDIS_DB', '0'),
'read_write_timeout' => 60,
'context' => [
'stream' => [
'verify_peer' => true,
'verify_peer_name' => true,
],
],
],
],
// Usage
use Illuminate\Support\Facades\Redis;
// Write to primary
Redis::connection('default')->set('key', 'value');
// Read from replica
$value = Redis::connection('replica')->get('key');
Node.js with Failover
const Redis = require('ioredis');
const primary = new Redis({
host: 'redis-123456.danubedata.com',
port: 6379,
password: 'password',
tls: {}
});
const replica = new Redis({
host: 'redis-replica-123456-01.danubedata.com',
port: 6379,
password: 'password',
tls: {},
retryStrategy(times) {
// Failover to primary after 3 attempts
if (times > 3) {
return null; // Stop retrying
}
return Math.min(times * 50, 2000);
}
});
class RedisManager {
async get(key) {
try {
return await replica.get(key);
} catch (error) {
console.log('Replica failed, using primary');
return await primary.get(key);
}
}
async set(key, value) {
return await primary.set(key, value);
}
}
module.exports = new RedisManager();
High Availability Configuration
Automatic Failover
Enable automatic failover for production instances:
- Navigate to your Redis instance
- Click Settings > High Availability
- Enable Automatic Failover
- Set Failover Timeout (default: 45 seconds)
- Click Save
Failover Process
When primary fails:
- Detection: Health check detects primary failure (45 seconds)
- Verification: Multiple checks confirm failure
- Promotion: Best replica promoted to primary
- DNS Update: Primary endpoint redirected to new primary
- Notification: Email alert sent to account owners
- Reconnection: Applications automatically reconnect
Expected Downtime: 30-60 seconds for automatic failover
Manual Failover
Trigger manual failover for maintenance:
- Navigate to your Redis instance
- Click Replicas tab
- Select replica to promote
- Click Promote to Primary
- Confirm promotion
Manual failover completes within seconds.
Handling Failover in Applications
Connection Retry Logic
import redis
import time
from redis.exceptions import ConnectionError
def redis_operation_with_retry(func, *args, max_retries=3, **kwargs):
"""Execute Redis operation with retry logic"""
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except ConnectionError as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
# Recreate connection
func.im_self.connection_pool.reset()
# Usage
try:
redis_operation_with_retry(redis_client.get, 'key')
except ConnectionError:
# Handle permanent failure
pass
Circuit Breaker Pattern
class CircuitBreaker {
constructor(redis, options = {}) {
this.redis = redis;
this.failures = 0;
this.threshold = options.threshold || 5;
this.timeout = options.timeout || 60000;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
}
async execute(operation) {
if (this.state === 'OPEN') {
if (Date.now() - this.openedAt > this.timeout) {
this.state = 'HALF_OPEN';
} else {
throw new Error('Circuit breaker is OPEN');
}
}
try {
const result = await operation();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failures = 0;
if (this.state === 'HALF_OPEN') {
this.state = 'CLOSED';
}
}
onFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = 'OPEN';
this.openedAt = Date.now();
}
}
}
Monitoring Replicas
Key Metrics
Monitor these metrics for replicas:
- Replication Lag: Time/bytes behind primary
- Connected Replicas: Number of connected replicas
- Replication Offset: Bytes replicated
- Replica Health: Overall replica status
- Connection Count: Client connections per replica
- Memory Usage: Memory consumption per replica
Redis Commands
Check replication status:
# On primary
INFO replication
# Output:
# role:master
# connected_slaves:2
# slave0:ip=10.0.1.5,port=6379,state=online,offset=1234567,lag=0
# slave1:ip=10.0.2.5,port=6379,state=online,offset=1234567,lag=0
# On replica
INFO replication
# Output:
# role:slave
# master_host:redis-123456.danubedata.com
# master_port:6379
# master_link_status:up
# master_last_io_seconds_ago:0
# master_sync_in_progress:0
Monitoring Script
import redis
def check_replication_health(primary_host, replica_hosts):
"""Check replication health for all replicas"""
primary = redis.Redis(host=primary_host, ...)
# Get primary info
primary_info = primary.info('replication')
print(f"Primary: {primary_info['role']}")
print(f"Connected replicas: {primary_info['connected_slaves']}")
# Check each replica
for i, replica_host in enumerate(replica_hosts):
try:
replica = redis.Redis(host=replica_host, ...)
info = replica.info('replication')
print(f"\nReplica {i+1}:")
print(f" Status: {info['master_link_status']}")
print(f" Lag: {info['master_last_io_seconds_ago']}s")
print(f" Sync in progress: {info['master_sync_in_progress']}")
except Exception as e:
print(f"\nReplica {i+1}: ERROR - {e}")
# Run check
check_replication_health(
'redis-123456.danubedata.com',
['redis-replica-123456-01.danubedata.com',
'redis-replica-123456-02.danubedata.com']
)
Replica Management
Scaling Read Capacity
Add more replicas to scale reads:
- Create additional replicas
- Update application configuration with new endpoints
- Implement load balancing across all replicas
- Monitor distribution of read traffic
Resizing Replicas
Change replica resource profile:
- Navigate to replica in dashboard
- Click Resize
- Select new profile
- Confirm resize
Note: Replica can have different profile than primary
Removing Replicas
Delete unused replicas:
- Navigate to replica in dashboard
- Click Delete
- Confirm deletion
- Update application configuration to remove endpoint
Synchronous Replication
For critical data requiring zero data loss:
Enabling Synchronous Replication
- Navigate to your Redis instance
- Click Settings > Replication
- Enable Synchronous Replication
- Set Minimum Replicas (e.g., 1)
- Click Save
How It Works
With synchronous replication:
- Write operations wait for acknowledgment from replicas
- Guarantees zero data loss on failover
- Higher latency for write operations (typically +5-10ms)
- Write fails if minimum replicas not available
Trade-offs
Pros:
- Zero data loss guarantee
- Strong consistency
- Perfect for financial/critical data
Cons:
- Increased write latency
- Reduced write throughput
- Availability depends on replica health
Best Practices
Application Design
- Separate Connections: Use different connections for primary and replicas
- Read from Replicas: Route all read traffic to replicas when possible
- Write to Primary: Always write to primary
- Handle Failover: Implement retry logic with exponential backoff
- Connection Pooling: Use pools for both primary and replica connections
Scaling Strategy
- Start with One Replica: Provide high availability
- Add Replicas for Reads: Scale horizontally as needed
- Cross-Region Replica: Add for disaster recovery
- Monitor Load: Watch primary and replica utilization
- Load Balance: Distribute reads evenly across replicas
High Availability
- Enable Auto-Failover: Critical for production
- Multiple Replicas: At least 2 for redundancy
- Cross-AZ Deployment: Replicas in different availability zones
- Regular Testing: Test failover procedures monthly
- Monitoring and Alerts: Set up alerts for replication lag
Performance
- Connection Pooling: Reuse connections efficiently
- Pipeline Commands: Batch operations when possible
- Monitor Lag: Keep replication lag under 100ms
- Right-Size Resources: Ensure replicas have adequate resources
- Network Proximity: Place replicas close to application servers
Troubleshooting
High Replication Lag
Symptoms: Replica falling behind primary
Causes:
- High write load on primary
- Network bandwidth limitations
- Undersized replica resources
- Large bulk operations
Solutions:
- Upgrade replica to larger profile
- Optimize write operations on primary
- Split large operations into smaller batches
- Check network connectivity
- Monitor primary CPU/memory usage
Replica Connection Failures
Symptoms: Cannot connect to replica
Solutions:
- Check replica status in dashboard
- Verify connection details and credentials
- Test with redis-cli
- Check firewall rules
- Review application logs for errors
Replica Out of Sync
Symptoms: Replication status shows "sync_in_progress" or "disconnected"
Solutions:
- Check primary and replica health
- Verify network connectivity
- Review replication logs in dashboard
- Rebuild replica if necessary
- Contact support if issue persists
Failover Not Working
Symptoms: Primary fails but replica not promoted
Causes:
- Auto-failover not enabled
- No healthy replicas available
- Replication lag too high
- Network partitioning
Solutions:
- Verify auto-failover is enabled
- Check replica health status
- Ensure replicas are online and synced
- Manually promote replica if needed
- Review failover logs