Metrics
Detailed guide to understanding and using metrics for monitoring your DanubeData resources.
Overview
Metrics provide quantitative measurements of resource performance and health over time.
VPS Metrics
CPU Metrics
CPU Usage %
- Current CPU utilization
- Range: 0-100%
- Alert: > 80% sustained
CPU Load Average
- System load over 1, 5, 15 minutes
- Varies by CPU count
- Alert: > CPU count
CPU Steal
- CPU time stolen by hypervisor (should be near 0)
- Alert: > 5%
Memory Metrics
Memory Usage %
- RAM utilization
- Range: 0-100%
- Alert: > 85%
Memory Available
- Free memory plus cache/buffers
- Alert: < 10% of total
Swap Usage
- Swap space used
- Alert: > 0 (indicates memory pressure)
Disk Metrics
Disk I/O
- Read/write MB/s
- IOPS (operations per second)
Disk Usage %
- Storage consumption
- Alert: > 85%
Network Metrics
Network In/Out
- Bandwidth usage (MB/s)
- Track against allocation
Network Packets
- Packets per second
- Useful for diagnosing issues
Database Metrics
Performance Metrics
Query Time
- Average query execution time
- Alert: > 100ms average
Slow Queries
- Queries exceeding threshold
- Alert: > 10/minute
Throughput
- Queries per second
- Monitor for capacity planning
Connection Metrics
Active Connections
- Current client connections
- Alert: > 80% of max_connections
Connection Rate
- New connections per second
- Alert: Sudden spikes
Cache Metrics
Buffer Cache Hit Rate
- Percentage of queries served from cache
- Target: > 99%
- Alert: < 95%
Cache Size
- Memory used for query cache
- Monitor for sizing
Replication Metrics
Replication Lag
- Delay between primary and replica
- Alert: > 5 seconds
Replication Status
- Connected/Disconnected status
- Alert: Disconnected
Cache (Redis) Metrics
Memory Metrics
Memory Usage
- Current RAM consumption
- Alert: > 90% of allocated
Memory Fragmentation
- Ratio of RSS to used memory
- Alert: > 1.5 (consider restart)
Evicted Keys
- Keys removed due to memory pressure
- Alert: > 100/second
Performance Metrics
Hit Rate
- Cache hit percentage
- Target: > 90%
- Alert: < 80%
Operations/Sec
- Commands processed per second
- Monitor for capacity
Latency
- Average command execution time
- Alert: > 10ms
Connection Metrics
Connected Clients
- Active connections
- Alert: > 80% of max
Blocked Clients
- Clients waiting on blocking operations
- Alert: Sustained blocked clients
Metric Collection
Data Retention
- 1-minute granularity: 1 hour
- 5-minute granularity: 1 day
- 15-minute granularity: 1 week
- 1-hour granularity: 30 days
API Access
Fetch metrics programmatically:
curl -X GET \
https://api.danubedata.com/v1/resources/{id}/metrics \
-H 'Authorization: Bearer YOUR_TOKEN' \
-d 'metric=cpu_usage&start=2024-10-01T00:00:00Z&end=2024-10-02T00:00:00Z'
Best Practices
- Regular Monitoring: Check metrics daily
- Set Baselines: Know normal values
- Correlate Metrics: Look at multiple metrics together
- Trend Analysis: Watch for gradual changes
- Alert Configuration: Set meaningful thresholds