Documentation

Metrics

Detailed guide to understanding and using metrics for monitoring your DanubeData resources.

Overview

Metrics provide quantitative measurements of resource performance and health over time.

VPS Metrics

CPU Metrics

CPU Usage %

  • Current CPU utilization
  • Range: 0-100%
  • Alert: > 80% sustained

CPU Load Average

  • System load over 1, 5, 15 minutes
  • Varies by CPU count
  • Alert: > CPU count

CPU Steal

  • CPU time stolen by hypervisor (should be near 0)
  • Alert: > 5%

Memory Metrics

Memory Usage %

  • RAM utilization
  • Range: 0-100%
  • Alert: > 85%

Memory Available

  • Free memory plus cache/buffers
  • Alert: < 10% of total

Swap Usage

  • Swap space used
  • Alert: > 0 (indicates memory pressure)

Disk Metrics

Disk I/O

  • Read/write MB/s
  • IOPS (operations per second)

Disk Usage %

  • Storage consumption
  • Alert: > 85%

Network Metrics

Network In/Out

  • Bandwidth usage (MB/s)
  • Track against allocation

Network Packets

  • Packets per second
  • Useful for diagnosing issues

Database Metrics

Performance Metrics

Query Time

  • Average query execution time
  • Alert: > 100ms average

Slow Queries

  • Queries exceeding threshold
  • Alert: > 10/minute

Throughput

  • Queries per second
  • Monitor for capacity planning

Connection Metrics

Active Connections

  • Current client connections
  • Alert: > 80% of max_connections

Connection Rate

  • New connections per second
  • Alert: Sudden spikes

Cache Metrics

Buffer Cache Hit Rate

  • Percentage of queries served from cache
  • Target: > 99%
  • Alert: < 95%

Cache Size

  • Memory used for query cache
  • Monitor for sizing

Replication Metrics

Replication Lag

  • Delay between primary and replica
  • Alert: > 5 seconds

Replication Status

  • Connected/Disconnected status
  • Alert: Disconnected

Cache (Redis) Metrics

Memory Metrics

Memory Usage

  • Current RAM consumption
  • Alert: > 90% of allocated

Memory Fragmentation

  • Ratio of RSS to used memory
  • Alert: > 1.5 (consider restart)

Evicted Keys

  • Keys removed due to memory pressure
  • Alert: > 100/second

Performance Metrics

Hit Rate

  • Cache hit percentage
  • Target: > 90%
  • Alert: < 80%

Operations/Sec

  • Commands processed per second
  • Monitor for capacity

Latency

  • Average command execution time
  • Alert: > 10ms

Connection Metrics

Connected Clients

  • Active connections
  • Alert: > 80% of max

Blocked Clients

  • Clients waiting on blocking operations
  • Alert: Sustained blocked clients

Metric Collection

Data Retention

  • 1-minute granularity: 1 hour
  • 5-minute granularity: 1 day
  • 15-minute granularity: 1 week
  • 1-hour granularity: 30 days

API Access

Fetch metrics programmatically:

curl -X GET \
  https://api.danubedata.com/v1/resources/{id}/metrics \
  -H 'Authorization: Bearer YOUR_TOKEN' \
  -d 'metric=cpu_usage&start=2024-10-01T00:00:00Z&end=2024-10-02T00:00:00Z'

Best Practices

  1. Regular Monitoring: Check metrics daily
  2. Set Baselines: Know normal values
  3. Correlate Metrics: Look at multiple metrics together
  4. Trend Analysis: Watch for gradual changes
  5. Alert Configuration: Set meaningful thresholds

Related Documentation