Documentation

Alerting

Configure alerts to receive notifications when metrics exceed thresholds or resources experience issues.

Overview

DanubeData alerting features:

  • Email Notifications: Sent to configured addresses
  • Webhook Integration: POST to your endpoint
  • Threshold-Based: Trigger on metric thresholds
  • Status Changes: Alert on state changes
  • Customizable: Configure for your needs

Creating Alerts

Via Dashboard

  1. Navigate to resource
  2. Click Monitoring > Alerts
  3. Click Create Alert
  4. Configure:
    • Metric: Select metric to monitor
    • Condition: Threshold and comparison (>, <, =)
    • Duration: How long condition must persist
    • Notification Method: Email or webhook
    • Recipients: Who to notify
  5. Click Create Alert

Alert Types

Resource Alerts

CPU Alert

Metric: CPU Usage
Condition: > 90%
Duration: 5 minutes
Action: Email team@example.com

Memory Alert

Metric: Memory Usage
Condition: > 85%
Duration: 10 minutes
Action: Email ops@example.com

Disk Alert

Metric: Disk Usage
Condition: > 90%
Duration: 1 minute
Action: Email admin@example.com

Database Alerts

Connection Alert

Metric: Active Connections
Condition: > 400 (80% of 500 max)
Duration: 5 minutes
Action: Email dba@example.com

Replication Lag

Metric: Replication Lag
Condition: > 5 seconds
Duration: 2 minutes
Action: Email ops@example.com

Slow Queries

Metric: Slow Queries
Condition: > 10 per minute
Duration: 5 minutes
Action: Email dev@example.com

Cache Alerts

Hit Rate Alert

Metric: Cache Hit Rate
Condition: < 80%
Duration: 15 minutes
Action: Email ops@example.com

Memory Alert

Metric: Memory Usage
Condition: > 90%
Duration: 5 minutes
Action: Email ops@example.com

Notification Channels

Email Notifications

Configure email recipients:

  • Multiple email addresses
  • Distribution lists
  • Role-based emails

Email format includes:

  • Alert name and severity
  • Resource affected
  • Metric value and threshold
  • Time alert triggered
  • Direct link to resource

Webhook Notifications

POST JSON to your endpoint:

{
  "alert_id": "alert-123456",
  "alert_name": "High CPU Usage",
  "resource_id": "vps-789012",
  "resource_name": "web-server-1",
  "metric": "cpu_usage",
  "current_value": 95.5,
  "threshold": 90,
  "condition": "greater_than",
  "triggered_at": "2024-10-12T10:30:00Z",
  "severity": "warning",
  "status": "triggered"
}

Integrate with:

  • Slack
  • PagerDuty
  • Microsoft Teams
  • Custom systems

Alert States

Triggered

Alert condition met:

  • Notification sent
  • State: TRIGGERED
  • Dashboard shows active alert

Resolved

Condition no longer met:

  • Resolution notification sent
  • State: RESOLVED
  • Removed from active alerts

Acknowledged

Alert acknowledged by team:

  • Mark as acknowledged
  • State: ACKNOWLEDGED
  • Still shows in dashboard

Alert Management

Viewing Alerts

  1. Navigate to Monitoring > Alerts
  2. View:
    • Active: Currently triggered alerts
    • History: Past 30 days
    • Configured: All alert rules

Modifying Alerts

  1. Select alert rule
  2. Click Edit
  3. Update configuration
  4. Click Save

Disabling Alerts

Temporarily disable alerts:

  1. Select alert
  2. Click Disable
  3. Alert rule preserved but not evaluated

Re-enable when ready.

Deleting Alerts

  1. Select alert rule
  2. Click Delete
  3. Confirm deletion

Best Practices

Alert Configuration

  1. Start Conservative: Avoid alert fatigue
  2. Tune Thresholds: Adjust based on baselines
  3. Set Duration: Avoid flapping alerts
  4. Severity Levels: Prioritize critical alerts
  5. Test Alerts: Verify notifications work

Alert Response

  1. Document Procedures: Runbooks for common alerts
  2. Escalation Path: Define who handles what
  3. Root Cause Analysis: Document alert causes
  4. Continuous Improvement: Refine alert rules

Common Patterns

Gradual Degradation

Alert: CPU > 80% for 15 minutes
Warning: Early warning of issues

Immediate Critical

Alert: Disk > 95%
Critical: Immediate action needed

Informational

Alert: Deployment completed
Info: Status update only

Troubleshooting

Not Receiving Alerts

Check:

  • Email address correct
  • Not in spam folder
  • Webhook endpoint accessible
  • Alert rule enabled

Too Many Alerts

Solutions:

  • Increase threshold
  • Increase duration
  • Combine related alerts
  • Review and delete unnecessary alerts

False Positives

Solutions:

  • Adjust threshold
  • Increase duration requirement
  • Review metric baseline
  • Consider time-based rules

Related Documentation