Alerting
Configure alerts to receive notifications when metrics exceed thresholds or resources experience issues.
Overview
DanubeData alerting features:
- Email Notifications: Sent to configured addresses
- Webhook Integration: POST to your endpoint
- Threshold-Based: Trigger on metric thresholds
- Status Changes: Alert on state changes
- Customizable: Configure for your needs
Creating Alerts
Via Dashboard
- Navigate to resource
- Click Monitoring > Alerts
- Click Create Alert
- Configure:
- Metric: Select metric to monitor
- Condition: Threshold and comparison (>, <, =)
- Duration: How long condition must persist
- Notification Method: Email or webhook
- Recipients: Who to notify
- Click Create Alert
Alert Types
Resource Alerts
CPU Alert
Metric: CPU Usage
Condition: > 90%
Duration: 5 minutes
Action: Email team@example.com
Memory Alert
Metric: Memory Usage
Condition: > 85%
Duration: 10 minutes
Action: Email ops@example.com
Disk Alert
Metric: Disk Usage
Condition: > 90%
Duration: 1 minute
Action: Email admin@example.com
Database Alerts
Connection Alert
Metric: Active Connections
Condition: > 400 (80% of 500 max)
Duration: 5 minutes
Action: Email dba@example.com
Replication Lag
Metric: Replication Lag
Condition: > 5 seconds
Duration: 2 minutes
Action: Email ops@example.com
Slow Queries
Metric: Slow Queries
Condition: > 10 per minute
Duration: 5 minutes
Action: Email dev@example.com
Cache Alerts
Hit Rate Alert
Metric: Cache Hit Rate
Condition: < 80%
Duration: 15 minutes
Action: Email ops@example.com
Memory Alert
Metric: Memory Usage
Condition: > 90%
Duration: 5 minutes
Action: Email ops@example.com
Notification Channels
Email Notifications
Configure email recipients:
- Multiple email addresses
- Distribution lists
- Role-based emails
Email format includes:
- Alert name and severity
- Resource affected
- Metric value and threshold
- Time alert triggered
- Direct link to resource
Webhook Notifications
POST JSON to your endpoint:
{
"alert_id": "alert-123456",
"alert_name": "High CPU Usage",
"resource_id": "vps-789012",
"resource_name": "web-server-1",
"metric": "cpu_usage",
"current_value": 95.5,
"threshold": 90,
"condition": "greater_than",
"triggered_at": "2024-10-12T10:30:00Z",
"severity": "warning",
"status": "triggered"
}
Integrate with:
- Slack
- PagerDuty
- Microsoft Teams
- Custom systems
Alert States
Triggered
Alert condition met:
- Notification sent
- State: TRIGGERED
- Dashboard shows active alert
Resolved
Condition no longer met:
- Resolution notification sent
- State: RESOLVED
- Removed from active alerts
Acknowledged
Alert acknowledged by team:
- Mark as acknowledged
- State: ACKNOWLEDGED
- Still shows in dashboard
Alert Management
Viewing Alerts
- Navigate to Monitoring > Alerts
- View:
- Active: Currently triggered alerts
- History: Past 30 days
- Configured: All alert rules
Modifying Alerts
- Select alert rule
- Click Edit
- Update configuration
- Click Save
Disabling Alerts
Temporarily disable alerts:
- Select alert
- Click Disable
- Alert rule preserved but not evaluated
Re-enable when ready.
Deleting Alerts
- Select alert rule
- Click Delete
- Confirm deletion
Best Practices
Alert Configuration
- Start Conservative: Avoid alert fatigue
- Tune Thresholds: Adjust based on baselines
- Set Duration: Avoid flapping alerts
- Severity Levels: Prioritize critical alerts
- Test Alerts: Verify notifications work
Alert Response
- Document Procedures: Runbooks for common alerts
- Escalation Path: Define who handles what
- Root Cause Analysis: Document alert causes
- Continuous Improvement: Refine alert rules
Common Patterns
Gradual Degradation
Alert: CPU > 80% for 15 minutes
Warning: Early warning of issues
Immediate Critical
Alert: Disk > 95%
Critical: Immediate action needed
Informational
Alert: Deployment completed
Info: Status update only
Troubleshooting
Not Receiving Alerts
Check:
- Email address correct
- Not in spam folder
- Webhook endpoint accessible
- Alert rule enabled
Too Many Alerts
Solutions:
- Increase threshold
- Increase duration
- Combine related alerts
- Review and delete unnecessary alerts
False Positives
Solutions:
- Adjust threshold
- Increase duration requirement
- Review metric baseline
- Consider time-based rules