Alerting

Configure alerts to receive notifications when metrics exceed thresholds or resources experience issues.

Overview

DanubeData alerting features:

Email Notifications: Sent to configured addresses
Webhook Integration: POST to your endpoint
Threshold-Based: Trigger on metric thresholds
Status Changes: Alert on state changes
Customizable: Configure for your needs

Creating Alerts

Via Dashboard

Navigate to resource
Click Monitoring > Alerts
Click Create Alert
Configure:
- Metric: Select metric to monitor
- Condition: Threshold and comparison (>, <, =)
- Duration: How long condition must persist
- Notification Method: Email or webhook
- Recipients: Who to notify
Click Create Alert

Alert Types

Resource Alerts

CPU Alert

Metric: CPU Usage
Condition: > 90%
Duration: 5 minutes
Action: Email team@example.com

Memory Alert

Metric: Memory Usage
Condition: > 85%
Duration: 10 minutes
Action: Email ops@example.com

Disk Alert

Metric: Disk Usage
Condition: > 90%
Duration: 1 minute
Action: Email admin@example.com

Database Alerts

Connection Alert

Metric: Active Connections
Condition: > 400 (80% of 500 max)
Duration: 5 minutes
Action: Email dba@example.com

Replication Lag

Metric: Replication Lag
Condition: > 5 seconds
Duration: 2 minutes
Action: Email ops@example.com

Slow Queries

Metric: Slow Queries
Condition: > 10 per minute
Duration: 5 minutes
Action: Email dev@example.com

Cache Alerts

Hit Rate Alert

Metric: Cache Hit Rate
Condition: < 80%
Duration: 15 minutes
Action: Email ops@example.com

Memory Alert

Metric: Memory Usage
Condition: > 90%
Duration: 5 minutes
Action: Email ops@example.com

Notification Channels

Email Notifications

Configure email recipients:

Multiple email addresses
Distribution lists
Role-based emails

Email format includes:

Alert name and severity
Resource affected
Metric value and threshold
Time alert triggered
Direct link to resource

Webhook Notifications

POST JSON to your endpoint:

{
  "alert_id": "alert-123456",
  "alert_name": "High CPU Usage",
  "resource_id": "vps-789012",
  "resource_name": "web-server-1",
  "metric": "cpu_usage",
  "current_value": 95.5,
  "threshold": 90,
  "condition": "greater_than",
  "triggered_at": "2024-10-12T10:30:00Z",
  "severity": "warning",
  "status": "triggered"
}

Integrate with:

Slack
PagerDuty
Microsoft Teams
Custom systems

Alert States

Triggered

Alert condition met:

Notification sent
State: TRIGGERED
Dashboard shows active alert

Resolved

Condition no longer met:

Resolution notification sent
State: RESOLVED
Removed from active alerts

Acknowledged

Alert acknowledged by team:

Mark as acknowledged
State: ACKNOWLEDGED
Still shows in dashboard

Alert Management

Viewing Alerts

Navigate to Monitoring > Alerts
View:
- Active: Currently triggered alerts
- History: Past 30 days
- Configured: All alert rules

Modifying Alerts

Select alert rule
Click Edit
Update configuration
Click Save

Disabling Alerts

Temporarily disable alerts:

Select alert
Click Disable
Alert rule preserved but not evaluated

Re-enable when ready.

Deleting Alerts

Select alert rule
Click Delete
Confirm deletion

Best Practices

Alert Configuration

Start Conservative: Avoid alert fatigue
Tune Thresholds: Adjust based on baselines
Set Duration: Avoid flapping alerts
Severity Levels: Prioritize critical alerts
Test Alerts: Verify notifications work

Alert Response

Document Procedures: Runbooks for common alerts
Escalation Path: Define who handles what
Root Cause Analysis: Document alert causes
Continuous Improvement: Refine alert rules

Common Patterns

Gradual Degradation

Alert: CPU > 80% for 15 minutes
Warning: Early warning of issues

Immediate Critical

Alert: Disk > 95%
Critical: Immediate action needed

Informational

Alert: Deployment completed
Info: Status update only

Troubleshooting

Not Receiving Alerts

Check:

Email address correct
Not in spam folder
Webhook endpoint accessible
Alert rule enabled

Too Many Alerts

Solutions:

Increase threshold
Increase duration
Combine related alerts
Review and delete unnecessary alerts

False Positives