Risk Management & Business Continuity
DanubeData's risk management framework and business continuity plan, including recovery objectives, backup strategies, and disaster recovery procedures.
1. Introduction
DanubeData operates a structured risk management framework aligned with ISO 31000 principles and designed to satisfy the requirements of the CISPE Code of Conduct, in particular Section 5.4 (Risk Management) and Section 5.5 (Security Measures / Business Continuity). This document describes how we identify, assess, treat, and monitor risks across our infrastructure and services, and how our business continuity plan ensures service resilience for our customers.
All DanubeData production infrastructure is located in Falkenstein, Germany, within Hetzner certified data centers that hold ISO/IEC 27001 certification. Our platform runs on dedicated bare-metal servers operating a Kubernetes-based orchestration layer, providing managed VPS instances, databases, caches, object storage, serverless containers, static site hosting, and managed applications.
This framework applies to all DanubeData services, internal operations, and the processing of customer data. It is maintained by the DanubeData security and operations team, reviewed annually, and updated in response to material changes in the threat landscape, service portfolio, or regulatory environment.
2. Risk Management Framework
2.1 Risk Identification
DanubeData employs a systematic approach to identifying risks across all layers of the service stack:
- Asset-based identification — We maintain a comprehensive asset inventory covering physical infrastructure (dedicated servers, network equipment), virtualization components (KubeVirt, Kubernetes control plane), data services (databases, caches, object storage), application-layer services (control panel, API, GitOps pipelines), and customer data in transit and at rest.
- Cloud-specific threat landscape assessment — We evaluate threats specific to cloud infrastructure providers, including multi-tenant isolation failures, hypervisor escape vectors, container breakout scenarios, supply chain attacks on container images, and Kubernetes API server exposure.
- Vulnerability management program — Continuous vulnerability scanning is performed across operating system packages, container base images, Kubernetes components, and application dependencies. Critical vulnerabilities are triaged within 24 hours of disclosure. We track CVEs relevant to our stack (Linux kernel, KubeVirt, Kubernetes, PHP, MySQL, PostgreSQL, Redis, Ceph) and apply patches according to severity-based SLAs.
- Threat intelligence monitoring — We subscribe to security advisories from upstream projects (Kubernetes, KubeVirt, Cilium, ArgoCD, Rook-Ceph), CERT-EU bulletins, and Hetzner security notifications. Emerging threats are evaluated against our infrastructure within 48 hours of publication.
2.2 Risk Assessment
Each identified risk is assessed using a structured methodology to determine its severity and prioritization:
Methodology: We apply a likelihood-times-impact scoring matrix. Likelihood is rated on a 5-point scale (Rare, Unlikely, Possible, Likely, Almost Certain) and impact is rated on a 5-point scale (Negligible, Minor, Moderate, Major, Catastrophic). The product yields a risk score used for prioritization.
Risk categories:
- Operational — Hardware failures, software defects, capacity shortages, deployment errors, dependency outages.
- Security — Unauthorized access, data breaches, denial-of-service attacks, malware, insider threats, credential compromise.
- Compliance — GDPR non-compliance, CISPE Code violations, regulatory changes, audit findings, contractual breaches.
- Financial — Revenue loss from outages, unexpected infrastructure costs, billing disputes, fraud.
- Reputational — Public incidents, customer data loss, prolonged service degradation, negative media coverage.
Risk scoring and response timelines:
| Risk Level | Score Range | Required Action | Response Timeline |
|---|---|---|---|
| Critical | 20–25 | Immediate action required; escalation to management | Immediate |
| High | 12–19 | Action plan required; owner assigned | Within 30 days |
| Medium | 6–11 | Mitigation planned; scheduled for implementation | Within 90 days |
| Low | 1–5 | Accept and monitor; review at next cycle | Next quarterly review |
A formal risk register is maintained documenting each identified risk, its score, assigned owner, treatment decision, and current status. The register is reviewed quarterly by the operations team and annually by management.
2.3 Risk Treatment
For each assessed risk, one of the following treatment strategies is selected:
| Strategy | Description | Example |
|---|---|---|
| Mitigate | Implement controls to reduce likelihood or impact | Deploy Cilium network policies to enforce tenant isolation; enable encryption at rest via Vault Transit |
| Transfer | Shift risk to a third party through insurance or contract | Professional liability insurance; Hetzner SLA for hardware replacement |
| Accept | Acknowledge the risk when it falls within defined risk appetite | Minor UI availability degradation during planned maintenance windows |
| Avoid | Eliminate the risk by discontinuing the activity | Not offering services in jurisdictions with incompatible data protection regimes |
Control selection is prioritized by cost-effectiveness and the degree of risk reduction achieved. All controls are documented, and residual risk after treatment must fall within the organization's defined risk appetite. Where residual risk remains above the acceptable threshold, additional controls are implemented or the risk is escalated to management for an explicit acceptance decision.
2.4 Risk Monitoring & Review
Risk management is a continuous activity, not a point-in-time exercise. DanubeData employs the following monitoring and review practices:
- Continuous monitoring — Prometheus collects infrastructure and application metrics across all production nodes. Alertmanager routes alerts based on severity. Grafana dashboards provide real-time visibility into system health, resource utilization, error rates, and security events.
- Key Risk Indicators (KRIs) — The following indicators are tracked continuously and reported to management:
- Service availability (uptime percentage per service, per month)
- Security incident count and severity distribution
- Patch compliance rate (percentage of nodes at current patch level)
- Backup success rate (percentage of scheduled backups completed successfully)
- Mean time to detect (MTTD) and mean time to recover (MTTR) for incidents
- Vulnerability remediation SLA adherence
- Quarterly risk register review — The operations team reviews all entries in the risk register, updates risk scores based on new information, closes resolved risks, and identifies newly emerging risks.
- Annual comprehensive risk assessment — A full reassessment of all identified risks, including re-evaluation of the threat landscape, review of control effectiveness, and alignment with any changes to services, infrastructure, or regulatory requirements.
- Management reporting — A risk posture summary is prepared quarterly, covering the distribution of risks by level and category, treatment progress, KRI trends, and any material changes since the previous report.
3. Business Continuity Plan
3.1 Scope
This Business Continuity Plan (BCP) covers all DanubeData production services operated from Hetzner Germany data centers in Falkenstein and Nuremberg. It encompasses the Kubernetes control plane, all customer-facing services (VPS, databases, caches, object storage, serverless containers, static sites, managed applications, Storage Share), the control panel and API, supporting infrastructure (GitOps pipelines, monitoring, secrets management), and all customer data stored within these services.
The objective of this BCP is to ensure that DanubeData can maintain or rapidly restore critical services following a disruptive event, minimizing impact on customers and meeting the recovery objectives defined below.
3.2 Recovery Objectives
Recovery Time Objective (RTO) defines the maximum acceptable duration before a service must be restored. Recovery Point Objective (RPO) defines the maximum acceptable data loss measured in time. The following targets apply:
| Service | RTO | RPO | Recovery Method |
|---|---|---|---|
| VPS Instances | 4 hours | 24 hours | KubeVirt VirtualMachineSnapshot restore |
| Managed Databases (MySQL, PostgreSQL, MariaDB) | 1 hour | 5 minutes | Streaming replication failover + VolumeSnapshot |
| Cache Instances (Redis, Valkey, Dragonfly) | 2 hours | 1 hour | VolumeSnapshot restore + replica promotion |
| Object Storage (S3-compatible) | 4 hours | Near-zero | Multi-replica storage (erasure coding) |
| Serverless Containers | 2 hours | N/A (stateless) | Knative service redeployment from Git/registry |
| Static Sites | 1 hour | N/A (Git source) | Rebuild and redeploy from source repository |
| Managed Applications | 4 hours | 24 hours | Velero backup restore |
| Storage Share (Nextcloud) | 4 hours | 24 hours | Velero backup + S3 data restore |
| Control Panel & API | 2 hours | 5 minutes | Database failover + pod restart |
These objectives are tested regularly as described in Section 3.6 and are subject to revision based on test outcomes and evolving service requirements.
3.3 Backup Strategy
DanubeData follows the 3-2-1 backup rule: three copies of data, on two different storage media, with one copy stored offsite. This is implemented as follows:
- Primary data — Stored on local NVMe drives on dedicated servers, managed via TopoLVM (LVM thin provisioning) or Rook-Ceph (erasure-coded distributed storage), depending on the service.
- Local snapshots — Automated daily VolumeSnapshots via TopoLVM for all stateful services (databases, caches, VPS disks). KubeVirt VirtualMachineSnapshot for VPS instances captures both VM state and associated volumes. LVM snapshots are near-instantaneous and space-efficient (copy-on-write).
- Offsite backups — Velero performs namespace-level backups to our self-hosted S3-compatible object storage (Ceph RGW), providing redundancy. Backup schedules are configured per service category based on RPO requirements.
- Encryption — All backup media is encrypted using AES-256. Encryption keys are managed through HashiCorp Vault with Shamir seal (3-of-5 threshold). Vault Transit is used for encryption operations where applicable.
- Backup verification — Automated restore testing is performed to validate backup integrity. Restore operations are tested against non-production environments to confirm that backups produce functional, consistent service instances. Backup success rates are tracked as a Key Risk Indicator (see Section 2.4).
- Retention — Default retention is 30 days for automated snapshots. Customers may configure custom retention policies within their plan limits. Offsite Velero backups are retained for 30 days with configurable extensions.
3.4 Disaster Recovery Scenarios
The following scenarios are addressed by our disaster recovery procedures:
- Single node failure — Kubernetes automatically reschedules affected pods to healthy nodes. For stateful services with replicas (databases, caches), the replica is promoted to primary. VPS instances on the failed node are restored from VirtualMachineSnapshot on an alternate VPS pool node. Customer impact is limited to the RTO for the affected service category.
- Storage failure — If a local NVMe drive fails, affected volumes are restored from the most recent VolumeSnapshot (local) or Velero backup (offsite). For Ceph-backed services, erasure coding provides automatic resilience to individual disk failures without data loss or service interruption.
- Network failure — Redundant network paths are maintained at the data center level. Hetzner provides redundant uplinks and network infrastructure. Cilium provides in-cluster network resilience with automatic path selection. DNS failover is configured for critical endpoints.
- Data center incident — In the event of a complete data center loss, services are restored from offsite Velero backups to alternate Hetzner infrastructure. This scenario has the longest recovery time and is tested annually during BCP drills. Customer data integrity is maintained within the RPO targets defined in Section 3.2.
- Security breach — The incident response plan is activated immediately. Affected systems are contained and isolated. Forensic evidence is preserved before remediation. Affected customers are notified per GDPR Article 33/34 timelines. A full post-incident review is conducted, and findings are integrated into the risk register.
3.5 Communication During Incidents
Timely and transparent communication is a cornerstone of our incident management process:
- Status page — Real-time incident updates are published at status.danubedata.ro. The status page reflects current service health, ongoing incidents, and scheduled maintenance windows.
- Email notifications — Affected customers receive email notifications at the start of an incident, at significant status changes, and upon resolution. Notifications include the nature of the issue, affected services, estimated time to resolution, and any customer actions required.
- Support ticket updates — Customers who have open support tickets related to an incident receive direct updates through the ticket system with technical details specific to their case.
- Post-incident reports — A detailed post-incident report is published within 5 business days of incident resolution. Reports include root cause analysis, timeline of events, impact assessment, remediation actions taken, and preventive measures implemented to avoid recurrence.
3.6 BCP Testing
The business continuity plan is tested regularly to validate its effectiveness and identify areas for improvement:
- Annual full BCP test — A comprehensive disaster recovery drill is conducted annually, simulating a major disruption scenario (e.g., complete node failure, data center evacuation). The drill exercises the full recovery process from detection through communication to service restoration, and measures actual RTO/RPO against targets.
- Quarterly backup restore verification — Each quarter, a sample of backups across all service categories is restored to a non-production environment. Restored services are validated for data integrity, functional correctness, and performance characteristics.
- Tabletop exercises — Scenario-based tabletop exercises are conducted for critical scenarios including security breaches, supply chain disruptions, and cascading failures. These exercises test decision-making processes and communication protocols without impacting production systems.
- Lessons learned — All tests, drills, and actual incidents produce a lessons-learned document. Findings are prioritized, assigned to owners, and tracked to completion. Material findings are reflected in updates to this BCP and the risk register.
4. Supply Chain Risk Management
DanubeData relies on a limited number of carefully selected sub-processors and infrastructure providers. Supply chain risk is managed as follows:
- Pre-engagement evaluation — All sub-processors undergo a security and data protection evaluation before engagement. Assessment criteria include: relevant certifications (ISO 27001, SOC 2), data protection practices and GDPR compliance posture, incident response capability and notification timelines, business continuity and disaster recovery provisions, and jurisdictional considerations (EU data residency preference). The current list of sub-processors is published at /sub-processors.
- Ongoing monitoring — Supplier security posture is monitored continuously through review of security advisories, incident notifications, and certification renewals. Any material change in a supplier's security posture triggers an immediate reassessment.
- Contingency planning — For each critical supplier, a contingency plan is maintained that identifies alternative providers and documents the migration path. Critical dependencies (Hetzner for infrastructure, container registries for base images) are assessed for single-point-of-failure risk, and mitigation strategies are documented.
- Contract and SLA review — Supplier contracts and SLAs are reviewed annually to ensure alignment with DanubeData's security requirements, data protection obligations, and service level commitments to customers.
5. Compliance Risk Management
DanubeData operates within a regulated environment and actively manages compliance-related risks:
- GDPR compliance monitoring — Ongoing monitoring of data processing activities for alignment with GDPR requirements. Records of processing activities are maintained under Article 30. Data subject rights requests are tracked and fulfilled within statutory timelines. Technical and organizational measures are reviewed regularly for adequacy.
- CISPE Code of Conduct adherence — Annual review of compliance with the CISPE Code of Conduct for Cloud Infrastructure Service Providers, covering transparency, security, data protection, and customer rights. This document forms part of that compliance evidence.
- Regulatory change monitoring — EU data protection regulatory developments are monitored, including amendments to GDPR, ePrivacy Regulation progress, EU Data Act implementation, and relevant national transposition measures. Impact assessments are performed for material regulatory changes.
- Data Protection Impact Assessments (DPIAs) — DPIAs are performed when introducing new processing activities or technologies that present a high risk to the rights and freedoms of data subjects, as required by GDPR Article 35. DPIAs are also triggered by significant changes to existing processing operations, adoption of new sub-processors, or entry into new markets or service categories.
6. Framework Review
This risk management framework and business continuity plan is reviewed comprehensively at least once per year. Reviews are also triggered by material changes to the service portfolio, infrastructure architecture, threat landscape, regulatory environment, or following any significant incident.
Material changes to this framework are communicated to customers via email notification and publication on the DanubeData platform. The "Last updated" date at the top of this page reflects the most recent revision.
7. Contact
For questions regarding DanubeData's risk management framework or business continuity plan:
Email: security@danubedata.ro
Support: Contact Form
Questions about this policy?
If you have any questions or concerns, please contact our legal team.
Contact Us