Danube Data - Data, on the right course

1. Introduction

DanubeData operates a structured risk management framework aligned with ISO 31000 principles and designed to satisfy the requirements of the CISPE Code of Conduct, in particular Section 5.4 (Risk Management) and Section 5.5 (Security Measures / Business Continuity). This document describes how we identify, assess, treat, and monitor risks across our infrastructure and services, and how our business continuity plan ensures service resilience for our customers.

All DanubeData production infrastructure is located in Falkenstein, Germany, within Hetzner certified data centers that hold ISO/IEC 27001 certification. Our platform runs on dedicated bare-metal servers operating a Kubernetes-based orchestration layer, providing managed VPS instances, databases, caches, object storage, serverless containers, static site hosting, and managed applications.

This framework applies to all DanubeData services, internal operations, and the processing of customer data. It is maintained by the DanubeData security and operations team, reviewed annually, and updated in response to material changes in the threat landscape, service portfolio, or regulatory environment.

2. Risk Management Framework

2.1 Risk Identification

DanubeData employs a systematic approach to identifying risks across all layers of the service stack:

Asset-based identification — We maintain a comprehensive asset inventory covering physical infrastructure (dedicated servers, network equipment), virtualization components (KubeVirt, Kubernetes control plane), data services (databases, caches, object storage), application-layer services (control panel, API, GitOps pipelines), and customer data in transit and at rest.
Cloud-specific threat landscape assessment — We evaluate threats specific to cloud infrastructure providers, including multi-tenant isolation failures, hypervisor escape vectors, container breakout scenarios, supply chain attacks on container images, and Kubernetes API server exposure.
Vulnerability management program — Continuous vulnerability scanning is performed across operating system packages, container base images, Kubernetes components, and application dependencies. Critical vulnerabilities are triaged within 24 hours of disclosure. We track CVEs relevant to our stack (Linux kernel, KubeVirt, Kubernetes, PHP, MySQL, PostgreSQL, Redis, Ceph) and apply patches according to severity-based SLAs.
Threat intelligence monitoring — We subscribe to security advisories from upstream projects (Kubernetes, KubeVirt, Cilium, ArgoCD, Rook-Ceph), CERT-EU bulletins, and Hetzner security notifications. Emerging threats are evaluated against our infrastructure within 48 hours of publication.

2.2 Risk Assessment

Each identified risk is assessed using a structured methodology to determine its severity and prioritization:

Methodology: We apply a likelihood-times-impact scoring matrix. Likelihood is rated on a 5-point scale (Rare, Unlikely, Possible, Likely, Almost Certain) and impact is rated on a 5-point scale (Negligible, Minor, Moderate, Major, Catastrophic). The product yields a risk score used for prioritization.

Risk categories:

Operational — Hardware failures, software defects, capacity shortages, deployment errors, dependency outages.
Security — Unauthorized access, data breaches, denial-of-service attacks, malware, insider threats, credential compromise.
Compliance — GDPR non-compliance, CISPE Code violations, regulatory changes, audit findings, contractual breaches.
Financial — Revenue loss from outages, unexpected infrastructure costs, billing disputes, fraud.
Reputational — Public incidents, customer data loss, prolonged service degradation, negative media coverage.

Risk scoring and response timelines:

Risk Level	Score Range	Required Action	Response Timeline
Critical	20–25	Immediate action required; escalation to management	Immediate
High	12–19	Action plan required; owner assigned	Within 30 days
Medium	6–11	Mitigation planned; scheduled for implementation	Within 90 days
Low	1–5	Accept and monitor; review at next cycle	Next quarterly review

A formal risk register is maintained documenting each identified risk, its score, assigned owner, treatment decision, and current status. The register is reviewed quarterly by the operations team and annually by management.

2.3 Risk Treatment

For each assessed risk, one of the following treatment strategies is selected:

Strategy	Description	Example
Mitigate	Implement controls to reduce likelihood or impact	Deploy Cilium network policies to enforce tenant isolation; enable encryption at rest via Vault Transit
Transfer	Shift risk to a third party through insurance or contract	Professional liability insurance; Hetzner SLA for hardware replacement
Accept	Acknowledge the risk when it falls within defined risk appetite	Minor UI availability degradation during planned maintenance windows
Avoid	Eliminate the risk by discontinuing the activity	Not offering services in jurisdictions with incompatible data protection regimes

Control selection is prioritized by cost-effectiveness and the degree of risk reduction achieved. All controls are documented, and residual risk after treatment must fall within the organization's defined risk appetite. Where residual risk remains above the acceptable threshold, additional controls are implemented or the risk is escalated to management for an explicit acceptance decision.

2.4 Risk Monitoring & Review

Risk management is a continuous activity, not a point-in-time exercise. DanubeData employs the following monitoring and review practices:

Continuous monitoring — Prometheus collects infrastructure and application metrics across all production nodes. Alertmanager routes alerts based on severity. Grafana dashboards provide real-time visibility into system health, resource utilization, error rates, and security events.
Key Risk Indicators (KRIs) — The following indicators are tracked continuously and reported to management:
- Service availability (uptime percentage per service, per month)
- Security incident count and severity distribution
- Patch compliance rate (percentage of nodes at current patch level)
- Backup success rate (percentage of scheduled backups completed successfully)
- Mean time to detect (MTTD) and mean time to recover (MTTR) for incidents
- Vulnerability remediation SLA adherence
Quarterly risk register review — The operations team reviews all entries in the risk register, updates risk scores based on new information, closes resolved risks, and identifies newly emerging risks.
Annual comprehensive risk assessment — A full reassessment of all identified risks, including re-evaluation of the threat landscape, review of control effectiveness, and alignment with any changes to services, infrastructure, or regulatory requirements.
Management reporting — A risk posture summary is prepared quarterly, covering the distribution of risks by level and category, treatment progress, KRI trends, and any material changes since the previous report.

3. Business Continuity Plan

3.1 Scope

This Business Continuity Plan (BCP) covers all DanubeData production services operated from Hetzner Germany data centers in Falkenstein and Nuremberg. It encompasses the Kubernetes control plane, all customer-facing services (VPS, databases, caches, object storage, serverless containers, static sites, managed applications, Storage Share), the control panel and API, supporting infrastructure (GitOps pipelines, monitoring, secrets management), and all customer data stored within these services.

The objective of this BCP is to ensure that DanubeData can maintain or rapidly restore critical services following a disruptive event, minimizing impact on customers and meeting the recovery objectives defined below.

3.2 Recovery Objectives

Recovery Time Objective (RTO) defines the maximum acceptable duration before a service must be restored. Recovery Point Objective (RPO) defines the maximum acceptable data loss measured in time. The following targets apply:

Service	RTO	RPO	Recovery Method
VPS Instances	4 hours	24 hours	KubeVirt VirtualMachineSnapshot restore
Managed Databases (MySQL, PostgreSQL, MariaDB)	1 hour	5 minutes	Streaming replication failover + VolumeSnapshot
Cache Instances (Redis, Valkey, Dragonfly)	2 hours	1 hour	VolumeSnapshot restore + replica promotion
Object Storage (S3-compatible)	4 hours	Near-zero	Multi-replica storage (erasure coding)
Serverless Containers	2 hours	N/A (stateless)	Knative service redeployment from Git/registry
Static Sites	1 hour	N/A (Git source)	Rebuild and redeploy from source repository
Managed Applications	4 hours	24 hours	Velero backup restore
Storage Share (Nextcloud)	4 hours	24 hours	Velero backup + S3 data restore
Control Panel & API	2 hours	5 minutes	Database failover + pod restart

These objectives are tested regularly as described in Section 3.6 and are subject to revision based on test outcomes and evolving service requirements.

3.3 Backup Strategy

DanubeData follows the 3-2-1 backup rule: three copies of data, on two different storage media, with one copy stored offsite. This is implemented as follows:

Primary data — Stored on local NVMe drives on dedicated servers, managed via TopoLVM (LVM thin provisioning) or Rook-Ceph (erasure-coded distributed storage), depending on the service.
Local snapshots — Automated daily VolumeSnapshots via TopoLVM for all stateful services (databases, caches, VPS disks). KubeVirt VirtualMachineSnapshot for VPS instances captures both VM state and associated volumes. LVM snapshots are near-instantaneous and space-efficient (copy-on-write).
Offsite backups — Velero performs namespace-level backups to our self-hosted S3-compatible object storage (Ceph RGW), providing redundancy. Backup schedules are configured per service category based on RPO requirements.
Encryption — All backup media is encrypted using AES-256. Encryption keys are managed through HashiCorp Vault with Shamir seal (3-of-5 threshold). Vault Transit is used for encryption operations where applicable.
Backup verification — Automated restore testing is performed to validate backup integrity. Restore operations are tested against non-production environments to confirm that backups produce functional, consistent service instances. Backup success rates are tracked as a Key Risk Indicator (see Section 2.4).
Retention — Default retention is 30 days for automated snapshots. Customers may configure custom retention policies within their plan limits. Offsite Velero backups are retained for 30 days with configurable extensions.

3.4 Disaster Recovery Scenarios

The following scenarios are addressed by our disaster recovery procedures:

Single node failure — Kubernetes automatically reschedules affected pods to healthy nodes. For stateful services with replicas (databases, caches), the replica is promoted to primary. VPS instances on the failed node are restored from VirtualMachineSnapshot on an alternate VPS pool node. Customer impact is limited to the RTO for the affected service category.
Storage failure — If a local NVMe drive fails, affected volumes are restored from the most recent VolumeSnapshot (local) or Velero backup (offsite). For Ceph-backed services, erasure coding provides automatic resilience to individual disk failures without data loss or service interruption.
Network failure — Redundant network paths are maintained at the data center level. Hetzner provides redundant uplinks and network infrastructure. Cilium provides in-cluster network resilience with automatic path selection. DNS failover is configured for critical endpoints.
Data center incident — In the event of a complete data center loss, services are restored from offsite Velero backups to alternate Hetzner infrastructure. This scenario has the longest recovery time and is tested annually during BCP drills. Customer data integrity is maintained within the RPO targets defined in Section 3.2.
Security breach — The incident response plan is activated immediately. Affected systems are contained and isolated. Forensic evidence is preserved before remediation. Affected customers are notified per GDPR Article 33/34 timelines. A full post-incident review is conducted, and findings are integrated into the risk register.

3.5 Communication During Incidents

Timely and transparent communication is a cornerstone of our incident management process:

Status page — Real-time incident updates are published at status.danubedata.ro. The status page reflects current service health, ongoing incidents, and scheduled maintenance windows.
Email notifications — Affected customers receive email notifications at the start of an incident, at significant status changes, and upon resolution. Notifications include the nature of the issue, affected services, estimated time to resolution, and any customer actions required.
Support ticket updates — Customers who have open support tickets related to an incident receive direct updates through the ticket system with technical details specific to their case.
Post-incident reports — A detailed post-incident report is published within 5 business days of incident resolution. Reports include root cause analysis, timeline of events, impact assessment, remediation actions taken, and preventive measures implemented to avoid recurrence.

3.6 BCP Testing

The business continuity plan is tested regularly to validate its effectiveness and identify areas for improvement:

Annual full BCP test — A comprehensive disaster recovery drill is conducted annually, simulating a major disruption scenario (e.g., complete node failure, data center evacuation). The drill exercises the full recovery process from detection through communication to service restoration, and measures actual RTO/RPO against targets.
Quarterly backup restore verification — Each quarter, a sample of backups across all service categories is restored to a non-production environment. Restored services are validated for data integrity, functional correctness, and performance characteristics.
Tabletop exercises — Scenario-based tabletop exercises are conducted for critical scenarios including security breaches, supply chain disruptions, and cascading failures. These exercises test decision-making processes and communication protocols without impacting production systems.
Lessons learned — All tests, drills, and actual incidents produce a lessons-learned document. Findings are prioritized, assigned to owners, and tracked to completion. Material findings are reflected in updates to this BCP and the risk register.

4. Supply Chain Risk Management

DanubeData relies on a limited number of carefully selected sub-processors and infrastructure providers. Supply chain risk is managed as follows:

Pre-engagement evaluation — All sub-processors undergo a security and data protection evaluation before engagement. Assessment criteria include: relevant certifications (ISO 27001, SOC 2), data protection practices and GDPR compliance posture, incident response capability and notification timelines, business continuity and disaster recovery provisions, and jurisdictional considerations (EU data residency preference). The current list of sub-processors is published at /sub-processors.
Ongoing monitoring — Supplier security posture is monitored continuously through review of security advisories, incident notifications, and certification renewals. Any material change in a supplier's security posture triggers an immediate reassessment.
Contingency planning — For each critical supplier, a contingency plan is maintained that identifies alternative providers and documents the migration path. Critical dependencies (Hetzner for infrastructure, container registries for base images) are assessed for single-point-of-failure risk, and mitigation strategies are documented.
Contract and SLA review — Supplier contracts and SLAs are reviewed annually to ensure alignment with DanubeData's security requirements, data protection obligations, and service level commitments to customers.

5. Compliance Risk Management

DanubeData operates within a regulated environment and actively manages compliance-related risks:

GDPR compliance monitoring — Ongoing monitoring of data processing activities for alignment with GDPR requirements. Records of processing activities are maintained under Article 30. Data subject rights requests are tracked and fulfilled within statutory timelines. Technical and organizational measures are reviewed regularly for adequacy.
CISPE Code of Conduct adherence — Annual review of compliance with the CISPE Code of Conduct for Cloud Infrastructure Service Providers, covering transparency, security, data protection, and customer rights. This document forms part of that compliance evidence.
Regulatory change monitoring — EU data protection regulatory developments are monitored, including amendments to GDPR, ePrivacy Regulation progress, EU Data Act implementation, and relevant national transposition measures. Impact assessments are performed for material regulatory changes.
Data Protection Impact Assessments (DPIAs) — DPIAs are performed when introducing new processing activities or technologies that present a high risk to the rights and freedoms of data subjects, as required by GDPR Article 35. DPIAs are also triggered by significant changes to existing processing operations, adoption of new sub-processors, or entry into new markets or service categories.

6. Framework Review

This risk management framework and business continuity plan is reviewed comprehensively at least once per year. Reviews are also triggered by material changes to the service portfolio, infrastructure architecture, threat landscape, regulatory environment, or following any significant incident.

Material changes to this framework are communicated to customers via email notification and publication on the DanubeData platform. The "Last updated" date at the top of this page reflects the most recent revision.

7. Contact

For questions regarding DanubeData's risk management framework or business continuity plan:

Email: security@danubedata.ro
Support: Contact Form

Get Started Free

Compute

Storage

Managed Apps

Caching

Databases

Messaging

DanubeData CLI

Infrastructure as Code

Tools

Developer Docs

Learn

Support

Risk Management & Business Continuity