IT Guides

What Are Backup and Disaster Recovery? Practical Guide to RPO, RTO, 3-2-1 Backup and Ransomware Recovery

A practical guide to backup and disaster recovery covering RPO, RTO, 3-2-1 backup, immutable and offline backups, database and Kubernetes backup, ransomware recovery, DR runbooks, restore testing, and a 90-day rollout roadmap.

Published: Jun 5, 2026Updated: Jun 5, 2026Reading time: 12 minViews: 0

BackupDisaster RecoveryRPORTO3-2-1 BackupImmutable BackupRansomware RecoveryDatabase BackupKubernetes BackupVelero

💡Key Takeaways

A practical guide to backup and disaster recovery covering RPO, RTO, 3-2-1 backup, immutable and offline backups, database and Kubernetes backup, ransomware recovery, DR runbooks, restore testing, and a 90-day rollout roadmap.

What Are Backup and Disaster Recovery? A Practical Guide to RPO, RTO, 3-2-1 Backup, Immutable Backup and Ransomware Recovery

Raster/preview image checked for display before being inserted into this Markdown file, used to illustrate Kubernetes backup and disaster recovery. Not SVG.¹

Raster/preview image checked for display before being inserted into this Markdown file, used to illustrate open-source backup tooling. Not SVG.²

Raster/preview image checked for display before being inserted into this Markdown file, used to illustrate deduplicating backup. Not SVG.³

Quick summary

Backup is a copy of data, configuration or systems that can be restored after deletion, corruption, hardware failure, ransomware, deployment failure or infrastructure outage. Disaster Recovery, or DR, is the plan, architecture and operating process used to restore services after a major incident, not just restore files.

NIST SP 800-34 Rev. 1 describes contingency planning guidance that helps organizations understand the purpose, process and format of information system contingency plans, and evaluate systems and operations to determine recovery requirements and priorities.⁴ AWS Well-Architected Reliability Pillar states that RTO and RPO are restoration objectives, and DR strategy should be based on business needs, workload resources, disruption probability and recovery cost.⁵

Simple version: backup answers “do we still have the data?”; disaster recovery answers “can the service run again, how fast, and how much data did we lose?”

Why backup alone is not enough

A backup is useful only if you can restore the right data, from the right point in time, into the right system, within RTO/RPO limits.

Teams often have backups but still fail recovery because:

restores were never tested;
ransomware encrypted backup repositories;
database transaction logs are missing;
encryption keys are unavailable;
app configuration or IaC is missing;
backups are incompatible with new versions;
restore is too slow;
the clean restore point is unknown;
backups are in the same compromised cloud account;
no runbook defines who does what during an incident.

Backup must be paired with DR planning, restore tests, monitoring, access control and incident runbooks.

Backup, High Availability and Disaster Recovery

Concept	Goal	Example
Backup	preserve recoverable copies	hourly database snapshots, object backups
High Availability	reduce downtime from component failure	load balancers, multi-node clusters, failover
Disaster Recovery	recover after major disruption	region failover, rebuild cluster, restore database
Business Continuity	keep business operations running	manual processes, customer communication

AWS also distinguishes Availability and Disaster Recovery: both rely on practices such as monitoring, multi-location deployment and automatic failover, but DR focuses on copies of the entire workload and recovery time after disaster.⁵

What are RPO and RTO?

The two most important DR metrics are RPO and RTO.

Metric	Meaning	Example
RPO	Recovery Point Objective, maximum acceptable data loss	lose at most 15 minutes of data
RTO	Recovery Time Objective, maximum acceptable restoration time	restore service within 2 hours
MTD	Maximum Tolerable Downtime	maximum 8 hours
WRT	Work Recovery Time after technical restore	reconciliation, validation, reopening operations

Example:

Payment system:
RPO = 5 minutes
RTO = 30 minutes

Internal blog:
RPO = 24 hours
RTO = 2 days

Not every system needs very low RPO/RTO. Lower RPO/RTO usually costs more.

What is 3-2-1 backup?

Common rule:

3 copies of data
2 different storage/media types
1 offsite copy

For ransomware resilience, many teams extend this to:

3-2-1-1-0
3 copies
2 storage types
1 offsite copy
1 offline or immutable copy
0 errors after verification/restore testing

Practical meaning:

production data is not the only copy;
backup is not in the same permission boundary as production;
at least one copy resists deletion/modification;
restore is tested;
backup job failures are monitored.

Immutable and offline backups

Immutable backup means backup data cannot be modified or deleted during a retention period, even with normal credentials. Examples include object storage Object Lock, WORM storage or backup appliances with immutability.

AWS S3 Object Lock stores objects using a write-once-read-many model and can prevent object deletion or overwrite for a fixed retention period.⁶ Azure Blob immutable storage supports WORM retention through time-based retention or legal hold policies.⁷

Offline backup means a backup copy is not continuously connected, such as tape, offline disk, offline vault or air-gapped repository.

Recommendations:

keep at least one immutable/offline copy;
separate backup admin from production admin;
do not let apps or CI/CD delete backups;
enable MFA delete or equivalent where available;
monitor deletion and retention changes;
test restore from immutable/offline copies.

Types of backup

Type	Description	Strength	Weakness
Full backup	backs up everything	simple restore	time and storage heavy
Incremental	backs up changes since previous backup	efficient	restore depends on chain
Differential	backs up changes since last full backup	easier restore than incremental	grows over time
Snapshot	captures volume/storage state	fast	may not be app-consistent
Logical backup	exports data in logical format	portable	slow for large databases
Physical backup	copies data files/blocks	efficient for large DBs	version/config sensitive
Continuous backup	journals changes continuously	low RPO	more complex and costly
Replication	copies changes elsewhere	fast failover	mistakes/deletions may replicate

There is no universal best backup type. Choose based on RPO, RTO, data, cost and operations.

Application-consistent vs crash-consistent

Storage snapshots are fast, but not always safe for databases.

Type	Meaning
Crash-consistent	like the system after sudden power loss
Application-consistent	app/database flushed data before backup
Transaction-consistent	database can recover to a consistent transaction point

For databases, understand:

whether snapshots integrate with database freeze/hooks;
whether logs/WAL/binlogs are backed up;
whether restore needs log replay;
whether queries are tested after restore;
whether multi-volume consistency is guaranteed.

Database backup

PostgreSQL

PostgreSQL documentation covers SQL dump, file-system level backup and continuous archiving/point-in-time recovery.⁸

Common strategy:

Daily base backup
+ continuous WAL archiving
= point-in-time recovery

Checklist:

use pg_dump for smaller logical backups;
use physical base backup for larger systems;
archive WAL;
test PITR;
back up roles/users/extensions;
verify checksums;
monitor replication lag;
encrypt backups;
record PostgreSQL version.

MySQL

MySQL documentation includes backup and recovery guidance.⁹ Consider:

logical dumps with mysqldump;
physical backups with suitable tools;
binary logs for point-in-time recovery;
replication;
GTID;
InnoDB consistency;
users/grants backup.

MongoDB

MongoDB documentation covers backups.¹⁰ Consider:

snapshots for replica sets or sharded clusters;
mongodump where appropriate;
oplog-based recovery strategies;
version compatibility;
restore tests.

Kubernetes backup

Kubernetes backup is not just backing up container images. Back up:

Kubernetes resources: Deployments, Services, Ingresses, ConfigMaps, Secrets, CRDs;
PersistentVolumes and PVC data;
Helm values;
namespace labels/annotations;
RBAC;
admission policies;
cluster-level configuration;
external dependencies;
etcd for self-managed clusters.

Kubernetes docs discuss backing up clusters, including backing up etcd for cluster state.¹¹ Velero provides tools for backing up and restoring Kubernetes cluster resources and persistent volumes; its docs say Velero can back up clusters, restore after loss, migrate resources to other clusters and replicate production clusters to development/testing clusters.¹²

Example:

velero backup create prod-backup --include-namespaces app-prod
velero restore create --from-backup prod-backup

For managed Kubernetes, control plane etcd may be provider-managed, but application resources and persistent volumes still need a backup strategy.

SaaS backup

Many teams assume SaaS backup is fully handled by the provider. Verify the shared responsibility model.

Check:

Google Workspace/Microsoft 365 retention;
GitHub/GitLab repository backup;
Jira/Confluence/Notion export;
CRM/billing/helpdesk export;
IAM/SSO configuration;
audit logs;
admin account recovery;
deletion retention;
API export rate limits;
legal/compliance retention.

A SaaS provider may protect its infrastructure, but not always protect you from accidental deletion, insider misuse, bad retention policy or account compromise.

Backup encryption and key management

Backups often contain the most sensitive data. Encryption and key management matter.

Checklist:

encryption in transit;
encryption at rest;
keys stored outside backup repository;
encryption keys not stored only with backups;
key rotation plan;
tested restore with real keys;
key recovery process for staff turnover;
separate backup read permissions from production permissions;
log backup access;
back up required key/certificate/secret-manager metadata.

A backup without a decryption key is not recoverable.

Ransomware recovery

Ransomware makes backup harder because attackers may:

delete backups before encrypting production;
encrypt online backup repositories;
steal backup data;
wait until clean backups are overwritten;
compromise domain or backup admins;
disable monitoring and logs;
attack the identity provider.

Strategy:

Immutable/offline backup
  + privileged access hardening
  + backup anomaly detection
  + clean restore point identification
  + isolated recovery environment
  + malware scan before reconnect
  + identity rebuild plan

Do not restore directly into a production environment that may still be compromised. Use an isolated recovery environment first.

DR strategies

AWS Well-Architected discusses DR strategies with different cost and recovery characteristics.⁵

Strategy	Description	Typical RTO/RPO	Cost
Backup and Restore	restore from backup when needed	higher	lower
Pilot Light	keep minimal core resources in DR site	medium	medium
Warm Standby	run smaller live version in DR site	lower	higher
Multi-site Active/Active	multiple active sites	very low	very high

Do not choose active/active just because it sounds best. It increases complexity, cost and operational risk.

What a DR runbook should include

A recovery runbook should be clear enough for on-call staff to execute.

Include:

DR activation criteria;
system priority list;
system owners;
RPO/RTO;
emergency contacts;
backup locations;
required credentials/keys;
restore order;
restore commands;
data validation steps;
DNS/traffic switch steps;
customer communication;
rollback steps;
incident evidence logging;
post-incident checklist.

Do not store the only copy of the runbook in a wiki that might be down. Keep an independent/offline copy.

Restore testing

A backup that was never restored is not proven.

Test types:

Test	Goal
File restore test	restore a single file
Database restore test	restore DB into test environment
PITR test	restore to a specific point in time
Full system restore	rebuild full application
Region failover drill	move traffic to DR site
Tabletop exercise	test decisions and process
Game day	controlled realistic incident simulation

Measure:

actual restore time;
actual data loss;
runbook errors;
missing credentials/keys;
forgotten dependencies;
slow manual steps.

Backup monitoring

Monitor:

backup job success/failure;
backup duration;
backup size;
restore point count;
age of latest restore point;
replication lag;
immutable retention;
object lock status;
verification errors;
ransomware-like deletion patterns;
storage cost;
restore test status.

Useful alerts:

Backup job failed twice
No new restore point in 6 hours
Backup size dropped by 80%
Retention/object lock changed
WAL/binlog archiving stopped
Velero backup failed
Restore test exceeded RTO

30/60/90-day rollout roadmap

Days 1–30: inventory and risk reduction

Inventory systems, databases, SaaS, storage and Kubernetes clusters.
Define owner, RPO and RTO for critical systems.
Verify whether existing backups can restore.
Enable backups for critical databases.
Create offsite backup copies.
Separate backup admin and production admin roles.
Encrypt backups.
Add backup failure alerts.
Write basic restore runbooks.
Test restore for a small database.

Days 31–60: standardization and ransomware resilience

Implement 3-2-1 or 3-2-1-1-0.
Enable immutable backup/object lock for critical data.
Enable WAL/binlog/PITR where low RPO is required.
Back up IaC, GitOps repos and secrets metadata.
Add Velero or another Kubernetes backup tool if using K8s.
Create isolated recovery environment.
Monitor backup age, size and anomalies.
Review backup deletion permissions.
Schedule restore testing.
Run ransomware tabletop exercise.

Days 61–90: full DR maturity

Select DR strategy: backup/restore, pilot light, warm standby or multi-site.
Test full application restore.
Measure actual RTO/RPO.
Improve runbooks based on test results.
Automate repeatable restore steps.
Test DNS/traffic failover.
Back up critical SaaS data.
Build DR readiness dashboard.
Test recovery when identity provider is unavailable.
Review storage cost and retention.

Quick backup checklist

Data

Databases.
Object storage.
File uploads.
Persistent volumes.
Config files.
Secrets metadata.
IaC/GitOps repos.
SaaS exports.
Audit logs.
Encryption keys or key recovery procedure.

Technical controls

Full + incremental/differential strategy.
PITR for critical databases.
Offsite copy.
Immutable/offline copy.
Encryption.
Access control.
Monitoring.
Restore testing.
Retention policy.
Documentation.

Process

Owner.
RPO/RTO.
Runbook.
Escalation.
Communication plan.
Restore approval.
Evidence collection.
Postmortem.
Periodic DR drills.

Common mistakes

Backups exist but were never restored.
Backups live in the same compromised account.
No immutable/offline copy.
No transaction log backup.
No config/IaC backup.
No RPO/RTO.
Paper RTO does not match real restore time.
Lost backup key.
Unencrypted backups with secrets.
Database backed up but file uploads forgotten.
Kubernetes manifests backed up but PV data missing.
Restoring directly into a still-compromised environment.
No backup failure monitoring.
No backup storage cost review.
No accountable owner.

Reference tooling

Need	Tool/service
File/server backup	Restic, BorgBackup, Duplicity
Kubernetes backup	Velero, Kasten K10, cloud-native backup
Database backup	pgBackRest, WAL-G, Percona XtraBackup, native tools
Object immutability	S3 Object Lock, Azure Immutable Blob, GCS retention policy
Snapshots	cloud snapshots, storage array snapshots
Backup monitoring	Prometheus/Grafana, backup software alerts
DR orchestration	cloud DR services, Terraform/OpenTofu, runbooks
Config backup	Git, artifact registry, IaC repositories
Ransomware protection	immutable backup, EDR, identity hardening, segmentation

Restic is an open-source backup program focused on secure, efficient backup; BorgBackup is a popular deduplicating backup program for Linux/Unix-like systems.¹³¹⁴

FAQ

How are Backup and Disaster Recovery different?

Backup is a copy of data or systems. Disaster Recovery is the strategy and process for restoring service after a major incident according to defined RPO/RTO.

What are RPO and RTO?

RPO is maximum acceptable data loss. RTO is maximum acceptable time to restore service.

What is 3-2-1 backup?

3-2-1 backup means 3 copies of data, on 2 different storage/media types, with 1 offsite copy. For ransomware, add immutable/offline copies and restore verification.

Does immutable backup stop ransomware?

It reduces the risk of ransomware deleting or encrypting backups, but it is not enough alone. You still need access separation, monitoring, identity security, malware scanning and isolated recovery.

Do Kubernetes clusters need backup?

Yes. Back up resources, CRDs, RBAC, ConfigMaps, Secrets and PersistentVolumes. Velero is a common Kubernetes backup/restore tool.¹²

How often should restore testing happen?

It depends on criticality. Critical systems should be tested monthly or quarterly and after major changes. At minimum, every important backup path needs a real restore test.

Conclusion

Backup and Disaster Recovery are core IT resilience capabilities. A system without reliable backups may lose data permanently; a system with backups but no DR plan may still suffer unacceptable downtime or restore the wrong data. The core work is defining RPO/RTO, keeping offsite and immutable copies, protecting keys and access, testing restores and maintaining clear runbooks.

A practical rollout starts with inventory, checking current backups, testing restore, adding monitoring and creating offsite copies. Then mature into immutable backups, database PITR, Kubernetes backup, isolated recovery, DR drills and recovery automation. A backup has value only when you have proven it can be restored.

References

Footnotes

GitHub Open Graph preview image for vmware-tanzu/velero. https://opengraph.githubassets.com/backup-dr-guide/vmware-tanzu/velero ↩
GitHub Open Graph preview image for restic/restic. https://opengraph.githubassets.com/backup-dr-guide/restic/restic ↩
GitHub Open Graph preview image for borgbackup/borg. https://opengraph.githubassets.com/backup-dr-guide/borgbackup/borg ↩
NIST SP 800-34 Rev. 1. “Contingency Planning Guide for Federal Information Systems.” https://csrc.nist.gov/pubs/sp/800/34/r1/final ↩
AWS Well-Architected Reliability Pillar. “Plan for Disaster Recovery (DR).” https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/plan-for-disaster-recovery-dr.html ↩ ↩² ↩³
AWS S3 User Guide. “Using S3 Object Lock.” https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html ↩
Microsoft Learn. “Immutable storage for Azure Blob Storage.” https://learn.microsoft.com/en-us/azure/storage/blobs/immutable-storage-overview ↩
PostgreSQL Docs. “Backup and Restore.” https://www.postgresql.org/docs/current/backup.html ↩
MySQL Docs. “Backup and Recovery.” https://dev.mysql.com/doc/refman/8.4/en/backup-and-recovery.html ↩
MongoDB Docs. “Backups.” https://www.mongodb.com/docs/manual/core/backups/ ↩
Kubernetes Docs. “Backing up a cluster.” https://kubernetes.io/docs/concepts/cluster-administration/backing-up/ ↩
Velero Docs. “Overview.” https://velero.io/docs/main/ ↩ ↩²
Restic documentation. https://restic.readthedocs.io/en/stable/ ↩
BorgBackup official website. https://www.borgbackup.org/ ↩

Written by PixelRouter Editorial Team

We publish deep, authoritative guides on AI infrastructure, API gateway security, cloud financial management, and system optimizations for developers.

FAQ

How are Backup and Disaster Recovery different?

Backup is a copy of data or systems. Disaster Recovery is the strategy and process for restoring service after a major incident according to defined RPO and RTO.

What are RPO and RTO?

RPO is the maximum acceptable data loss. RTO is the maximum acceptable time to restore service.

What is 3-2-1 backup?

3-2-1 backup means keeping 3 copies of data, on 2 different storage or media types, with 1 offsite copy. For ransomware resilience, teams may add immutable or offline copies and restore verification.

Does immutable backup stop ransomware?

Immutable backup reduces the risk of ransomware deleting or encrypting backups, but it is not enough alone. Access separation, monitoring, identity security, malware scanning and isolated recovery are still needed.

Do Kubernetes clusters need backup?

Yes. Kubernetes backup should include resources, CRDs, RBAC, ConfigMaps, Secrets and PersistentVolumes. The article notes Velero as a common Kubernetes backup and restore tool.

How often should restore testing happen?

It depends on criticality. The article recommends that critical systems be tested monthly or quarterly and after major changes. At minimum, every important backup path needs a real restore test.

← PixelRouter Blog