Beyond Backups: Designing a Working IT Resilience Strategy

Jan 22, 2026 10:00:00 AM

6:34

How's your organisation handling data protection, think backups are enough? Backups protect data; they don’t guarantee you can keep operating when identity goes down, networks break, dependencies fail, or recovery turns into a slow, untested scramble. A credible plan starts with business impact, clear RTO/RPO decisions, and deliberate design choices, not whatever your backup tool happens to do.

Close-up of an open hard disk drive with rainbow reflections, illustrating that “having backups” is only one part of a wider resilience and recovery strategy.

In this blog post, we're addressing why an effective disaster recovery strategy requires more than just backups.

Why “We Have Backups” Is Not a Strategy

Ask most organisations how prepared they are and you will hear the same line: “We have backups.”

Backups are important, but they’re only part of the picture. Backups give you a safety net. Resilience is your ability to keep critical services running or restore them quickly enough that the business is not badly damaged.

You can have flawless backups and still:

Spend days restoring systems
Leave customers angry and in the dark
Fail regulatory expectations

All because recovery was slow, untested or only covered part of the estate.

A serious resilience strategy starts with business impact and works backwards. A backup policy usually starts with whatever the existing tools can do.

From “We Have Backups” to “We Can Keep Operating”

Backups answer one question: “Is our data stored somewhere else?”

Resilience answers a tougher one: “Can we continue to deliver our most important services when something fails?”

You should be asking:

How long would it actually take to restore this system from backup in a real incident, at scale?
What would staff and customers do while we are restoring?
Which dependencies such as identity, networking or integrations would still block us even if the data is back?

A credible IT resilience strategy treats backups as one component among many instead of an entire solution.

The Core Building Blocks of Resilience

1. Redundancy and High Availability

Redundancy means not relying on a single component that can bring everything down if it fails. High availability means designing systems so that if one part fails, another takes over and users hardly notice.

Examples:

Multiple servers behind a load balancer
Two data centres rather than one
Cloud services spread across multiple regions

We have seen what happens when this is ignored. One of our customers had services hosted in a single Azure region. When that region went down, so did they. That is the classic “all eggs in one basket” problem.

2. Data Protection: Backups, Replicas, Snapshots, Off-Site and Cross-Region

You already understand backups. The detail matters:

Backups
Copies of your data stored separately, often on slower, cheaper storage. Good for recovery from deletion, corruption or ransomware, but usually slow to restore.
Replicas
Live, continuously updated copies of data elsewhere, designed for fast failover. Great for uptime, but if you replicate too aggressively you can also replicate corruption or malicious changes.
Snapshots
Point-in-time images of systems or volumes, useful for quick rollbacks.
Off-site / cross-region
Storage or replicas in other physical locations or cloud regions, to protect against site-level issues such as fires, floods or regional cloud failures.

A mature strategy combines these deliberately, based on the Recovery Time Objective (RTO) and Recovery Point Objective (RPO you agreed as a business, not on the default settings in your backup software.

3. Network Resilience: Multiple Links, Routes and VPNs

Your systems are useless if people and other systems cannot reach them.

Network resilience is about avoiding a single cable, router or ISP becoming your Achilles heel.

What does network resilience planning involve?

Multiple internet providers
Redundant firewalls and core network devices
Diverse routes between sites
Tested VPNs for remote access

If you’re heavily cloud-based, you also need to consider what happens if a key region or interconnect is degraded.

4. Identity and Access: IAM, AD and SSO Recovery

After a serious incident, one of the most common blockers is depressingly simple: no one can log in.

If your identity provider, such as Active Directory (AD), your Single Sign-On (SSO) platform or your IAM (Identity and Access Management) service is down (think Microsoft or Google), your recovered applications are effectively bricks.

An IT resilience strategy must treat identity services as tier-one services, with their own DR and high availability design.

If you're an executive, you should be asking yourself: “In an incident, how will administrators and key users authenticate if our primary directory or SSO is offline?”

Trade-Offs: Cost vs Resilience vs Complexity

Resilience isn’t free and it isn’t linear. Doubling spend does not magically halve risk.

Each extra layer of protection adds cost and usually adds complexity, which can itself create new failure modes.

For example:

Active/active architectures reduce downtime but are harder to operate and test
Aggressive replication improves your Restore Point Objective (RPO) but increases the risk of replicating corruption or ransomware
Extra vendors or regions reduce concentration risk but increase integration and monitoring effort

Leadership’s job is to decide, explicitly, where high resilience is essential and where slower recovery is acceptable. Labelling everything as “critical” results in bloated, fragile designs, and wasted money.

Architecture Standards, Technical Debt and the Drag of Legacy

Resilience goes beyond DR tools. It’s heavily influenced by the quality of your architecture and the amount of technical debt you’re carrying.

Common warning signs:

Legacy systems that can’t be clustered or replicated
Applications only understood by one engineer
Point-to-point integrations and hard-coded dependencies
“Temporary” workarounds that quietly became permanent

These make recovery slow, unpredictable and dependent on individual heroics.

The alternative is to invest in architecture standards, for example:

Common patterns for how critical services are built and protected
Clear “gold / silver / bronze” resilience tiers and what each means
Approved, well-understood technologies for backup, replication, monitoring and identity, rather than a mess of bespoke setups

If you refuse to confront technical debt and inconsistent architecture, you’re effectively betting your continuity on a few exhausted people in a crisis. And if there’s anything you should avoid in an emergency, it’s panic.

A serious IT resilience strategy treats debt reduction and standardisation as first-class resilience activities rather than side projects.

In the next blog post, we'll look beyond your own estate and talk about third-party risk, testing and the human side of recovery.

Read the next blog post

Backup & Recovery

Jul 3, 2023 8:54:28 AM

Beyond Backups: Designing a Working IT Resilience Strategy

Why “We Have Backups” Is Not a Strategy

From “We Have Backups” to “We Can Keep Operating”

The Core Building Blocks of Resilience

1. Redundancy and High Availability

2. Data Protection: Backups, Replicas, Snapshots, Off-Site and Cross-Region

4. Identity and Access: IAM, AD and SSO Recovery

Trade-Offs: Cost vs Resilience vs Complexity

Architecture Standards, Technical Debt and the Drag of Legacy

Related Posts

70% of tech infrastructure will be cloud-based within three years

Beyond Your Firewall: Third-Party Risk, Testing, and the Human Side of Recovery

Best Practices for Azure Spend Optimisation