We need to talk about your backups

We need to talk about your backups
Photo by Denny Müller / Unsplash

Really - we do.

This is the first of a series of posts we'll be composing relating to the fundamentals that all businesses no matter what size should be addressing.

As this isn't the most exciting topic I'd urge you to grab a coffee now before reading further as this is likely to be a long post.

Over the years, I've worked with a lot of organisations, and one of the early questions I ask is about their backups. More often than not, I'm greeted with a "yes, our MSP does that for us", but without any further validation of what that actually means or checking that this is actually happening.

In most cases, there's an assumption that the MSP or even internal IT department will magically understand what the business requirements are in this arena and provide accordingly. Conversely the team responsible for this may just assume that as they haven't been told otherwise, then nightly (or even weekly) backups will suffice.

There are several different types of disaster scenario that you probably want to consider.

  • Accidental deletion of files during day-to-day operations
  • Loss of access to the site housing your data (fire/flood/earthquake/<insert other apocalyptic event here>
  • Loss of server - total failure of server storage (while single disk failure can be mitigated against, a failed RAID controller or unexpected power event can result in data destruction).
  • Deliberate destruction of data (rogue employee, ransomware incident).
  • Cloud Services backups - a lot of organisations assumption that just because it's "in the cloud" it's safe, or that their cloud services provider are doing this for them.

The backup and recovery strategy for these scenarios may be completely different, and the anticipated recovery times for each should also be considered. Meaningful conversations should be had about the businesses appetite for risk and the required speed of recovery of critical services. Testing plans should be created and exercised to assess whether the desired outcomes and recovery time objectives (RTO's) are being met.

Availability of backup media should also be verified. On one of our recent penetration tests the clients nightly backups were stored on network attached storage in a network share which was available to all domain users. All data from the organisation was easily downloaded and restored onto our own equipment without triggering any AV or intrusion alerts on any workstations or servers. A malicious actor could have easily compromised significant amounts of client data and subsequently destroyed all backups (as often happens with ransomware groups).

Conversely, connectivity speeds for downloading backups from cloud storage may reduce the availability and slow the restoration process - if large quantities of data are at play, and either the upload capacity at the backup location, or the download speed at the DR site is inadequate, hours or even days can be added onto the recovery times.

Critical line-of-business applications may have different backup requirements - loss of 24 hours of order data for an e-Commerce organisation or time-critical environments such as legal or healthcare businesses may result in regulatory breaches that are difficult to recover from (without even factoring in the human impact of late diagnosis, missed medication or missed court dates). It may be impractical to backup everything on a shorter frequency, but understanding what services may require a more regular backup schedule, and shortening your recovery time objective for these services is a conversation you should be having.

Backup retention strategies should also be discussed. How long do you need to retain backups and how much data is this practical for? We've had the misfortune of running incident response and attempting data recovery for an organisation that only had a 24 hour retention strategy. What would have been an inconvenience turned into catastrophic damage to the organisation.

Longer retention inevitably incrementally adds cost and a commercial decision must be made before a crisis has occurred as to what is appropriate.

The final consideration is what should be backed up. Often forgotten devices, such as websites, firewalls and switch configurations can result in catastrophe in the event of failure due to long recovery times spent trying to piece together configurations. These are often simple to back up and require relatively small amounts of storage compared to business data, but aren't seen to be crucial.

We'd urge you to carry out a review of possible scenarios that might require urgent recovery, the business impact of loss of services and find a solution that's most appropriate for your organisation. There's no "one-size-fits-all" approach, and this will largely depend on the nature of your business and your overall risk tolerance. Whatever solution you agree to and find, above all set a schedule for routine testing to make sure your backups are available and recovery time objectives are met.