We’ve been deploying backup solutions since the beginning of computing and the foundations of backup and recovery have stayed the same: make sure backups run consistently and set recovery objectives. Yet systems in 2022 don’t work or act the same way they did decades ago. Cloud data backups have helped us meet the need for offsite backups, as well as impacted how we budget for them. Ransomware has impacted how we store them. The laws of physics might be more of an issue than when we had tapes stored in a safe down the hall. Cost models have changed, too.
In this session, Karen Lopez covers best practices for modern data recovery…and she will share stories of worst practices just to keep it real.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Expert Cloud Data Backup and Recovery Best Practice.pptx
1. Expert Cloud Data Backup & Recovery Best Practices
March 25, 2022 10 AM PT/1 PM ET
Moderated by: Thank You To Our Sponsors:
2. Expert Cloud Data Backup & Recovery Best
Practices
SPEAKER:
Karen Lopez
Data Evangelist, InfoAdvisors
3. Abstract
We’ve been deploying backup solutions since the beginning of
computing and the foundations of backup and recovery have stayed the
same: make sure backups run consistently and set recovery
objectives. Yet systems in 2022 don’t work or act the same way they did
decades ago. Cloud data backups have helped us meet the need for
offsite backups, as well as impacted how we budget for
them. Ransomware has impacted how we store them. The laws of
physics might be more of an issue than when we had tapes stored in a
safe down the hall. Cost models have changed, too.
In this session, Karen Lopez covers best practices for modern data
recovery…and she will share stories of worst practices just to keep it real.
4. Today’s session
Please do ask
questions during
the session
Best/Worst
practices
Best Practices
are contextual
Processes
matter just as
much as tech.
Maybe more.
5. Let’s talk about these words, first
Successful
Recovered
Tested
Practiced
Data
Insights in this title are really about experiences I’ve had (most of them). Some that were publicly shared.
Keep questions coming in from the beginning. Any stories you want to share should only be publicly known or about your extended relationship family member you have only met once at a wedding and you don’t work at the same company. So start your stories with “ I heard….”
I’ve worked in a lot of places where executives abhor the concept of “best practices” as if they were only for elite super powered organizations. That’s not what I mean in this presentation. I’m talking major “this is what will keep us up and running. It will keep us out of jail.. Remember ROI doesn’t just stand for Return on investment. It also means Risk of incarceration.
Every design decision comes down to cost, benefit, and risk. If we aren’t accounting for all those things, we aren’t being true professionals.
But not every BP fits every environment. For instance, there are plenty of cases where it makes more sense to RELOAD initial data rather than backup the resulting data. This is more common in some analytical/warehouse scenarios, especially in cloud services where you don’t have as much control as in on prem systems. That goes beyond this talk, but it’s an example of where Cloud DR is slightly different from
Successful: You. Backups. Restores. Business back to business. It does not mean “no errors”.
Recovered; You, from the stress of a system outage. The business from an outage. Data, because we love data.
Tested: Not just no errors. Business functioning. Hopefully no data loss, but that’s also something to negotiate when talking about DR.
Practiced: Both as in “doing again and again until it’s easy and less stressful to do, but also in the same mindset as in “just being”
Data: Not just the business data. It also includes all that extra data we generate for Software_defined X. As we move to more configurable infrastructure, that generates more data. That data needs to be treated the same way.
A lot of my experience is with databases, so my stories will focus on that sort of data. But all this applies to all kinds of backup and recovery.
There are many best practices and I’m not going to cover them all because I want to talk about some things that I don’t see addressed a lot. However:
Always backup
3 copies. 2 different places. 1 offsite
Test your backups
But better, test your restores.
Not everyone is an expert. By Expert here, I mean one with experience. Tools. Files. Systems. On your systems.
Expert: someone who understands how systems should not be fiddled with when they are down.
Story about backups being on the same drive as the database, so when that HD failed, the backups were inaccessible, too. Story about backups being on a reused desktop under the sysadmin desks, but the system got water/coffee damage.
Story about a non-DBA trying to recover a database backup but he didn’t know what he was doing. So he accidentally over-wrote the last database backup. There were still others, off the DB server. But more data was loss and recovery time took longer.
Story about database “log files” being deleted to make more space on the server.
No one is perfect. Non-pros doing restores is often more harmful than business losses for waiting.
The best mitigator to all this is to have detailed documentation and instructions on how to recover.
This is how open buckets with unsecured backups happen
Last story: Don’t let managers bully you into not flying the plane: Story about manager saying “WE DON”T HAVE TIME FOR THIS NOW”. It took two years to get backups even started. But even this took an evil twist along the way….
I cannot emphasize enough that
Story from this week: Admin restored data by using an import function. Except his data for dates was in the wrong format. Some dates were accepted, but wrong. Some were rejected as invalid dates, so the row was not restored.
Story about restores not working at all. All backups worked fine (no errors). But none of the restores did.
Story about admin who misunderstood that the new website was actually an online application, so he did not backups because the site could just be reloaded from the site development environment. 3 years with zero backups for the Point of Sale and CRM system.
Story about backup media being thrown out every week because cleaning staff was not notified to leave those boxes alone.
Story about not having access to encryption keys. Licenses. How to get a new license file. Expired employees.
Gary Williams of Spiceworks highlights the term Schrodinger’s Backup which states “The condition of any backup is unknown until a restore is attempted.”
Toy Story 2 – Only partially true to the headlines, but a woman on parental leave had the bits to start recovery.
Story about my own employee stopping backups and destroying existing ones.
When a system is experiencing data loss, doing things on the target system or database can result in more data loss. That incudes installing recovery scripts and software. Better to use other systems to install/run recovery software, then work from there.
Story about Windows Core
Story about deleting some backup config files to make room for the restore files on the machine. Documentation said to do that, but there was no room on the server to do that.
The less fiddling you do with the target system, the better off you will be.
No keys. No license. No login.
Backups were offsite and the storage location would only release them to the person on the contract. Who no longer worked there..
Backup server was decommissioned. Admins had every alert possible turned on, so they missed the alerts that the backup was failing.
Backup media for a dev environment were actually in an IT person’s home, and he was “away.” Police had captured all media at his home.
Using a personal file backup up system instead of an enterprise one to save money – They often are not distributed backups and often don’t keep backups for a long time. They often can’t be monitored by enterprise SIEMs and can’t be customized to exact needs.
A single point of failure is a failure.
Story about admin going on vacation and we had no access to anything because she had not actually granted access to others. And no one had ever tested the supposed access.
Shared resources have to balanced (as we will see in future slides)
But resources here means software, systems, storage, keys, licenses, logins (separate logins, always), etc.
Putting backups far away from the target systems is a good practice. But it has a tradeoff. You can’t performance tune physics. If you think you can, please tell me in a DM so we can both become the richest person in the world.
Restores take time. Often longer than a backup
Near for physics, far enough away for weather is what I say.
Plain old backup solutions for cloud systems.
A good backup system for on-prem data isn’t going to work well for modern cloud-based systems. There may be connectivity issues. They may not tolerate latency well. They like will not work at all with services like XaaS.
They may not work with cloud-based authentication, especially for services..
They may not offer a gap for backups, leaving you open to ransomware.
Shiny. New car smell. And helps.
Compute + HW + SW to make any tampering evident or to make it physically “impossible”
Story about admin downloading backup script, but not understanding it. Some of the data was sent to the script-publishers S3 bucket. None of the backup data was password protected or encrypted. Its was just some stranger’s script.
Story about tasking admin to resource commercial backup services, but he decided to turn on archiving feature and move on.
One of the things that the cloud offers us is the ability to scale out or up in an instant to make everything go faster, even while making 3 copies of the backup. If you aren’t getting budget to do this, you need to escalate this. Talk in terms of down time, not scale cost.
And don’t forget to scale back when you are done.
I believe we don’t need backups. We only need restores. Let that sink in.
While it’s a best practice to get very very good at backups, too often that focus takes away from recovery.