Global Data Vault’s February 2022 webinar again takes us “back to the basics” with an important topic: how to verify if your backups are working. Not to spoil the movie ending here, but the common theme in the video below is the need to test, monitor, and test again. Of course, that is easier said than done, so today our team will walk you through the different types of backups, different restore types, Veeam SureBackup, and offer some tips and best practices for healthy backups.
The speakers today are Global Data Vault’s Service Delivery Manager, Kelly Culwell, and Operations Director, Steven New.
Types of Backup
It’s important to understand the types of backups that there are. We normally see full and incremental backups. The differentials are sort of a legacy backup type. A full backup is everything front-to-back–in Veeam-speak, it’s what we call the VBK. Full backups get all the data the first time through. Then we talk about incrementals, which are the blocks of data that have changed since the last time we took a backup, becoming pretty space-efficient. Differentials, which aren’t really used anymore, are what has changed since the last full backup. That was really made for more of a tape storage system, so it’s not really as good for the disk-to-disk jobs Veeam uses. We’re primarily going to focus on full and incrementals today, which are the majority of backups that you’ll see.
Obviously, nothing. If you lose data, you’re not going to be able to restore any applications, files, spreadsheets, or anything like that. The bigger thing is you’re not going to be able to recover your environments; your servers, your virtual machines are not going to be available for you in the event of a disaster or loss. And then again, if you lose your primary infrastructure, the primary storage of primary data, you’re not going to have any way to recover it.
Restoring a file completely is different from being able to recover an entire server or an entire virtual machine from a backup. And that’s something that we’re going to reiterate today. So don’t just assume that because you can restore something from a backup, it means your backup is good.
There are different types of restores. Some of them are Veeam-specific and some are more general.
The first thing that you want to do is instant recover your VM or your virtual machine. The reason for this is because sometimes you’ll see Windows updates that could cause a machine not to boot. So you do want to instant recover that into your hypervisor to verify that your machine does boot, or at least until you get a CTRL-ALT-DEL screen.
Another suggestion is to make sure you do not name the restored virtual machine the same as your production VM. The reason for this is because it will overwrite the data that is in your data store, or at least overwrite the name. So you always want to name it something like machine name underscore restore or something like that. That way you know that it is a restored machine.
Restore Entire VM
You also want to try to restore your entire VM. The difference between this and an instant recovery is that ‘restore entire VM’ is meant for servers that have lower SLAs, because the RTO, or recovery time objective, is longer. The process is slower because it actually restores the whole VM to your data store and/or your hypervisor, so these are for less-critical VMs. Maybe you have a server that is an RDS server, something that’s less critical that does not need a15-minute or less RTO. Be sure to leave the networking disabled anytime you do these instant recoveries or restore entire VMs because it will create an IP conflict on the network.
The last option is to do a simple file-level restore that verifies your chain. If the data comes out, and you can see the directory structure, more than likely that chain is good. It doesn’t take the software into account because you’re not actually booting that machine. Something as simple as a file level restore, maybe restore the document off your desktop, maybe an Excel document or a Word document, just do a simple restore and restart back to a location on your computer to verify that the data does restore properly.
Let’s talk a little bit about Veeam SureBackup. This is a feature of Veeam Backup & Replication. Basically, it is a lab. You create a lab in your VMware environment for this to run, and it runs just like another job. It takes that instant VM recovery process and mounts a virtual machine, an application group, or a set of virtual machines in a private network that’s created, and allows you to test them. So it does mimic the DR test, but it’ll give you a couple of options. One, you have a lab environment available, so you can test things against real production data. It’s exact copies of what you have running in production, so if you wanted to test patches or software upgrades or anything like that, you can do that using this tool.
It also verifies you can recover that entire server from the backup. Like Steven said, sure, you can restore a file, but it doesn’t mean that your operating system will boot. This will give you the warm fuzzies about that virtual machine being able to boot from the backup file, meaning you can completely recover it. SureBackup can be scheduled to run on a daily basis, after a backup job completes, weekly, whatever you choose, and it’ll give you a ping, a heartbeat. You can run some application scripts against it, or test certain ports to see if they’re open or not. But one other neat thing is that you can schedule malware or anti-virus scans to run in this process. You can run scripts or anything that you need to do to go in and look for malware.
It’s pretty good to have because you’re actually looking in your backups. We’ve had some customers who have had ransomware that have not used this feature, but we’ve brought them up in a disconnected state where they were able to go in and run a malware scan. Giving you the capability to do that to your backups is pretty cool. This is solely a customer-side thing. This isn’t something that Global Data Vault can run on our side because it’s limited to the certain type of install that we have. And it would be pretty difficult for us to run this against 10,000 virtual machines on any regular schedule, but it is a good option for customers and end-users.
At Global Data Vault, we pride ourselves on the quality of data and our monitoring. But what exactly does that mean?
First of all, we need to stress that the existence of backups is not enough to ensure that you can successfully recover. You need continuous monitoring in place to verify that you don’t have any failed jobs and that backups are running properly. You also need to be sure your servers are patched and up-to-date. Normally, Global Data Vault sends out a status report every morning, and we provide a live quality portal for current data. This is the current status of your local and remote backups. Another thing I would suggest is to configure notification emails inside Veeam Backup & Replication, which will send job status reports and warnings directly from the Veeam console.
It is important to monitor these emails and make sure that they don’t go to spam, or you’re not sending them to a folder, because these provide critical information about the quality of your backups and whether you are able to restore.
We will be having a whole webinar focussing on VeeamOne and what a cool tool it is, but from a backup perspective, it can’t test your backups. It can help you monitor them more thoroughly than Veeam Backup & Replication by itself. It gives you some additional reports, alerts, and notifications that Veeam Backup & Replication might not give you out of the box, or maybe you want more granularity. We’ll be sure to cover that in our VeeamOne webinar, but it’s something else that you can use for monitoring.
At Global Data Vault, we do an annual hands-on DR test. What does that look like on our side and for our customers?
We spin up all the virtual machines that are business-critical to the customer’s environment, and we provide a VPN to allow the customer to go in and “kick the tires” during the test. You can ensure all of your servers are communicating, functioning, make sure any applications are working. We want to demonstrate that everything that you would do in production is basically duplicated during this test so that everything you expect to function is functioning. The last thing that anybody wants to do is go into a real DR event and find out that their accounting software doesn’t work. So, during these annual hands-on tests, please test everything that you would use during your normal business day to verify that everything is working properly.
Also, we suggest that you make no changes prior to the test. This is important because if you make changes to the environment, or even patch servers, the week before the test, those copy to Global Data Vault. You might discover that patch actually broke some software in your environment, so we ask that you limit or prevent changes. I would suggest a change restriction a week prior to the test. During the test, document any issues that you have. GDV will assist with issues related to our services.
Maybe a server had eight gigs of RAM, and we requested it to have 16. Sometimes we run into an issue where the CPUs are under-provisioned. So we have to bump the CPUs for this server. That’s important to follow up with because maybe you want to make these changes in the production environment as well to increase performance on your side. The next thing that we would suggest is always following best practices.
One of the things that we will suggest is that you have a BCDR (business continuity/disaster recovery) plan in place, and also any BIAs (business impact analysis) that need to be done on what critical servers need to be brought up. What is critical to your business to function, who is impacted, who would need access? All this stuff needs to be documented prior to the actual DR test.
We also recommend regular testing of your local backups. I’ve mentioned that what you do locally is sent to the Global Data Vault side, so if your local backup is not working, that will affect the copy to GDV. So you need to test your local backups, do those local recoveries, make sure the backup boots into Windows, do the file-level restore test. During our first slide, we put ‘test’ two times. We can’t stress that enough–test your local backups to make sure that they are functioning properly.
Preparing for a DR Test
Yeah, that’s important stuff, and we always want to follow the ITIL framework for people, processes, and technology. Make sure that you have the people available that need to assist with the testing and your processes documented. It’s okay if you stumble during your test. That’s what the test is for. If you find a process that doesn’t work or that needs to be updated, or, you know, something has changed, that’s the appropriate time to find it, not when you’re in the middle of a crisis. We have a webinar about how to create a DR Plan and in it, we talk about the business impact analysis and other things that Steven mentioned. We also have a DR Plan Checklist available for download. But just remember when you have this test, treat it seriously. It could be the only time during that year to actually do a full-blown test and get everybody involved. It’s pretty important.
Tips and Best Practices
All right, let’s cover tips and best practices for testing your backups.
15-20 years ago testing backups was extremely difficult. You had to have hardware that was similar to, if not an exact match, of what you were restoring. We’d do bare-metal testing. It took a really long time and we were using tapes, so if the tape failed, you’d have to start all over. If the restore failed, you had to start all over. And this wasn’t like a five-minute process. You could be 6, 7, 8 hours into a restore and it would fail and you’d have to start over. Those days are gone now, for the most part, and with virtualization, it is easy to test your backups because you don’t have to rely on any specific type of hardware.
You don’t have any location or geographic-specific things preventing you from doing the tests. It’s virtualization, so you can literally test them wherever you have hypervisors and storage.
Don’t get complacent
Complacency is what will get you thinking that everything’s okay. You haven’t had any errors. You don’t have time to test backups. Nothing’s going to happen to you. In the past two years, we have seen an escalating number of ransomware attacks. We’ve had a lot of natural disasters–a lot more than we normally see–and we don’t think those things are going to quiet down.
It’s really important to not be complacent on backups, much like cybersecurity. Be vigilant with your testing, with your DR testing, and stay on top of it. Steven also mentioned that patching is critical.
The flip side of that is when you get into a DR scenario. If you haven’t patched and rebooted your servers recently, whenever you do a restore or an instant VM recovery the servers will say “oh, I have Windows updates that I still have to apply” and it can take an hour or more for them to sit there and install all of those updates. That’s another reason to make sure that your Windows servers are patched frequently and rebooted. We can’t do anything to speed up the recovery of a server finishing the installation of Windows updates.
Provisioning Backup Jobs
Keep your job size within a reasonable level, best practice is to keep a job size under 10 terabytes with the VBK and the VIB, so the entire thing. You also want to make sure that you limit your number of VMs per job. We recommend that it be around 30 VMs per job.
Disk size of the local machine is also important. If you have a 30 or 40 TB disk, either as a VDMK or a VHDX, you *might* get 2:1 compression and deduplication, which may produce a 15 TB VBK (full backup file). Keep in mind, when Veeam mounts the data it has to mount the entire chain back to the last VBK, so RTOs increase.
When you do have those larger disks, if you need the 20 or 30 TB volume, I would suggest that you add the volume as smaller disks. That way you keep it under the 10-TB chain limit so that it doesn’t take ten minutes to mount. It is recommended to keep the size of the local machines under 10 TB. Those are best practices from Veeam.
At Global Data Vault, we’re all about backups, but the backup is only as good as it is. We’ve long preached that you can’t rely solely on a successful status email.
So make sure that you’re verifying that you can actually recover anything and everything that you need, in the way that you need it, from your backup files, whether it’s the entire virtual machine application items, a database, a spreadsheet, et cetera. Make sure you’re monitoring your backups because you still want to know if you’re having errors or if they’ve completed. Obviously, it’s one thing to get a successful email that you have issues on.
It’s something completely different to get a failure and then to have multiple failures and never do anything about it. Like Steven mentioned, automatically sending alerts to a folder, or not paying attention to monitors of your backups can turn around to bite you. So test, test, and test.
I think that’s what we wanted to cover today–how to test your backups and why.
Global Data Vault is a fully managed backup as a service, DR as a service, O365 backup shop. We support VMware, Hyper V Acropolis and physical devices, servers, and endpoints. And we really focus on quality.
Will testing SureBackup affect network latency?
No, SureBackup tests should not affect your network latency.
Yeah. It’ll affect your host resources. It does have to use the CPU and memory and things like that on one of your hypervisor hosts. So it can impact that and just be careful, you know, where you spin up those tests
How long can we test for?
You can test for roughly a week. We want to give you sufficient time to test everything that you can test while we remain compliant with service provider licensing. We normally leave the DR test up for one week.
Is there a specific set of data that you test from?
That’s a good question. Yes, we test from the customer’s EDP or Enhanced Data Protection. There are two benefits from that. First of all, we test the customer’s Enhanced Data Protection to verify functionality, and then we also get a disaster recovery test.
Does the customer continue to backup daily while DR testing?
If we’re doing it from Enhanced Data Protection, we don’t touch your backup repository. If you don’t have Enhanced Data Protection, we have to build it from the data in that repository, which locks files and means that your backups can’t merge. You can still continue sending your backups, but they’re not going to merge which could create some issues down the road.
How do you get EDP?
The operations team will get that enabled for you, but for anybody who does not have EDP, we would highly suggest that you add it because of the increased amount of ransomware activity we have seen. Reach out to email@example.com to get that added.
How is Veeam ONE licensed?
Veeam ONE is licensed per instance, and an instance is just a virtual machine. So if you have a hundred VMs in your environment, you need a hundred instances of Veeam One. Pretty simple.