Recently one our DBAs mentioned that there are periodic interruptions in network connectivity occurring at the time Veeam backup jobs were running. Our SQL Servers are all in High Availability (HA) Always on Availability Groups (AAG). There are at least two servers in each of the AAGs and all were experiencing issues over time. So what’s the connection with a running SQL server and a Veeam backup job?
Let’s review. The SQL 2012 AAG is really SQL 2008 Database Mirroring on steroids. The concept is that a set of SQL servers with identical configurations in a primary – secondary relation uses the concept of mirroring, that is the secondary has to successfully process new transactions sent to it from the server acting as the primary. There can be multiple secondary servers and they operate in a synchronous update mode.
The servers have an Operating System vDisk, a vDisk for Application support and binary files, and vDisks for Database, Transaction Log, and TempDB at minimum.
A Veeam job will recognize a SQL server and perform a backup which will include all of the databases. It will flush the Transaction log to create a transitionally quiet point. Some DBAs are OK with Veeam doing this – others want total control and rely an SQL jobs to take backups. Also, the size of the vDisks come into play. The larger the database the “more larger” the repository requirement. In this instance, the decision was made to use Veeam to backup the Operating systems (C: and D:), and use SQL Server database backup functionality to take backups and store them on a file share. A 3rd party backup to tape solution sends the SQL database backup files to tape.
Getting back on track, why the outage? Let’s examine how a Veeam backup works.
1. Veeam communicates with vCenter to get an inventory of the running state of the VMs in the job and compares it to the Veeam job settings to determine the operations to carry out. Veeam backs up the each of the VMs sequentially.
2. Veeam asks VMware to take a snapshot. A snapshot uses Microsoft VSS Writers to pause the operations on the files in use so VMware can mark the vDisks as READ-ONLY. VMware then creates snapshot files of all vDisks that comprise the VM. In the case of SQL, this includes the vDisks for Data, T-Log and TempDB. You can see what this looks like in VMWare when you Browse the datastore in vCenter. You will see all of the files that comprise the VM in its folder. Create a snapshot manually and look again in the browse window. You now see a lot more files in the folder. Notice that there is a group of files with a -000001- identifier. These are snapshot files. VMWare writes all changes to these files while it freezes operations to the .vmdk files.
3. The snapshot files are not used by Veeam. Veeam asks VMware to performs a hot add of the VM to the Backup Proxy and uses vPower (proprietary Linux technology) to copy the vDisks specified in the backup job to the Backup Repository. There is no traffic over the network – strictly done through VMware and the SAN.
4. Copy is complete, Veeam asks VMware to remove the snapshot. This could take a while as we’ve seen in the case of some of the highly transactional servers in the environment. It could become a nightmare in which the log files are growing while VMware is trying to merge the changes into the base files. Regardless, the last part of the process where the last bits are merged into the vDisk will pause the program – aka the “stun” operation. The vDisks are now READ-WRITE. The more disk, the more highly active VM, the larger the snapshot grows and the longer it takes for the removal.
5. Veeam asks VMware to reconfigure the proxy server to dismount the VM.
6. The backup complete.
Keep in mind that a snapshot includes all files that comprise a virtual machine. The larger the file, the more highly transactional the server, the longer the backup takes. The larger the logfiles become, the longer it takes to merge them when the snapshot is removed. In that last step of merging the last bit of changed blocks, the network is interrupted. This causes the problem my DBA was complaining about.
If you only snapshot the OS vDisks then writes to the other data files are not affected. You accomplish this by making the data vDisks Independent disks. Independent disks are not included in a Veeam backup and they are not included in a snapshot. As a rule, only Disk 1 & 2 (C: & D:) are included in a backup, so its a quick process and there is no network interruption in the final merge.
This posting is provided “as is” with no warranties, guarantees or rights whatsoever.