Quantcast
Channel: Category Name
Viewing all articles
Browse latest Browse all 5932

Backup of a Replica VM

$
0
0

This blog post covers the scenarios and motivations that drive the backup of a Replica VM, and product guidance to administrators.

Why backup a Replica VM?

Ever since the advent of Hyper-V Replica in Windows Server 2012, customers have been interested in backing up the Replica VM. Traditionally, IT administrators have taken backups of the VM that contains the running workload (the primary VM) and backup products have been built to cater to this need. So when a significant proportion of customers talked about the backup of Replica VMs, we were intrigued. There are a few key scenarios where backup of a Replica VM becomes useful:

  1. Reduce the impact of backup on the running workload:   Taking the backup of a VM involves the creation of a snapshot/diff-disk to baseline the changes that need to be backed up. For the duration of the backup job, the workload is running on a diff-disk and there is an impact on the system when that happens. By offloading the backup to the Replica site, the running workload is no longer impacted by the backup operation. Of course, this is applicable only to deployments where the backup copy is stored on the remote site. For example, the daily backup operation might store the data locally for quicker restore times, but monthly or quarterly backup for long-term retention that are stored remotely can be done from the Replica VM.
  2. Limited bandwidth between sites:   This is typical of Branch Office-Home Office (BO-HO) kind of deployments where there are multiple smaller remote branch office sites and a larger central Home Office site. The backup data for the branch offices is stored in the home office, and an appropriate amount of bandwidth is provisioned by administrators to transfer the backup data between the two sites. The introduction of disaster recovery using Hyper-V Replica creates another stream of network traffic, and administrators have to re-evaluate their network infrastructure. In most cases, administrators either could not or were not willing to increase the bandwidth between sites to accommodate both backup and DR traffic. However they did come to the realization that backup and DR were independently sending copies of the same data over the network – and this was an area that could be optimized. With Hyper-V Replica creating a VM in the Home Office site, administrators could save on the network transfer by backing up the Replica VM locally rather than backing up the primary VM and sending the data over the network.
  3. Backup of all VMs in the Hoster datacenter:   Some customers use the Hoster datacenter as the Replica site, with the intention of not building a secondary datacenter of their own. Hosters have SLAs around the protection of all customer VMs in their datacenters – typically once a day backup. Thus the backup of Replica VMs becomes a requirement for the success of their business.

Thus various customer segments found that the backup of a Replica VM has value for their specific scenarios.

Data consistency

A key aspect of the backup operation is related to the consistency of the backed-up data. Customers have a clear prioritization and preference when it comes to data consistency of backed up VMs:

  1. Application-consistent backup
  2. Crash-consistent backup

And this prioritization applied to Replica VMs as well. Conversations with customers indicated that they were comfortable with crash-consistency for a Replica VM, if application-consistency was not possible. Of course, anything less than crash-consistency was not acceptable and customers preferred that backups fail rather than have inconsistent data getting backed up.

Attempting application-consistency

Typical backup products try to ensure application-consistency of the data being backed up (using the VSS framework) – and this works out well when the VM is running. However, the Replica VM is always turned off until a failover is initiated, and VSS is unable to guarantee application-consistent backup for a Replica VM. Thus getting application-consistent backup of a Replica VM is not possible.

Guaranteeing crash-consistency

In order to ensure that customers backing up Replica VMs always get crash-consistent data, a set of changes were introduced in Windows Server 2012 R2 that failed the backup operation if consistency could not be guaranteed. The virtual disk could be inconsistent when any one of the below conditions are encountered, and in these cases backup is expected to fail.

  1. HRL logs are being applied to the Replica VM
  2. Previous HRL log apply operation was cancelled or interrupted
  3. Previous HRL log apply operation failed
  4. Replica VM health is Critical
  5. VM is in the Resynchronization Required state or the Resynchronization in progress state
  6. Migration of Replica VM is in progress
  7. Initial replication is in progress (between the primary site and secondary site)
  8. Failover is in progress

Dealing with failures

These are largely treated as transient error states and the backup product is expected to retry the backup operation based on its own retry policies. With 30 second replication and apply being supported in Windows Server 2012 R2, the backup operation is expected to collide with HRL log apply more frequently – resulting in error scenario 1 mentioned above. A robust retry mechanism is needed to ensure a high backup success rate. In case the backup product is unable to retry or cope with failures then an option is to explicitly pause the replication before the backup is scheduled to run.

 

Backing up Replica VMs using DPM

The backup of Replica VMs on Windows Server 2012 R2 hosts using Data Protection Manager 2012 R2 is now supported“Backup of primary virtual machines is supported. Backup of replica (secondary) virtual machines is supported on Hyper-V servers running Windows Server 2012 R2”. The matrix below gives the clearest picture about what deployment configurations are supported and which ones are not:

 Host OS on Replica (secondary/tertiary) serverHost OS on Primary server
 Windows Server 2012Windows Server 2012 R2Windows Server 2012Windows Server 2012 R2
DPM 2012Not supportedNot supportedSupportedNot supported
DPM 2012 R2Not supportedSupportedSupportedSupported

 

DPM 2012 R2 along with Windows Server 2012 R2 provides the right experience for the backup of Replica VMs. It includes the changes to the platform that ensure crash-consistency of backups, and the appropriate retry mechanism to ensure a high success rate. This has been validated with internal tests involving 2 servers with different VM mixes that are backed up by a DPM server(below).

 Server 1 (Total 44 VMs)Server 2  (Total 36 VMs)DPM Server
Primary VMs1012Not applicable
Replica VMs420Not applicable
Non-replicating VMs304Not applicable
Host OSWindows Server 2012 R2Windows Server 2012 R2Windows Server 2012 R2
Host RAM144 GB144 GB72 GB
Network bandwidth available for replication10 Gbps10 Gbps1 Gbps
Storage subsystem6 TB FC SAN storage2.5 TB FC SAN storage6 TB Direct attached storage
Number of VHDs2-4 VHDs based on workload2-4 VHDs based on workloadNot applicable
VM WorkloadMix of SQL, Exchange, and IOMeterMix of SQL, Exchange, and IOMeterNot applicable

 

Total test duration48 hours
Backup frequencyevery 3 hours
Number of backup points16 backup points expected per VM at the end of 48 hours
Number of VMs backed up by the DPM serverTotal:  80 VMs
(Server 1: 44 VMs, Server 2: 36 VMs)
  
Replication frequency30s for all virtual machines
Recovery History of Replica VMVariable (0-5 recovery points)

 

We measured the actual number of backup points created per VM in DPM, and matched that with our expectation of 16 recovery points per VM for the 48 hour test duration. The results of the test were extremely positive, with close to 100% success rate of the backup jobs.

 

 

Server 1  (Total 44 VMs)

Server 2  (Total 36 VMs)

 

Number of VMs

Expected number
of backup points

Actual number
of backup points

Number of VMs

Expected number
of backup points

Actual number
of backup points

 ( A )( A*16 )-( B )( B*16 )-
Primary VMs10160160 (100%)12192192 (100%)
Replica VMs4 64 64 (100%)20 320 300 (94%)
Non-replicating VMs30 480 480 (100%)4 64 64 (100%)

 

Takeaways

  1. Backup of Replica VMs is now a supported scenario.
  2. Only crash-consistent backup of a Replica VM is guaranteed.
  3. A robust retry mechanism needs to be configured in the backup product to deal with failures. Or ensure that replication is paused when backup is scheduled.

Viewing all articles
Browse latest Browse all 5932

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>