Live Recovery of VMware Virtual Machines

The Live Recovery feature enables virtual machines (VMs) to be recovered and powered on from a backup without waiting for a full restore of the VM. This feature can be used to recover a VM that has failed and needs to be placed back in production quickly, or to validate that a backup can be used in a disaster recovery scenario.

Prerequisites

  • To verify the version of VMware software that is required to support this feature, see System Requirements. VMware licensing must include a license for vMotion operations.

  • The ESX server used to mount the NFS datastore for the browse and restore operations must be able to resolve the MediaAgent (which uses 3DFS components to perform the live recovery). To ensure connectivity, create a host file entry for the MediaAgent on the ESX server.

  • The Live Recovery feature uses a 3DFS cache on the MediaAgent that performs the Live Recovery. By default, the 3DFS cache is located in the Job Results folder for the MediaAgent; but you can change the path using the s3dfsRootDir additional setting. The 3DFS cache is circular; unused data are pruned from the cache as needed. By default 5% free space is maintained on the cache; but you can change the required percentage of free space using the n3dfsCacheMinFree additional setting.

    For each live recovery job, the 3DFS cache requires minimum free space equal to the larger of the following values:

    • 20 GB

    • 15% of the total VM size (the sum of the sizes of all VMDKs for the VM)

      Note

      For faster recovery times, the 3DFS cache should be hosted on a solid state drive (SSD) using flash memory storage.

  • The user performing the live recovery operation must be an owner of the virtualization client and VSA proxy used for the operation. For more information, see Ownership and Capabilities Needed for Virtual Machine Recovery.

  • The vCenter user account must have permissions set as described in Permissions for Custom User Accounts.

Considerations

  • Live VM recovery is supported for recovery from the following types of backups:

    • Streaming backups and backup copies that use magnetic disk libraries

    • IntelliSnap backups that use either NetApp snapshot engines or one of the non-NetApp engines that are listed in IntelliSnap Support for Live VM Recovery.

  • Live VM recovery is not supported for the following operations:

    • Backups to tape libraries

    • Archived VMs

    • Multiple VM restores in the same job

    • Simultaneous live recovery and live browse operations for the same virtual machine

  • The operating system of the MediaAgent used for live recovery does not need to match the operating system of the guest VM. For example, a Windows MediaAgent can be used to recover a Linux VM.

  • For block-level restores, in addition to the restore job, the Job Controller launches a persistent recovery job that opens a common pipeline, enabling multiple extent recall requests to be submitted as a group. The default timeout for a persistent recovery job is 7 days. For block-level restores using the Virtual Server Agent, the persistent recovery job remains open for 7 days and can be used for subsequent block-level restores that use the same proxy.

Recovery Process

Data is restored from the backup as needed to enable the operations requested by the VM, and the full restore completes as resources allow. The backup is not modified by the restore process.

The process for a Live Recovery is:

  1. When this option is selected for a restore, the restore operation can use the MediaAgent that was used to perform the backup.

  2. Rather than reading the backup, the restore process exposes the backup to the destination ESX server as a network file system (NFS) export.

  3. The NFS export is mounted to the destination ESX server as an NFS datastore.

  4. When the NFS datastore is visible to the ESX server, the restore process retrieves the .vmx and catalog files for the VM.

    The .vmx file is modified to indicate that writes can be made to the VMDK files on the NFS datastore (or the VM can be modified to redirect writes to an alternate datastore).

  5. When the VM files are available to the NFS datastore, the VM is registered and can be powered on.

  6. Any reads for the virtual machine disks are handled by the MediaAgent, which restores the requested data to the NFS cache and presents it to the ESX server.

  7. After the initial reads needed to make the VM usable, a storage vMotion is initiated to migrate the virtual machine to the destination datastore specified for the restore.

  8. When the migration is complete, the ESX server unexports the backup and unmounts the datastore (if there are no other paths exported to the ESX server). When the cleanup is done, the restore job is marked as complete.

Loading...