Deduplication FAQ

Is deduplication supported for NAS NDMP backups?

Yes, deduplication is supported for NAS NDMP backups.

The following vendors support the best deduplication rates for NAS backups:

  • Dell EMC PowerStore, Dell EMC VNX / Celerra, Dell EMC Unity, and Dell EMC VNX, excluding VBB backups

  • Dell EMC Isilon/PowerScale

  • Hitachi NAS

  • Huawei

  • NetApp, excluding SMTape backups (backups from image backup set)

For all other NAS backups, deduplication is supported but the savings might not be as high. To achieve a better deduplication rate, consider one of the following options:

  • Migrate to one of the vendor configurations that provides a better deduplication rate.

  • Move backups from the NDMP Agent to a Windows File System Agent using a CIFS share for subclient content.

  • Move backups to a UNIX File System Agent using a mounted file system for subclient content.

When moving backups from the NDMP Agent to a Windows File System Agent or UNIX File System Agent, first verify the data transfer rate over CIFS or NFS as a lower transfer rate might result in slower backups. Also, the File System Agent configuration supports synthetic full backups and DASH copy. See Adding a CIFS or NFS Network Share Backup to a NAS Client.

Can I host the Deduplication Database on Laptops or Workstations?

No. Hosting the deduplication database on a Laptops or Workstations is not allowed when all of the following conditions are true:

  • MediaAgent with the Windows File System Agent is installed on a Laptop or Workstation.

  • During installation, the Configure for Laptop or Desktop Backup check box was selected on the Policy Selection page of the Installer wizard.

How to deduplicate existing non-deduplicated backed up data?

If you have non-deduplicated backed up data or if your clients are pointed to non-deduplicated storage policy and if you wish to deduplicate existing non-deduplicated data, create a secondary copy by enabling deduplication on the copy and run an Auxiliary Copy job on the secondary copy. See Creating a Storage Policy Copy with Deduplication for instructions.

If necessary, after all the data is copied to the new secondary copy, you can promote the secondary copy as the primary copy so that subsequent backups are automatically deduplicated. See Setting Up the Storage Policy Copy to be the Primary Copy for instructions.

How do I disable deduplication?

You can enable or disable deduplication only during the storage policy creation. After storage policy creation, deduplication cannot be disabled on a storage policy.

However, you can use the following workaround to disable deduplication:

  • Disable deduplication on all the subclients associated with the storage policy copy.

  • Create a new storage policy without enabling Deduplication and re-associate the necessary subclients to that storage policy.

  • Create a secondary copy, and run an auxiliary copy in the secondary copy. Then promote the secondary copy as the primary copy.

  • You can also disable deduplication temporarily from the DDB storage policy copy properties dialog box >Deduplication tab. On the Advanced tab, select the Temporarily disable deduplication check box. For more details, see Temporarily disable deduplication.

What happens if the Snap creation fails during DDB backup?

During DDB backup using DDB subclient, if the VSS (Windows) or LVM (Linux) snapshot fails, the DDB backup fails by default with the following error message:

For Windows:

Error Code: [19:857]

Description: The job has failed because the VSS snapshot could not be created

For Unix:

Error Code: [19:857]

Description: Snap not supported on the volumes holding DDB paths or we failed to take snapshot. Backup will fail, error=[Case specific error message]

To continue the DDB backup with live volume when snapshot creation fails, contact Commvault Customer Support for assistance.

How should I schedule DDB Backups with respective to Data Aging jobs to minimize the time for reconstruction?

After the completion of the data aging job scheduled on the CommServe, the physical pruning on the disk library begins. The best time to run the deduplication database backup is when all the physical pruning of the data blocks on the disk library is complete, so that the reconstruction job that uses the deduplication database snapshot from this backup job will not have to replay that many prune records. The physical pruning usually takes a few hours after the completion of the data aging job. So statistically, it will be good to run the deduplication database backup at the mid point between two Data Aging jobs.

So for example, if the data aging is scheduled to run for every 6 hours, the recommended schedules are:

Data Aging: Every 6 hours starting at 3:00 AM.

Deduplication Database Backup: Every 6 hours starting at 6:00 AM.

How should I schedule DDB Backups with respective to File System Backups using the same storage policy?

Schedule the DDB backup in such a way that it runs when there are a fewer backups in progress as possible. This will ensure that the DDB backup will finish sooner.

What happens if one of the DDBs is offline?

If you have single DDB configured, if that DDB is offline, you can still continue the client backup without deduplication. See How do I continue my client backups during DDB recovery? for more information.

If you have configured partitioned DDB on the storage policy, in an unlikely event of one of the partitions going offline, you could use the available partition to run the backup jobs. For more information on enabling, see Modifying Deduplication Database Recovery Settings.

If a library or a mount path is offline, then you will not be able to restore the data until that library or the mount path is brought back online.

What do I do when drive hosting the DDB is lost or the DDB files are missing?

If the drive (for example, E:\) hosting the DDB is lost or the DDB files are missing, then reconstruct the DDB by performing the steps described in Recovering Permanently Offline Deduplication Database Partitions.

How do I continue my client backups during DDB recovery?

When you have multiple client backups scheduled to storage policy copy with deduplication and if the backup jobs are in a Waiting state for a long period of time due to the following reason (displayed in the Job Controller).

Description: DeDuplication DB access path [D:\DDB01] on MediaAgent [mediaagent01] is offline for Storage Policy Copy [ SP01 / Primary ]. Offline Reason: The active deduplication store of current storage policy copy is not available to use. Please wait till the DeDuplication DB Reconstruction job reconstructs the partitions. Source: mm4, Process: Job Manager

You can avoid the Waiting state of the backup jobs by enabling Allow backup jobs to run to deduplication storage policy copy when DDB is an unusable state option. This option allows you to continue client backups to same deduplicated storage policy without deduplication. Once DDB is recovered, client backups will automatically continue with the deduplication.

To configure the client backups to continue during DDB recovery:

  1. On the ribbon in the CommCell Console, click the Storage tab, and then click the Media Management.

  2. In the Media Management Configuration dialog box, click the Resource Manager Configuration tab.

  3. In the Allow backup jobs to run to deduplication storage policy copy when DDB is an unusable state box, enter 1 to enable.

  4. Click OK.

    Once the DDB is recovered, the new client backups will automatically continue with deduplication.

    Note that during this process the data that was backed up without deduplication will remain as non-deduplicated data.

Why is the deduplication backup or DASH copy not utilizing the network bandwidth?

When jobs with deduplication are run, the data is read from the source in data blocks. A signature is generated for the block of data using hash algorithm. The signature is compared against a deduplication database (DDB) of existing signatures for data blocks already in the destination storage.

  • If the signature does not exist, the DDB is updated with the new signature. The block will be written to the disk and the signature will be logged in the DDB.

  • If the signature already exists in the DDB, the DDB is updated to reflect another existing data block on the destination storage. The data block and the index information are written to the destination storage.

Since redundant data is not sent over the network, the amount of data transferred across the network from the source to the destination computer will be lower with deduplication. This results the lower bandwidth utilization for backups with deduplication or DASH copies compared to non-deduplicated backups.

To test the bandwidth utilization of backups with deduplication or DASH copies, send unique data across the network using the following steps:

  1. During this test, to obtain accurate results make sure that no jobs are run between the client and MediaAgent.

  2. Create two new subclients on a client with same content. For example:

    • subclient01 with content D:\content01

    • subclient02 with content D:\content02

  3. Assign subclient01 to non-deduplicated storage policy and subclient02 to deduplicated storage policy.

  4. For non-deduplicated backup on subclient01, do the following:

    1. Run a full backup.

    2. Record the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      See Viewing Job Details for instructions.

    3. Enable the network bandwidth throttling by specifying the following settings:

      1. Select client and the data path MediaAgent under Remote Clients or Client Group section

      2. Specify the throttling value for Throttle Send (Kbps) and Throttle Receive (Kbps) as half of the average throughput value.

        That is, if the average throughput value was 100 GB/hr set the throttling value as 50 GB/hr.

      See Network Bandwidth Throttling - Getting Started for instructions.

    4. Run another full backup.

    5. View the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      The throughput should be closer to the value set in the throttling option.

  5. For deduplicated backup on subclient02, do the following:

    1. Disable network bandwidth throttling on the client.

      See Disabling Network Bandwidth Throttling for instructions.

    2. Run a full backup on subclient02.

    3. Record the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      See Viewing Job Details for instructions.

    4. Enable the network bandwidth throttling by specifying the following settings:

      1. Select client and the data path MediaAgent under Remote Clients or Client Group section

      2. Specify the throttling value for Throttle Send (Kbps) and Throttle Receive (Kbps) as half of the average throughput value.

      See Network Bandwidth Throttling - Getting Started for instructions.

    5. Seal the deduplication database to ensure that all the data is sent over the network when throttling is enabled and to utilize the available bandwidth.

      See Sealing the Deduplication Database for instructions.

    6. Run another full backup.

    7. View the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      The throughput should be closer to the value set in the throttling option.

This test ensures that the network bandwidth throttling utilization is similar for both deduplicated and non-deduplicated backups.

Can I setup Silo storage with a Global Deduplication Policy?

Silo storage can be enabled with Global Deduplication copy. When a Global Deduplication Policy is configured with Silo Storage, the deduplication database and silo storage operations are performed from the global deduplication copy. All deduplication database and silo storage operations are available for this copy. The silo backups can also be initiated from this copy.

You can create additional copies of silo backups by creating secondary copies of the global deduplication copy. The secondary copies are created from the Silo Storage set data, preserving deduplication on the copies. Basically, this copy creates an auxiliary copy of the silo storage backup without unraveling the deduplication — similar to Tape-to-Tape Auxiliary Copies. When a copy is associated with a global deduplication policy, all data paths are inherited from the global deduplication policy; so silo data paths must be added at the global deduplication policy level and not at the copy level.

Global deduplication policy in itself does not contain any data. So auxiliary copy, data verification, and media refresh operations are applicable for a global deduplication policy, only if it is enabled with silo storage.

How does space reclamation work for deduplicated data?

Data pruning with deduplication involves cross-checking the availability of data blocks across different backups. Deduplicated data is pruned from the secondary storage. However, the space is reclaimed only when the last backup job that references the deduplicated data is deleted. If all data is deleted from a storage policy copy, no data blocks are available for cross-checking resulting in faster storage space reclamation.

For example:

You have two backups, Job1 and Job2, that contain similar data. Job2 references most of the data blocks from Job1. If Job1 is deleted, then the data blocks will not be pruned from the secondary storage because Job2 is still referencing the data blocks. When Job 2 is deleted, the data blocks will be pruned from the secondary storage and space will be reclaimed.

Why is Quick Verification option disabled for incremental deduplicated data verification job?

The Quick Verification for Deduplicated Database option on the Data Verification dialog box is disabled because it checks only the presence of the data blocks on the disk. Whereas, the Incremental Data Verification job verifies the newly added data blocks and the data blocks that are not verified during the last data verification job.

Why is the disk space usage high for deduplicated data?

If the disk space usage for deduplicated data is high, then check the following:

  • Ensure that the data aging is enabled on the storage policy copy. For more information, see Data Aging.

  • If the number of prunable records are high, then perform the following DD0054.

  • Ensure that the Managed Disk Space for Disk Library option is disabled on the Copy Properties dialog box. For more information, see Thresholds for Managed Disk Space.

  • For deduplication databases older than version 5, do not configure extended retention rules on the deduplicated storage policy copy. However, if you want to configure extended retention rules for the deduplicated data, use selective copy.

If all of the above conditions are met and still the disk space usage is high, then configure the Do not Deduplicate against objects older than n day(s) option to 4 times of the retention days when the retention is below 90 days on the storage policy copy. If the retention is above 90 days or the extended retention is set on the storage policy copy, then do not use this option. This option allows subsequent backups to not use the old unique data blocks, thereby reducing the possibility of drill holes. For more information, see Deduplication Database Properties - Settings.

Is DDB used during restore operations?

No, the DDB is not used during the restore process.

Frequently Asked Questions - How do I view job ids of a sealed DDB?

Follow the steps given below to view the job ids of a sealed DDB:

  1. From the CommCell Browser, expand Storage Resources > Deduplication Engines > storage_policy_copy.

  2. Right-click the Sealed deduplication database, and then click View > Jobs.

    The list of jobs and their job ids that are associated with the sealed DDB are displayed.

Why do I see event messages regarding DDB engine resynchronization without running disaster recovery (DR)?

You may see event messages regarding DDB engine resynchronization even without running a disaster recovery operation because the system automatically validates the archive files in a deduplication engine and prunes the orphaned archive files. Backups are allowed to a deduplication engine during the archive file validation. This operation runs every 24 hours or 30 days depending on the number of orphaned archive files.

The following Event Messages appear in the Event Viewer:

Resync of Deduplication Engine [] failed. Resync will be attempted again. Please check MediaManager service log for detailed information.

Resync of Deduplication Engine [] cannot proceed. Reason [].

Initiating resync on [] Deduplication Engines.

Deduplication Engine [] successfully resync-ed and marked online.

Why is the Deduplication Database (DDB) resynchronization request getting timed out?

If the DDB MediaAgent is unable to respond to the request for DDB resynchronization within 30 seconds, then the resynchronization request is timed out till another DDB resynchronization request is made.

Also, during the CommServe disaster recovery (DR) restore process, where the DDB is also in offline maintenance mode, the deduplication does not happen until the DDB resynchronization is complete.

You can use the nDDBResyncNwTimeoutSec additional setting to modify the time set for the network timeout.

Loading...