Loading...

Deduplication FAQ

Table of Contents

Can I use other Hardware or Software Deduplication Applications with Commvault Deduplication?

Why is the DDB size large even though data aging is occurring?

Is deduplication supported for NAS NDMP backups?

Can I host the Deduplication Database on Laptops or Workstations?

How do I release a deduplication license?

How to deduplicate existing non-deduplicated backed up data?

How do I disable deduplication?

What happens if the Snap creation fails during DDB backup?

How should I schedule DDB Backups with respective to Data Aging jobs to minimize the time for reconstruction?

How should I schedule DDB Backups with respective to File System Backups using the same storage policy?

What happens if one of the DDBs is offline?

What do I do when drive hosting the DDB is lost or the DDB files are missing?

How do I continue my client backups during DDB recovery?

Why is the deduplication backup or DASH copy not utilizing the network bandwidth?

Can I setup Silo storage with a Global Deduplication Policy?

How does space reclamation work for deduplicated data?

Why is Quick Verification option disabled for incremental deduplicated data verification job?

Why is the disk space usage is high for deduplicated data?

Is DDB used during restore operations?

Frequently Asked Questions - How do I view job ids of a sealed DDB?

Why do I see event messages regarding DDB engine resynchronization without running disaster recovery (DR)?

How does data aging impact the data when new partitions are added to the DDB?

Can I use other Hardware or Software Deduplication Applications with Commvault Deduplication?

We recommend not to use other hardware or software deduplication applications with Commvault Deduplication. This is because using a secondary software or hardware deduplication application on top of Commvault deduplication can incur unneeded performance and resource utilization on systems and provides no further benefit to the already deduplicated data.

Why is the DDB size large even though data aging is occurring?

The DDB size might be large because the storage policy copy associated with the DDB might have extended retention configured.

For deduplicated storage policy copy, we recommend not to configure extended retention rules because the jobs with extended retention rule can hold on to the unique data blocks that are aged by the basic retention rule. This happens because, for deduplication, the data is written in the form of unique data blocks that are shared among multiple jobs.

As a result, when basic retention jobs are aged, comparatively less space gets reclaimed from the disk library due to some of the unique blocks pertaining to extended retention jobs. More significant space would be reclaimed from the disk library only when the extended retention jobs are aged.

If you still want to configure extended retention rules for deduplicated jobs, we recommend to do the following:

  • Create a selective copy with deduplication for each selective criteria (for example, weekly, monthly, and so on) and set the higher basic retention period on each selective copy.

    For instructions, see Creating a Selective Copy.

  • To preserve data for long-term, run an auxiliary copy job to copy data from source to synchronous copy.

Is deduplication supported for NAS NDMP backups?

Yes, deduplication is supported for NAS NDMP backups.

The following vendors support the best deduplication rates for NAS backups:

  • NetApp, excluding SMTape backups (backups from image backup set)
  • Dell EMC VNX / Celerra, Dell EMC Unity, and Dell EMC VNX, excluding VBB backups
  • Hitachi NAS
  • Huawei
  • Dell EMC Isilon

For all other NAS backups, deduplication is supported but the savings might not be as high. To achieve a better deduplication rate, consider one of the following options:

  • Migrate to one of the vendor configurations that provides a better deduplication rate.
  • Move backups from the NDMP Agent to a Windows File System Agent using a CIFS share for subclient content.
  • Move backups to a UNIX File System Agent using a mounted file system for subclient content.

When moving backups from the NDMP Agent to a Windows File System Agent or UNIX File System Agent, first verify the data transfer rate over CIFS or NFS as a lower transfer rate might result in slower backups. Also, the File System Agent configuration supports synthetic full backups and DASH copy. See Adding a CIFS or NFS Network Share Backup to a NAS Client.

Can I host the Deduplication Database on Laptops or Workstations?

No. Hosting the deduplication database on a Laptops or Workstations is not allowed when all of the following conditions are true:

  • MediaAgent with the Windows File System Agent is installed on a Laptop or Workstation.
  • During installation, the Configure for Laptop or Desktop Backup check box was selected on the Policy Selection page of the Installer wizard.

How do I release a deduplication license?

If you no longer require a De-Duplication Block Level license on a MediaAgent computer where a DDB is hosted, you can release the license and use it later for another computer. For example, when host computer is being retired.

Note that since DDB's are associated with a Storage Policy, once you release the license, you cannot perform any new backups using the Storage Policy associated with the DDB. The data backed up using the storage policy will be available for restores.

  1. From the CommCell Browser, expand Storage Resources > MediaAgents.
  2. Right-click the appropriate MediaAgent where the DDB is hosted, and then click Release license for MediaAgent.
  3. Select the De-Duplication Block Level and click Add to move license to the Licenses to Release list box.
  4. Click OK.
  5. In the Confirm dialog box, click Yes to continue.

    If you have storage policies that are still using this MediaAgent to host the DDB, a message appears.

  6. Click Yes to continue to release license.

    The associated storage policy is now functional for restore only.

How to deduplicate existing non-deduplicated backed up data?

If you have non-deduplicated backed up data or if your clients are pointed to non-deduplicated storage policy and if you wish to deduplicate existing non-deduplicated data, create a secondary copy by enabling deduplication on the copy and run an Auxiliary Copy job on the secondary copy. See Creating a Storage Policy Copy with Deduplication for instructions.

If necessary, after all the data is copied to the new secondary copy, you can promote the secondary copy as the primary copy so that subsequent backups are automatically deduplicated. See Setting Up the Storage Policy Copy to be the Primary Copy for instructions.

How do I disable deduplication?

You can enable or disable deduplication only during the storage policy creation. After storage policy creation, deduplication cannot be disabled on a storage policy.

However, you can use the following workaround to disable deduplication:

  • Disable deduplication on all the subclients associated with the storage policy copy.
  • Create a new storage policy without enabling Deduplication and re-associate the necessary subclients to that storage policy.
  • Create a secondary copy, and run an auxiliary copy in the secondary copy. Then promote the secondary copy as the primary copy.
  • You can also disable deduplication temporarily from the DDB storage policy copy properties dialog box >Deduplication tab. On the Advanced tab, select the Temporarily disable deduplication check box. For more details, see Temporarily disable deduplication.

What happens if the Snap creation fails during DDB backup?

During DDB backup using DDB subclient, if the VSS (Windows) or LVM (Linux) snapshot fails, the DDB backup fails by default with the following error message:

For Windows:

Error Code: [19:857]

Description: The job has failed because the VSS snapshot could not be created

For Unix:

Error Code: [19:857]

Description: Snap not supported on the volumes holding DDB paths or we failed to take snapshot. Backup will fail, error=[Case specific error message]

To continue the DDB backup with live volume when snapshot creation fails, you can set the bFailDDBJobIfSnapCreationFails additional setting on the MediaAgent hosting the DDB with value 0.

For instructions on creating additional setting, see Add or Modify an Additional Setting.

During this process:

  • The DDB subclient will quiesce the SIDB engine till the backup completes. If there are backups using the DDB, these backups will continue to show as running state, but it waits until the SIDB engine is active again.
  • The quiesced DDB will be unavailable to new jobs putting them in a wait state until the SIDB engine is active again.

How should I schedule DDB Backups with respective to Data Aging jobs to minimize the time for reconstruction?

After the completion of the data aging job scheduled on the CommServe, the physical pruning on the disk library begins. The best time to run the deduplication database backup is when all the physical pruning of the data blocks on the disk library is complete, so that the reconstruction job that uses the deduplication database snapshot from this backup job will not have to replay that many prune records. The physical pruning usually takes a few hours after the completion of the data aging job. So statistically, it will be good to run the deduplication database backup at the mid point between two Data Aging jobs.

So for example, if the data aging is scheduled to run for every 6 hours, the recommended schedules are:

Data Aging: Every 6 hours starting at 3:00 AM.

Deduplication Database Backup: Every 6 hours starting at 6:00 AM.

How should I schedule DDB Backups with respective to File System Backups using the same storage policy?

Schedule the DDB backup in such a way that it runs when there are a fewer backups in progress as possible. This will ensure that the DDB backup will finish sooner.

What happens if one of the DDBs is offline?

If you have single DDB configured, if that DDB is offline, you can still continue the client backup without deduplication. See How do I continue my client backups during DDB recovery? for more information.

If you have configured Partitioned DDB on the storage policy, in an unlikely event of one of the partitions going offline, you could use the available partition to run the backup jobs. For more information on enabling, see Modifying Deduplication Database Recovery Settings.

What do I do when drive hosting the DDB is lost or the DDB files are missing?

If the drive (for example, E:\) hosting the DDB is lost or the DDB files are missing, then reconstruct the DDB by performing the steps described in Recovering Permanently Offline Deduplication Database Partitions.

How do I continue my client backups during DDB recovery?

When you have multiple client backups scheduled to storage policy copy with deduplication and if the backup jobs are in a Waiting state for a long period of time due to the following reason (displayed in the Job Controller).

Description: DeDuplication DB access path [D:\DDB01] on MediaAgent [mediaagent01] is offline for Storage Policy Copy [ SP01 / Primary ]. Offline Reason: The active deduplication store of current storage policy copy is not available to use. Please wait till the DeDuplication DB Reconstruction job reconstructs the partitions.
Source: mm4, Process: Job Manager

You can avoid the Waiting state of the backup jobs by enabling Allow backup jobs to run to deduplication storage policy copy when DDB is an unusable state option. This option allows you to continue client backups to same deduplicated storage policy without deduplication. Once DDB is recovered, client backups will automatically continue with the deduplication.

To configure the client backups to continue during DDB recovery:

  1. On the ribbon in the CommCell Console, click the Storage tab, and then click the Media Management.
  2. In the Media Management Configuration dialog box, click the Resource Manager Configuration tab.
  3. In the Allow backup jobs to run to deduplication storage policy copy when DDB is an unusable state box, enter 1 to enable.
  4. Click OK.

    Once the DDB is recovered, the new client backups will automatically continue with deduplication.

    Note that during this process the data that was backed up without deduplication will remain as non-deduplicated data.

Why is the deduplication backup or DASH copy not utilizing the network bandwidth?

When jobs with deduplication are run, the data is read from the source in data blocks. A signature is generated for the block of data using hash algorithm. The signature is compared against a deduplication database (DDB) of existing signatures for data blocks already in the destination storage.

  • If the signature does not exist, the DDB is updated with the new signature. The block will be written to the disk and the signature will be logged in the DDB.
  • If the signature already exists in the DDB, the DDB is updated to reflect another existing data block on the destination storage. The data block and the index information are written to the destination storage.

Since redundant data is not sent over the network, the amount of data transferred across the network from the source to the destination computer will be lower with deduplication. This results the lower bandwidth utilization for backups with deduplication or DASH copies compared to non-deduplicated backups.

To test the bandwidth utilization of backups with deduplication or DASH copies, send unique data across the network using the following steps:

  1. During this test, to obtain accurate results make sure that no jobs are run between the client and MediaAgent.
  2. Create two new subclients on a client with same content. For example:
    • subclient01 with content D:\content01
    • subclient02 with content D:\content02
  3. Assign subclient01 to non-deduplicated storage policy and subclient02 to deduplicated storage policy.
  4. For non-deduplicated backup on subclient01, do the following:
    1. Run a full backup.
    2. Record the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      See Viewing Job Details for instructions.

    3. Enable the network bandwidth throttling by specifying the following settings:
      1. Select client and the data path MediaAgent under Remote Clients or Client Group section
      2. Specify the throttling value for Throttle Send (Kbps) and Throttle Receive (Kbps) as half of the average throughput value.

        That is, if the average throughput value was 100 GB/hr set the throttling value as 50 GB/hr.

      See Network Bandwidth Throttling - Getting Started for instructions.

    4. Run another full backup.
    5. View the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      The throughput should be closer to the value set in the throttling option.

  5. For deduplicated backup on subclient02, do the following:
    1. Disable network bandwidth throttling on the client.

      See Disabling Network Bandwidth Throttling for instructions.

    2. Run a full backup on subclient02.
    3. Record the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      See Viewing Job Details for instructions.

    4. Enable the network bandwidth throttling by specifying the following settings:
      1. Select client and the data path MediaAgent under Remote Clients or Client Group section
      2. Specify the throttling value for Throttle Send (Kbps) and Throttle Receive (Kbps) as half of the average throughput value.

      See Network Bandwidth Throttling - Getting Started for instructions.

    5. Seal the deduplication database to ensure that all the data is sent over the network when throttling is enabled and to utilize the available bandwidth.

      See Sealing the Deduplication Database for instructions.

    6. Run another full backup.
    7. View the throughput value of the backup at Average Throughput option in the Backup Job Details dialog box.

      The throughput should be closer to the value set in the throttling option.

This test ensures that the network bandwidth throttling utilization is similar for both deduplicated and non-deduplicated backups.

Can I setup Silo storage with a Global Deduplication Policy?

Silo storage can be enabled with Global Deduplication copy. When a Global Deduplication Policy is configured with Silo Storage, the deduplication database and silo storage operations are performed from the global deduplication copy. All deduplication database and silo storage operations are available for this copy. The silo backups can also be initiated from this copy.

You can create additional copies of silo backups by creating secondary copies of the global deduplication copy. The secondary copies are created from the Silo Storage set data, preserving deduplication on the copies. Basically, this copy creates an auxiliary copy of the silo storage backup without unraveling the deduplication — similar to Tape-to-Tape Auxiliary Copies. When a copy is associated with a global deduplication policy, all data paths are inherited from the global deduplication policy; so silo data paths must be added at the global deduplication policy level and not at the copy level.

Global deduplication policy in itself does not contain any data. So auxiliary copy, data verification, and media refresh operations are applicable for a global deduplication policy, only if it is enabled with silo storage.

How does space reclamation work for deduplicated data?

Data pruning with deduplication involves cross-checking the availability of data blocks across different backups. Deduplicated data is pruned from the secondary storage. However, the space is reclaimed only when the last backup job that references the deduplicated data is deleted. If all data is deleted from a storage policy copy, no data blocks are available for cross-checking resulting in faster storage space reclamation.

For example:

You have two backups, Job1 and Job2, that contain similar data. Job2 references most of the data blocks from Job1. If Job1 is deleted, then the data blocks will not be pruned from the secondary storage because Job2 is still referencing the data blocks. When Job 2 is deleted, the data blocks will be pruned from the secondary storage and space will be reclaimed.

Why is Quick Verification option disabled for incremental deduplicated data verification job?

The Quick Verification for Deduplicated Database option on the Data Verification dialog box is disabled because it checks only the presence of the data blocks on the disk. Whereas, the Incremental Data Verification job verifies the newly added data blocks and the data blocks that are not verified during the last data verification job.

Why is the disk space usage is high for deduplicated data?

If the disk space usage for deduplicated data is high, then check the following:

  • Ensure that the data aging is enabled on the storage policy copy. For more information, see Data Aging.
  • If the number of prunable records are high, then perform the following DD0054.
  • Ensure that the Managed Disk Space for Disk Library option is disabled on the Copy Properties dialog box. For more information, see Thresholds for Managed Disk Space.
  • For deduplicated storage policy copy, do not configure extended retention rules. However, if you want to configure extended retention rules for deduplicated data, use selective copy. For more information, see Why is the DDB size large even though data aging is occurring?.

If all of the above conditions are met and still the disk space usage is high, then:

  • Set the Do not Deduplicate against objects older than n day(s) option such that the number of days is equal to the highest retention days set on the storage policy copy for the data. This option allows subsequent backups to not use the old unique data blocks, thereby reducing the possibility of drill holes.

Is DDB used during restore operations?

No, the DDB is not used during the restore process.

Frequently Asked Questions - How do I view job ids of a sealed DDB?

Follow the steps given below to view the job ids of a sealed DDB:

  1. From the CommCell Browser, expand Storage Resources > Deduplication Engines > storage_policy_copy.
  2. Right-click the Sealed deduplication database, and then click View > Jobs.

    The list of jobs and their job ids that are associated with the sealed DDB are displayed.

Why do I see event messages regarding DDB engine resynchronization without running disaster recovery (DR)?

You may see event messages regarding DDB engine resynchronization even without running a disaster recovery operation because the system automatically validates the archive files in a deduplication engine and prunes the orphaned archive files. Backups are allowed to a deduplication engine during the archive file validation. This operation runs every 24 hours or 30 days depending on the number of orphaned archive files.

The following Event Messages appear in the Event Viewer:

Resync of Deduplication Engine [] failed. Resync will be attempted again. Please check MediaManager service log for detailed information.

Resync of Deduplication Engine [] cannot proceed. Reason [].

Initiating resync on [] Deduplication Engines.

Deduplication Engine [] successfully resync-ed and marked online.

How does data aging impact the data when new partitions are added to the DDB?

After new partitions are added to a DDB, and on running backups, the signatures are distributed evenly across the partitions. New baseline for the DDB store is started each time a new partition is added and some of the same signatures might be written to the new partition even though they exist in the DDB store.

The data is aged when there are no more references to the blocks left and then the data is pruned physically from the DDB as well as from the magnetic disk.

For example: In your environment you have a DDB store and with Backup1 you back up 1 MB of data. This backup will create 8 signatures of 125 KB size each. The magnetic disk will have 1 MB of data.

After running the Backup1, you add Partition2 and run Backup2 of the same 1 MB of data. After the second backup, 4 signatures of 125 KB size will be added to Partition2 (even though the same signatures exists in the original store) and for the other 4 signatures only the reference will be added in the original store (as the signatures already exists). The magnetic disk will have 1.5 MB of data (1 MB from the first Backup + 500 KB from Partition2 from Backup2).

On running data aging, if Backup1 is aged, then from the first partition the first 4 signatures will be aged and also 500 KB of data will be pruned from the magnetic disk.

Last modified: 3/1/2018 7:33:13 PM