Deduplication for Data Protection and Archiving
The deduplication process includes the following key components:
Deduplication settings, such as the location of the deduplication database (DDB) and the disk library that is used to store the data, are set in the storage policy.
When you configure a storage policy with deduplication, we recommend that you follow the practices described in Deduplication Building Block Guide.
You can configure deduplication for MediaAgent-side or source-side.
After the signature is generated on the data block, both the block and the signature are sent to the MediaAgent. The MediaAgent compares the signature within the DDB:
- If the signature does not exist that means the data block is unique. The block is written to the disk storage and the signature is updated in the database.
- If the signature already exists in the database that means the data block already exists on the disk. The data block is discarded but the signature and index information is saved.
After the signature is generated on the data block, only the signature is sent to the MediaAgent. The MediaAgent compares the signature within the DDB.
- If the signature does not exist that means the data block is unique. The MediaAgent requests the block to be sent from the client to the MediaAgent which then writes the data to the disk.
- If the signature already exists in the database that means the block already exists on the disk. The MediaAgent informs the client to discard the block and the signature and index information is saved.
Source-side deduplication can be used with source-side disk cache (local disk cache). For more information, see Advanced Client Properties - Deduplication.
The Deduplication Database (DDB) is the primary component of the deduplication process. The DDB maintains all of the signatures for a deduplicated storage policy. The DDB uses in-memory database to store the data signatures.
Each storage policy can be configured with one of the following types of databases:
Uses only one DDB.
Restriction: Legacy configuration for backward compatibility with Version 9 only.
Uses DDB with one or more partitions. A multiple partitioned deduplication database provides the following advantages:
- Enterprise Scalability
Expands the maximum number of deduplication signatures possible across multiple DDB partitions. This facilitates enterprise scalability for large volumes and offers longer retention of deduplicated data.
- Faster Backups
Increases throughput and concurrency by distributing the database processing load across multiple partitions.
Enables partition failover by automatically redirecting the deduplication process if one partition is temporarily unavailable.
- Enterprise Scalability
Important: You can configure additional partitions on a DDB for a storage policy enabled with deduplication. To add more partitions to a DDB used by a storage policy, see Configuring Additional Partitions for a Deduplication Database.
The deduplication engine is the process that runs on the DDB MediaAgent. In case of multiple DDB partitions, each partition has its own dedicated deduplication engine. The deduplication engine manages the state and the consistency of the DDB and allocates the access to the DDB to operations like backups, Dash Copy and data aging.