The deduplication process includes the following key components:
Storage Policy with Deduplication
Deduplication settings, such as the location of the deduplication database (DDB) and the disk library that is used to store the data, are set in the storage policy.
When you configure a storage policy with deduplication, we recommend that you follow the practices described in Deduplication Building Block Guide.
Deduplication Modes
You can configure deduplication for MediaAgent-side or source-side.
MediaAgent-Side Deduplication
After the signature is generated on the data block, both the block and the signature are sent to the MediaAgent. The MediaAgent compares the signature within the DDB:
-
If the signature does not exist that means the data block is unique. The block is written to the disk storage and the signature is updated in the database.
-
If the signature already exists in the database that means the data block already exists on the disk. The data block is discarded but the signature and index information is saved.
Source-Side Deduplication
After the signature is generated on the data block, only the signature is sent to the MediaAgent. The MediaAgent compares the signature within the DDB.
-
If the signature does not exist that means the data block is unique. The MediaAgent requests the block to be sent from the client to the MediaAgent which then writes the data to the disk.
-
If the signature already exists in the database that means the block already exists on the disk. The MediaAgent informs the client to discard the block and the signature and index information is saved.
Source-side deduplication can be used with source-side disk cache (local disk cache). For more information, see Advanced Client Properties - Deduplication.
Deduplication Database
The Deduplication Database (DDB) is the primary component of the deduplication process. The DDB maintains all of the signatures for a deduplicated storage policy. The DDB uses in-memory database to store the data signatures.
Each storage policy can be configured with one of the following types of databases:
Single DDB
Uses only one DDB.
Restriction: Legacy configuration for backward compatibility with Version 9 only.
Partitioned Deduplication Database (DDB)
Uses DDB with one or more partitions. A multiple partitioned deduplication database provides the following advantages:
-
Enterprise Scalability
Expands the maximum number of deduplication signatures possible across multiple DDB partitions. This facilitates enterprise scalability for large volumes and offers longer retention of deduplicated data.
-
Faster Backups
Increases throughput and concurrency by distributing the database processing load across multiple partitions.
-
Resiliency
Enables partition failover by automatically redirecting the deduplication process if one partition is temporarily unavailable.
Important: You can configure additional partitions on a DDB for a storage policy enabled with deduplication. To add more partitions to a DDB used by a storage policy, see Configuring Additional Partitions for a Deduplication Database.
Deduplication Engine
The deduplication engine is the process that runs on the DDB MediaAgent. In case of multiple DDB partitions, each partition has its own dedicated deduplication engine. The deduplication engine manages the state and the consistency of the DDB and allocates the access to the DDB to operations like backups, Dash Copy and data aging.