Deduplication Building Block Guide
A Building Block is a combination of server and storage which provides a modular approach for data management.
This building block guide illustrates how to choose the right number of deduplication nodes depending on your environment, storage requirements and the production data size. Choosing the right deduplication model will allow you to protect large amounts of data with minimal infrastructure, faster backups and better scalability.
For building block, you must choose efficient servers with fastest processors and effective memory that delivers good performance and scalability.
Before setting up building block, make sure that you plan for sufficient storage space that balances the cost, availability and performance. Sufficient storage space includes:
- Space for deduplication database
- Space for configuring disk library
DDB can be hosted on any of the following operating systems:
Important: Partitioned DDB is supported only on the x64 version of the following operating systems.
|Windows||All platforms on which Windows MediaAgent is supported, except 64-bit editions on Intel Itanium (IA 64) and Windows XP.
Supported on NTFS and ReFS.
|Linux||All platforms on which Linux MediaAgent is supported, except Power PC (Includes IBM System p).
Supported on ext3, ext4 and XFS.
|Microsoft Cluster Service (MSCS)||Clusters supported by Windows MediaAgents.
Supported on NTFS and ReFS.
|Linux Cluster||Clusters supported by Linux MediaAgents.
Supported on ext3, ext4 and XFS.
The hardware requirements for MediaAgent that hosts the DDB is explained in Hardware Specifications for MediaAgent.
Tip: For optimal backup performance, the DDB needs to be on a fast, dedicated disk formatted at 4 KB of block size. Before setting up the DDB, the storage volumes must be validated for high performance. Use a tool that measures the IOPS (Input Output Operations per second).
The following aspects need to be considered before configuring the deduplication in your environment.
Deduplication is centrally managed through storage policies. Each storage policy can maintain its own deduplication settings or can be associated to a global deduplication storage policy. Depending upon the type of data and production size you can use dedicated storage policy or global deduplication policy.
A dedicated deduplication storage policy consists of one library, one deduplication database and one or more MediaAgents. For scalability purposes, using a dedicated deduplication policy allows for the efficient movement of very large amounts of data.
Dedicated policies are recommended, when backing up large amount of data with separate data types that do not deduplicate well against each other such as database and file system data.
For more information, see Data Protection and Archiving Deduplication.
Global deduplication storage policy provides one large global deduplication database which can be shared by multiple deduplication storage policy copies. Each storage policy can manage specific content and its own retention rules. However, all participating storage policy copies share the same data paths (which consists of MediaAgents and Disk Library mount paths) and the global deduplication database.
- Client computers - subclients cannot be associated to a Global Deduplication Storage Policy. They should be associated only to standard storage policies.
- Once a storage policy copy is associated to a Global Deduplication Storage Policy, you cannot change the association.
- Multiple copies within a storage policy cannot use the same Global Deduplication Storage Policy.
Global deduplication policy is recommended:
- For data that exists in multiple remote sites and is being consolidated into a centralized data center.
- For small data size with different retention requirements.
For more information, see Global Deduplication.
Deduplication database maintains all signature hash records for a deduplication storage policy. A DDB partition that is hosted on solid-state drive (SSD) might scale up to Back-End Terabyte (BET) size of 200 TB of data residing on the disk library and 2 PB of application (backup) data, assuming a 10:1 deduplication ratio.
Also, we recommend you to locate the DDB locally on the MediaAgent. The faster the disk performance the more efficient the data protection and deduplication process will be.
The Disk Library consists of disk devices that point to the location of the disk library folders. Each disk device may have a read or write path or read only path. The read or write path is for the MediaAgent controlling the mount path to perform backup. The read only path is for the alternate MediaAgent to be able to read the data from the host MediaAgent. This is to allow for restores or auxiliary copy operations while the local MediaAgent is busy.
For deduplication backups:
- We recommend you to use dedicated disk libraries formatted at 64 KB of block size for each MediaAgent.
- Run the disk performance tool to test the performance of the read and write operation on a disk.
See Disk Performance Tool for instructions.
- Non-deduplication data should backup to a separate disk library.
- Configuring the data types into separate disk libraries allows for easier reporting on the overall deduplication savings.
If the non-deduplicated and deduplication data are written to the single library, it will skew the overall disk usage information and make space usage prediction difficult.
The disk storage should be partitioned into 2 – 8 TB LUNs and configured as mount points in the operating system. This LUN size is recommended to allow for ease of maintenance for the Disk Library. Additionally, a larger array of smaller LUNs reduces the impact of a failure of a given LUN.
When you commission a disk storage, plan for and measure its optimal performance prior to running your data protection operations. for more information, see Disk Library Volume Performance.
Note: If you have configured disk mount paths that do not support sparse files, but you want to reclaim unused disk space, then you can use the Reclaim idle space on Mount Paths with no drill hole capability option.
For example NAS mount paths.
For disk storage the mount paths can be divided into two types:
NAS paths (Disk Library over shared storage)
- This is the preferred method for a mount path configuration.
- In NAS paths the disk storage is on the network and the MediaAgent connects through a network protocol.
- If a MediaAgent goes offline, the disk library is still accessible by other MediaAgents in the library.
Direct Attached Block Storage (Disk Library over Direct Attached Storage)
- In direct attached block storage (SAN) the mount paths are locally attached to the MediaAgent.
- If a MediaAgent is lost then the disk library is offline.
- In a direct attached design, configure the mount paths as mount points instead of drive letters. This allows for larger capacity solutions to configure more mount paths than the drive letters.
- Smaller capacity sites can use drive letters as long as they do not exceed the number of available drive letters.
We recommend you to use default block size of 128 K for deduplicated storage policies for all data type backups.
Block size can be configured from the Storage Policy Properties - Advanced tab. When configuring the global deduplication policy, all other storage policy copies that are associated with the global deduplication policy must use the same block size. To modify the block size of global deduplication policy, see Modifying Global Deduplication Policy Settings for instructions.
Application read size is the size of the data read from the clients for data transfer during backup operations. By default, the application read size is set to 64 KB.
To achieve optimal rate of data transfer during DDB backups, we recommend you to modify the read size value to 256 KB.
By default, when a deduplication storage policy is configured, source-side compression is automatically enabled on the Storage Policy level. This setting will override the subclients compression settings.
When global deduplication storage policy is configured, the compression settings on the global deduplication policy will override the storage policy compression settings.
For more information, see Data Compression.
Consider the following when using SAN storage for data path configuration:
- When using SAN storage for the mount path, use Alternate Data Paths > When Resources are offline.
If a data path fails or is marked offline for maintenance, the job will failover to the next data path configured in the Data Path tab.
Although Round-Robin between Data paths will work for SAN storage it’s not recommended because of the performance penalty during DASH copies and restores. This is because of the multiple hops that have to occur in order to restore or copy the data.
Consider the following when using NAS storage for data path configuration:
- When using NAS storage for the mount path, Round Robin, between data paths is recommended. This is configured in the Copy Properties > Data Path Configuration tab of the storage policy. If using a global deduplication policy, the data path configuration is configured in each associated storage policy and not in the Global Deduplication Policy properties.
- NAS mount paths do not have the same performance penalty because the network communication is between the servicing MediaAgent and the NAS mount path directly.
For best performance and scalability when backing up the different data types (such as file system data, SQL data, and exchange data) that exceeds the suggested capacity referenced in Hardware Specifications for MediaAgent, it is best practice to have different global deduplication policy to protect different data types.
For instructions on how to setup remote office backups, see Global Deduplication.