Optimizing Synthetic Full Backups in Deduplication Appliances
Performance of a deduplication appliance, such as the EMC Data Domain Global Deduplication Array (GDA), is bound by stream or thread performance. The synthetic full operation requires a concurrent read/write operation, which can affect device performance.
These steps can optimize deduplication device performance:
- Stagger your backup schedule so as to minimize the number of simultaneous active streams targeted to the device.
- If the synthetic full operation takes too long to complete, you can instead perform a Full backup followed by daily incremental backup jobs.
- Longer backup cycles performed in the appliance may result in degraded performance as the cycle progresses, due to the large volume of deduplicated data on the appliance.
Synthetic Full Backups onto Tape Media
When using tape libraries with synthetic full backups:
- Keep at least two media drives available for the same storage policy when the backup job starts. Synthetic full backups do not work for subclients where the storage policy is associated with a single, stand-alone tape drive.
The synthetic full backup process reads and writes data concurrently. It reads data from one media (tape), then immediately writes the result to the new active media (tape) within the same media group. This flow requires two separate media.
- Have only one synthetic full job running to the tape drive for synthetic full backups with multiple streams, where the number of streams is more than the number of drives on the tape.
Recommendations for Scheduling Synthetic Full Backups
You can schedule a synthetic full backup as often as you need to, but we strongly recommend against performing frequent synthetic full backup jobs, especially when using deduplicated data. For deduplicated data, each block of data retained in protected storage has a record in the Deduplication Database (DDB). Full backup jobs and synthetic full backup jobs create new DDB records for each data block. Since a DDB has a max limit on the number of records, performing frequent full backup jobs or synthetic full backup jobs will fill up the DDB quicker.
Several factors need to be considered when determining frequency of synthetic full backup jobs:
- Space management: A synthetic full backup affects data retention. If you have a large number/volume of deleted or archived objects being retained you may want to run a synthetic full backup to free up protected disk space. You don’t want to run out of free disk space in protected storage because backups will stop.
- Resource management: Consider whether sufficient source or network resources are available to run a full backup job.
- Compliance: Legal compliance requirements of data retention may affect your decision. You may want to run a synthetic full backup so that you don’t retain objects longer than legally required.
- Scalability: Consider the number of objects in a synthetic full backup job. Frequent synthetic full backup jobs for a large number of objects may fill up the DDB too quickly.
- Restore performance (this pertains to tape media only): Full backup or synthetic full backup jobs consolidate data and can reduce load/seek time.
Our recommendation is to schedule synthetic full backup jobs using the Automatic Schedule option, with the day's frequency set the same or higher than the retention days of that data. For example; if your retention is set for 30 days, you should set the automatic scheduling day frequency value to 30 days or more. For data retention greater than 180 days, we recommend using 180 days for automatic scheduling.
Synthetic full backup jobs that are run within less than 15 days should only be done for compliance reasons or in cases in which strict data retention requirements make it relevant.
Synthetic full backup jobs that are run more frequently than the recommendations made here are consider excessive. Excessive synthetic full jobs can be seen in the Health Report, which provides information on the overall wellness of the CommCell components.
Last modified: 2/16/2018 7:11:19 PM