Creating a User-Defined Subclient to Archive Hadoop Data

You can create a user-defined subclient to manage and archive specific data.

Before You Begin

You can use wildcards to define the subclient content. For more information, see Wildcards for the UNIX File System Agent.

Procedure

From the CommCell Browser, expand Client Computers > pseudo-client > Big Data Apps.
Right-click the instance that you want to create a subclient for, point to All Tasks, and then click New Subclient.

The Subclient Properties dialog box appears.
Specify the basic settings for the subclient:
1. In the Subclient Name box, type a name.
2. On the Data Access Nodes tab, select the data access nodes that you want to add to the subclient, and then click Add.
3. On the Content tab, click Browse to select the directory or files that you want to archive, and then click Add.
  
  Note: The default subclient does not back up the content that you specify in user-defined subclients that are within the same instance.
4. Select the Only backup files that qualify for archiving check box.
Click Advanced.

The Advanced Subclient Properties dialog box appears.
On the Retention tab, specify the retention time period:
1. Select the Extend storage policy retention check box.
  
  Restrictions:
  - After you select this option, you cannot disable Commvault OnePass for this subclient.
  - After you select this option, you can perform only incremental and synthetic full archive or backup operations for your subclient.
2. Select the Archiver retention and Backup retention check boxes, and then set the time period for which you want to retain the archived items. For more information on retention options, see Retention Options for Archiving.
To configure multiple streams for archiving, on the Performance tab, specify the number of data streams:
1. In the Number of Data Readers box, enter the number of data streams.
  
  Notes:
  - For optimal sharing of the backup load, the number of data readers must be greater than the number of data access nodes.
  - The number of streams configured in the storage policy must be equal to or greater than the value entered in the Number of Data Readers box.
2. Select the Allow multiple data readers within a drive or mount point check box.
3. Click OK.
On the Disk Cleanup tab, select the Enable Archiving with these rules check box to archive your subclient content.

To archive your files, set the rules based on the file access time, modified time, or file size. For more information about archiving rules, see Archiving Rules for Hadoop Data.
Under After Archiving, click Delete the file.

The archiving job will fail if you do not select this option.
On the Storage Device tab, select a storage policy from the Storage Policy list.
To create a new storage policy, click Create Storage Policy, and then follow the instructions in the storage policy creation wizard.
To perform LAN-free backup and restores, select a grid storage policy.

For more information, see GridStor^® (Alternate Data Paths) - Overview.
Optional: Select the subclient options.

Checking Redundancy for Files that Qualify for Archiving
Setting Up Pre-processes and Post-processes

Important: The pre-process and post-process scripts must be present on the master node.
- On the Pre/Post Process tab:
  - In the PreBackup Process box, type the full path name for the script.
  - In the PostBackup Process box, type the full path name for the script.
  - To run the post backup process regardless of the job's outcome, select the Run Post Process for all attempts check box.
Click OK.

A subclient with the content that you want to archive is created under the instance that you selected.