The Commvault software provides the integrated approach that you need to back up and archive HDFS (Hadoop Distributed File System) data.
You install the Commvault software on a Hadoop DataNode or a Hadoop Client Node. These nodes are referred to as data access nodes.
When you configure Hadoop, you specify one data access node as a master node. The master node must always be available. The master node is a control client that distributes the backup and restore operations among the data access nodes.
During backup and restore operations, communication that is related to the file system namespace operations between the data access nodes and the Hadoop cluster occurs through the Hadoop NameNode. The actual data transfer occurs between the Hadoop DataNodes and the data access nodes.
For Azure HDInsight cluster with backend storage as Azure Blob Storage, we recommend using Cloud Apps to back up the Azure Blob Storage directly. For more information about Azure Blob Storage, see Overview: Azure Blob Storage.
Simplified Data Management
Management of all the Hadoop data in your environment using the same console and infrastructure.
Distributed Backup and Restores
Distributed backup and restores, which run in parallel on multiple data access nodes, for optimal sharing of the backup load.
Fault-tolerant model that redistributes task loads when a data access node fails.
Archive and delete inactive Hadoop data from the primary disk storage based on user-defined policy.
Diverse Backup Types
Support for full, incremental, and synthetic full backup types for archive subclients.
LAN-Free Backup and Restores
Faster LAN-free backup and restores using a grid storage policy.
Restores to Big Data Application Targets
Support for restoring Hadoop data to a big data application target (any other file system).
Support for multiple file versions that allows selecting a specific version of a file for restore.
Support for recovering data lost due to file deletion or corruption.
A variety of reports are automatically provided for managing the Hadoop data. You can access Reports from the Web Console, the Cloud Services site, or the CommCell Console.
The Hadoop documentation uses the following terminology:
The logical entity that represents one or more Hadoop clusters.
The entity that represents one Hadoop cluster.
The logical entity that defines the data to be backed up or archived.
Last modified: 4/16/2019 9:22:59 PM