Loading...

Hadoop (HDFS)

The Commvault software provides the integrated approach that you need to back up and archive HDFS (Hadoop Distributed File System) data.

You install the Commvault software on a Hadoop DataNode or a Hadoop Client Node. These nodes are referred to as data access nodes.

When you configure Hadoop, you specify one data access node as a master node. The master node must always be available. The master node is a control client that distributes the backup and restore operations among the data access nodes.

During backup and restore operations, communication that is related to the file system namespace operations between the data access nodes and the Hadoop cluster occurs through the Hadoop NameNode. The actual data transfer occurs between the Hadoop DataNodes and the data access nodes. 

For Azure HDInsight cluster with backend storage as Azure Blob Storage, we recommend using Cloud Apps to back up the Azure Blob Storage directly. For more information about Azure Blob Storage, see Overview: Azure Blob Storage.

Key Features

Simplified Data Management

Management of all the Hadoop data in your environment using the same console and infrastructure.

Distributed Backup and Restores

Distributed backup and restores, which run in parallel on multiple data access nodes, for optimal sharing of the backup load.

Fault-Tolerant Model

Fault-tolerant model that redistributes task loads when a data access node fails.

Data Archiving

Archive and delete inactive Hadoop data from the primary disk storage based on user-defined policy.

Diverse Backup Types

Support for full, incremental, and synthetic full backup types for archive subclients.

LAN-Free Backup and Restores

Faster LAN-free backup and restores using a grid storage policy.

Restores to Big Data Application Targets

Support for restoring Hadoop data to a big data application target (any other file system).

Versioning

Support for multiple file versions that allows selecting a specific version of a file for restore.

Data Recovery

Support for recovering data lost due to file deletion or corruption.

Reports

A variety of reports are automatically provided for managing the Hadoop data. You can access Reports from the Web Console, the Cloud Services site, or the CommCell Console.

Terminology

The Hadoop documentation uses the following terminology:

Pseudo-client

The logical entity that represents one or more Hadoop clusters.

Instance

The entity that represents one Hadoop cluster.

Subclient

The logical entity that defines the data to be backed up or archived.

Last modified: 4/16/2019 9:22:59 PM