Getting Started with Hadoop

Step 1: Review the following requirements and supported features.

Step 2: Install the Hadoop Package on Data Access Nodes

Review each of the following topics to prepare for the installation and to select the best installation method for your environment.

Prepare the Installation on UNIX, Linux, and Macintosh Computers
Preinstallation Checklist for Hadoop on Linux
Installation Methods

Note

When you upgrade the existing data access nodes (and the master node) using push installation, the Hadoop package is installed automatically on all the nodes.

If you want to also use the data access nodes as MediaAgents, then install the MediaAgent package on the nodes. For instructions, see Installing the MediaAgent.

Notes:

All of the participating data access nodes and the master node must be at the same service pack level.
All of the participating data access nodes and the master node must be time synced.

Step 3: Configure All Nodes that You Want to Use for Backup and Restore Operations as HDFS Clients on the Hadoop Cluster, so that the Nodes have Access to the HDFS File System.

Verify that you are able to run the following commands correctly without any errors:

hdfs dfs –ls /
 hadoop classpath --glob

Note: Verify that the Hadoop bin path is correctly set in the environment for the root user and you are able to run the above commands successfully as the root user. (Start the Commvault services from the same environment.)

Step 4: In Secure Hadoop Environments, Provide the Keytab File Location in the Configuration File on Data Access Nodes

For Kerberos authentication, a keytab file is used to authenticate to the Key Distribution Center (KDC). Add the keytab file location as a property in the hdfs-site.xml configuration file on all data access nodes, including the master node. The hdfs-site.xml file is located under the hadoop_installation_directory/conf/ directory.

Example:

 
<property>
<name>hadoop.user.keytab.file</name>
<value>/etc/krb5.keytab</value>
</property>

To change the default path of the keytab file, see Changing the Default Path of the Keytab File.

Step 5: Prepare for Your First Backup and Restore

Open the CommCell Console.
Configure a storage device.
- To configure a disk library, see Disk Libraries - Getting Started.
- To configure a tape library, see Tape Libraries - Getting Started.
Create a pseudo-client for the Hadoop cluster.
Create an appropriate subclient based on your requirement.

Requirement

More Information

Back up Hadoop data

Creating a User-Defined Subclient for Backups

Archive Hadoop data

Creating a User-Defined Subclient for Archiving
Decide whether you want the following additional functionality:
- You want to be notified of events that require attention. For more information, see Alerts and Notifications - Overview.
- You want to report on and analyze critical data about your CommCell operations.
  
  For more information about reports, see Reports Overview.
- You want to manage security.
  
  For more information, see User Account and Password Management - Getting Started.

Requirement	More Information
Back up Hadoop data	Creating a User-Defined Subclient for Backups
Archive Hadoop data	Creating a User-Defined Subclient for Archiving

Step 6: Run Your First Backup and Restore

Step 7: What to Do Next

Configure data retention and data aging. For more information, see Data Aging - Getting Started.