Loading...

Getting Started with Hadoop

Step 1: Review Requirements and Supported Features

Review the following requirements and supported features for Hadoop:

Step 2: Install the Hadoop Package on Data Access Nodes

Review each of the following topics to prepare for the installation and to select the best installation method for your environment.

If you want to also use the data access nodes as MediaAgents, then install the MediaAgent package on the nodes. For instructions, see Installing the MediaAgent.

Notes:

  • All data access nodes and the master node that you include in backups must share the same Job Results directory. For information about changing the path of the Job Results directory, see Changing the Path of the Job Results Directory.
  • All of the participating data access nodes and the master node must be at the same service pack level.
  • All of the participating data access nodes and the master node must be time synced.

Step 3: Configure All Nodes that You Want to Use for Backup and Restore Operations as HDFS Clients on the Hadoop Cluster, so that the Nodes have Access to the HDFS File System.

Verify that you are able to run the following commands correctly without any errors:

hdfs dfs –ls /
hadoop classpath --glob

Note: Verify that the Hadoop bin path is correctly set in the environment for the root user and you are able to run the above commands successfully as the root user. (Start the Commvault services from the same environment.)

Step 4: In Secure Hadoop Environments, Provide the Keytab File Location in the Configuration File on Data Access Nodes

For Kerberos authentication, a keytab file is used to authenticate to the Key Distribution Center (KDC). Add the keytab file location as a property in the hdfs-site.xml configuration file on all data access nodes, including the master node. The hdfs-site.xml file is located under the hadoop_installation_directory/conf/ directory.

Example:

<property>
<name>hadoop.user.keytab.file</name>
<value>/etc/krb5.keytab</value>
</property>

Step 5: Prepare for Your First Backup and Restore

  1. Open the CommCell Console.
  2. Configure a storage device.
  3. Create a pseudo-client for the Hadoop cluster.
  4. Create an appropriate subclient based on your requirement.

    Requirement

    More Information

    Back up Hadoop data

    Creating a User-Defined Subclient for Backups

    Archive Hadoop data

    Creating a User-Defined Subclient for Archiving

  5. Decide whether you want the following additional functionality:

Step 6: Run Your First Backup and Restore

Step 7: What to Do Next

Configure data retention and data aging. For more information, see Data Aging - Getting Started.

Last modified: 3/2/2018 5:47:25 PM