Getting Started with Hadoop
Step 1: Review Requirements and Supported Features
Review the following requirements and supported features for Hadoop:
Step 2: Install the Hadoop Package on Data Access Nodes
Review each of the following topics to prepare for the installation and to select the best installation method for your environment.
- Prepare the Installation on UNIX, Linux, and Macintosh Computers
- Preinstallation Checklist for Hadoop on Linux
- Installation Methods
Note: When you upgrade the existing data access nodes (and the master node) using push installation, the Hadoop package is installed automatically on all the nodes.
If you want to also use the data access nodes as MediaAgents, then install the MediaAgent package on the nodes. For instructions, see Installing the MediaAgent.
- All data access nodes and the master node that you include in backups must share the same Job Results directory. For information about changing the path of the Job Results directory, see Changing the Path of the Job Results Directory.
- All of the participating data access nodes and the master node must be at the same service pack level.
- All of the participating data access nodes and the master node must be time synced.
Step 3: Configure All Nodes that You Want to Use for Backup and Restore Operations as HDFS Clients on the Hadoop Cluster, so that the Nodes have Access to the HDFS File System.
Verify that you are able to run the following commands correctly without any errors:
hdfs dfs –ls /
hadoop classpath --glob
Note: Verify that the Hadoop bin path is correctly set in the environment for the root user and you are able to run the above commands successfully as the root user. (Start the Commvault services from the same environment.)
Step 4: In Secure Hadoop Environments, Provide the Keytab File Location in the Configuration File on Data Access Nodes
For Kerberos authentication, a keytab file is used to authenticate to the Key Distribution Center (KDC). Add the keytab file location as a property in the hdfs-site.xml configuration file on all data access nodes, including the master node. The hdfs-site.xml file is located under the hadoop_installation_directory/conf/ directory.
Step 5: Prepare for Your First Backup and Restore
- Open the CommCell Console.
- Configure a storage device.
- Create a pseudo-client for the Hadoop cluster.
- Create an appropriate subclient based on your requirement.
Back up Hadoop data
Archive Hadoop data
- Decide whether you want the following additional functionality:
- You want to be notified of events that require attention. For more information, see Alerts and Notifications - Overview.
- You want to report on and analyze critical data about your CommCell operations.
For more information about reports, see Reports Overview.
- You want to manage security.
For more information, see User Account and Password Management - Getting Started.
Step 6: Run Your First Backup and Restore
Step 7: What to Do Next
Configure data retention and data aging. For more information, see Data Aging - Getting Started.
Last modified: 3/2/2018 5:47:25 PM