Configuring the Linux Computer to Access the Azure Data Lake Storage Using the Hadoop (HDFS) Agent

For each Azure Data Lake Storage, you must configure a set of Linux computers to access the Azure Data Lake Storage.

For more information about Azure Data Lake Storage support, see the Hadoop Azure Data Lake Support page at https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html.

Prerequisites

Obtain the following parameters for your Azure Data Lake Storage:

Parameter	Example
Azure ADL URI	adl://sampleuri.azuredatalakestore.net
Application ID or Client ID	80a0aa0a-a000-0aa0-a000-a0000a000000
Application or Authentication Key	aassssdgasdadsasa1aaasdfdssddasdasdasddad
OAuth 2.0 token endpoint	https://login.microsoftonline.com/aa111111-9996-9999-abcd-00aa0a00a0a0/oauth2/token

For more information about service-to-service authentication, see Service-to-service authentication with Azure Data Lake Storage using Azure Active Directory page at https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory#create-an-active-directory-application.

The Linux computers must have a minimum glibc version of 2.14.

Procedure

Download and Install Java 8 on the Linux computer. Per Lokeskumar: we need to use Java for Hadoop and not for Commvault. This is step remains required.

This procedure assumes the Java home path is /usr/java/default.
Download and extract Hadoop software on the computer.

Verify that the Hadoop version is higher than 3.0.0-alpha2.
Edit the JAVA_HOME variable in the /hadoop303/hadoop-3.0.3/etc/hadoop/hadoop-env.sh file as follows:
```
export JAVA_HOME= /usr/java/default
```
Edit the HADOOP_CLASSPATH variable in the /hadoop303/hadoop-3.0.3/etc/hadoop/hadoop-env.sh file as follows:
```
export HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tools/lib/*
```

Add the following properties in the hadoop core-site.xml file by substituting the values with the Azure Data Lake Storage parameters that you obtained earlier:

 
<configuration> <property> <name>fs.default.name</name> <value>Azure_ADL_URI</value> </property> <property> <name>dfs.adls.oauth2.access.token.provider.type</name> <value>ClientCredential</value> </property> <property> <name>dfs.adls.oauth2.refresh.url</name> <value>OAuth_2.0_token_endpoint</value> </property> <property> <name>dfs.adls.oauth2.client.id</name> <value>Application_ID_or_Client_ID</value> </property> <property> <name>dfs.adls.oauth2.credential</name> <value>Application_or_Authentication_Key</value> </property> <property> <name>fs.adl.impl</name> <value>org.apache.hadoop.fs.adl.AdlFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.adl.impl</name> <value>org.apache.hadoop.fs.adl.Adl</value> </property>
</configuration>

Add Hadoop bin directories to the PATH environment in your profile or bashrc file.

export HADOOP_HOME=/hadoop303/hadoop-3.0.3
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Verify that you are able to access the Azure Data Lake Storage by using the following HDFS commands:

hdfs dfs -ls /
Found 3 items
drwxrwx---+  - 4902e6ff-70a5-494a-bd19-2edc6966ae64 4902e6ff-70a5-494a-bd19-2edc6966ae64          0 2017-12-15 07:04 /cluster
drwxrwxr-x+  - 01863d38-90b3-4c6e-aed7-a2049987545b 4902e6ff-70a5-494a-bd19-2edc6966ae64          0 2018-06-13 23:50 /my_data
drwxrwx---+  - 01863d38-90b3-4c6e-aed7-a2049987545b 4902e6ff-70a5-494a-bd19-2edc6966ae64          0 2018-06-13 23:57 /restore

Configuring the Linux Computer to Access the Azure Data Lake Storage Using the Hadoop (HDFS) Agent

Prerequisites

Procedure

Page contents