For each Azure Data Lake Storage, you must configure a set of Linux computers to access the Azure Data Lake Storage.
For more information about Azure Data Lake Storage support, see the Hadoop Azure Data Lake Support page at https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html.
Prerequisites
- Obtain the following parameters for your Azure Data Lake Storage:
|
Parameter |
Example |
|---|---|
|
Azure ADL URI |
adl://sampleuri.azuredatalakestore.net |
|
Application ID or Client ID |
80a0aa0a-a000-0aa0-a000-a0000a000000 |
|
Application or Authentication Key |
aassssdgasdadsasa1aaasdfdssddasdasdasddad |
|
OAuth 2.0 token endpoint |
https://login.microsoftonline.com/aa111111-9996-9999-abcd-00aa0a00a0a0/oauth2/token |
For more information about service-to-service authentication, see Service-to-service authentication with Azure Data Lake Storage using Azure Active Directory page at https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory#create-an-active-directory-application.
- The Linux computers must have a minimum glibc version of 2.14.
Procedure
-
Download and Install Java 8 on the Linux computer. Per Lokeskumar: we need to use Java for Hadoop and not for Commvault. This is step remains required.
This procedure assumes the Java home path is /usr/java/default.
-
Download and extract Hadoop software on the computer.
Verify that the Hadoop version is higher than 3.0.0-alpha2.
-
Edit the JAVA_HOME variable in the /hadoop303/hadoop-3.0.3/etc/hadoop/hadoop-env.sh file as follows:
export JAVA_HOME= /usr/java/default -
Edit the HADOOP_CLASSPATH variable in the /hadoop303/hadoop-3.0.3/etc/hadoop/hadoop-env.sh file as follows:
export HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tools/lib/* -
Add the following properties in the hadoop core-site.xml file by substituting the values with the Azure Data Lake Storage parameters that you obtained earlier:
<configuration> <property> <name>fs.default.name</name> <value>Azure_ADL_URI</value> </property> <property> <name>dfs.adls.oauth2.access.token.provider.type</name> <value>ClientCredential</value> </property> <property> <name>dfs.adls.oauth2.refresh.url</name> <value>OAuth_2.0_token_endpoint</value> </property> <property> <name>dfs.adls.oauth2.client.id</name> <value>Application_ID_or_Client_ID</value> </property> <property> <name>dfs.adls.oauth2.credential</name> <value>Application_or_Authentication_Key</value> </property> <property> <name>fs.adl.impl</name> <value>org.apache.hadoop.fs.adl.AdlFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.adl.impl</name> <value>org.apache.hadoop.fs.adl.Adl</value> </property> </configuration> -
Add Hadoop bin directories to the PATH environment in your profile or bashrc file.
export HADOOP_HOME=/hadoop303/hadoop-3.0.3 export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin -
Verify that you are able to access the Azure Data Lake Storage by using the following HDFS commands:
hdfs dfs -ls / Found 3 items drwxrwx---+ - 4902e6ff-70a5-494a-bd19-2edc6966ae64 4902e6ff-70a5-494a-bd19-2edc6966ae64 0 2017-12-15 07:04 /cluster drwxrwxr-x+ - 01863d38-90b3-4c6e-aed7-a2049987545b 4902e6ff-70a5-494a-bd19-2edc6966ae64 0 2018-06-13 23:50 /my_data drwxrwx---+ - 01863d38-90b3-4c6e-aed7-a2049987545b 4902e6ff-70a5-494a-bd19-2edc6966ae64 0 2018-06-13 23:57 /restore