Configuring the Linux Computer to Access the Azure Data Lake Storage Using the Hadoop (HDFS) Agent

For each Azure Data Lake Storage, you must configure a set of Linux computers to access the Azure Data Lake Storage.

For more information about Azure Data Lake Storage support, see the Hadoop Azure Data Lake Support page at https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html.

Prerequisites

  • Obtain the following parameters for your Azure Data Lake Storage:

Parameter

Example

Azure ADL URI

adl://sampleuri.azuredatalakestore.net

Application ID or Client ID

80a0aa0a-a000-0aa0-a000-a0000a000000

Application or Authentication Key

aassssdgasdadsasa1aaasdfdssddasdasdasddad

OAuth 2.0 token endpoint

https://login.microsoftonline.com/aa111111-9996-9999-abcd-00aa0a00a0a0/oauth2/token

For more information about service-to-service authentication, see Service-to-service authentication with Azure Data Lake Storage using Azure Active Directory page at https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory#create-an-active-directory-application.

  • The Linux computers must have a minimum glibc version of 2.14.

Procedure

  1. Download and Install Java 8 on the Linux computer. Per Lokeskumar: we need to use Java for Hadoop and not for Commvault. This is step remains required.

    This procedure assumes the Java home path is /usr/java/default.

  2. Download and extract Hadoop software on the computer.

    Verify that the Hadoop version is higher than 3.0.0-alpha2.

  3. Edit the JAVA_HOME variable in the /hadoop303/hadoop-3.0.3/etc/hadoop/hadoop-env.sh file as follows:

    export JAVA_HOME= /usr/java/default
    
  4. Edit the HADOOP_CLASSPATH variable in the /hadoop303/hadoop-3.0.3/etc/hadoop/hadoop-env.sh file as follows:

    export HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tools/lib/*
    
  5. Add the following properties in the hadoop core-site.xml file by substituting the values with the Azure Data Lake Storage parameters that you obtained earlier:

    <configuration> <property> <name>fs.default.name</name> <value>Azure_ADL_URI</value> </property> <property> <name>dfs.adls.oauth2.access.token.provider.type</name> <value>ClientCredential</value> </property> <property> <name>dfs.adls.oauth2.refresh.url</name> <value>OAuth_2.0_token_endpoint</value> </property> <property> <name>dfs.adls.oauth2.client.id</name> <value>Application_ID_or_Client_ID</value> </property> <property> <name>dfs.adls.oauth2.credential</name> <value>Application_or_Authentication_Key</value> </property> <property> <name>fs.adl.impl</name> <value>org.apache.hadoop.fs.adl.AdlFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.adl.impl</name> <value>org.apache.hadoop.fs.adl.Adl</value> </property> </configuration>
  6. Add Hadoop bin directories to the PATH environment in your profile or bashrc file.

    export HADOOP_HOME=/hadoop303/hadoop-3.0.3
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
  7. Verify that you are able to access the Azure Data Lake Storage by using the following HDFS commands:

    hdfs dfs -ls /
    Found 3 items
    drwxrwx---+  - 4902e6ff-70a5-494a-bd19-2edc6966ae64 4902e6ff-70a5-494a-bd19-2edc6966ae64          0 2017-12-15 07:04 /cluster
    drwxrwxr-x+  - 01863d38-90b3-4c6e-aed7-a2049987545b 4902e6ff-70a5-494a-bd19-2edc6966ae64          0 2018-06-13 23:50 /my_data
    drwxrwx---+  - 01863d38-90b3-4c6e-aed7-a2049987545b 4902e6ff-70a5-494a-bd19-2edc6966ae64          0 2018-06-13 23:57 /restore
    

Page contents

×

Loading...