To back up Azure Data Lake Storage using the Hadoop (HDFS) Agent, for each Azure Data Lake Storage, you must configure a set of Linux computers to access the ADLS. See Configuring the Linux Computer to Access the Azure Data Lake Storage using Hadoop (HDFS) Agent.
You can increase or decrease the number of computers required for each Azure Data Lake Storage based on the scale of data later. The Linux computers can be hosted on the cloud or on premises. Note that the Linux computers must have a minimum glibc version of 2.14.
To avoid extra network hops, you can also configure Linux MediaAgent computers to access the ADLS provided the Linux MediaAgent computers have a minimum glibc version of 2.14.
For each Azure Data Lake Storage, you must create a Hadoop pseudo-client from the CommCell Console. For more information about the Hadoop (HDFS) Agent, see Hadoop (HDFS) Overview. The Linux computers that you configured to access the Azure Data Lake Storage will be assigned as the master node and data access nodes for the Hadoop pseudo-client.
Unsupported Components
-
ACLs and timestamps of the data are not preserved
-
Archive and Delete option
-
Extent-based backups are not supported for Azure Data Lake Storage. To disable extent-based backups, see Disabling Extent-Based Backups for Azure Data Lake.