Backup Support for Azure Data Lake Storage Using the Hadoop (HDFS) Agent

To back up Azure Data Lake Storage using the Hadoop (HDFS) Agent, for each Azure Data Lake Storage, you must configure a set of Linux computers to access the ADLS. See Configuring the Linux Computer to Access the Azure Data Lake Storage using Hadoop (HDFS) Agent.

You can increase or decrease the number of computers required for each Azure Data Lake Storage based on the scale of data later. The Linux computers can be hosted on the cloud or on premises. Note that the Linux computers must have a minimum glibc version of 2.14.

To avoid extra network hops, you can also configure Linux MediaAgent computers to access the ADLS provided the Linux MediaAgent computers have a minimum glibc version of 2.14.

For each Azure Data Lake Storage, you must create a Hadoop pseudo-client from the CommCell Console. For more information about the Hadoop (HDFS) Agent, see Hadoop (HDFS) Overview. The Linux computers that you configured to access the Azure Data Lake Storage will be assigned as the master node and data access nodes for the Hadoop pseudo-client.

Unsupported Components

ACLs and timestamps of the data are not preserved
Archive and Delete option
Extent-based backups are not supported for Azure Data Lake Storage. To disable extent-based backups, see Disabling Extent-Based Backups for Azure Data Lake.

Backup Support for Azure Data Lake Storage Using the Hadoop (HDFS) Agent

Unsupported Components

Page contents