Frequently Asked Questions

Updated

How do backup and restore operations work if I have Ranger KMS encryption enabled on Hadoop data?

Commvault software uses the Hadoop native library to connect to Hadoop, and then to read or write data during backup and restore operations.

For backup operations, the Hadoop native library provides a single copy of the data in a decrypted format. During restore operations, when Commvault writes data into a secure zone using Hadoop native library, the data is encrypted automatically and stored. Similarly, key rollovers are handled by Hadoop automatically. To perform read or write operations in secure zones, ensure that the Hadoop backup user has the DECRYPT_EEK privilege on all of the keys.

Why is there no change in the size of data on my disk after archiving?

When you delete a file from HDFS, the file is moved to the .trash directory before it is deleted permanently. Therefore, disable the Hadoop trash feature before you run an archive operation so that the files are deleted from the primary disk storage.

Can I preserve HDFS file access timestamps during restores?

Yes. You must configure the HDFS file access timestamps by setting the dfs.namenode.accesstime.precision parameter in the hdfs-site.xml configuration file.

Example

<property>
 <name>dfs.namenode.accesstime.precision</name>
 <value>3600000</value>
 </property>
    

Are there any log files that help in troubleshooting?

Following are the log file names by nodes and the information that they provide for troubleshooting:

  • Master Node

    1. Hadoop.log

      • Logs tasks, data access nodes, streams, and how a task is divided. This is the beginning of each phase.

      • The DMC (Distributed Master Controller) process that is run on the client machine logs into this log file.

      • Logs about collect files, restart strings, and so on.

    2. FileScan.log

      • Scan logs.
  • Data Access Node

    1. HadoopCtrl.log

      This is the controller log. It is logged by the DMC process (started with –ctrl command line) that communicates with the DMC process on the master node and receives tasks to be performed, and then distributes the tasks among workers.

      • Logs about the workers and the tasks distributed to workers.
    2. clBackupChild.log

      This is the worker log.

      • Logs about the actual backup of collect files.