Frequently Asked Questions
Why is there no change in the size of data on my disk after archiving?
When you delete a file from HDFS, the file is moved to the .trash directory before it is deleted permanently. Therefore, disable the Hadoop trash feature before you run an archive operation so that the files are deleted from the primary disk storage.
Can I preserve HDFS file access timestamps during restores?
Yes. You must configure the HDFS file access timestamps by setting the dfs.namenode.accesstime.precision parameter in the hdfs-site.xml configuration file.
Are there any log files that help in troubleshooting?
Following are the log file names by nodes and the information that they provide for troubleshooting:
- Master Node
- Logs tasks, data access nodes, streams, and how a task is divided. This is the beginning of each phase.
- The DMC (Distributed Master Controller) process that is run on the client machine logs into this log file.
- Logs about collect files, restart strings, and so on.
- Scan logs.
- Data Access Node
This is the controller log. It is logged by the DMC process (started with –ctrl command line) that communicates with the DMC process on the master node and receives tasks to be performed, and then distributes the tasks among workers.
- Logs about the workers and the tasks distributed to workers.
This is the worker log.
- Logs about the actual backup of collect files.