GlusterFS is a scalable network file system suitable for data-intensive tasks such as cloud storage and media streaming.
A traditional recursive scan operation on scalable file systems such as GlusterFS takes a considerable amount of time.
The Commvault software provides the integrated approach that you need to back up GlusterFS volumes by using the GlusterFind API. The GlusterFind API is used to retrieve a full or incremental list of files and directories that are present or modified on a given GlusterFS volume.
Install the Commvault software on one or more GlusterFS Client nodes. These nodes are referred to as data access nodes. The distributed backup and restore operations run on the data access nodes.
Key Features
Simplified Data Management
Management of all the GlusterFS data in your environment using the same console and infrastructure.
GlusterFS Scan Optimization using GlusterFind API
Support for GlusterFS scan optimization using GlusterFind API.
Performance Comparison
The following table compares performance metrics between GlusterFind scan and recursive scan on Linux.
Job Type |
Number of Files |
GlusterFind Scan Time |
Recursive Scan Time |
---|---|---|---|
Incremental backup job |
32,000,039 existing files + 106,695 new files added |
01:43:12 |
29:30:11 |
Note
Commvault runs the GlusterFind full query for full and incremental backup jobs which synchronize the data on the source volume and index. Therefore, the GlusterFind full query takes longer to complete.
Distributed Backup and Restores
Distributed backup and restores, which run in parallel on multiple data access nodes, for optimal sharing of the backup load.
Fault-Tolerant Model
Fault-tolerant model that redistributes task loads when a data access node fails.
LAN-Free Backup and Restores
Faster LAN-free backup and restores using a grid storage policy.
Reports
A variety of reports are automatically provided for managing the GlusterFS data. You can access Reports from the Web Console, the Cloud Services site, or the CommCell Console.
Terminology
The GlusterFS documentation uses the following terminology:
Pseudo-client |
The logical entity that represents one or more GlusterFS clusters. |
---|---|
Instance |
The entity that represents one GlusterFS cluster. |
Subclient |
The logical entity that defines the data to be backed up. |