GlusterFS

GlusterFS is a scalable network file system suitable for data-intensive tasks such as cloud storage and media streaming.

A traditional recursive scan operation on scalable file systems such as GlusterFS takes a considerable amount of time.

The Commvault software provides the integrated approach that you need to back up GlusterFS volumes by using the GlusterFind API. The GlusterFind API is used to retrieve a full or incremental list of files and directories that are present or modified on a given GlusterFS volume.

Install the Commvault software on one or more GlusterFS Client nodes. These nodes are referred to as data access nodes. The distributed backup and restore operations run on the data access nodes.

Key Features

Simplified Data Management

Management of all the GlusterFS data in your environment using the same console and infrastructure.

GlusterFS Scan Optimization using GlusterFind API

Support for GlusterFS scan optimization using GlusterFind API.

Performance Comparison

The following table compares performance metrics between GlusterFind scan and recursive scan on Linux.

Job Type	Number of Files	GlusterFind Scan Time	Recursive Scan Time
Incremental backup job	32,000,039 existing files + 106,695 new files added	01:43:12	29:30:11

Note

Commvault runs the GlusterFind full query for full and incremental backup jobs which synchronize the data on the source volume and index. Therefore, the GlusterFind full query takes longer to complete.

Distributed Backup and Restores

Distributed backup and restores, which run in parallel on multiple data access nodes, for optimal sharing of the backup load.

Fault-Tolerant Model

Fault-tolerant model that redistributes task loads when a data access node fails.

LAN-Free Backup and Restores

Faster LAN-free backup and restores using a grid storage policy.

Reports

A variety of reports are automatically provided for managing the GlusterFS data. You can access Reports from the Web Console, the Cloud Services site, or the CommCell Console.

Terminology

The GlusterFS documentation uses the following terminology:

Pseudo-client	The logical entity that represents one or more GlusterFS clusters.
Instance	The entity that represents one GlusterFS cluster.
Subclient	The logical entity that defines the data to be backed up.