Loading...

HyperScale Storage Pool - Resiliency - Best Practices

Best Practices cover the areas of data integrity, security, resilience, performance and efficiency. Commvault’s HyperScale software comes standard with data verification and encryption to ensure protection against bit rot and un-authorized access of data.

Resiliency is typically delivered through redundancies. The thinking being that even the best of components is bound to fail. However, there is a very low probability of a second component failure of the same type within a short period of the first failure. In hardware, redundancy usually translates to a replica of the same component, whereas this is not necessarily true in software. Commvault’s HyperScale architecture leverages efficient methods of providing redundancies. For example, erasure coding can provide equal or greater levels of resiliency against disk and node failures for much less disk overhead than data replication. Network bandwidth requirement is also expected to be lower when erasure coding is used versus replicated volumes.

Each HyperScale node has a private cluster VLAN and a public VLAN. It is recommended to have both these VLAN’s on minimum of 10g of bandwidth. The private cluster VLAN deals with inter-node traffic while the public VLAN is used for communication with the CommServe and client computers. To ensure resilience, available network interfaces on each node are bonded. Each node in the scale-out Commvault HyperScale reference architecture is configured with sufficient processing and memory to support all required functionality while also providing for storage capacity growth without affecting performance.

From a setup perspective, Commvault HyperScale does not require a RAID controller as the data disks are treated as JBOD. However, the root disk, when not mirrored, presents a single point of failure (SPOF) for the node. While the HyperScale cluster will continue to run in the event of a failure of a single root disk, the failure does lead to a node down situation, which can trigger a re-balancing of data. Therefore, Commvault recommends the use of mirrored root disks. As a general rule, it is best to mirror, RAID or erasure code at the disk level and bond network interfaces to prevent a situation similar to a node down condition. This is because an un-available node is expensive from a data management perspective.

HyperScale software has necessary resiliency to continue operating without impact when disk, node and network components fail.

The following shutdown/reboot procedure/limit is recommended:

Erasure Code

Block Size

(Nodes / Block)

Resilience Limit

Comments / Description

(4 + 2)

3 Nodes

1 node per block

Shutdown/reboot only one node at a time. Ensure Commvault services are up on the rebooted node before rebooting the next node in the block.

6 Nodes

Up to 2 nodes per block

Can shutdown/reboot up to two nodes at a time. Ensure Commvault services are up on the rebooted nodes before rebooting the next node in the block.

(8 + 4)

3 Nodes

1 node per block

Shutdown/reboot only one node at a time. Ensure Commvault services are up on the rebooted node before rebooting the next node in the block.

6 Nodes

Up to 2 nodes per block

Can shutdown/reboot up to two nodes at a time. Ensure Commvault services are up on the rebooted nodes before rebooting the next node in the block.

12 Nodes

Up to 4 nodes per block

Can shutdown/reboot up to four nodes at a time. Ensure Commvault services are up on the rebooted nodes before rebooting the next node in the block

Last modified: 7/31/2019 8:39:09 PM