Troubleshooting for Cluster Configuration

Updated

This document provides information and resolutions for the following troubleshooting scenarios:

  • Backup fails during index restore due to MediaAgent failover

  • Cluster configuration fails for one or more remote cluster nodes

  • Executable application errors during failover

  • Removal of cluster configuration from Client Completed with Errors

  • Services stop running after a failover on Linux clusters

Backup fails during index restore due to MediaAgent failover

A backup may fail if the MediaAgent enters the failover state while the scan phase is restoring an index from media. The reason for job failure will state that the index restore was not able to complete due to failover.

In such a situation, terminate the job and perform another backup.

Cluster configuration fails for one or more remote cluster nodes

Symptom

After configuring the cluster group client, a message is displayed indicating that some remote cluster nodes failed to be configured with the cluster group client.

config_error_1

Cause

The cluster group configuration may fail to update a remote cluster node if the cluster node presents the following issues:

  • Network related problems (node not reachable)

  • Commvault services not running

Resolution

In the cluster group client properties dialog box, you must associate the remote cluster nodes to the cluster group.

Use the following steps to force the cluster configuration on the remote cluster nodes. Note that this operation creates an update request for the client in the CommServe database.

  1. From the CommCell Browser, go to the Client Computers node, right-click the Cluster_Group_Client and then click Properties.

    The Client Computer Properties dialog box appears.

  2. Click Advanced.

    The Advanced Client Properties dialog box appears.

  3. On the Cluster Group Configuration tab, click the Force Sync configuration on remote nodes check box and then click OK.

Executable application errors during failover

Executable application errors have been observed on the originating active node during a failover. Once the new node takes over, the jobs will continue and complete. On rare occasions, an archiveindex.exe application error may corrupt the index and the backup cannot recover. In such a situation, kill the job and perform another backup.

Removal of cluster configuration from Client Completed with Errors

When a client computer is removed from the cluster group, all cluster settings are removed from the CommServe and client computer. If the CommServe fails to remove the cluster settings from the client, use the following steps to resolve this issue:

  1. Add the client back to the cluster group.

  2. Resolve the issues in the client that caused the failure. Check the failure reason stated in the error message received during the cluster configuration update. For example, <Connection to remote machine [machine_host_name] refused. Please check that the services are running on the remote machine.>

  3. Remove the client again from the cluster group.

Services stop running after a failover on Linux clusters

During a failover on Linux clusters, services on the node that takes over may be killed by cluster services. To ensure the services are restarted on the new node, add commands to start services to failover scripts.