Recovering From Stuck Failovers

Symptoms

When one or more of the following situations occur:

  • The failover progress does not report any new activity in the Process Manager associated with the SQL Server client, in the Failover Assistant tab.

  • Both the production and standby CommServe hosts are down and the Node Information in the Failover Assistant tab appears red or blank.

  • Production CommServe host is functional but the Process Manager associated with the SQL Server client does not display any information in the Failover Assistant tab.

  • The failover is interrupted, and the subsequent rollback operation also fails. Services are not started in the production CommServe host.

  • Force a failover to a standby CommServe host, when an unplanned failover fails.

    Caution

    Use extreme caution while forcing a failover to a standby CommServe host. Data loss can be experienced with this process as the SQL Replication job will not run automatically before performing the failover. Use this option as a last resort and only when all the other options do not work.

Resolution - Windows

  1. Navigate to the following folder associated with the SQL client:

    \Program Files\Commvault\ContentStore2\Base

    Note

    ContentStore2 corresponds to the installation folder associated with the SQL client or Instance002. If you have installed the SQL client in a different instance, navigate to the corresponding folder.

  2. Run the following command:

    CvFailover -OpType ResetFailoverConfig -FailoverType Production -TargetNode ProductionSQLClient
  3. If failover operation does not succeed or fail and still in progress, run the following command:

    CvFailover -OpType ResetFailoverOperation -FailoverType Production -TargetNode ProductionSQLClient
  4. Re-run the failover operation.

  5. If the failover operation continues to be stuck, run the following command:

    CvFailover -OpType ResetFailoverOperation -FailoverType Production -FailoverSubType forceUnplanned -TargetNode ProductionSQLClient

    Caution

    If there is a failure during a forced unplanned failover, contact Commvault Customer Support. Do not perform any manual or additional steps to recover from the failure.

Resolution - Linux

  1. Navigate to the following folder associated with the SQL client:

    opt/commvault2/Base

    Note

    commvault2 corresponds to the installation folder associated with the SQL client or Instance002. If you have installed the SQL client in a different instance, navigate to the corresponding folder.

  2. Run the following command:

    ./CvFailover -OpType ResetFailoverConfig -FailoverType Production -TargetNode ProductionSQLClient
  3. If failover operation does not succeed or fail and still in progress, run the following command:

    ./CvFailover -OpType ResetFailoverOperation -FailoverType Production -TargetNode ProductionSQLClient
  4. Re-run the failover operation.

  5. If the failover operation continues to be stuck, run the following command:

    ./CvFailover -OpType ResetFailoverOperation -FailoverType Production -TargetNode ProductionSQLClient -forceUnplannedFailover

    Caution

    If there is a failure during a forced unplanned failover, contact Commvault Customer Support. Do not perform any manual or additional steps to recover from the failure.

Additional Information

CvFailover Command Options

  • <OpType> can be one of the following operations:

    Failover - To perform a failover operation.

    GetFailoverConfig - To display the failover configuration status.

    ResetFailoverOperation - To reset a partial failover operation.

    ResetFailoverConfig - To reset an existing failover configuration. (To be used in case of xml corruption.)

  • <Failover Type> can be one of the following failover types:

    Production - To perform a production failover.

    ProductionMaintenance - To perform a maintenance failover.

    MaintenanceFailback option - To reset maintenance failover.

    Test - To perform a test failover.

    TestFailback - To reset a test failover.

  • <Target Node Name> is the name of the target SQL client in the CommServe host to which the operation must be failed over.

  • -forceUnplannedFailover can be used when the setup is stuck and only when the target node is the node with latest database, to avoid major data loss.

Loading...