Ensuring Fault Tolerance For The CommServe VM
HyperScale 1.5 Appliance is a Hyperconverged platform which runs the CommServe as a VM (Virtual Machine). Although the RHV provided high availability features are used to ensure that the VM is highly available and can failover in case one of the nodes in the appliance goes down, ensure the following additional prerequisites.
- Each node must have an active network connection on the iRMC (Integrated Remote Management Console) port.
- The data protection network should be able to reach iRMC network.
Data protection and iRMC networks should be routable.
- In case there is an external firewall ensure that the required ports are opened. For more information about the required ports, see Firewall Port Requirements.
- Negative affinity is enabled between the RHV VM and the CommServe VM to ensure that they are running on different HyperScale nodes.
In some cases, it has been noticed that negative affinity may not be honored and as a result both the VM’s become active on the same node. If this happens, migrate the Commserve VM using the Virtualization Manager (a RHV html interface) so that the VM’s become active on different nodes. This will ensure that VM can be failed over automatically, in case any of the nodes go down.
For additional information about Affinity Groups, see https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-affinity_groups.
- In some cases, it has been noted that RHV VM does not become active on a different node, when the node on which it is currently running has been rebooted.
Check the vdsm.log to see if the following error is reported:
libvirtError: unsupported configuration: maximum vCPU count must not be less than current vCPU count
If the error occurs, follow the steps described in the KB article The Virtual Machine Fails to Start After Upgrading the Hosted Engine OVA Version From v4.2 to v4.3 to correct this error.
- In the HE (Hosted Engine) VM hosted on HS3300, the vCPU (Virtual CPU) is set to 40, which can cause the HE migration to fail with the following error:
Maximum number of sockets exceeded
If this error occurs, follow the steps described in HE Migration Error to address the issue.
Last modified: 2/7/2020 10:22:24 PM