There are restrictions and known issues for protecting Kubernetes with Commvault. Workarounds, if available, are included.
Restrictions
Supportability
-
Protection of etcd is not supported on Amazon EKS Distro (EKS-D).
-
Protection of Robin.io-bundled applications is not supported when protecting Robin.io Kubernetes clusters.
-
Use of the Rancher management endpoint is not supported.
As a workaround, add Rancher clusters using the control plane endpoint for the Kubernetes cluster. For more information, see Adding a Kubernetes Cluster.
Data You Cannot Back Up
-
Deprecated in-tree storage volume plug-ins (azureFile, cinder, fc (fibre channel), flocker, gitRepo, quobyte, storageOS)
-
Deprecated out-of-tree storage plug-ins (flexVolume)
-
In-tree storage volume plug-ins (cephfs, glusterfs, iscsi, nfs, portworxVolume, rbd)
-
Legacy in-tree volumes (awsElasticBlockStore, azureDisk, gcePersistentDisk)
-
Kubernetes PKI certificates stored in /etc/kubernetes/pki
-
KubeVirt.io-managed virtual machines
-
Robin.io-bundled applications and metadata
-
Windows containers in Kubernetes
Data You Cannot Restore
-
System namespaces (kube-system, kube-node-lease, kube-public) that have the overwrite option enabled
-
Namespaces that provide system-level shared services (such as ceph-rook, calico-apiserver, calico-system)
-
Out-of-place application or namespace recovery (another namespace, another cluster) of helm chart-deployed applications
-
Out-of-place application or namespace recovery to another Kubernetes cluster that is running a different major revision than the source cluster
-
Out-of-place application recovery with API resources/objects that have cluster-specific networking configuration (Endpoints, EndpointSlices, Services, Ingresses)
Note
Kubernetes resources that have an owner reference to a parent resource are not restored. The parent is restored, and then the parent creates the child resource. For example, for a Pod that is created by a Deployment, Commvault restores the Deployment (but not the Pod), and then the Deployment creates the Pod.
Known Issues
Guided Setup
-
When you complete the Kubernetes guided setup, add a Kubernetes cluster, or create a Kubernetes application group, you can create a Windows x86 64-bit access node, but you cannot create a Linux access node. To add a Linux access node, see Adding an Access Node for Kubernetes.
-
When you complete the Kubernetes guided setup or add a Kubernetes cluster, on the Add Cluster page, if you click the Previous button to review the server plan or access node, the cluster is not added and the cluster information is not retained.
Access Nodes
- Installations with access nodes that run Commvault 11.24 and a CommServe server that runs Commvault Platform Release 2022E are not supported.
Adding a Cluster
-
After you add a Kubernetes cluster for protection, the kube-apiserver and control plane endpoint cannot be updated. To change the endpoint URL, you must delete all application groups, then the cluster, and then re-add a new cluster with the updated endpoint URL. Backup data that is under active retention can still be recovered.
-
Commvault does not support creating a volume-based protection application group with both file-mode and block-mode volumes included. Please select the entire namespace to protect applications with both file-mode and block-mode volumes.
Adding a Cluster: Authentication
-
Commvault supports authentication and authorization with the default cluster-admin role or a custom ClusterRole created from the cvrolescript.sh. Commvault does not support the creation or usage of restricted ClusterRoles that have limited access to specific namespaces, applications, and/or API resources.
-
Commvault supports service account tokens for authentication exclusively.
-
Commvault does not support X509 client certificates, static token Files, bearer tokens, OpenID Connect tokens, or authenticating proxies for authentication.
Application Groups
-
When an application group uses application-centric protection to protect individual applications or content detected using label selectors, related API resources/objects are intelligently inferred based on the application manifest and/or helm chart. If an API resource/object exists in the application namespace, but is not directly referenced, the resource/object is not protected.
To protect all API resources/objects in a namespace, use namespace-centric protection. For more information, see Namespace-Centric and Application-Centric Protection for Kubernetes.
-
Clicking a failed or partially failed application group, and then clicking Backup failed VMs fails to find failed Kubernetes applications to protect.
As a workaround, re-run the application group backup from the application group properties page.
-
Backups of applications and/or application group with full or partial failure do not identify the source host, but instead indicate that the problem occurred on host[].
-
Application or application group failures indicate that the access node (referred to as proxy) might not be able to communicate with the host. The destination host is not identified in the failure reason.
-
Application or application group failures might ask the user to validate that the selected transport mode is valid. However, there is no transport mode selection for Kubernetes backups.
References to transport mode in Kubernetes protection jobs can be safely ignored.
-
Clicking a failed or partially failed last backup on the Application group tab refers to failed entities as VMs instead of as Kubernetes applications.
As a workaround, re-run the application group backup from the application group properties page.
-
Exclusion filters can be defined for API resources that exist at the time of application group creation. Label selectors cannot be used to define exclusion filters.
-
Exclusion filters operate at the level of an application, namespace, or volume. Exclusion of specific API resources (for example, excluding specific secrets via wildcard) is not supported.
-
When you specify exclusions, the effect of both including and excluding an object depends on the type of the object. If you both add and exclude the same application, the application is not backed up. If you both add and exclude the same namespace, the namespace is backed up.
-
The preview of selections for an application group does not permit sorting of discovered applications, namespaces, and volumes.
-
When you select objects to be protected by a Kubernetes application group, there is a Preview button to see which objects are auto-discovered by Commvault for protection. Occasionally, if you click the Preview button, an error is generated, indicating that no applications match. You can continue to save the application group and re-attempt using the Preview button.
-
When you select objects for an application group, the search function searches only namespaces and application names.
-
PersistentVolumeClaim (PVC) names are not searched or found, even when the Browse setting is set to Volumes.
-
Label names and values are not searched or found, even when the Browse setting is set to Labels.
-
-
When you select objects for an application group using label selectors, Commvault does not detect PersistentVolumes (PV) that have the label selector applied.
As a workaround, verify that the label selector is applied to the PersistentVolumeClaim (PVC).
Clusters Page
-
Commvault tags attached to a cluster resource are not displayed on the Clusters page.
As a workaround, click the cluster to view the cluster properties page, and then go to the Configuration tab.
-
The Application groups page might not update immediately after a successful backup and show an old Last backup value.
As a workaround, click the refresh button or press the F5 key to reload your browser window.
-
When you add applications to an application group, the icons for the Kubernetes cluster, etcd pod, and Helm-deployed applications are displayed as a gray circle. This known issue does not affect protection of these resources.
-
When you modify the content of an existing application group, Kubernetes-native icons are not displayed. This known issue does not affect protection of these resources.
-
The Content section of the application group properties page does not show Kubernetes-native icons for included content rules. This known issue does not affect protection of these resources.
-
Commvault tags that are attached to an application group are not displayed on the Application groups page.
As a workaround, click the application group, then go to the Configuration tab to view the tags.
Applications Page
-
Clicking a failed or partially failed last backup on the Application tab opens or redirects to a blank page.
As a workaround, re-run the application group backup from the application group properties page.
-
Applications are displayed with a linux icon. This icon is not representative of the underlying operating system of the object. All Kubernetes applications, namespaces, volumes show a linux icon. This known issue does not affect protection operations.
-
The Applications page might not update immediately after a successful backup.
As a workaround, click the refresh or press the F5 key to reload your browser window.
-
When you select the etcd (system generated) application on the Application page, and then clicking Restore, the Application manifests and Full application options are presented. Only the Application files restore type can be used.
As a workaround, go to the etcd (system generated) application group properties page, and then click the Restore button.
-
Commvault tags attached to an application are not displayed in the Applications page.
As a workaround, click the application, and then go to the Configuration tab to view the tags.
-
When you view the protected Kubernetes applications on the Applications page, the Application type, Namespace, Kubernetes cluster, and Application group columns do not support sorting.
As a workaround, for reporting, you can export the data to a CSV or XLSX file, and then sort the data.
Application Groups Page
-
When you view the protected Kubernetes application groups on the Application groups page, the Kubernetes cluster column does not support sorting.
As a workaround, for reporting, you can export the data to a CSV or XLSX file, and then sort the data.
Cleanup of Temporary Resources
-
If a backup or restore operation is interrupted, Commvault might not have a chance to remove temporary volumesnapshots, volumes, and Commvault worker pods. These resources will continue to consume CPU, memory, and diskspace on your cluster until manually removed. Additionally, when multiple access nodes are used to perform data management, if a subsequent data management job is scheduled on another access node, that access node will not be aware of previously created orphaned resources for cleanup.
As a workaround, you can identify and manually remove orphaned resources. For more information, see "Verify There Are No Orphan Objects Created by Commvault" in Validating Your Kubernetes Environment.
Backups
-
Backups might fail when the Kubernetes API server is behind either a load balancer or Ingress. The following error appears in logs.
15784 f18'a0'a0 08/04 16:04:39 130730 VSBkpWorker::BackupVMFileCollection() - Failed to backup file collection accessnode-certsandlogs due to 0xFFFFFFE2:{CK8sFSVolume::readNodeContentBlock(263)/ErrNo.-30.(Unknown error)-file collection data read failed. error Truncated tar archive.} 'a0
If you encounter this problem, try adding the cluster in the Command Center without a load balancer or Ingress.
-
Commvault protects control plane SSL certificates from the node that runs the etcd POD (at time of backup). To capture all SSL certificates, from all control plane nodes, see Configuration for Kubernetes SSL Certificates.
-
Commvault protects SSL certificates in controlplane-node: /etc/kubernetes/pki/etcd. Certificates in the /etc/kubernetes/pki folder are not collected. To capture all SSL certificates from all control plane nodes, see Configuration for Kubernetes SSL Certificates.
-
Commvault does not support on-demand synthetic full backups for Kubernetes. Commvault automatically schedules synthetic full backups according to the server plan.
-
Commvault protects annotations on Pods, DaemonSets, Deployments, and StatefulSets. Annotations on any other API resources/object are not collected or restoreable.
-
Backups of large PersistentVolumeClaims (PVCs) in excess of 2 TB might hang and run indefinitely with no updates in the Job Monitor.
If you experience this issue, please contact Commvault Support.
-
Backups might behave incorrectly if the Job Results directory path for a Windows access node is greater than 256 characters.
As a workaround, move the Job Results directory to a shorter path. For instructions, see Changing the Path of the Job Results Directory.
-
Backups of a namespace will fail if PersistentVolumeClaims with misconfigured VolumeSnapshotClass bindings exist. These backups will fail with a "Config files download failed for app [app-name]" error, and all contents of the namespace will not be protected.
-
For application-consistent backups, Commvault does not log the return codes of pre- and post-process scripts. Backups continue to run, regardless of whether a pre-execution script returns zero (no error occurred) or non-zero (one or more errors occurred). Because of this issue, backups might not be application-consistent and might not be restoreable.
As a workaround, implement notification in your pre-execution scripts to ensure that the appropriate application or backup owner is alerted to pre- and post-execution script failure.
-
For Windows access nodes, backup and restores might fail with an "Error mounting snap volumes" error because TLS handshake errors and connection timeouts occurred. For more information, see Backups or restores of Kubernetes fail with an "Error mounting snap volumes" error.
-
When you view backup job details, some descriptions (such as "VM Admin Job(Backup)" and "Virtual Machine Count") refer to "virtual machine" or "VM" instead of "application".
-
Commvault creates the worker Pods without any restrictions on CPU or memory. If necessary, you can apply CPU and memory restrictions. For information, see Modifying the Resource Limits for Commvault Temporary Pods for Kubernetes.
Restores
-
Helm-based application recovery is supported only for in-place recovery to the original cluster and namespace.
-
Commvault cannot restore etcd snapshots to the original pod or an alternate running etcd pod. Instead, restore the snapshot to a Commvault access node file system, and then transfer the snapshot to the intended control plane node. For instructions, see Restoring a Kubernetes etcd Snapshot to a File System.
-
Commvault cannot restore control plane SSL certificates to the original or alternate running etcd POD. Restore to a Commvault access node file system, and then transfer to intended control plane node. For information, see Configuration for Kubernetes SSL Certificates.
-
Commvault cannot perform application restores or migration between different releases of Kubernetes. Commvault requires that the same major revision is running on both the source cluster and the destination cluster for application migration to succeed.
-
Commvault does not perform API resource transformation when restoring objects cross cluster. If you are restoring networking resources with references to the source cluster (for example, endpoints, endpointslices, routes), restore the manifest, update networking information to reference the destination cluster, and then manually schedule on the cluster using kubectl utility.
-
When you perform a restore for a stateless application or namespace the Restore options page shows a Storage class mapping section. There is no ability to assign a StorageClass during a restore without PersistentVolumeClaims.
This input can be safely ignored.
-
When you perform an etcd SSL certificate or etcd snapshot recovery to an access node file system, the Restore options page shows a Volume destination by default. Restoring to an existing PersistentVolumeClaim (PVC) or Volume is not supported.
As a workaround, select the File system destination tab when you perform an etcd SSL certificate or etcd snapshot recovery.
-
When you perform an etcd SSL certificate or etcd snapshot recovery to a file system destination, the Path setting does not allow an existing or new path to be directly typed into the input box.
As a workaround, click the Browse button, and then select an existing folder or create a new folder.
-
When you perform an etcd SSL certificate or etcd snapshot recovery to a file system destination, the software incorrectly shows access node groups in the Destination client list. For the restore to complete successfully, you must select an individual access node, not a group.
-
When you perform a restore, Commvault performs the restore as the root user. There is no ability to change the user account that performs the restore. Do not enter credentials in the Impersonate user settings.
-
Live browse operations are not supported for Kubernetes.
Operations: Job Monitor and Log Files
-
Within Kubernetes protection Job summary and Job details panes, the application group might be identified as a "Subclient". This known issue does not affect protection activity.
-
Within Kubernetes protection log files (vsbkp.log, vsrst.log), containerized applications might be referred to as "Virtual Machines" or "VMs". This known issue does not affect protection activity.
-
Within Kubernetes protection jobs, errors might refer to the Kubernetes access node as a "proxy". The "proxy" and "access node" terms are used interchangeably for servers that run the Virtual Server package. This known issue does not affect protection activity.
-
When Commvault performs a backup of a stateful application or namespace, if a VolumeSnapshotClass is not available, then the volumes AccessMode are set to ReadWriteMany (RWX) to perform the backup. No visual indication is available in the Job Monitor to indicate that snapshot-based backup is not possible.
As a workaround, perform the steps in Validating Your Kubernetes Environment to verify that your StorageClass and VolumeSnapshotClass are set up correctly.
-
If your VolumeSnapshotClass has the deletionPolicy set to Delete, then snapshots created by Commvault are not deleted after the backup. This issue might cause full consumption of available storage space on the underlying storage array. There is no visual indication in the Job Monitor or Job log files that this condition was detected.
As a workaround, verify that any VolumeSnapshotClass that will be used by Commvault has its deletionPolicy set to Retain.
-
There is no visual indication in the Job monitor or Job log files about the StorageClass and VolumeStorageClass that are detected and used for a backup or recovery operation. This known issue does not affect protection activity.
-
Clicking an application name (in the Server column) on the Job history page redirects the user to a non-functional application details page.
As a workaround, do not click the application name in the Job history page. Instead, go to Protect > Kubernetes, and then click the application on the Applications page.
Orphaned Objects
-
If a backup or recovery operation is interrupted on the Commvault access node or if the Commvault worker Pod is interrupted, orphaned Kubernetes instances/objects might be left in your cluster.
As a workaround, run the "Verify There Are No Orphan Objects Created by Commvault" process in Validating Your Kubernetes Environment, and then delete any orphaned objects.