Validating Your Kubernetes Environment

Before you add your cluster to Commvault to begin protecting it, Using kubectl commands, validate that your environment is ready for Commvault backups.

Get the API Server URL

To add your cluster to, Commvault, you need to know the kube-apiserver or control plane URL.

  • Command to run:

    kubectl cluster-info

  • In the following example output, the URL is https://k8s-123-4.home.arpa:6443:

    Kubernetes control plane is running at https://k8s-123-4.home.arpa:6443 CoreDNS is running at https://k8s-123-4.home.arpa:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

Verify the Nodes Are Ready

Verify that all cluster nodes (control nodes and worker nodes) have a condition status of Ready—not of DiskPressure, MemoryPressure, PIDPressure, or NetworkUnavailable. For information about node condition statuses, see Conditions. Also verify that the version of the nodes is supported by Commvault.

For on-premises clusters or infrastructure-based clusters, verify that multiple control plane nodes or master nodes are listed.

  • Command to run:

    kubectl get nodes
  • Example output:

    • From an Azure Kubernetes Service (AKS) cluster:

      NAME STATUS ROLES AGE VERSION aks-agentpool-26889666-vmss000000 Ready agent 3h41m v1.23.5 aks-agentpool-26889666-vmss000001 Ready agent 3h41m v1.23.5 aks-agentpool-26889666-vmss000002 Ready agent 3h41m v1.23.5

    • From a Google Kubernetes Engine (GKE) cluster:

      NAME STATUS ROLES AGE VERSION gke-cluster-1-default-pool-a709ed39-hcc6 Ready <none> 9s v1.23.6-gke.1700 gke-cluster-1-default-pool-a709ed39-q497 Ready <none> 9s v1.23.6-gke.1700 gke-cluster-1-default-pool-a709ed39-zkg8 Ready <none> 9s v1.23.6-gke.1700

    • From an on-premises Vanilla Kubernetes single-node dev/test cluster:

      NAME STATUS ROLES AGE VERSION k8s-123-4 Ready control-plane,master 77d v1.23.4

Verify the CSI Drivers Are Functioning

Verify that your Container Storage Interface (CSI) driver or drivers are installed and functioning.

  • Command to run:

    kubectl get csidrivers
  • Example output from a cluster that is running the Rook-Ceph CSI driver:

    NAME                                    ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
    rook-ceph-restore.cephfs.csi.ceph.com   true             false            false             <unset>         false               Persistent   76d
    rook-ceph-restore.rbd.csi.ceph.com      true             false            false             <unset>         false               Persistent   76d
    rook-ceph.cephfs.csi.ceph.com           true             false            false             <unset>         false               Persistent   76d
    rook-ceph.rbd.csi.ceph.com              true             false            false             <unset>         false               Persistent   76d

Verify the CSI Pods

Each of your CSI drivers has one or more pods that run to respond to provisioning, attach, detach, and mount requests. Verify that the pods for your CSI driver have a status of Running. Also verify that each CSI driver has 2 pods listed, with 1 pod labeled as the provisioner. Commvault uses the provisioner to create new CSI volumes during restores.

  • Commands to run:

    kubectl get pods -A | grep -i csi
  • Example output from a Vanilla Kubernetes cluster running the CephFS and CephRBD CSI drivers:

    default csirbd-demo-pod 1/1 Running 0 76drook-ceph csi-cephfsplugin-provisioner-5dc9cbcc87-9hjvh 6/6 Running 0 76drook-ceph csi-cephfsplugin-qjpbn 3/3 Running 0 76drook-ceph csi-rbdplugin-provisioner-58f584754c-gqw6b 6/6 Running 1 (43d ago) 76drook-ceph csi-rbdplugin-x5vlg 3/3 Running 0 76d

Verify the Nodes Have No Active Taints

Verify that there are no taints on your cluster that might prevent backups or restores.

  • Command to run:

    kubectl describe node node name | grep -i taints

    Where node_name is the name of the node that you want to verify.

  • Example output from a Vanilla Kubernetes server that has no active taints:

    Taints: <none>

Verify the StorageClasses Are CSI-Enabled

The PersistentVolumeClaims (PVCs) that you want to protect must be presented by a registered, CSI-enabled StorageClass. Verify that the StorageClasses that have PersistentVolumeClaims that you want to protect use the Container Storage Interface (CSI).

Specifically, verify that at least one StorageClass includes "(default)" in its name and that the provisioners for the StorageClasses that you want to protect include ".csi." in their names.

  • Command to run:

    kubectl get storageclasses
  • Example output from a Vanilla Kubernetes cluster running Hostpath and Ceph Raw-Block Device (RBD) CSI drivers. This cluster also runs a non-CSI-based volume plug-in for provisioning object-based Rook-Ceph storage. Note that, if the provisioner does not contact CSI, then the volume plugin/driver is not supported by Commvault for backups and restores.

    NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGEcsi-hostpath-sc hostpath.csi.k8s.io Delete Immediate true 2m58srook-ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 76d rook-ceph-delete-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 76d

Verify You Have a CSI Node That Can Handle Requests

After installing a CSI driver, you can verify that the installation was successful by listing the nodes that have CSI drivers installed on them.

  • Command to run:

    kubectl get csinodes
  • Example output:

    • From a Azure Kubernetes Service (AKS) cluster that has a default CSI driver installation:

      NAME DRIVERS AGE aks-agentpool-26889666-vmss000000 2 5h aks-agentpool-26889666-vmss000001 2 5h aks-agentpool-26889666-vmss000002 2 5h

    • From a Google Kubernetes Engine (GKE) cluster with default CSI driver installation:

      NAME DRIVERS AGE gke-cluster-1-default-pool-a709ed39-hcc6 1 73m gke-cluster-1-default-pool-a709ed39-q497 1 73m gke-cluster-1-default-pool-a709ed39-zkg8 1 73m

If Necessary, Get Detailed Information About the CSI Drivers Installed on Each Node

You can get detailed information about the CSI drivers installed on each node.

  • Command to run:

    kubectl describe csinodes csinode_name

  • Example commands:

    • For an Azure Kubernetes Service (AKS) cluster:

      kubectl describe csinodes aks-agentpool-26889666-vmss000000

      Example output:

      Name: aks-agentpool-26889666-vmss000000 Labels: <none> Annotations: storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd CreationTimestamp: Mon, 30 May 2022 16:45:27 +0000 Spec: Drivers: disk.csi.azure.com: Node ID: aks-agentpool-26889666-vmss000000 Allocatables: Count: 4 Topology Keys: [topology.disk.csi.azure.com/zone] file.csi.azure.com: Node ID: aks-agentpool-26889666-vmss000000 Events: <none>

    • For a Google Kubernetes Engine (GKE) cluster:

      kubectl describe csinodes gke-cluster-1-default-pool-a709ed39-hcc6

      Example output:

      Name: gke-cluster-1-default-pool-a709ed39-hcc6 Labels: <none> Annotations: storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/cinder,kubernetes.io/gce-pd CreationTimestamp: Mon, 30 May 2022 20:36:18 +0000 Spec: Drivers: pd.csi.storage.gke.io: Node ID: projects/intense-vault-351721/zones/us-central1-c/instances/gke-cluster-1-default-pool-a709ed39-hcc6 Allocatables: Count: 15 Topology Keys: [topology.gke.io/zone] Events: <none>

Verify the CSI Drivers Are Registered

In order for the Container Storage Interface (CSI) driver to perform provisioning, attach/detach, mount, and snapshot activities, the CSI driver must be registered. Verify that all of your installed CSI drivers are listed and support the Persistent mode.

  • Command to run:

    kubectl get csidrivers
  • Example output:

    • From an Azure Kubernetes Service (AKS) with default CSI drivers installed and configured:

      NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGEdisk.csi.azure.com true false false <unset> false Persistent 4h44mfile.csi.azure.com false true false <unset> false Persistent,Ephemeral 4h44m

    • From a Google Kubernetes Engine (GKE) cluster with default CSI drivers installed and configured:

      NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGEpd.csi.storage.gke.io true false false <unset> false Persistent 53m

    • From a Vanilla Kubernetes cluster with Ceph and Hostpath CSI drivers installed and configured:

      NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGEhostpath.csi.k8s.io true true false <unset> false Persistent,Ephemeral 26mrook-ceph.cephfs.csi.ceph.com true false false <unset> false Persistent 76drook-ceph.rbd.csi.ceph.com true false false <unset> false

Verify the Pods Do Not Have an Error Status

Verify that your cluster and its hosted applications have a status of Running, Completed, or Terminated—not of Pending, Failed, CrashLoop, Evicted, or Unknown. Although Commvault notifies you of failures for backups and restores, you still need to verify the stability of your cluster before beginning backups.

If you identify a pod that does not have a status of Running, Completed, or Terminated, see Troubleshooting Applications. For more information about states, see Container states.

  • Command to run:

    kubectl get pods -A
  • Example output:

    • From a newly created Azure Kubernetes Services (AKS) cluster with all pods in the Running status (as expected):

      NAMESPACE NAME READY STATUS RESTARTS AGE calico-system calico-kube-controllers-7547b445f6-rgt7l 1/1 Running 1 (5h14m ago) 5h15m calico-system calico-node-6g6hm 1/1 Running 0 5h15m calico-system calico-node-n5xvk 1/1 Running 0 5h15m calico-system calico-node-r9kk9 1/1 Running 0 5h15m calico-system calico-typha-ddbb67d7b-kzk6k 1/1 Running 0 5h15m calico-system calico-typha-ddbb67d7b-mxhhl 1/1 Running 0 5h15m kube-system cloud-node-manager-bdfsj 1/1 Running 0 5h15m kube-system cloud-node-manager-bhkfp 1/1 Running 0 5h15m kube-system cloud-node-manager-kqb5l 1/1 Running 0 5h15m kube-system coredns-87688dc49-wb592 1/1 Running 0 5h18m kube-system coredns-87688dc49-xbbpg 1/1 Running 0 5h14m kube-system coredns-autoscaler-6fb889cdfc-vsm7j 1/1 Running 0 5h18m kube-system csi-azuredisk-node-6p4w8 3/3 Running 0 5h15m kube-system csi-azuredisk-node-g5rct 3/3 Running 0 5h15m kube-system csi-azuredisk-node-n2v72 3/3 Running 0 5h15m kube-system csi-azurefile-node-6ss8s 3/3 Running 0 5h15m kube-system csi-azurefile-node-r5682 3/3 Running 0 5h15m kube-system csi-azurefile-node-rdrgn 3/3 Running 0 5h15m kube-system kube-proxy-8f5fm 1/1 Running 0 5h15m kube-system kube-proxy-sw4hs 1/1 Running 0 5h15m kube-system kube-proxy-wqzlz 1/1 Running 0 5h15m kube-system metrics-server-948cff58d-5d8qv 1/1 Running 1 (5h14m ago) 5h18m kube-system metrics-server-948cff58d-q4lsm 1/1 Running 1 (5h14m ago) 5h18m kube-system tunnelfront-5486fcf877-j8gvc 1/1 Running 0 4h56m tigera-operator tigera-operator-5755874764-vxckp 1/1 Running 0 5h18m

    • From a newly created Google Kubernetes Engine (GKE) cluster with all pods in the Running status (as expected):

      NAMESPACE NAME READY STATUS RESTARTS AGE kube-system event-exporter-gke-5dc976447f-szrn9 2/2 Running 0 86m kube-system fluentbit-gke-6896v 2/2 Running 0 85m kube-system fluentbit-gke-69ntr 2/2 Running 0 85m kube-system fluentbit-gke-wnwkd 2/2 Running 0 85m kube-system gke-metrics-agent-5cq66 1/1 Running 0 85m kube-system gke-metrics-agent-cwnzn 1/1 Running 0 85m kube-system gke-metrics-agent-tdvx2 1/1 Running 0 85m kube-system konnectivity-agent-86cbc78d8-8sfzc 1/1 Running 0 85m kube-system konnectivity-agent-86cbc78d8-gvvnn 1/1 Running 0 85m kube-system konnectivity-agent-86cbc78d8-tbr2t 1/1 Running 0 86m kube-system konnectivity-agent-autoscaler-84559799b7-njczq 1/1 Running 0 86m kube-system kube-dns-584f56f967-65758 4/4 Running 0 86m kube-system kube-dns-584f56f967-jglv5 4/4 Running 0 85m kube-system kube-dns-autoscaler-9f89698b6-rn54z 1/1 Running 0 74m kube-system kube-proxy-gke-cluster-1-default-pool-a709ed39-hcc6 1/1 Running 0 84m kube-system kube-proxy-gke-cluster-1-default-pool-a709ed39-q497 1/1 Running 0 85m kube-system kube-proxy-gke-cluster-1-default-pool-a709ed39-zkg8 1/1 Running 0 85m kube-system l7-default-backend-5465dfc4ff-274zw 1/1 Running 0 86m kube-system metrics-server-v0.5.2-6f6d597469-xn6sx 2/2 Running 0 85m kube-system pdcsi-node-2xr6p

    • From a Vanilla Kubernetes cluster that recently experienced a DiskFull event on its CSI driver:

      NAMESPACE NAME READY STATUS RESTARTS AGE app-nginx nginx 1/1 Running 0 76d apps-helm my-release-redis-master-0 1/1 Running 0 56d apps-helm my-release-redis-replicas-0 1/1 Running 0 56d apps-helm my-release-redis-replicas-1 1/1 Running 0 56d apps-helm my-release-redis-replicas-2 1/1 Running 0 56d calico-apiserver calico-apiserver-5444dfd6b4-nc9f6 1/1 Running 0 77d calico-apiserver calico-apiserver-5444dfd6b4-pkgwj 1/1 Running 0 77d calico-system calico-kube-controllers-67f85d7449-ctxdz 1/1 Running 1 (77d ago) 77d calico-system calico-node-tlqvh 1/1 Running 1 (77d ago) 77d calico-system calico-typha-7bc4d5557f-rb7d2 1/1 Running 2 (77d ago) 77d default csi-hostpath-socat-0 1/1 Running 0 63m default csi-hostpathplugin-0 8/8 Running 0 63m default csirbd-demo-pod 1/1 Running 0 76d kube-system coredns-64897985d-brsvc 1/1 Running 1 (77d ago) 77d kube-system coredns-64897985d-s9rsq 1/1 Running 1 (77d ago) 77d kube-system etcd-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system kube-apiserver-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system kube-controller-manager-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system kube-proxy-8mmzm 1/1 Running 1 (77d ago) 77d kube-system kube-scheduler-k8s-123-4 1/1 Running 1 (77d ago) 77d kube-system snapshot-controller-7f5d798964-6vz5m 1/1 Running 0 76d kube-system snapshot-controller-7f5d798964-mc59t 1/1 Running 0 76d rook-ceph csi-cephfsplugin-provisioner-5dc9cbcc87-9hjvh 6/6 Running 0 76d rook-ceph csi-cephfsplugin-qjpbn 3/3 Running 0 76d rook-ceph csi-rbdplugin-provisioner-58f584754c-gqw6b 6/6 Running 1 (43d ago) 76d rook-ceph csi-rbdplugin-x5vlg 3/3 Running 0 76d rook-ceph rook-ceph-mgr-a-f74657b66-bs6b2 0/1 Completed 1 76d rook-ceph rook-ceph-mgr-a-f74657b66-k8vz2 1/1 Running 0 11d rook-ceph rook-ceph-mon-a-75d9f6df4-446xc 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-7j7pd 0/1 Completed 0 25d rook-ceph rook-ceph-mon-a-75d9f6df4-bsbwg 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-c7rxf 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-fktm4 0/1 Completed 0 42d rook-ceph rook-ceph-mon-a-75d9f6df4-fplth 0/1 Evicted 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-kctct 0/1 Completed 1 76d rook-ceph rook-ceph-mon-a-75d9f6df4-msx5n 1/1 Running 0 2d15h rook-ceph rook-ceph-mon-a-75d9f6df4-phmrb 0/1 Evicted 0 2d15h

Verify You Have a VolumeSnapshotClass That Has a CSI Driver

A CSI-enabled VolumeSnapshotClass is required for Commvault to orchestrate the creation of storage snapshots. Verify that your environment includes a VolumeSnapshotClass that has a CSI driver. A VolumeSnapshot is a storage-level snapshot of the underlying storage sub-system. Volume snapshots can be application-consistent or infrastructure/storage-consistent (default).

  • Command to run:

    kubectl get volumesnapshotclass
  • Example output from a Vanilla Kubernetes cluster with Ceph Raw Block Device (RBD) VolumeSnapshotClass installed and configured:

    NAME DRIVER DELETIONPOLICY AGE csi-hostpath-snapclass hostpath.csi.k8s.io Delete 74mcsi-rbdplugin-snapclass rook-ceph.rbd.csi.ceph.com Delete 76d

If Necessary, Get Detailed Information About the VolumeSnapshotClass

  • Command to run:

    kubectl describe volumesnapshotclass volumesnapshotclass_name

  • Example command:

    kubectl describe volumesnapshotclass csi-rbdplugin-snapclass
  • Example output:

    Name: csi-rbdplugin-snapclass Namespace: Labels: <none> Annotations: <none> API Version: snapshot.storage.k8s.io/v1 Deletion Policy: Delete Driver: rook-ceph.rbd.csi.ceph.com Kind: VolumeSnapshotClass Metadata: Creation Timestamp: 2022-03-14T23:05:36Z Generation: 1 Managed Fields: API Version: snapshot.storage.k8s.io/v1 Fields Type: FieldsV1 fieldsV1: f:deletionPolicy: f:driver: f:parameters: .: f:clusterID: f:csi.storage.k8s.io/snapshotter-secret-name: f:csi.storage.k8s.io/snapshotter-secret-namespace: Manager: kubectl-create Operation: Update Time: 2022-03-14T23:05:36Z Resource Version: 67371 UID: 153a1fac-783c-4b71-9d57-f0e161650100 Parameters: Cluster ID: rook-ceph csi.storage.k8s.io/snapshotter-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/snapshotter-secret-namespace: rook-ceph Events: <none>

The example output shows that the underlying driver is a CSI driver (rook-ceph.rbd.csi.ceph.com) and that a VolumeSnapshotClass is registered.

Volume snapshots is not installed by default in many cloud-managed Kubernetes services. For example, a default installation Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE) clusters returns "No resources found", which indicates that no VolumeSnapshotClass is registered and snapshot backup is not possible without installing the volume snapshot controller and associated StorageClass and Custom Resource Definitions. For instructions to install the CSI external-snapshotter, see kubernetes-csi / external-snapshotter.

Verify the API Version Includes snapshot.storage.k8s.io

The CSI external-snapshotter supports both v1 and v1beta1 snapshot APIs. Verify that your installed external-snapshotter supports the snapshot.storage.k8s.io/v1 API.

  • Command to run:

    kubectl describe volumesnapshotclass volumesnapshotclass_name

  • Example command:

    kubectl describe volumesnapshotclass csi-rbdplugin-snapclass
  • Example output:

    Name: csi-rbdplugin-snapclass Namespace: Labels: <none> Annotations: <none> API Version: snapshot.storage.k8s.io/v1 Deletion Policy: Delete Driver: rook-ceph.rbd.csi.ceph.com Kind: VolumeSnapshotClass Metadata: Creation Timestamp: 2022-03-14T23:05:36Z Generation: 1 Managed Fields: API Version: snapshot.storage.k8s.io/v1 Fields Type: FieldsV1 fieldsV1: f:deletionPolicy: f:driver: f:parameters: .: f:clusterID: f:csi.storage.k8s.io/snapshotter-secret-name: f:csi.storage.k8s.io/snapshotter-secret-namespace: Manager: kubectl-create Operation: Update Time: 2022-03-14T23:05:36Z Resource Version: 67371 UID: 153a1fac-783c-4b71-9d57-f0e161650100 Parameters: Cluster ID: rook-ceph csi.storage.k8s.io/snapshotter-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/snapshotter-secret-namespace: rook-ceph Events: <none>

Verify You Have a snapshot-controller Pod in the Running Status

To install and configure the CSI external-snapshotter, the Kubernetes Volume Snapshot CRDs, volume snapshot controller, and snapshot validation webhook components are installed. Verify that the snapshot-controller is installed and has a status of Running, which is required for the external snapshotter to handle requests to orchestrate snapshots.

  • Command to run:

    kubectl get pods -A | grep -i snapshot
  • Example output from a Vanilla Kubernetes cluster with the external-snapshotter installed and running:

    kube-system snapshot-controller-7f5d798964-6vz5m 1/1 Running 0 76d kube-system snapshot-controller-7f5d798964-mc59t 1/1 Running 0 76d

Verify There Are No Orphan Resources/Objects Created by Commvault

Note

If a backup or restore operation is interrupted, Commvault might not have a chance to remove temporary volumesnapshots, volumes, and Commvault worker pods. For simple identification and manual removal, if required, Commvault attaches labels to these resources. For more information, see Restrictions and Known Issues for Kubernetes.

  • Command to run:

    kubectl get pods,pvc,volumesnapshot -l cv-backup-admin= --all-namespaces
  • Example output from a cluster that has no orphaned objects:

    No resources found

If Necessary, Delete Orphaned Resources/Objects

If orphaned resources are listed, delete them.

  • Command to run:

    kubectl delete pod|pvc|volumesnapshot -n namespace resource_name

Verify the centos:8 Image Can Be Pulled

Commvault spawns a temporary worker pod for each protected application, namespace, and PersistentVolumeClaim. The Commvault worker pod uses the centos:8 docker hub image. For Commvault data management operations to function, the centos:8 image must be accessible to all nodes where the operations will occur. For more information, see System Requirements for Kubernetes. Verify that your cluster or nodes can pull the centos:8 image.

In air-gapped environments, verify that the centos:8 image can be pulled from all nodes that will have Commvault worker pods running on them. For more information, see Protecting an Air-Gapped Kubernetes Cluster.

  • Command to run:

    docker pull centos:8
  • Example output from a cluster that is connected to hub.docker.com for image initial download and updated image download:

    8: Pulling from library/centos
    a1d0c7532777: Already exists
    Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
    Status: Downloaded newer image for centos:8
    docker.io/library/centos:8

Loading...