Skip to main content
Crusoe Support Help Center home page
Crusoe

How to Uninstall the NVIDIA GPU Operator Add-on in Crusoe Managed Kubernetes (CMK)

Karan Solanki
Karan Solanki
Updated

Last Updated: March 30, 2026

Introduction

Crusoe Managed Kubernetes (CMK) provides several optional add-ons, including the Nvidia GPU Operator, Nvidia Network Operator, and Crusoe CSI Driver, Cluster Autoscaler, Autoclusters and Nvidia GB200 Support to extend cluster functionality. This guide covers the verification and uninstallation procedures for the NVIDIA GPU Operator within a Crusoe Managed Kubernetes (CMK) cluster.

Why is this operator needed? 
The NVIDIA GPU Operator is a critical automation layer that manages the software stack required to utilize GPUs. It installs the NVIDIA drivers, the CUDA runtime, the NVIDIA Container Toolkit, and the NVIDIA Device Plugin. Without this operator, your Kubernetes nodes will not be able to schedule GPU workloads.

Prerequisites

Before starting, ensure you have the following:

  • Running CMK cluster with Nodepool having NVIDIA GPU Operator add on enabled .

  • The kubectl CLI Installed and configured with your CMK cluster's Kubeconfig (get Kubeconfig)

  • The latest version of the Crusoe CLI installed and authenticated to perform nodepool updates.

Step-by-Step Instructions

Step 1: Identify all the installed operators

Before uninstalling, you should verify the name of the release and the namespace it resides in.

  • List all the Helm Releases

    ~ helm list -A | grep gpu-operator
    
    --------------------------------Expected Output------------------------------------
    
    NAME                	NAMESPACE              	REVISION	UPDATED                                	STATUS  	CHART                        	APP VERSION
    gpu-operator        	nvidia-gpu-operator    	1       	2026-03-26 19:23:32.929250932 +0000 UTC	deployed	gpu-operator-v25.3.4         	v25.3.4
  • Check Pod Status

    ~ kubectl get pods -n nvidia-gpu-operator
    
    --------------------------------Expected Output------------------------------------
    
    NAME                                                          READY   STATUS    RESTARTS   AGE
    gpu-operator-b6d589d4c-fw4fb                                  1/1     Running   0          26h
    gpu-operator-node-feature-discovery-gc-6d6c559f4d-86btq       1/1     Running   0          26h
    gpu-operator-node-feature-discovery-master-6f9ff7c476-wv65t   1/1     Running   0          26h
    gpu-operator-node-feature-discovery-worker-cwgvq              1/1     Running   0          2m17s
    

Step 2: Identify all the installed operators

If you need to move to a custom driver or a different operator version, follow these steps to ensure a clean removal.

  • Since CMK deploys this as a Helm chart, use helm to ensure all Custom Resource Definitions (CRDs) and linked resources are handled.

    ~ helm uninstall gpu-operator -n nvidia-gpu-operator
  • Delete the Namespace

    ~ kubectl delete namespace nvidia-gpu-operator

Step 3: Resolving "Stuck" Drivers

After uninstallation, NVIDIA kernel modules often remain "in use" by the OS, preventing the new drivers or operator from loading correctly.

  • In a CMK, deleting the VM is the most reliable way, as it will delete the existing VM and a new VM will be auto provisioned and added to the nodepool.

    ~ crusoe compute vms delete <VM_NAME>

Note: You must add the NVIDIA Helm repository and install the operator manually and when managing your own version, you are responsible for monitoring upgrades, driver compatibility, and security patches. Crusoe Support can assist with infrastructure issues, but software-level troubleshooting for custom operator versions is handled on a best effort basis.

FAQ & Troubleshooting

Q: What happens to the NVIDIA Network Operator if I uninstall the GPU Operator? 
Ans: The Network Operator runs independently. However, if your new GPU Operator includes a different driver version, ensure it is compatible with the existing Network Operator/MOFED configuration to avoid InfiniBand connectivity issues.

Q: If I manually uninstall the add-on, will the CMK service try to "fix" it? 
Ans: No. Currently, CMK add-ons are not managed by the cluster after initial installation. Even if you delete them, the state change will not be detected or automatically reverted by the platform.

Q: What happens during a CMK cluster upgrade if I have manually modified or removed the operator? 
Ans: During a cluster upgrade, the CMK control plane focuses on the Kubernetes version and system components. However, if you have removed a managed add-on, the upgrade process will generally not attempt to reinstall it. If you have installed a custom version, you must ensure it remains compatible with the newer Kubernetes version of the control plane.

Q: Does scaling down a nodepool automatically delete the VMs? 
Ans: No. In CMK, scaling down a Node Pool updates the target count, but you must manually delete the VMs you no longer want. 
Note: Doing a kubectl delete node does not delete the underlying Crusoe VM, you must delete the VM resource itself.

Q: Will deleting an instance release my capacity? 
Ans: No. The capacity is reserved for you. Deleting a VM in a CMK nodepool simply triggers the platform to create a new VM to maintain your target count.

Additional Resources

Related to

Was this article helpful?

0 out of 0 found this helpful

Still need help?

Our support team is ready to assist you with any questions.

Have more questions? Submit a request

Recently Viewed

Comments

0 comments

Article is closed for comments.