Last Updated: March 30, 2026
Introduction
Crusoe Managed Kubernetes (CMK) provides several optional add-ons, including the Nvidia GPU Operator, Nvidia Network Operator, and Crusoe CSI Driver, Cluster Autoscaler, Autoclusters and Nvidia GB200 Support to extend cluster functionality. This guide covers the verification and uninstallation procedures for the NVIDIA GPU Operator within a Crusoe Managed Kubernetes (CMK) cluster.
Why is this operator needed?
The NVIDIA GPU Operator is a critical automation layer that manages the software stack required to utilize GPUs. It installs the NVIDIA drivers, the CUDA runtime, the NVIDIA Container Toolkit, and the NVIDIA Device Plugin. Without this operator, your Kubernetes nodes will not be able to schedule GPU workloads.
Prerequisites
Before starting, ensure you have the following:
Running CMK cluster with Nodepool having NVIDIA GPU Operator add on enabled .
The
kubectlCLI Installed and configured with your CMK cluster's Kubeconfig (get Kubeconfig)The latest version of the Crusoe CLI installed and authenticated to perform nodepool updates.
Step-by-Step Instructions
Step 1: Identify all the installed operators
Before uninstalling, you should verify the name of the release and the namespace it resides in.
-
List all the Helm Releases
~ helm list -A | grep gpu-operator --------------------------------Expected Output------------------------------------ NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION gpu-operator nvidia-gpu-operator 1 2026-03-26 19:23:32.929250932 +0000 UTC deployed gpu-operator-v25.3.4 v25.3.4 -
Check Pod Status
~ kubectl get pods -n nvidia-gpu-operator --------------------------------Expected Output------------------------------------ NAME READY STATUS RESTARTS AGE gpu-operator-b6d589d4c-fw4fb 1/1 Running 0 26h gpu-operator-node-feature-discovery-gc-6d6c559f4d-86btq 1/1 Running 0 26h gpu-operator-node-feature-discovery-master-6f9ff7c476-wv65t 1/1 Running 0 26h gpu-operator-node-feature-discovery-worker-cwgvq 1/1 Running 0 2m17s
Step 2: Identify all the installed operators
If you need to move to a custom driver or a different operator version, follow these steps to ensure a clean removal.
-
Since CMK deploys this as a Helm chart, use
helmto ensure all Custom Resource Definitions (CRDs) and linked resources are handled.~ helm uninstall gpu-operator -n nvidia-gpu-operator -
Delete the Namespace
~ kubectl delete namespace nvidia-gpu-operator
Step 3: Resolving "Stuck" Drivers
After uninstallation, NVIDIA kernel modules often remain "in use" by the OS, preventing the new drivers or operator from loading correctly.
-
In a CMK, deleting the VM is the most reliable way, as it will delete the existing VM and a new VM will be auto provisioned and added to the nodepool.
~ crusoe compute vms delete <VM_NAME>
Note: You must add the NVIDIA Helm repository and install the operator manually and when managing your own version, you are responsible for monitoring upgrades, driver compatibility, and security patches. Crusoe Support can assist with infrastructure issues, but software-level troubleshooting for custom operator versions is handled on a best effort basis.
FAQ & Troubleshooting
Q: What happens to the NVIDIA Network Operator if I uninstall the GPU Operator?
Ans: The Network Operator runs independently. However, if your new GPU Operator includes a different driver version, ensure it is compatible with the existing Network Operator/MOFED configuration to avoid InfiniBand connectivity issues.
Q: If I manually uninstall the add-on, will the CMK service try to "fix" it?
Ans: No. Currently, CMK add-ons are not managed by the cluster after initial installation. Even if you delete them, the state change will not be detected or automatically reverted by the platform.
Q: What happens during a CMK cluster upgrade if I have manually modified or removed the operator?
Ans: During a cluster upgrade, the CMK control plane focuses on the Kubernetes version and system components. However, if you have removed a managed add-on, the upgrade process will generally not attempt to reinstall it. If you have installed a custom version, you must ensure it remains compatible with the newer Kubernetes version of the control plane.
Q: Does scaling down a nodepool automatically delete the VMs?
Ans: No. In CMK, scaling down a Node Pool updates the target count, but you must manually delete the VMs you no longer want.
Note: Doing a kubectl delete node does not delete the underlying Crusoe VM, you must delete the VM resource itself.
Q: Will deleting an instance release my capacity?
Ans: No. The capacity is reserved for you. Deleting a VM in a CMK nodepool simply triggers the platform to create a new VM to maintain your target count.