Last Updated: Mar 16, 2026
Introduction
This article explains why a node pool (for example, an A100 GPU pool) may stop scaling even when VMs appear as Running in Crusoe Cloud.
This situation occurs when a VM exists in the infrastructure layer but the corresponding node is missing from the Kubernetes cluster. As a result, the infrastructure and Kubernetes cluster become out of sync, which may cause the Cluster Autoscaler to skip the affected node pool.
You may also see this error in the Cluster Autoscaler logs when it attempts to retrieve readiness information for the node pool.
Failed to find readiness information for <node-pool-id>
Question 1: Can a node showing as "Running" in Crusoe but not registered in Kubernetes cause the autoscaler to stop working?
Answer:
Yes, If there is drift between:
VMs reported as Running in Crusoe Cloud, and
Nodes registered in Kubernetes,
the autoscaler will skip that node pool as a valid autoscaling target.
The autoscaler expects infrastructure state and Kubernetes node state to match. If a VM exists but the node object does not, the pool is considered inconsistent and will not scale up to schedule pending pods.
Question 2: Why would a VM be "Running" in Crusoe but not appear in Kubernetes?
Answer:
One common cause is out-of-band node deletion from Kubernetes.
Example: Kubernetes audit logs show that the node was manually deleted via some client (eg. k9s) by a user. The VM itself was never deleted from Crusoe Cloud. This will cause
VM: Running (Crusoe view)
Node object: Deleted (Kubernetes view)
This creates drift. Because the VM still exists at the infrastructure layer, Crusoe sees the expected number of instances running (n/n), so the node pool state remains RUNNING.
Question 3: Should the node pool state change to UNHEALTHY if nodes fail to join?
Answer:
It depends on the scenario.
1. If a node fails to join the cluster:
In newer CMK versions:
The VM is automatically deleted
The instance group transitions to UNHEALTHY
2. If a node is deleted out-of-band in Kubernetes:
The VM still exists and is Running
The node pool sees expected VM count (n/n)
The node pool state remains
RUNNINGAutoscaler may skip the pool due to drift
Currently, deleting nodes directly in Kubernetes does not affect node pool health state in Crusoe Cloud.
Question 4: How can I recover from this condition?
Answer:
You can resolve drift by:
Identifying the orphaned VM (Running in Crusoe but missing in Kubernetes)
Deleting the VM from Crusoe Cloud
Allowing the autoscaler to recreate a clean node
After removing the drift, autoscaling should resume normally.