How-To Resolve Intermittent NFS CSI Mount Failures During Large Pod Fanouts

Introduction

During high-concurrency pod scaling events (large pod fanouts), Kubernetes workloads using NFS volumes may experience intermittent mount failures. These failures surface as the Linux kernel error Required key not available (ENOKEY) or as API-level 409 Conflict errors during volume attachment.

The condition is triggered by rapid, concurrent DNS resolutions during a large pod scale-up. When many worker nodes simultaneously resolve the multi-homed VAST storage endpoint (nfs.crusoecloudcompute.com), divergent DNS lookups between the primary mount process and its sub-connection handlers (configured via nconnect=16) produce connection-state mismatches.

This article walks you through identifying the condition and upgrading your Crusoe CSI driver to the corrected production version. The fix switches the driver to a userspace Go resolver that resolves the endpoint exactly once and moves lookups to the node's local resolver, bypassing cluster-level CoreDNS entirely.

Prerequisites

Running CMK Cluster Using NFS as a Storage Option
kubectl CLI Installed and Configured With Your CMK Cluster's Kubeconfig (Get Kubeconfig)
Helm CLI Installed

Instructions

Step 1: Identify the Error State

Inspect the events in your namespace to look for FailedMount errors matching the ENOKEY error signature:
```
kubectl get events -n <your-namespace> --sort-by='.metadata.creationTimestamp' | grep FailedMount
```

Verify if the log or event message matches the following specific string pattern:

MountVolume.SetUp failed for volume [...] : rpc error: code = Internal desc = ... Required key not available

Check for concurrent volume attach conflicts on the Kubernetes API side using the controller logs:
```
kubectl logs deployment/crusoe-csi-controller -n crusoe-system | grep "409 Conflict"
```
ℹ️ Note: The CSI driver namespace is typically crusoe-system. Older deployments may use kube-system. Substitute your actual namespace in the commands above and below.

Step 2: Update the Crusoe CSI Driver

The issue is mitigated in Crusoe CSI Driver Helm Chart version 0.5.0 (and later).

Ensure the official Crusoe Helm repository is added and up to date:

helm repo add crusoe-csi-driver https://crusoecloud.github.io/crusoe-csi-driver-helm-charts/charts
helm repo update crusoe-csi-driver

Verify that version 0.5.0 or later is visible in the repository layout:
```
helm search repo crusoe-csi-driver/crusoe-csi-driver --versions
```
Prepare an override_values.yaml so your API credential paths are mapped correctly during the upgrade:
```
crusoe:
  secrets:
    crusoeApiKeys:
      secretName: "crusoe-api-keys"
      accessKeyPath: "CRUSOE_ACCESS_KEY"
      secretKeyPath: "CRUSOE_SECRET_KEY"
```
- secretName — name of the Kubernetes Secret holding your Crusoe API credentials; must match the Secret in your CSI namespace.
- accessKeyPath — the key within that Secret that stores your Crusoe access key.
- secretKeyPath — the key within that Secret that stores your Crusoe secret key.

Execute the upgrade against your CSI namespace:

helm upgrade --install crusoe-csi-driver crusoe-csi-driver/crusoe-csi-driver \
    --namespace <your-csi-namespace> \
    --version 0.5.0 \
    -f override_values.yaml

Confirm all DaemonSet pods are completely updated and in a Ready state:
```
kubectl get daemonset crusoe-csi-node -n <your-csi-namespace>
```

Frequently Asked Questions

Q: Why did this issue occur primarily during large pod scale ups ("fanouts")?

A: When hundreds of pods deploy simultaneously, they issue rapid back-to-back mount calls. With NFS configurations like nconnect=16 and remoteports=dns, each mount triggers a sequence of backend network connections. Under heavy parallel load, different stages of the mount sequence received different IP addresses from the DNS load balancer pool. This IP drift breaks the secure handshake, resulting in the Linux kernel throwing a Required key not available (ENOKEY) fault.

Q: Does this issue cause permanent data loss or disk corruption?

A: No. This is exclusively an identification and handshake issue at the network/mount layer. The underlying data stored on your flash volumes (c2-home, c2-datadisk, etc.) remains completely intact and uncorrupted.

Q: How does the CSI driver upgrade fix the root cause?

A: The upgraded driver uses a userspace Go resolver that resolves the VAST storage domain exactly once per mount request. It pins that specific set of IP addresses and passes them directly into the kernel mount command, bypassing iterative lookups and eliminating the divergent DNS racing. It also decouples mounting from CoreDNS, so cluster DNS issues during scale events no longer affect volume availability.

Q: My cluster uses a gang-scheduler (like Kueue) and jobs are still getting evicted before retries can complete. What temporary workaround is available?

A: Because kubelet and the CSI framework retry these mounts automatically, pods will eventually connect if given enough time. If your topology-aware or gang-scheduled jobs are timing out and getting evicted too quickly, increase your Kueue (or scheduler) waitForPodsReady timeout to 10m. That buffer lets the system clear the backlogged mounts until you can apply the permanent driver upgrade.

Example

A training cluster with 200 GPU worker nodes is scaling up after a scheduled maintenance window. As the cluster autoscaler provisions the nodepool and pods begin scheduling, you notice that roughly 10–15% of the pods are stuck in ContainerCreating state with repeated FailedMount events:

MountVolume.SetUp failed for volume "pvc-abc123" : rpc error: code = Internal desc = Required key not available

The remaining pods eventually mount and become ready after kubelet retries, but the affected pods have already exceeded their scheduler's timeout and are evicted. After upgrading the CSI driver to v0.5.0 and redeploying the workload, all pods mount cleanly on the first attempt — even during the next large-scale fanout event.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.