Last Updated: March 23, 2026
Introduction
On a Slurm clusters and specific HPC deployments, local high-speed NVMe drives are configured as a RAID 0 array and mounted at /scratch/local (typically using the virtual device /dev/md127).
If this hardware fails to mount during the node boot sequence, applications may inadvertently write data to the /scratch/local directory on the OS boot partition instead. This "ghost data" can fill the 124GB boot drive, causing "No space left on device" errors and preventing the OS from successfully mounting the actual NVMe RAID array.
Prerequisites
Before starting, ensure you have the following:
A Crusoe VM acting as a node within a Slurm or HPC cluster.
Running GPU/CPU VM supporting NVMe.
SSH access to the VM.
The latest version of the Crusoe CLI installed and authenticated (Installing Crusoe Cli).
Step-by-Step Instructions
Step 1:Verify the Current Mount State
-
Check if /scratch/local is incorrectly pointing to the root disk of the NVMe
~ df /scratch/localIf the output shows
Mounted on /, you are writing to the boot partition and your NVMe is unmounted. Proceed to Step 2.
Step 2: Identify the RAID Device Status
-
Before attempting a recovery, confirm the system recognizes the RAID
~ cat /proc/mdstat | grep nvme ~ sudo blkid /dev/md127 #to identify the UUID and filesystem type. ~ lsblk -f /dev/md127 #to verify the block device is present. -----------------------------Expected Output---------------------------------- md127 : active raid0 nvme1n1[3] nvme0n1[2] nvme2n1[1] nvme3n1[0] NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS md127 xfs <UUID-STRING>Note: The
FSTYPEmust showxfs. If this is blank, the RAID array may not be assembled.
Step 3: Rescue Data using a Temporary Mount
Since the boot disk is full, you cannot move data to a backup folder on that same disk. Instead, we mount the NVMe to a temporary location to offload the ghost data.
-
Create a temporary mount point:
~ sudo mkdir -p /mnt/temp_nvme ~ sudo mount -t xfs /dev/md127 /mnt/temp_nvme -
Move the Ghost Data off the Boot Disk:
~ sudo mv /scratch/local/* /mnt/temp_nvme/
Step 4: Finalize the Correct Mount
Now that the /scratch/local folder on the boot disk is empty, we can link the hardware to its permanent home.
-
Unmount from the temporary path:
~ sudo umount /mnt/temp_nvme -
Mount to the correct path:
~ sudo mount -t xfs /dev/md127 /scratch/local
Step 5: Verify Space Recovery
-
Confirm that /scratch/local now reflects the full capacity and correct device:
~ df -h / ~ df -h /scratch/local ---------------------------------Expected Output------------------------------------ Filesystem Size Used Avail Use% Mounted on /dev/vda1 124G 20G 104G 16% / <-- Confirming Root is clear /dev/md127 14T 100G 13T 1% /scratch/local <-- Confirming NVMe is active