Skip to main content
Crusoe Support Help Center home page
Crusoe

How-to Fix Missing Local NVMe Mounts and Root Partition Full Errors on Slurm Clusters Running on Crusoe Cloud

Karan Solanki
Karan Solanki
Updated

Last Updated: March 23, 2026

Introduction

On a Slurm clusters and specific HPC deployments, local high-speed NVMe drives are configured as a RAID 0 array and mounted at /scratch/local (typically using the virtual device /dev/md127).

If this hardware fails to mount during the node boot sequence, applications may inadvertently write data to the /scratch/local directory on the OS boot partition instead. This "ghost data" can fill the 124GB boot drive, causing "No space left on device" errors and preventing the OS from successfully mounting the actual NVMe RAID array.

Prerequisites

Before starting, ensure you have the following:

  • A Crusoe VM acting as a node within a Slurm or HPC cluster.

  • Running GPU/CPU VM supporting NVMe.

  • SSH access to the VM.

  • The latest version of the Crusoe CLI installed and authenticated (Installing Crusoe Cli).

Step-by-Step Instructions

Step 1:Verify the Current Mount State

  • Check if /scratch/local is incorrectly pointing to the root disk of the NVMe

    ~ df /scratch/local

    If the output shows Mounted on /, you are writing to the boot partition and your NVMe is unmounted. Proceed to Step 2.

Step 2: Identify the RAID Device Status

  • Before attempting a recovery, confirm the system recognizes the RAID

    ~ cat /proc/mdstat | grep nvme
    ~ sudo blkid /dev/md127  #to identify the UUID and filesystem type.
    ~ lsblk -f /dev/md127    #to verify the block device is present.
    
    
    -----------------------------Expected Output----------------------------------
    
    md127 : active raid0 nvme1n1[3] nvme0n1[2] nvme2n1[1] nvme3n1[0]
    NAME      FSTYPE  FSVER  LABEL  UUID                                  FSAVAIL FSUSE% MOUNTPOINTS
    md127     xfs                   <UUID-STRING>

    Note: The FSTYPE must show xfs. If this is blank, the RAID array may not be assembled.

Step 3: Rescue Data using a Temporary Mount

Since the boot disk is full, you cannot move data to a backup folder on that same disk. Instead, we mount the NVMe to a temporary location to offload the ghost data.

  • Create a temporary mount point:

    ~ sudo mkdir -p /mnt/temp_nvme
    ~ sudo mount -t xfs /dev/md127 /mnt/temp_nvme
  • Move the Ghost Data off the Boot Disk:

    ~ sudo mv /scratch/local/* /mnt/temp_nvme/

Step 4: Finalize the Correct Mount

Now that the /scratch/local folder on the boot disk is empty, we can link the hardware to its permanent home.

  • Unmount from the temporary path:

    ~ sudo umount /mnt/temp_nvme
  • Mount to the correct path:

    ~ sudo mount -t xfs /dev/md127 /scratch/local

Step 5: Verify Space Recovery

  • Confirm that /scratch/local now reflects the full capacity and correct device:

    ~ df -h /
    ~ df -h /scratch/local
    
    ---------------------------------Expected Output------------------------------------
    Filesystem     Size  Used Avail Use% Mounted on
    /dev/vda1      124G   20G  104G  16% /                <-- Confirming Root is clear
    /dev/md127      14T  100G   13T   1% /scratch/local   <-- Confirming NVMe is active

 

Additional Resources

Related to

Was this article helpful?

0 out of 0 found this helpful

Still need help?

Our support team is ready to assist you with any questions.

Have more questions? Submit a request

Recently Viewed

Comments

0 comments

Article is closed for comments.