Skip to main content
Crusoe Support Help Center home page
Crusoe

How-To Diagnose and Fix CMK Node Disk Pressure

Tanaya Atmaram Kambli
Tanaya Atmaram Kambli
Updated

Last Updated: Dec 24, 2025

Introduction

Disk pressure on Kubernetes nodes can cause pod evictions, performance issues, or node taints. In clusters with ephemeral storage, container images and layers may reside on high-performance storage (e.g., NVMe), while logs, emptyDir volumes, and kubelet data remain on the boot disk. Understanding the distinction between the node filesystem (nodefs) and the container runtime filesystem (imagefs), and how Kubernetes monitors both, is crucial for accurate troubleshooting.

This guide walks through identifying and resolving disk pressure issues, including hidden "phantom" disk usage from deleted-but-still-open files.

 

Prerequisites

  • SSH access with sudo privileges to the CMK node experiencing disk pressure.
  • Basic familiarity with Linux commands (df, du, lsof, lsblk).
  • Optional: Monitoring tools to track disk usage over time.

 

Step-by-Step Instructions

1: Verify ephemeral storage mount

  • If ephemeral storage is enabled, container images should be stored on a separate high-performance disk (NVMe), while the boot disk holds logs and kubelet data.

    Command:

df -h /var/lib/containerd

  Expected Output:

  • Shows the container runtime storage path and its usage.
  • NVMe should appear for ephemeral-enabled nodes. Example given below
ubuntu@np-xxxxxxxx-x:~$ df -h /var/lib/containerd
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1     12T  3.0G   11T   1% /mnt/nvme
  • boot disk otherwise. Example given below
ubuntu@np-xxxxxxxx-x:~$ df -h /var/lib/containerd
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       124G  6.7G  118G   6% /

Tip:

  • Distinguish nodefs (boot disk: /var/lib/kubelet, /var/log, emptyDir volumes) from imagefs (/var/lib/containerd).
  • DiskPressure can occur if either filesystem is nearly full.

 

2: Inspect disk usage and identify large files

Check both capacity and inode usage, and identify the largest files consuming space on the nodefs (boot disk).

Commands:

  • Overall disk and inode usage:
df -h      # Overall disk space usage
df -i      # Inode usage
  • Largest files/directories on root (excluding virtual filesystems):
sudo du -xh / --exclude=/proc --exclude=/sys | sort -rh | head -20
  • Disk usage of kubelet and container runtime:
sudo du -sh /var/lib/kubelet
sudo du -sh /var/lib/containerd
sudo du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
  • Top 20 largest files on root only:
sudo find / -xdev -type f -exec du -h {} + 2>/dev/null | sort -rh | head -20
  • Logs and temporary files:
sudo du -sh /var/log
sudo du -sh /tmp

Tips:

  • Pay attention to /var/lib/kubelet and /var/log; runaway logs or large emptyDir volumes often cause nodefs pressure.
  • OverlayFS usage in containerd can grow quickly if multiple images/layers exist.
  • High inode usage may also trigger DiskPressure, even if disk space appears sufficient.

 

3: Detect deleted-but-still-open files

  • Deleted files held open by processes can consume disk space invisibly.
  • Basic Command:
sudo lsof +L1 | grep deleted
  • Enhanced Command (Recommended):
sudo lsof +L1 | grep deleted | sort -k7 -rn | head -n 10
  • Sorts by file size in descending order to quickly find the largest “phantom” files.

Tip:

  • Restart the process holding the file to free space.
  • This is crucial when nodefs usage is high but du does not show large files.

 

4: Monitor nodefs vs imagefs

  • Nodefs: /var/lib/kubelet, /var/log, /tmp, emptyDir volumes.
  • Imagefs: /var/lib/containerd (container images and writable layers).

Observation from cases:

  • In ephemeral-enabled nodes, df -h /var/lib/containerd points to NVMe.
  • Boot disk can still determine evictions because logs, emptyDir volumes, and kubelet state reside there.

Tip:

  • Monitoring both filesystems is necessary for proactive disk pressure management.

 

Common Issues and Resolutions

Issue Cause Resolution
NodeDiskPressure despite ephemeral storage Boot disk (nodefs) usage is high Check /var/lib/kubelet, /var/log, emptyDir; remove large files; monitor log rotation
Phantom disk usage Deleted files held open by processes Use sudo lsof +L1
OverlayFS growth Many container images/layers stored Cleanup unused images; monitor containerd overlay usage

 

Additional Resources

 

Related to

Was this article helpful?

0 out of 0 found this helpful

Still need help?

Our support team is ready to assist you with any questions.

Have more questions? Submit a request

Recently Viewed

Comments

0 comments

Article is closed for comments.