Introduction
This guide outlines how to remediate issues related to I/O failures or SSH inaccessibility on virtual machines (VMs) due to a full root ("/") disk. When the OS disk fills up, operations like executing commands, SSH access, or accessing directories might be impacted. This guide provides users guidance on resolving disk space issues and understand best practices for managing persistent storage.
Prerequisites
- Crusoe CLI
- Serial Console or SSH access
- Permissions to modify or delete files on the root file system
- A mounted additional persistent disk or shared storage (if planning to migrate data)
Step-by-Step Instructions
Step 1: Access the VM
- Access your VM using SSH
- If the VM is not accessible using SSH, use serial console:
crusoe compute vms serial-console --name <VM_NAME> --port-num <PORT>
Step 2: Identify disk usage
- Run the following command to check disk usage:
df -h
If this command hangs or provides no output, proceed to manual inspection using ls -lh /
and du -sh /home/*
.
Step 3: Locate large files or directories
- Use the following to list directories consuming the most space:
du -sh /* 2>/dev/null | sort -h
- Drill down as necessary:
du -sh /var/* 2>/dev/null | sort -h
Step 4: Clean temporary or unused files
- Remove unnecessary temporary files:
sudo rm -rf /tmp/*
- For log files, manually review and remove the oldest compressed logs instead of bulk deletion:
# List log files to review
ls -lht /var/log/*.gz
# Remove specific old compressed logs (example)
sudo rm /var/log/syslog.2.gz /var/log/syslog.3.gz
# Use journalctl to clean up systemd logs
sudo journalctl --vacuum-size=100M
Step 5: Migrate data to another disk or shared storage
- If you have another persistent disk mounted (e.g.,
/mnt/data
), migrate non-critical data from the root disk:
sudo mv /<source-directory> /mnt/data/<destination-folder>/
Note: This method moves data to reclaim OS disk space. Ensure the destination path exists before moving.
Alternatively, you can use a shared storage solution for long-term data storage.
Common Issues & Resolutions
Issue | Resolution |
---|---|
SSH unavailable | Use serial console to gain shell access |
df -h hangs |
Use ls -lh / and du -sh /home/* to manually inspect usage |
/tmp is full |
Remove files via serial console: sudo rm -rf /tmp/*
|
/var/log is bloated |
Manually review and remove oldest compressed logs, use journalctl --vacuum-size=100M
|
Additional Notes
- Crusoe is aware that resizing OS disks is currently not supported and plans to add this feature in a future release.
- It is best practice to use a separate persistent disk for data storage to avoid filling up the OS disk.
- For large-scale workloads, consider using shared storage for user data.
- Always review files before deletion, especially log files that might be needed for troubleshooting.
Comments
0 comments
Please sign in to leave a comment.