XID Errors Observed in Dmesg

Last Updated: Oct 21, 2025

Overview

You are observing XID errors within the dmesg logs within a VM and want to determine the remediation steps for the different XID errors. This guide will help you understand what XID errors indicate, how to interpret them, and the recommended actions to resolve or mitigate the issues.

Prerequisites

SSH Access to the VM

XID Errors and Solution

Note: Please refer to this article to understand the repercussions of VM reset/reboot versus STOP/START.

XID Error

Solution

XID 13

- dmesg logs Error Message:

NVRM: Xid (PCI:0003:00:04): 13, pid='<unknown>', name=<unknown>, Graphics SM Warp Exception on (GPC 0, TPC 7, SM 1): Out Of Range Address

STOP and START the VM to see if the issue gets resolved.
Debug the application using cuda-gdb or the Compute Sanitizer memcheck tool.
Run the application with CUDA_DEVICE_WAITS_ON_EXCEPTION=1 and then attach later with cuda-gdb .
Run the application again. If you're still noticing XID 13 errors, generate an NVIDIA bug report.
Reach out to Crusoe Support and provide the bug report.

XID

48

- dmesg logs Error Message:

NVRM: Xid (PCI:0003:00:03): 48, pid='<unknown>', name=<unknown>, An uncorrectable double bit error (DBE) has been detected on GPU in the L2 cache at cache 0, slice 2.

According to NVIDIA's documentation: "This event is logged when the GPU detects that an uncorrectable error occurs on the GPU. This is also reported back to the user application. A GPU reset or node reboot is needed to clear this error."

Perform a GPU reset using the following command: # nvidia-smi -r
If the issue persists, perform a VM reset using the following command: # crusoe compute vms reset <vm-name>
If the issue persists, STOP and START the VM
If the issue persists after following the above steps, generate an NVIDIA bug report
Reach out to Crusoe Support and provide the bug report

XID 79

- dmesg logs Error Message:

NVRM: Xid (PCI:0000:14:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.

Stop and Start the VM to see if the issue gets resolved.
Generate an NVIDIA bug report
Reach out to Crusoe Support and provide the bug report.
If you have spare host availability, proceed to STOP and START the VM.
If you do not have any spare capacity, please proceed to shut down the instance for maintenance and let us know in the support ticket.

XID

95

- dmesg logs Error Message:

NVRM: Xid (PCI:0002:00:04): 95, pid='<unknown>', name=<unknown>, Uncontained: FBHUB. RST: Yes, D-RST: No

Perform a GPU reset using the following command: # nvidia-smi -r
If the issue persists, perform a VM reset using the following command: # crusoe compute vms reset <vm-name>
If the issue persists, STOP and START the VM
If the issue persists after following the above steps, generate an NVIDIA bug report
Reach out to Crusoe Support and provide the bug report

XID 119

- dmesg logs Error Message:

NVRM: Xid (PCI:0003:00:04): 119, pid=2009566, name=nvidia-smi, Timeout after 6s of waiting for RPC response from GPU7 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080014b 0x5).

Generate an NVIDIA bug report
Create the file /etc/modprobe.d/nvidia.conf
Add options nvidia NVreg_EnableGpuFirmware=0 to the file
Update the kernel images update-initramfs -u -k all

Additional Resources

Related to

faq xid solution

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Overview

Prerequisites

XID Errors and Solution

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Recently Viewed

Comments

XID Errors Observed in Dmesg

Overview

Prerequisites

XID Errors and Solution

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Related articles

Recently Viewed

Comments