How-To Perform a P2P Bandwidth and Latency Test on Multi-GPU Systems

Last Updated: Nov 6, 2025

Introduction

Peer-to-Peer (P2P) communication in multi-GPU systems is crucial for achieving optimal performance in high-performance computing and AI workloads. This guide walks you through performing a P2P bandwidth and latency test using NVIDIA's CUDA samples.

Prerequisites

Before starting, ensure the following:

Compatible GPUs: Multiple NVIDIA GPUs supporting P2P communication (e.g., L40S)
Linux Environment: Ubuntu or similar Linux distribution with administrative privileges
NVIDIA Driver: Installed and working (verify with nvidia-smi)

Step-by-Step Instructions

1. Verify NVIDIA Driver and CUDA

Check that your GPUs are detected:
```
nvidia-smi 
```
Check if CUDA is already installed:
```
nvcc --version 
```
Note: If nvcc is not found, you need to install CUDA Toolkit (see Step 2). If it shows a version, you can skip to Step 3.

2. Install CUDA Toolkit

Download and install the CUDA keyring:

# For Ubuntu 22.04 (replace ubuntu2204 with ubuntu2004 for Ubuntu 20.04)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb 
sudo dpkg -i cuda-keyring_1.1-1_all.deb

Update and install CUDA Toolkit:

sudo apt-get update 

# Check available CUDA versions 
apt-cache search cuda-toolkit | grep cuda-toolkit-12 

# Install an available version (e.g., 12-6, 12-5, or 12-4) 
sudo apt-get -y install cuda-toolkit-12-6

Note: If the specific version is not available, try cuda-toolkit-12-5, cuda-toolkit-12-4, or simply cuda-toolkit for the latest.

Add CUDA to your PATH:

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Verify CUDA installation:
```
nvcc --version 
```

3. Clone CUDA Samples Repository

Clone using the below command:

cd ~ git clone https://github.com/NVIDIA/cuda-samples.git cd cuda-samples

4. Checkout the Appropriate Version

Check available tags:
```
git tag | grep v12
```
Checkout v12.5 (recommended for stability):
```
git checkout tags/v12.5
```
Note: v12.5 is the last stable version with the P2P test in a standard location. Newer versions (v12.8, v12.9) have reorganized the repository structure.

5. Navigate to P2P Test Directory

Perform the below:

cd Samples/5_Domain_Specific/p2pBandwidthLatencyTest 
ls

You should see files including Makefile and p2pBandwidthLatencyTest.cu
Important: The path changed in v12.5. The test is now in 5_Domain_Specific instead of directly under Samples

6. Modify the Makefile

Open the Makefile:
```
nano Makefile
```
Find the section that defines SMS (search for "Gencode arguments" or "SMS"). Update it based on your GPU:
For L40S GPUs (compute capability 8.9):
```
SMS ?= 89
```
For L40 GPUs (compute capability 8.6):
```
SMS ?= 86
```
For broader compatibility across multiple GPU types:
```
SMS ?= 70 75 80 86 89
```
Save and exit (Ctrl+X, then Y, then Enter).

7. Build the Test

Perform the below command:
```
make
```
If successful, you'll see compilation messages and the binary p2pBandwidthLatencyTest will be created.

8. Run the P2P Test

Perform the below command:
```
./p2pBandwidthLatencyTest
```

Expected Output

The test will display the below:

GPU Detection: List of all GPUs in the system with PCI information
P2P Access Matrix: Shows which GPUs can access each other via P2P
Bandwidth Results: Transfer speeds between GPU pairs (with and without P2P)
Latency Results: Communication latency between GPU pairs

Example Output

[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]

Device: 0, NVIDIA L40S, pciBusID: 0, pciDeviceID: 1, pciDomainID:2
Device: 1, NVIDIA L40S, pciBusID: 0, pciDeviceID: 2, pciDomainID:2
Device: 2, NVIDIA L40S, pciBusID: 0, pciDeviceID: 3, pciDomainID:2
Device: 3, NVIDIA L40S, pciBusID: 0, pciDeviceID: 4, pciDomainID:2

Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
Device=1 CAN Access Peer Device=0
Device=1 CAN Access Peer Device=2
...

P2P Connectivity Matrix
     D\D     0     1     2     3
     0	     1     1     1     1
     1	     1     1     1     1
     2	     1     1     1     1
     3	     1     1     1     1

Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3
     0 523.45  50.23  50.23  25.12
     1  50.23 523.45  50.23  25.12
     2  50.23  50.23 523.45  25.12
     3  25.12  25.12  25.12 523.45

Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3
     0 523.45   8.23   8.23   8.12
     1   8.23 523.45   8.23   8.12
     2   8.23   8.23 523.45   8.12
     3   8.12   8.12   8.12 523.45

P2P=Enabled Latency Matrix (us)
   GPU     0      1      2      3
     0   2.15   3.42   3.45   5.23
     1   3.41   2.16   3.44   5.22
     2   3.43   3.43   2.17   5.25
     3   5.21   5.24   5.26   2.18

Optional: Check GPU Topology

To understand how your GPUs are physically connected:
```
nvidia-smi topo -m
```
This shows PCIe/NVLink connections between GPUs, helping interpret P2P results.
Legend:
NV#: NVLink connection (fastest, 300+ GB/s bidirectional)
SYS: Connection through PCIe root complex (slower)
NODE: NUMA node connection
PIX: PCIe connection with intermediate switches

Troubleshooting

Issue 1: "nvcc: Command not found"

Solution: Ensure CUDA is in your PATH

export PATH=/usr/local/cuda/bin:$PATH 
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH 
source ~/.bashrc 
nvcc --version

Issue 2: "No such file or directory" for p2pBandwidthLatencyTest

Problem: Wrong directory or wrong git version.

Solution: Listed below

# Find where the test is located
find ~/cuda-samples -name "*p2p*" -type d

# For v12.5, use:
cd ~/cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest

Issue 3: "No targets" when running make

Problem: You're in the wrong directory.

Solution: Navigate to the correct directory:

cd ~/cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest 
ls # Should see Makefile 
make

Additional Resources

Related to

how-to nvidia p2p latency

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Introduction

Prerequisites

Step-by-Step Instructions

Expected Output

Example Output

Troubleshooting

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Recently Viewed

Comments

How-To Perform a P2P Bandwidth and Latency Test on Multi-GPU Systems

Introduction

Prerequisites

Step-by-Step Instructions

Expected Output

Example Output

Troubleshooting

Additional Resources

Related to

Was this article helpful?

Still need help?

Related Articles

Related articles

Recently Viewed

Comments