Introduction
Peer-to-Peer (P2P) communication in multi-GPU systems is crucial for achieving optimal performance in high-performance computing and AI workloads. By testing P2P bandwidth and latency, you can evaluate the efficiency of data transfer between GPUs, ensuring your system is configured correctly for demanding applications. This guide walks you through performing a P2P bandwidth and latency test using NVIDIA's tools.
Prerequisites
Before starting, ensure the following:
- Compatible GPUs: Verify that your system has multiple NVIDIA GPUs supporting P2P communication.
- CUDA Toolkit Installed: Download and install the CUDA Toolkit version compatible with your system from NVIDIA's CUDA Toolkit page.
- Linux Environment: A Linux-based operating system with administrative privileges.
Step By Step Instructions:
Step 1: Install Prerequisites
Install CUDA Toolkit (if not already installed):
Download and install the appropriate version of the CUDA Toolkit for your system from NVIDIA's CUDA Toolkit page.
Install Required Dependencies:
Run the following command to install necessary libraries:
sudo apt-get update
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev libglfw3-dev libgles2-mesa-dev
Step 2: Clone the CUDA Samples Repository
Clone the Repository:
git clone https://github.com/NVIDIA/cuda-samples.git
Navigate to the Repository Directory:
cd cuda-samples
Checkout the Correct Tag for Your CUDA Version:
Replace v11.1
with your actual CUDA version (e.g., v12.5
for CUDA 12.5):
git checkout tags/v11.1
Step 3: Locate and Modify the Makefile
Navigate to the p2pBandwidthLatencyTest Directory:
cd Samples/p2pBandwidthLatencyTest
Open the Makefile in a Text Editor:
nano Makefile
Modify the SMS
Section:
Locate the section that defines SMS
and update it as shown below. Include only supported architectures for L40 GPUs (e.g., 70, 75, 80, and 86):
# Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64))
SMS ?= 70 75 80 86
else
SMS ?= 70 75 80 86
endif
Save and exit (in nano, press Ctrl+X
, then Y
, then Enter
).
Step 4: Build the p2pBandwidthLatencyTest
Build the Test:
make
If everything is set up correctly, this will compile the P2P test application without errors.
Step 5: Run the P2P Bandwidth and Latency Test
Execute the Compiled Binary:
./p2pBandwidthLatencyTest
Expected Output:
The output will display:
- GPU connectivity matrix.
- Bandwidth and latency results for all pairs of GPUs, with and without P2P enabled.
Example Output:
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA L40S, pciBusID: 0, pciDeviceID: 1, pciDomainID:2
Device: 1, NVIDIA L40S, pciBusID: 0, pciDeviceID: 2, pciDomainID:2
...
Device: 9, NVIDIA L40S, pciBusID: 0, pciDeviceID: 5, pciDomainID:3
Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
...
Device=9 CAN Access Peer Device=8
Note:
If a device doesn't have P2P access to another, it falls back to normal memory copy, resulting in lower bandwidth (GB/s) and unstable latency (µs).
Optional: Check GPU Topology
To gain additional insights into your system's GPU topology (e.g., PCIe connections between GPUs), run:
nvidia-smi topo -m
This will show how GPUs are connected in your system, helping you interpret P2P performance results.
Examples
Example 1: Testing P2P Between Two GPUs
Run the binary on a system with two GPUs:
./p2pBandwidthLatencyTest
Expected output:
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
Example 2: Verifying Results Across Multiple GPUs
For a system with 4 GPUs, output may look like:
Device=0 CAN Access Peer Device=1
Device=0 CAN Access Peer Device=2
Device=0 CAN Access Peer Device=3
...
Device=3 CAN Access Peer Device=2
Common Issues and Resolutions
Issue 1: Compilation Errors
Error: nvcc: Command not found
Resolution: Ensure that the CUDA Toolkit is installed, and the nvcc
compiler is added to your PATH. Add the following to your .bashrc
:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Issue 2: GPUs Not Detected
Error: No NVIDIA GPUs found
Resolution: Verify that the NVIDIA driver is installed correctly:
nvidia-smi
Reinstall the driver if necessary. If the issue persists contact support.
Issue 3: P2P Not Enabled
Error: Device=X cannot access Peer Device=Y
Resolution: Ensure that:
- GPUs are connected via a high-bandwidth link (e.g., NVLink).
- The system BIOS has P2P enabled.
Key Notes
- Ensure that all GPUs are visible to your system by running:
nvidia-smi
- If any issues arise during compilation or execution, double-check that your CUDA version matches the repository tag you checked out.
You have now successfully performed a P2P bandwidth and latency test on your multi-GPU system!
Additional Resources
Comments
0 comments
Please sign in to leave a comment.