What is Nested Virtualization?
Nested Virtualization permits customers to run their own hypervisor within one of Crusoe’s VMs, then boot up a VM utilizing that hypervisor. This nested VM has the same functionality as a normal Crusoe VM, and is granted access to all external devices via VFIO passthrough.
Cloud Hypervisor utilizes the virtio iommu spec to attach/probe devices and map/unmap memory between the L2 VMs and the L0 host. Virtio iommu acts as a paravirtualized device which accepts map requests from the virtualized kernel sitting on top of the hypervisor (the L1 VM in this case) and issues a map request to the bare-metal device. The technical details of the virtio iommu implementation relevant to the rest of the system are not relevant to understanding nested virtualization. The main concept nested virtualization users need to understand though is that virtio iommu needs to be enabled when booting the VM.
Constraints
Supported Hypervisors - Cloud Hypervisor is supported as L2 hypervisors. HyperV and QEMU are not supported.
Full Passthrough - MIG and Sriov are not supported in the L2 VM. Currently there is no support for Sriov for nested VMs in Cloud Hypervisor. Additionally, users must make sure the vfio driver is bound to any devices they are attempting to pass through to their L2 VM.
Enable on Boot - Nested virtualization needs to be explicitly enabled on VM boot
Internet Connectivity - It is up to the user to provide internet connectivity to their nested VMs (see section further below).
Hugepages - For performance reasons it is recommended that customers enable Hugepages when booting their Nested VMs (see section further below)
VM Configuration
To enable nested virtualization in the Crusoe Cloud Environment, the nested virtualization flag
must be enabled when creating the VM over the CLI:
crusoe compute vms create --type l40s-48gb.10x ...
--image ubuntu22.04:latest
--virtualization-features nested_virtualization=true
Currently, this is only supported via the CLI v0.27.0 and beyond. You will also need a feature flag enabled to access this feature, which you can request by contacting support@crusoecloud.com.
L1 VM Configuration
You will need to do a couple things to set up your L1 VM to launch an L2 VM.
Hugepages
It is recommended that you enable hugepages when booting your L2 VM; failing to do so may
cause very slow L2 VM boot times.
To enable hugepages:
1. Create the L1 VM from the CLI command above.
2. Modify the default grub command line to enable 1G hugepages.
3. To find the vendor IDs, run the command lspci -nn | grep -i nvidia
cat /etc/default/grub
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.driver.blacklist=nouveau ipv6.disable=1 iommu=force intel_iommu=on pcie_acs_override=downstream,multifunction vfio-pci.ids=<vendor_id>:<device_id>,<vendor_id>:<device_id>,... default_hugepagesz=1073741824B hugepagesz=1073741824B hugepages=960 video=efifb:off systemd.unified_cgroup_hierarchy=1 nvme_core.multipath=Y trace_event=iommu"
So an updated GRUB_CMDLINE_LINUX
would be similar to below:
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.driver.blacklist=nouveau ipv6.disable=1 iommu=force intel_iommu=on pcie_acs_override=downstream,multifunction vfio-pci.ids=10de:26b9 default_hugepagesz=1073741824B hugepagesz=1073741824B hugepages=960 video=efifb:off systemd.unified_cgroup_hierarchy=1 nvme_core.multipath=Y trace_event=iommu"
3. Run update-grub
sudo update-grub
4. Reboot the L1 VM
sudo reboot now
5. Update your hugepages system settings
6. Verify hugepages have been enabled at grep Huge /proc/meminfo
HugePages_Total: 960
HugePages_Free: 960
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 268435456 kB
VFIO Passthrough
When the nested virtualization feature is enabled, all PCI devices on the machine are placed behind a virtio-iommu. This configuration is suitable for running kvm guests with vfio passthrough devices.
Limitations
The virtio-iommu is only supported on Linux kernel 5.14+, which is included in our ubuntu22.04 and similar family of images. It is not supported on ubuntu20.04.
Interrupt remapping
The virtio-iommu does not support interrupt remapping. In order to use vfio passthrough devices in a kvm guest, the following driver configuration must be set. You must be the root user running the below command. Just running the command as sudo
will not work.
echo 1 | sudo tee /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
Verify allow_unsafe_interrupts
is Y
cat /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
Y
Device passthrough
The default device driver is bound to each device on L1 VM boot (ex: the nvidia driver is bound to all L40s GPUs). In order to pass a device to the L2 VM, the default device driver must be unbound and the VFIO driver must be bound instead. This can be found by running lspci
.
From the output of lspci -nn
, you want to note down the device driver specifically from NVIDIA (e.g. 0002:00:01.0
) as well as the Vendor ID (e.g. 10de 26b9
).
Once you've noted down each device driver, you can run the script below.
BDF=0002:00:01.0
VENDOR=10de
DEVICE=26b9
echo $BDF | sudo tee /sys/bus/pci/devices/$BDF/driver/unbind
echo "$VENDOR $DEVICE" | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
echo $BDF | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
Note: The number of devices will vary depending on the instance SKU you are using.
Confirm the kernel driver is in use with vfio-pci
lspci -k -s $BDF
L2 Hypervisor Configuration
When booting your L2 VM using your L2 Hypervisor, you should enable hugepages on VM boot. This is different from allocating huge pages within the L2 VM itself, and ensures that the VM booted using said hypervisor will be backed with hugepages.
1. Download Cloud Hypervisor and build it
sudo apt install -y git build-essential libseccomp-dev pkg-config libclang-dev libssl-dev libzstd-dev rustc cargo
git clone https://github.com/cloud-hypervisor/cloud-hypervisor.git
cd cloud-hypervisor
cargo build --release
sudo cp target/release/cloud-hypervisor /usr/local/bin/
Note: The command
sudo apt install -y git build-essential libseccomp-dev pkg-config libclang-dev libssl-dev libzstd-dev rustc cargo
Will ask you to restart services, press OK to proceed.
If cargo build --release
fails, this may be due to the version of cargo bundled with Ubuntu22.04.
Run the following commands:
curl https://sh.rustup.rs -sSf | sh # press 1 when prompted
source $HOME/.cargo/env
rustup update stable # upgrade to the latest stable Rust
cd ~/cloud-hypervisor
cargo build --release
2. Download a minimal Linux Kernel
sudo mkdir -p /var/lib/cloud-hypervisor
wget https://github.com/cloud-hypervisor/linux/releases/download/ch-release-v6.2-20240908/vmlinux -O /var/lib/cloud-hypervisor/vmlinux
3. Download and Convert the Ubuntu Cloud Image
wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img -O ubuntu.raw
qemu-img convert -O raw ubuntu.raw ubuntu.raw.img
4. Create a Root Partition
sudo kpartx -av ubuntu.raw.img
5. Boot an L2 VM with GPU Passthrough
cloud-hypervisor \
--kernel /var/lib/cloud-hypervisor/vmlinux \
--cpus boot=2 \
--memory size=4G \
--disk "path=/home/ubuntu/ubuntu.raw.img" \
--device path=/sys/bus/pci/devices/0002:00:01.0,pci_segment=0 \
--serial tty \
--cmdline "console=ttyS0 root=/dev/vda1 rw" \
--log-file log.txt
Next Steps
1. Add cloud-init ISO for SSH key injection
Comments
0 comments
Please sign in to leave a comment.