Skip to content

A set of Proxmox VE scripts that aids with suspend/resume and cpu pinning

Notifications You must be signed in to change notification settings

mattiarossi/pve-helpers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Proxmox VE Helpers

This repository is a set of scripts to better handle some of the Proxmox functions:

  • allow setting and usage of cset
  • allow to use CPU pinning, with mapping to single cores, and mapping for the emulator process
  • allow to set kvm threads'processes scheduler
  • allow to set affinity mask for vfio devices
  • allow to set cpu governor

Why to do CPU pinning?

  • Usually, it is not needed as long as you don't use SMT, and you don't need low latency on your VMs (Games, audio interfaces)
  • If you use SMT, each vCPU is not equal, CPU pinning allows to ensure that VMs receive a real threads

Installation

  • Clone and compile the repository:
  • Set up cset shield at boot
# crontab -e, add the following:
# m h  dom mon dow   command
@reboot /usr/local/bin/shield-set.sh

Usage

1. Enable snippet

You need to configure each machine to enable the hookscript.

The snippet by default is installed in /var/lib/vz that for Proxmox is present as local.

qm set 204 --hookscript=local:snippets/exec-cmds

2. Configure VM

Edit VM description and add a new line for each of these configuration options.

2.1. cpu_taskset

This configuration option will assign a core to each of the VM virtual CPUS, which cores will be better will depend on your CPU phisycal layout The configured cpus must be in the shielded part of the system (in my case, cores 0-20 and 24-44 as I am reserving 6 cores of an EPYC 48 core CPU for the OS and main KVM host threads)

#cpu_taskset 6,7,8,9,10,11,12,13,14,15,16,17

If you have two VMs concurrently running, you can assign it on one thread, second on another thread, like this:

VM 1:
#cpu_taskset 6,7,8,9,10,11,12,13,14,15,16,17

VM 2:
#cpu_taskset 0,1,2,3,4,5,18,19

2.2. assign_interrupts

assign_interrupts [cpu cores] [interrupt name] [interrupt name...]

This setting aims to simplify the process of assigning interrupts to the correct cpu cores in order to get the best performance while doing a gpu/usb controller/audio controller passthrough. The goal is to have the same cores assigned to the VM using cpu_taskset, be responsible for the interrupts generated by the devices that are fully passed through to the VM. This is very important for achieving the lowest possible latency and eliminating random latency spikes inside the VM. Ideally, you would also use something like irqbalance to move all other interrupts away from the VM assigned CPU cores and onto your other hypervisor-reserved cores. Same CPU mask can be used with irqbalance to have the VM cpu cores banned from getting any other interrupts.

Usage example:

cat /etc/pve/qemu-server/110.conf
##CPU pinning
#cpu_taskset 4,12,5,13,6,14,7,15,2,10,3,11 
##Assigning vfio interrupts to VM cores
#assign_interrupts 4,12,5,13,6,14,7,15,2,10,3,11 vfio
##Assigning vfio interrupts to VM cores (a specific vfio device) preferred
#assign_interrupts 4,12,5,13,6,14,7,15,2,10,3,11 0000:01:00
...

In this particular use case, all interrupts with vfio in their name are assigned to cores 4,12,5,13,6,14,7,15,2,10,3,11, which in term correspond to cores 2-7 and their SMT equivalents 10-15. In other words, cores 2,3,4,5,6,7 from an 8 core 3700x are assigned to the VM and to all of the interrupts from the GPU, the USB onboard controller, and the onboard audio controller.

2.3. cpu_governor

This configuration option will assign a specific governor to to each of the VM virtual CPUS and revert to the default schedutil once the VM is shut down

#cpu_governor schedutil

2.3. cpu_emulatorpin

This configuration option will assign a specific cpu Thread to the KVM emulator process

#cpu_emulatorpin 24

2.5. qm_conflict and qm_depends

Sometimes some VMs are conflicting with each other due to dependency on the same resources, like disks, or VGA.

There are helper commands to shutdown (the qm_conflict) or start (the qm_depends) when main machine is being started.

cat /etc/pve/qemu-server/204.conf

# qm_conflict 204
# qm_depends 207
...

This first qm_conflict will shuttdown VM with VMID 204 before starting the current one, and it will also start VMID 207, that might be a sibiling VM.

I use the qm_conflict or qm_depends to run Linux VM sometimes with VGA passthrough, sometimes as a sibiling VM without graphics cards passed, but running in a console mode.

Be careful if you use pci_unbind and pci_rebind, they should be after the qm_* commands.

3. Legacy features

These are features that are no really longer needed to achieve a good latency in a VM.

3.1. cpu_chrt (not needed anymore)

Running virtualized environment always results in quite random latency due to amount of other work being done. This is also, because Linux hypervisor does balance all threads that has bad effects on DPC and ISR execution times. Latency in Windows VM can be measured with https://www.resplendence.com/latencymon. Ideally, we want to have the latency of < 300us.

To improve the latency you can switch to the usage of FIFO scheduler. This has a catastrophic effects to everything else that is not your VM, but this is likely acceptable for Gaming / daily use of passthrough VMs.

Configure VM description with:

cpu_chrt fifo 1

Note: It seems that if Hyper-V entitlements (they are enabled for ostype: win10) are enabled this is no longer needed. I now have amazing performance without using cpu_chrt.

5. My setup

Here's a quick rundown of my environment that I currently use with above quirks.

5.1. Hardware

  • AMD EPYC 7402P 24-Core Processor
  • 128GB DDR4 ECC
  • Intel iGPU used by Proxmox VE
  • AMD RX5700 8GB used by Windows VM
  • Sapphire GPRO e9260 8 GB used by OSX VM
  • 2x Fresco Logic FL1100 USB 3.0 Host Controller
  • Audio is being output by both VMs to a Zoom Livetrak L-12 and a Zoom U-24 through the USB controller directly passed through them
  • Each VM has it's own dedicated USB controller
  • Each VM has a dedicated amount of memory using 1G hugepages
  • Each VM does not use SMT, rather it is assigned a dedicated CCX core (or two) of the EPYC processor

5.2. Kernel config

GRUB_CMDLINE_LINUX_DEFAULT="video=vesafb:off video=efifb:off amd_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=64 spec_store_bypass_disable=off nopti mitigations=off rd.driver.pre=vfio_pci apparmor=1 security=apparmor udev.log_priority=3 pcie_aspm=off nohz_full=0,20,24-44 rcu_nocbs=0,20,24-44"

5.3. OSX VM

I use OSX for regular daily development work.

My Proxmox VE config looks like this:

## CPU PIN
#cpu_taskset 0-5
#assign_interrupts 0-5 0000:02:00 0000:04:00
#
## Conflict (207 shares disks, 208 shares VGA)
#qm_conflict 207
#qm_conflict 208
agent: 1
args: -audiodev id=alsa,driver=alsa,out.period-length=100000,out.frequency=48000,out.channels=2,out.try-poll=off,out.dev=swapped -soundhw hda
balloon: 0
bios: ovmf
boot: dcn
bootdisk: scsi0
cores: 5
cpu: host
hookscript: local:snippets/exec-cmds
hostpci0: 02:00,romfile=215895.rom,x-vga=1
hostpci1: 04:00
hugepages: 1024
ide2: none,media=cdrom
memory: 32768
name: ubuntu19-vga
net0: virtio=32:13:40:C7:31:4C,bridge=vmbr0
numa: 1
onboot: 1
ostype: l26
scsi0: nvme-thin:vm-206-disk-1,discard=on,iothread=1,size=200G,ssd=1
scsi1: ssd:vm-206-disk-0,discard=on,iothread=1,size=100G,ssd=1
scsi10: ssd:vm-206-disk-1,iothread=1,replicate=0,size=32G,ssd=1
scsihw: virtio-scsi-pci
serial0: socket
sockets: 1
usb0: host=1050:0406
vga: none

5.4. Windows VM

I use Windows for Gaming. It has dedicated RTX 2080 Super.

## CPU PIN
#cpu_taskset 6-11
#assign_interrupts 6-11 0000:01:00
agent: 1
args: -audiodev id=alsa,driver=alsa,out.period-length=100000,out.frequency=48000,out.channels=2,out.try-poll=off,out.dev=swapped -soundhw hda
balloon: 0
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 5
cpu: host
cpuunits: 10000
efidisk0: nvme-thin:vm-204-disk-1,size=4M
hookscript: local:snippets/exec-cmds
hostpci0: 01:00,pcie=1,x-vga=1,romfile=Gigabyte.RTX2080Super.8192.190820.rom
hugepages: 1024
ide2: none,media=cdrom
machine: pc-q35-3.1
memory: 10240
name: win10-vga
net0: e1000=3E:41:0E:4D:3D:14,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
runningmachine: pc-q35-3.1
scsi0: ssd:vm-204-disk-2,discard=on,iothread=1,size=64G,ssd=1
scsi1: ssd:vm-204-disk-0,backup=0,discard=on,iothread=1,replicate=0,size=921604M
scsi3: nvme-thin:vm-204-disk-0,backup=0,discard=on,iothread=1,replicate=0,size=100G
scsihw: virtio-scsi-pci
sockets: 1
vga: none

5.5. Switching between VMs

To switch between VMs:

  1. Both VMs always run concurrently.
  2. I do change the monitor input.
  3. Audio is by default being output by both VMs, no need to switch it.
  4. I use Barrier (previously Synergy) for most of time.
  5. In other cases I have Logitech multi-device keyboard and mouse, so I switch it on keyboard.
  6. I also have a physical switch that I use to change lighting and monitor inputs.
  7. I have the monitor with PBP and PIP, so I can watch how Windows is updating while doing development work on Linux.

Author, License

Kamil Trzciński, 2019-2021, MIT

About

A set of Proxmox VE scripts that aids with suspend/resume and cpu pinning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%