How to troubleshoot broken ssh if you do GPU passthrough?

berylenara@sh.itjust.works · 12 days ago

How to troubleshoot broken ssh if you do GPU passthrough?

MNByChoice@midwest.social · 12 days ago

Serial is still a thing.
Get a cheap video card.
Or a usb to vga adapter.
A server class system with BMC.
Live CD.

There are options.

berylenara@sh.itjust.works · edit-2 12 days ago

Serial is still a thing.

Good to know 👍

Get a cheap video card.

I’d be tempted to just pass it through as well 😅

Live CD.

Doesn’t work if you have encrypted disk (nevermind I was wrong about this)

Or a usb to vga adapter.

A server class system with BMC.

Interesting ideas, I’ll look into them thanks

eldavi@lemmy.ml · 12 days ago

Doesn’t work if you have encrypted disk

this this because you are unable to provide the encryption password?

berylenara@sh.itjust.works · 12 days ago

I was wrong, got confused about how secure boot and disk encryption worked 😅

Max-P@lemmy.max-p.me · 12 days ago

I just have a boot entry that doesn’t do the passthrough, doesn’t bind to vfio-pci and doesn’t start the VMs on boot so I can inspect and troubleshoot.

berylenara@sh.itjust.works · 12 days ago

That sounds brilliant. Have any resources to learn how to do something like this? I’ve never created custom boot entries before

Max-P@lemmy.max-p.me · 12 days ago

I use systemd-boot so it was pretty easy, and it should be similar in GRUB:

title My boot entry that starts the VM
linux /vmlinuz-linux-zen
initrd /amd-ucode.img
initrd /initramfs-linux-zen.img
options quiet splash root=ZSystem/linux/archlinux rw pcie_aspm=off iommu=on systemd.unit=qemu-vms.target

What you want is that part: systemd.unit=qemu-vms.target which tells systemd which target to boot to. I launch my VMs with scripts so I have the qemu-vms.target and it depends on the VMs I want to autostart. A target is a set of services to run for a desired system state, the default usually being graphical or multi-user, but really it can be anything, and use whatever set of services you want: start network, don’t start network, mount drives, don’t mount drives, entirely up to you.

https://man.archlinux.org/man/systemd.target.5.en

You can also see if there’s a predefined rescue target that fits your need and just goes to a local console: https://man.archlinux.org/man/systemd.special.7.en

berylenara@sh.itjust.works · 11 days ago

This looks simple enough, I’ll have a crack at it this weekend. Thank you

InEnduringGrowStrong@sh.itjust.works@sh.itjust.works · 12 days ago

I passthrough a GPU (no iGPU on this mobo).
It only hijacks the GPU when I start the VM, for which I haven’t configured autostart.
Before the VM is started it’s showing the host prompt. It doesn’t return to the prompt if the VM is shutdown or crashed, but a reboot would, hence not autostarting that VM.
If it got borked too much, putting a temporary GPU might be easier.

Also, don’t break your ssh.
Pretty easy with PKI auth.

berylenara@sh.itjust.works · 12 days ago

It only hijacks the GPU when I start the VM

How did you do this? All the tutorials I read hijack the GPU at startup. Do you have to manually detach the GPU from the host before assigning it to the VM?

InEnduringGrowStrong@sh.itjust.works@sh.itjust.works · 12 days ago

Interesting.
I’m not doing anything special that wasn’t in one of the popular tutorials and I thought that’s how it was supposed to work, although it might very well be a “bug” how it behaves right now.

I don’t know enough about this, but the drivers are blacklisted on the host at boot, yet the console is still displayed through the GPU’s HDMI at that time which might depend on the specific GPU (a vega64 in my case).

The host doesn’t have a graphical desktop environment, just the shell.

berylenara@sh.itjust.works · 11 days ago

the drivers are blacklisted on the host at boot

This is the problem I was alluding to, though I’m surprised you are still able to see the console despite the driver being blacklisted. I have heard of people using scripts to manually detach the GPU and attach it to a VM, but sounds like you don’t need that, which is interesting

qjkxbmwvz@startrek.website · 12 days ago

For very simple tasks you can usually blindly log in and run commands. I’ve done this with very simple tasks, e.g., rebooting or bringing up a network interface. It’s maybe not the smartest, but basically, just type root, the root password, and dhclient eth0 or whatever magic you need. No display required, unless you make a typo…

In your specific case, you could have a shell script that stops VMs and disables passthrough, so you just log in and invoke that script. Bonus points if you create a dedicated user with that script set as their shell (or just put in the appropriate dot rc file).

berylenara@sh.itjust.works · 12 days ago

I’ll admit I’ve done this too 😅 Not ideal but a good idea nonetheless