[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: PCI passthrough of XHCI on Framework AMD crashes the host
On Wed, Jul 23, 2025 at 12:55:53PM +0000, Tu Dinh wrote: > On 23/07/2025 14:35, Marek Marczykowski-Górecki wrote: > > Hi, > > > > There is yet another issue affecting Framework AMD... When I start a > > domU with XHCI controller attached (PCI passthrough), the whole host > > resets if there was an USB device plugged into it. I don't get any panic > > message (neither on XHCI console - which is connected to a different > > XHCI controller, nor on VGA), and the reboot reason register shows > > 0x08000800 ("an uncorrected error caused a data fabric sync flood > > event") according to [1]. > > > > This is Framework AMD with AMD Ryzen 5 7640U. > > > > The crash itself happens quite early on domU startup - specifically when > > SeaBIOS tries to initialize XHCI. I tracked it down to the second > > readl() in xhci_controller_setup() [2]. Interestingly, it's specifically > > the second readl(), regardless of which of those comes first. I tried > > swapping their order, or even repeating read from the same register - > > always the second call triggers the crash. The first one succeeds and > > returns some value (for example 0x1200020 for HCCPARAMS). > > > > If I start the domU when no USB devices are connected, it doesn't crash. > > > > If I manually unbind the device from the dom0 driver (echo 0000:c3:00.4 > > > /sys/bus/pci/drivers/xhci_hcd/unbind), it doesn't crash. Note I have > > seize=1 in domU config, so the `xl pci-assignable-add` calls is implicit. > > > > If the system doesn't crash (either by not having any USB devices > > connected initially, or by the manual unbind), the USB controller in > > domU works fine. I can later connect devices and they appear inside > > domU. > > > > This system has a couple of XHCI controllers, and the same behavior is > > observed on at least two of them. > > > > The controller works just fine when used in dom0. > > > > If I passthrough another PCI device instead (tried wifi card and audio > > card), it doesn't crash. > > > > The value read from from HCCPARAMS (BAR + 0x10) differs between good and > > bad case: > > - 0x01200020 when it crashes > > - 0x0110ffc5 when it works > > > > It's weird to have this much differences here, given most bits in this > > register is about device capabilities[3], not its runtime state... > > > > In this system my main debugging tool is the XHCI console. But I tried > > also without enabling XHCI console, and it still crashes, so it looks > > like it isn't caused by the XHCI console. > > > > I tried also disabling XHCI initialization in SeaBIOS, and then it > > proceeds to booting domU's kernel. But as soon as Linux gets into > > initializing that USB controller, it crashes the same way. So, it isn't > > just SeaBIOS doing something weird (or at least not just that). > > > > With PVH dom0, the behavior is a bit different: > > 1. Initially, the controller works fine in dom0. > > 2. When starting domU, instead of clean unbind this happens: > > > > [ 11.248760] xhci_hcd 0000:c3:00.4: Controller not ready at resume > > -19 > > [ 11.248765] xhci_hcd 0000:c3:00.4: PCI post-resume error -19! > > [ 11.248767] xhci_hcd 0000:c3:00.4: HC died; cleaning up > > [ 11.249010] xhci_hcd 0000:c3:00.4: remove, state 4 > > [ 11.249013] usb usb8: USB disconnect, device number 1 > > [ 11.249437] xhci_hcd 0000:c3:00.4: USB bus 8 deregistered > > [ 11.249832] xhci_hcd 0000:c3:00.4: remove, state 4 > > [ 11.249835] usb usb7: USB disconnect, device number 1 > > [ 11.250074] xhci_hcd 0000:c3:00.4: Host halt failed, -19 > > [ 11.250076] xhci_hcd 0000:c3:00.4: Host not accessible, reset > > failed. > > [ 11.250389] xhci_hcd 0000:c3:00.4: USB bus 7 deregistered > > [ 11.251011] pciback 0000:c3:00.4: xen_pciback: seizing device > > [ 11.335120] pciback 0000:c3:00.4: xen_pciback: vpci: assign to > > virtual slot 0 > > [ 11.335544] pciback 0000:c3:00.4: registering for 1 > > > > 3. Reading from BAR in domU (in SeaBIOS, and later Linux) returns > > 0xffffffff. > > 4. Does not crash the host. > > > > Any ideas? > > > > I don't have any other system with Zen4 to try on. The hw11 gitlab > > runner is Ryzen 7 7735HS, and it doesn't have this issue. It's also > > possible this is something related to Framework's firmware, but give all > > the observations above, I find it less likely. > > > > [1] https://docs.kernel.org/arch/x86/amd-debugging.html#random-reboot-issues > > [2] https://github.com/coreboot/seabios/blob/master/src/hw/usb-xhci.c#L553 > > [3] > > https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/extensible-host-controler-interface-usb-xhci.pdf > > (page 385) > > I had a similar problem with a Beelink mini PC with the Ryzen 5800U > after a recent Qubes upgrade. > > If the USB controller is passed through to sys-usb then the system > simply resets without warning. Do you know if that happens also when no USB devices are connected at that time? There could be more reasons for similar issue, and a common one I've seen is dom0 kernel panic on unbind operation (which would be a different issue than the one I have here). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |