[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PCI passthrough of XHCI on Framework AMD crashes the host



On Wed, Jul 23, 2025 at 12:55:53PM +0000, Tu Dinh wrote:
> On 23/07/2025 14:35, Marek Marczykowski-Górecki wrote:
> > Hi,
> >
> > There is yet another issue affecting Framework AMD... When I start a
> > domU with XHCI controller attached (PCI passthrough), the whole host
> > resets if there was an USB device plugged into it. I don't get any panic
> > message (neither on XHCI console - which is connected to a different
> > XHCI controller, nor on VGA), and the reboot reason register shows
> > 0x08000800 ("an uncorrected error caused a data fabric sync flood
> > event") according to [1].
> >
> > This is Framework AMD with AMD Ryzen 5 7640U.
> >
> > The crash itself happens quite early on domU startup - specifically when
> > SeaBIOS tries to initialize XHCI. I tracked it down to the second
> > readl() in xhci_controller_setup() [2]. Interestingly, it's specifically
> > the second readl(), regardless of which of those comes first. I tried
> > swapping their order, or even repeating read from the same register -
> > always the second call triggers the crash. The first one succeeds and
> > returns some value (for example 0x1200020 for HCCPARAMS).
> >
> > If I start the domU when no USB devices are connected, it doesn't crash.
> >
> > If I manually unbind the device from the dom0 driver (echo 0000:c3:00.4 >
> > /sys/bus/pci/drivers/xhci_hcd/unbind), it doesn't crash. Note I have
> > seize=1 in domU config, so the `xl pci-assignable-add` calls is implicit.
> >
> > If the system doesn't crash (either by not having any USB devices
> > connected initially, or by the manual unbind), the USB controller in
> > domU works fine. I can later connect devices and they appear inside
> > domU.
> >
> > This system has a couple of XHCI controllers, and the same behavior is
> > observed on at least two of them.
> >
> > The controller works just fine when used in dom0.
> >
> > If I passthrough another PCI device instead (tried wifi card and audio
> > card), it doesn't crash.
> >
> > The value read from from HCCPARAMS (BAR + 0x10) differs between good and 
> > bad case:
> > - 0x01200020 when it crashes
> > - 0x0110ffc5 when it works
> >
> > It's weird to have this much differences here, given most bits in this
> > register is about device capabilities[3], not its runtime state...
> >
> > In this system my main debugging tool is the XHCI console. But I tried
> > also without enabling XHCI console, and it still crashes, so it looks
> > like it isn't caused by the XHCI console.
> >
> > I tried also disabling XHCI initialization in SeaBIOS, and then it
> > proceeds to booting domU's kernel. But as soon as Linux gets into
> > initializing that USB controller, it crashes the same way. So, it isn't
> > just SeaBIOS doing something weird (or at least not just that).
> >
> > With PVH dom0, the behavior is a bit different:
> > 1. Initially, the controller works fine in dom0.
> > 2. When starting domU, instead of clean unbind this happens:
> >
> >      [   11.248760] xhci_hcd 0000:c3:00.4: Controller not ready at resume 
> > -19
> >      [   11.248765] xhci_hcd 0000:c3:00.4: PCI post-resume error -19!
> >      [   11.248767] xhci_hcd 0000:c3:00.4: HC died; cleaning up
> >      [   11.249010] xhci_hcd 0000:c3:00.4: remove, state 4
> >      [   11.249013] usb usb8: USB disconnect, device number 1
> >      [   11.249437] xhci_hcd 0000:c3:00.4: USB bus 8 deregistered
> >      [   11.249832] xhci_hcd 0000:c3:00.4: remove, state 4
> >      [   11.249835] usb usb7: USB disconnect, device number 1
> >      [   11.250074] xhci_hcd 0000:c3:00.4: Host halt failed, -19
> >      [   11.250076] xhci_hcd 0000:c3:00.4: Host not accessible, reset 
> > failed.
> >      [   11.250389] xhci_hcd 0000:c3:00.4: USB bus 7 deregistered
> >      [   11.251011] pciback 0000:c3:00.4: xen_pciback: seizing device
> >      [   11.335120] pciback 0000:c3:00.4: xen_pciback: vpci: assign to 
> > virtual slot 0
> >      [   11.335544] pciback 0000:c3:00.4: registering for 1
> >
> > 3. Reading from BAR in domU (in SeaBIOS, and later Linux) returns
> > 0xffffffff.
> > 4. Does not crash the host.
> >
> > Any ideas?
> >
> > I don't have any other system with Zen4 to try on. The hw11 gitlab
> > runner is Ryzen 7 7735HS, and it doesn't have this issue. It's also
> > possible this is something related to Framework's firmware, but give all
> > the observations above, I find it less likely.
> >
> > [1] https://docs.kernel.org/arch/x86/amd-debugging.html#random-reboot-issues
> > [2] https://github.com/coreboot/seabios/blob/master/src/hw/usb-xhci.c#L553
> > [3] 
> > https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/extensible-host-controler-interface-usb-xhci.pdf
> >  (page 385)
> 
> I had a similar problem with a Beelink mini PC with the Ryzen 5800U
> after a recent Qubes upgrade.
> 
> If the USB controller is passed through to sys-usb then the system
> simply resets without warning.

Do you know if that happens also when no USB devices are connected at
that time? There could be more reasons for similar issue, and a common
one I've seen is dom0 kernel panic on unbind operation (which would be a
different issue than the one I have here).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.