Following 7 patches are for PCIE AER (Advanced Error Reporting) support for XEN.
---------------------------------------------------------------------------
Patches 1~4 back port from Linux Kernel which enables DOM0 kernel support to
AER.
Those patches enable DOM0 PCIE error handling capability. When a device sends
a PCIE error message to the root port, it will trigger an interrupt. The
aer-irq
handler collects root error status register and schedule a DPC to deal with
the error based on error type (correctable/non-fatal/fatal).
For correctable errors, it clears error status register of the device
For uncorrectable errors (fatal, non-fatal), it calls the callback functions
of the endpoint's driver. For bridge, it broadcasts the error to the
downstream ports. For dom0, it means pciback driver will be called accordingly.
(Fatal error needs to do some additional job such as reset pcie-link, etc.)
----------------------------------------------------------------------------
Patch 5~7: AER error handler implementation in pciback and pcifront. This the
main job we have done
As mentioned above, pciback pci error handler will be scheduled by root port
AER service.
Pciback then asks pcifront help to call end-device driver's support, completing
related pci error handling.
Please see detailed work flow/policy
---------------------------------------------------------------------------
Below workflow/policy illustration might be helpful:
1) Assign an AER-capable network device to a PV driver domain
2) Installed network device driver in PV guest which support pci error handling.
3) If no device driver installed in PV guest, or the driver does not register
pci error handler, the guest will be killed directly (the devices will be
FLRed).
HVM guest will be directly killed currently
4) Trigger AER by test driver, an interrupt will be generated and caught by
root port.
5) AER service driver below root port in DOM0 will help to do the recovery
steps
For each recovery process (error_detected, mmio_enabled, slot_reset,
error_resume), aer core will cooperate with each below devices which registers
pci_error_handlers. For details, please see the related docs in kernel (patch1
aer_doc.patch).
6) pciback_error_handler will then be called by AER core for each above four
steps. Pciback will send the service request to pcifront for each step. Pcifront
then tries to call the corresponding device driver if device driver has the
pci_error_handler.
If each recovery step succeeds, this pcie error should have been successfully
recovered. Otherwise, impacted guest will be killed and the pcie device will be
FLRed.
---------------------------------------------------------------------------
Test environment:
We have tested the patches on IPF Hitachi which could trigger Unsupported
Request
non-fatal AER by read/write a non-existing function on a pci-device which
supports AER. (We need to make sure the whole path: end device, bridges and the
root port must support AER too)
We also test it on the x86 and make sure it will not break current code path.
---------------------------------------------------------------------------
Any question, just let me know.
Thanks a lot for your help!
Regards,
Criping
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|