Following 7 patches are for PCIE AER (Advanced Error Reporting) support for XEN.
---------------------------------------------------------------------------
Patches 1~4 back port from Linux Kernel which enables kernel support to AER.
Those patches enable DOM0 PCIE error handling capability. When a device sends a
PCIE error message to the root port, it will trigger an interrupt. The irq
handler then collect root error status register then schedule a work to process
the error based on the error type (correctable/non-fatal/fatal).
For correctable errors, clear error status register of the device
For non-fatal error, call the callback functions of the endpoint's driver. For
bridge, it will broadcast the error to the downstream ports. In dom0, it means
pciback driver will be called accordingly.
For fatal error, except reseting the pcie link as additional job, it have the
same process with non-fatal error.
----------------------------------------------------------------------------
Patch 5~7: AER error handler implementation in pciback and pcifront. This the
main job we have done
As we mentioned above, pciback pci error handler will be scheduled by root port
AER service. Pciback then ask pcifront help to call end-device driver for
finally completing the related pci error handling jobs.
We noticed there might be some race condition between pciback ops (such as pci
error handling we now work on or other configuration ops) and pci-hotplug.
Those issues will be solved before sending patch.
---------------------------------------------------------------------------
Test:
We have tested the patches on IPF Hitachi which could trigger Unsupported
Request non-fatal AER by read/write a non-existing function on a pci-device
which support AER. (We need to make sure the end device, and the middle bridge
and the root port must support AER too)
We also test it on the x86 and make sure it will not break current code path.
---------------------------------------------------------------------------
Below example workflow which might be helpful:
1) Assigned an AER-capable network device to a PV driver domain (No-VTD
supported on Hitachi).
2) Installed network device driver in PV guest which support pci error handling.
3) If no device driver installed in PV guest, or the driver does not support
pci error recovery functions, the guest will be killed directly (the devices
will be FLRed). For HVM guest, it will be killed obviously.
4) Trigger AER by test driver, an interrupt will be generated and caught by
root port.
5) AER service driver below root port in DOM0 will help to do the recovery
steps in bottom half of the aer interrupt context.
For each recovery process (error_detected, mmio_enabled, slot_reset,
error_resume), aer core will cooperate with each below devices which has
registered pci_error_handlers to finish the process. For details, please see
the related docs in kernel (attached aer_doc.patch).
6) pciback_error_handler will then be called by AER core for each above four
processing. Pciback will send the processing notification to pcifront, pcifront
then try to call the corresponding device driver if device driver has the
pci_error_handler..
If all each recovery process succeeds, this pcie error should have been fixed
and successfully recovered. Otherwise, impacted guest will be killed.
Thanks& Regards,
Criping
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|