WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [RFC][patch 0/7] Enable PCIE-AER support for XEN

To: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] [RFC][patch 0/7] Enable PCIE-AER support for XEN
From: "Ke, Liping" <liping.ke@xxxxxxxxx>
Date: Fri, 14 Nov 2008 15:33:29 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Ke, Liping" <liping.ke@xxxxxxxxx>
Delivery-date: Thu, 13 Nov 2008 23:33:53 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AclGK0m2eSDPjWwdSqGbijhYmTR3Fg==
Thread-topic: [RFC][patch 0/7] Enable PCIE-AER support for XEN
Following 7 patches are for PCIE AER (Advanced Error Reporting) support for XEN.
---------------------------------------------------------------------------
Patches 1~4 back port from Linux Kernel which enables kernel support to AER.

Those patches enable DOM0 PCIE error handling capability. When a device sends a 
PCIE error message to the root port, it will trigger an interrupt. The irq 
handler then collect root error status register then schedule a work to process 
the error based on the error type (correctable/non-fatal/fatal).

For correctable errors, clear error status register of the device
For non-fatal error, call the callback functions of the endpoint's driver. For 
bridge, it will broadcast the error to the downstream ports. In dom0, it means 
pciback driver will be called accordingly.
For fatal error, except reseting the pcie link as additional job, it have the 
same process with non-fatal error.
----------------------------------------------------------------------------
Patch 5~7: AER error handler implementation in pciback and pcifront. This the 
main job we have done

As we mentioned above, pciback pci error handler will be scheduled by root port 
AER service. Pciback then ask pcifront help to call end-device driver for 
finally completing the related pci error handling jobs. 

We noticed there might be some race condition between pciback ops (such as pci 
error handling we now work on or other configuration ops) and pci-hotplug. 
Those issues will be solved before sending patch.
---------------------------------------------------------------------------
Test: 
We have tested the patches on IPF Hitachi which could trigger Unsupported 
Request non-fatal AER by read/write a non-existing function on a pci-device 
which support AER. (We need to make sure the end device, and the middle bridge 
and the root port must support AER too)
We also test it on the x86 and make sure it will not break current code path.
---------------------------------------------------------------------------
Below example workflow which might be helpful:
1) Assigned an AER-capable network device to a PV driver domain (No-VTD 
supported on Hitachi). 
2) Installed network device driver in PV guest which support pci error handling.
3) If no device driver installed in PV guest, or the driver does not support 
pci error recovery functions, the guest will be killed directly (the devices 
will be FLRed). For HVM guest, it will be killed obviously.
4) Trigger AER by test driver, an interrupt will be generated and caught by 
root port. 
5) AER service driver below root port in DOM0 will help to do the recovery 
steps in bottom half of the aer interrupt context. 
For each recovery process (error_detected, mmio_enabled, slot_reset, 
error_resume), aer core will cooperate with each below devices which has 
registered pci_error_handlers to finish the process. For details, please see 
the related docs in kernel (attached aer_doc.patch).
6) pciback_error_handler will then be called by AER core for each above four 
processing. Pciback will send the processing notification to pcifront, pcifront 
then try to call the corresponding device driver if device driver has the 
pci_error_handler.. 
If all each recovery process succeeds, this pcie error should have been fixed 
and successfully recovered. Otherwise, impacted guest will be killed.

Thanks& Regards,
Criping

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>