xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
> On Wed, 24 Sep 2008 16:57:21 +0800
> "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>> By the way, Can we recover error condition by only FLR? Resetting link
>>> from root port is needed on some error, isn't it?
>> Yes, root port link reset is needed for host side. I mean FLR is just for
>> guest specific. what I'm considering is add error handling to pciback, so
>> that when host reset the hierarchy, the pciback's error handler will be
>> invoked and notifiy control panel. But I'm not sure still if there
>> are any mechanism exists for the notification (otherwise, we need
>> xen special mechanism).
> We can use "error_detected" interface between AER driver and pciback
> driver, can't we? Actually, there is no AER driver in
> linux-2.6.18-xen.hg. We have to wait to merge dom0 function into upstream
Yes, we try to use the PCI error recovery mechanism when internal discussion.
To merge AER driver into dom0 is simple since AER driver merged in 2.6.18 also.
Ke Liping did some experimental before and there is no conflict at all. But I'm
not sure if the backport can be accepted by upstream Xen.
What I considered is, for PV domain, the pciback can act as a stub/proxy, pass
the callback from AER to guest side and wait guest's return, like
PCI_ERS_RESULT_NEED_RESET etc. I didn;t find much issue to this method, except
some guard on pciback to make sure no timeout and the feedback is valid. Also
some mechanism needed from pciback to notify pcifront (currently only request
from pcifront to pciback per my understanding).
But for HVM domain, maybe we can't support it unless we have virtual AER
support in virtual HVM platform. Even if we have virtual HVM platform, it is
much complex to translat the physical AER to guest side, and parse guest side's
action to decide how to act on host side. We are still consider this. Do you
have any idea on it?
Also another point is, have you consider how to handle multi-function device
that assigned to multple domain, and one function has error? Or devices under
the same switch assigned to different domain??
> The interface between pciback and xend is xen's special mechanism.
>> Also not sure if the long latency is
>> acceptable for error handling, especially it may finished after
>> reset link.
> I'm not sure too.
Yes, that is one point we need investigate. From the document, the
error_corrected callback can do anything including schedule, but access device,
so seems ok, but we need verify that has no side effect.
>>> I agree with you that implementing full PCI-E future in guest side
>>> will be complex. I don't think VT/TC in guest side is needed. But, AER
>> I remember I saw a doc that Windows has VC/TC support for HD Audio,
>> although not sure how is implemented. Is VC/TC needed for communication
> I do NOT think VT/TC in guest side is needed.
>>> in guest side is required in the long term, because guest OS will be
>>> able to handle AER and recover error condition.
>> Yes, agree that if guest can do AER, it will enahnce reliability and
>> availability. But more elegant design is needed. For example, if
>> guest decide that the AER need root port reset link (switch link
>> reset should be ok unless SR-IOV), what shall host do? If host act
>> according to guest's suggestion, that may not be safe, I suspect.
> I agree with you. Host should NOT act according to guest's
> suggestion. I think host should recover error condition with dom0
> linux's AER driver. AER emulation for guest is needed to make guest survive.
Have you considered implement just a virtual root port in qemu, not the whoel
RC? Not sure if any effort/function difference between these two method.
>> BTW, do you know what will recover action usually be? I didn't find
>> much document on it, and the PCI-E spec didn't give much clue
> Linux's AER driver will help us to understand recover
> action. Following function is the main logic.
I'm not sure if reset slot is sure to always resolve the issue, I have never
meet AER on my platform :(
> Yuji Shimada
> Xen-devel mailing list
Xen-devel mailing list