>>> On 16.08.11 at 17:06, Tim Deegan <tim@xxxxxxx> wrote:
> At 08:03 +0100 on 16 Aug (1313481813), Jan Beulich wrote:
>> >>> On 15.08.11 at 11:26, Tim Deegan <tim@xxxxxxx> wrote:
>> > At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
>> >> >>> On 12.08.11 at 16:09, Tim Deegan <tim@xxxxxxx> wrote:
>> >> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> >> >> > This issue is resolved in changeset 23762:537ed3b74b3f of
>> >> >> > xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg.
>> >> >>
>> >> >> Do you really think this helps much? Direct control of the device means
>> >> >> it could also (perhaps on a second vCPU) constantly re-enable the bus
>> >> >> mastering bit.
>> >> >
>> >> > That path goes through qemu/pciback, so at least lets Xen schedule the
>> >> > dom0 tools.
>> >>
>> >> Are you sure? If (as said) the guest uses a second vCPU for doing the
>> >> config space accesses, I can't see how this would save the pCPU the
>> >> fault storm is occurring on.
>> >
>> > Hmmm. Yes, I see what you mean.
>>
>> Actually, a second vCPU may not even be needed: Since the "fault"
>> really is an external interrupt, if that one gets handled on a pCPU other
>> than the one the guest's vCPU is running on, it could execute such a
>> loop even in that case.
>>
>> As to yesterdays softirq-based handling thoughts - perhaps the clearing
>> of the bus master bit on the device should still be done in the actual IRQ
>> handler, while the processing of the fault records could be moved out to
>> a softirq.
>
> Hmmm. I like the idea of using a softirq but in fact by the time we've
> figured out which BDF to silence we've pretty much done handling the
> fault.
Ugly, but yes, indeed.
> Reading the VTd docs it looks like we can just ack the IOMMU fault
> interrupt and it won't send any more until we clear the log, so we can
> leave the whole business to a softirq. Delaying that might cause the
> log to overflow, but that's not necessarily the end of the world.
> Looks like we can do the same on AMD by disabling interrupt generation
> in the main handler and reenabling it in the softirq.
>
> Is there any situation where we rally care terribly about the IOfault
> logs overflowing?
As long as older entries don't get overwritten, I don't think that's
going to be problematic. The more that we basically shut off the
offending device(s).
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|