Friday, October 29, 2010, 4:27:41 PM, you wrote:
> On Thu, Oct 28, 2010 at 11:54:40PM +0200, Sander Eikelenboom wrote:
>> Thursday, October 28, 2010, 11:12:23 PM, you wrote:
>> > On Thu, Oct 28, 2010 at 12:01:18AM +0200, Sander Eikelenboom wrote:
>> >> Hi Konrad,
>> >> Due to a 2.6.37-merge-window kernel now being able to boot under Xen i
>> >> was able to test my xhci controller under dom0 which i previously
>> >> couldn't.
>> >> The results:
>> >> A) 2.6.37-merge-window kernel baremetal: Videograbbing while doing 20
>> >> iterations of make -j6 of kernel works.
>> >> B) Xen + 2.6.37-merge-window kernel as Dom0: Videograbbing while doing 20
>> >> iterations of make -j6 of kernel works.
>> > Great!
>> Yes so as dom0 it seems actually quite good :-)
>> Hope Stefano's patches get pulled by Linus, because as far as i have seen he
>> hasn't yet ..
>> >> C) Xen + 184.108.40.206 pvops dom0 + 2.6.37-merge-window kernel DomU:
>> >> Videograbbing while doing 20 iterations of make -j6 still freezes the
>> >> machine without a trace after a short while.
>> > Ok, so it points to the 220.127.116.11 doing something funky, which
>> > unfortunatly does not help that much
>> > as the xen patches that are in the 2.6.37-merge-window for IRQ are about
>> > the same
>> > as what 18.104.22.168 has. Thought the 2.6.37-merge-window got tons of bells
>> > and whistles in the
>> > generic Linux kernel code part.
>> Hmm i see it differently:
>> - Baremetal doesn't freeze
>> - xen + dom0 2.6.37 doesn't freeze
>> - xen + dom0 22.214.171.124 can't be tested since xhci being pretty new.
>> - xen + dom0 126.96.36.199 + domU 2.6.37 (and device passed through) does work,
>> but not with high(er) load (in dom0), then it freezes quite fast.
>> - xen + dom0 2.6.37 + domU 2.6.37 (and device passed through) can't be
>> tested yet due to lack of pciback.
>> So it doesn't point to 2.6.32.x as far as i see, i would rather assume xen +
>> dom0 + domU which will all be involved processing the interrupts,
>> get in a lock some how when things don't get processed in time / normal
>> fashion by the high load. I experience that just before the freeze
>> everything slowly seems to grind to a halt.
>> But no errors ... (apart from the occasional RCU cpu stall messages)
>> In all the other cases i don't experience no real reduction in
>> responsiveness while torturing with the kernel compiles.
>> And keeping in mind that it does work in all situations with USB2.
>> So to be honest i feel i just know the same as before, it's all still
>> inconclusive .. accept that the hypervisor + dom0 combination doesn't seem
>> to cause the problem on their own,
>> the xhci controller doesn't seem to cause it on it's own either, since it
>> survives on bare metal and in the xen + dom0 2.6.37 case.
>> Only in combination with pass through to domU it freezes, but that doesn't
>> rule out that xen and dom0 play a role ...
>> The thing that striked me was the higher interrupt rate that could just be
>> over the edge on a system with load.
>> BTW do MSI interrupts go from domU to dom0 to xen and vice versa just like
>> legacy interrupts ?
> They should be funnel only to the appropiate domain. As the MSI interrupts
> cannot be shared (like some of
> the legacy IRQs), Xen would directly notify the domain of an MSI interrupt.
Ok, so that's why you don't see msi interrupts but do see legacy interrupts of
guests in dom0's /proc/interrupts
>> I'm now looking into using the kernel tracer to trace irq's on dom0/u to see
>> if that shows something funky when it freezes.
>> And perhaps to see if i can split the interrupt count to where they
>> originate from. But i'm quite new to all the tracing infrastructure.
>> For what i know xen doesn't have a detector for cpu stalls / lockups does it
> Unfortunatly not. But it does have IRQ storm code, so if there is too many
> interrupts firring it does disable them.
Hmm yeah fiddled with that ... lowered the max interrupt rate .. but it didn't
seem to do much in my impression, but will try to fiddle with it again.
Let's see if that mechanism does or doesn't work :-)
>> Because with my limited knowledge .. i would assume that as hypervisor xen
>> should (be able) to prevent at least it self from becoming locked when other
>> domains including dom0 get locked up some how.
>> And spit out debug info about cpu state etc.
> But in your case, the serial interface for Xen is completly dead, right?
Yep unfortunately .. else it would have been possible to dump at least
something about the state it's in when it freezes.
I think when the hypervisor manages to survive, it would make finding the
underlying reason much easier ..
>> > These interrupts were for the MSI-X for the XHCI controller or was it
>> > legacy interrupts?
>> usb3/xhci is MSI, my usb2 controllers only support legacy.
Xen-devel mailing list