Thursday, October 28, 2010, 11:12:23 PM, you wrote:
> On Thu, Oct 28, 2010 at 12:01:18AM +0200, Sander Eikelenboom wrote:
>> Hi Konrad,
>> Due to a 2.6.37-merge-window kernel now being able to boot under Xen i was
>> able to test my xhci controller under dom0 which i previously couldn't.
>> The results:
>> A) 2.6.37-merge-window kernel baremetal: Videograbbing while doing 20
>> iterations of make -j6 of kernel works.
>> B) Xen + 2.6.37-merge-window kernel as Dom0: Videograbbing while doing 20
>> iterations of make -j6 of kernel works.
Yes so as dom0 it seems actually quite good :-)
Hope Stefano's patches get pulled by Linus, because as far as i have seen he
hasn't yet ..
>> C) Xen + 184.108.40.206 pvops dom0 + 2.6.37-merge-window kernel DomU:
>> Videograbbing while doing 20 iterations of make -j6 still freezes the
>> machine without a trace after a short while.
> Ok, so it points to the 220.127.116.11 doing something funky, which unfortunatly
> does not help that much
> as the xen patches that are in the 2.6.37-merge-window for IRQ are about the
> as what 18.104.22.168 has. Thought the 2.6.37-merge-window got tons of bells and
> whistles in the
> generic Linux kernel code part.
Hmm i see it differently:
- Baremetal doesn't freeze
- xen + dom0 2.6.37 doesn't freeze
- xen + dom0 22.214.171.124 can't be tested since xhci being pretty new.
- xen + dom0 126.96.36.199 + domU 2.6.37 (and device passed through) does work, but
not with high(er) load (in dom0), then it freezes quite fast.
- xen + dom0 2.6.37 + domU 2.6.37 (and device passed through) can't be tested
yet due to lack of pciback.
So it doesn't point to 2.6.32.x as far as i see, i would rather assume xen +
dom0 + domU which will all be involved processing the interrupts,
get in a lock some how when things don't get processed in time / normal fashion
by the high load. I experience that just before the freeze everything slowly
seems to grind to a halt.
But no errors ... (apart from the occasional RCU cpu stall messages)
In all the other cases i don't experience no real reduction in responsiveness
while torturing with the kernel compiles.
And keeping in mind that it does work in all situations with USB2.
So to be honest i feel i just know the same as before, it's all still
inconclusive .. accept that the hypervisor + dom0 combination doesn't seem to
cause the problem on their own,
the xhci controller doesn't seem to cause it on it's own either, since it
survives on bare metal and in the xen + dom0 2.6.37 case.
Only in combination with pass through to domU it freezes, but that doesn't rule
out that xen and dom0 play a role ...
The thing that striked me was the higher interrupt rate that could just be over
the edge on a system with load.
BTW do MSI interrupts go from domU to dom0 to xen and vice versa just like
legacy interrupts ?
I'm now looking into using the kernel tracer to trace irq's on dom0/u to see if
that shows something funky when it freezes.
And perhaps to see if i can split the interrupt count to where they originate
from. But i'm quite new to all the tracing infrastructure.
For what i know xen doesn't have a detector for cpu stalls / lockups does it ?
Because with my limited knowledge .. i would assume that as hypervisor xen
should (be able) to prevent at least it self from becoming locked when other
domains including dom0 get locked up some how.
And spit out debug info about cpu state etc.
> These interrupts were for the MSI-X for the XHCI controller or was it legacy
usb3/xhci is MSI, my usb2 controllers only support legacy.
>> An other interesting thing is the interrupt rate i see in /proc/interrrupts
>> for the xhci controller, i measured for 5 minutes each time.
>> In situation:
>> A) About 3200 Interrupts/second
>> B) About 3200 Interrupts/second
>> C) About 7800 Interrupts/second, what would be 7.8 interrupt per ms which
>> seems to work as long as you don't stress the rest to the limit.
>> Which probably causes some sort of deadlock (some where in the path from
>> device, xen, dom0/pciback , domU/pcifront, xhci driver, application.) when
>> not delivered on time or when it boldly goes on a code path where no one has
>> gone before ..
>> Compared with a measurement of interrupts by a USB2 controller:
>> Around 155 Interrupts/second
>> Probably a silly question without a right answer ... but what interrupt rate
>> would you guess it should be able to take ?
>> And is it logically that when passed through it causes around 2.5 times more
>> interrupts ?
>> Now testing on a Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
>> Xen-devel mailing list
Xen-devel mailing list