Ok the freezing on a kernel compile with "make -j6" was a cpu0 stall, so it's
locked, by that amount that i can't use ctrl-a to get in the hypervisor.
Removing the noirqbalance makes it possible to compile the kernel in dom0 while
videograbbing in domU.
I can start a fish shop with all my red herrings :(
Saturday, October 2, 2010, 1:33:36 AM, you wrote:
> Hmmm i can get it to freeze with or without the mem=4G now.
> Letting the domU grab video, and let dom0 compile a kernel with make -j6 lets
> the machine freeze after a very short while ..
> With all the debug things the machine seems a bit slow any how for a six
> core, but it seems to choke on the interrupts generated by the xhci
> controller.
> With the host controller now using 32bit instead of 64bit DMA it now shows
> with or without the mem=4G some warnings before freezing:
> Oct 2 00:23:07 security kernel: [ 524.020717] xhci_hcd 0000:07:00.0:
> Spurious interrupt.
> Oct 2 00:23:10 security kernel: [ 526.926654] xhci_hcd 0000:07:00.0:
> Spurious interrupt.
> Oct 2 00:23:11 security kernel: [ 527.714567] xhci_hcd 0000:07:00.0:
> Spurious interrupt.
> Oct 2 00:23:42 security kernel: [ 558.402659] xhci_hcd 0000:07:00.0:
> Spurious interrupt.
> Oct 2 00:25:00 security kernel: [ 636.278406] xhci_hcd 0000:07:00.0:
> Spurious interrupt.
> When i do the kernel compile with the domU started, but not grabbing video,
> the kernel compile completes without a problem.
> With the domU running cpuburn, it does complete without a problem.
> I do have the feeling the videograbbing does cause a lot of interrupts ..
> (this is still booting xen with noirqbalance and dom0 and domU with
> pci=nomsi).
> So the 4G is then probably a red herring ...
> --
> Sander
> Friday, October 1, 2010, 10:54:17 PM, you wrote:
>> On Thu, Sep 30, 2010 at 09:24:48PM +0200, Sander Eikelenboom wrote:
>>> Hello Konrad,
>>>
>>> I have done some more tests, the results:
>>>
>>> - boot xen with mem=4G, > 2 days uptime with passthrough and videograbbing
>>> - boot xen without mem=4G, < 1 day freeze with passthrough and videograbbing
>>> - on both no problems as long as you don't grab video (so the controller
>>> doesn't do much)
>>> - on both no problems when grabbing video with usb2, so it's xhci specific
>>>
>>> I haven't changed anything else, same number of VM's running etc. etc.,
>>> videograbbing is working on both (until the freeze or until i ended the
>>> test)
>>> I'm reading some messages about msi(-x) interrupt problems with xen on
>>> xen-devel, and suggestions to try noirqbalance with xen, so on both i use
>>> noirqbalance.
>>>
>>> So it seems to be related to the amount of mem available.
>>> I do see one difference on the domU, with mem=4G i see some occasional
>>> warnings in syslog:
>>> Sep 28 17:55:02 security kernel: [81744.078288] xhci_hcd 0000:07:00.0:
>>> WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.092653] xhci_hcd 0000:07:00.0:
>>> WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0:
>>> WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0:
>>> WARN: transfer error on endpoint
>>> Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0:
>>> WARN: transfer error on endpoint
>>>
>>> I don't see these warnings in the syslog when no mem=4G is used, so a hunch
>>> would be it goes wrong there while the xhci code tries to clean something
>>> up.
>>> It could do something "strange" that seems to work on bare metal and on xen
>>> with mem=4G, but freezes everything with mem > 4G and gives no time to
>>> write the warning to the syslog / disk in time.
>>>
>>> in the syslog of dom0 i do see some occasional memleaks going by, but one
>>> set could be related:
>>> Sep 28 17:55:19 localhost kernel: [81962.053321] kmemleak: 22 new suspected
>>> memory leaks (see /sys/kernel/debug/kmemleak)
>>>
>>> I will add a script that cat's the content of /sys/kernel/debug/kmemleak to
>>> syslog when kmemleak reports new suspected leaks.
>>>
>>> Any suggestions to try to debug this further ?
>> <shakes his head>
>> Do you have the name of the grabber + USB3 device? If it is not too much I
>> might
>> as well get it and see what happens on my boxes.
--
Best regards,
Sander mailto:linux@xxxxxxxxxxxxxx
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|