Tuesday, October 12, 2010, 6:44:33 PM, you wrote:
> On Tue, Oct 12, 2010 at 06:28:13PM +0200, Sander Eikelenboom wrote:
>> Hi Keir,
>>
>> Does xen and/or the xen console depend on physical cpu 0 ?
> Usually the console for Dom0, and I think all other domains go
> through CPU0. Let me CC Ian here, who has been mucking in this
> area and found some bugs (and produced fixes).
> Ian, that bug you found with not clearing the eventchannel - that
> wouldn't have an impact here, right?
>>
>> I'm still trying to solve the mystery of my machine freezing when doing:
>>
>> - videograbbing in a domU with a usb3 pci-express controller passed through
>> (seems to cause quite a few interrupts)
>> - compiling a linux kernel with "make -j 6"
>>
>> It's a 6 core AMD phenom x6.
>>
>> Without cpu pinning:
>> I can freeze the machine easily within a minute after starting the compile,
>> at first xen serial console also slows down under the load (slow updates).
>> When the machine freezes i can't do anything with xen serial console.
>>
>> With cpu pinning:
>> By not using the pcpu 0 at all for any domain, and pinning the domain with
>> the videograbber to it's own pcpu (pcpu 5) it seems the machine keeps
>> running after 20 "make -j6" iterations of kernel compilation.
>> Xen serial console stays responsive and doesn't slow down during the kernel
>> compilation. The videograbber shows no problem grabbing video.
>>
> AHA! So finally closer to the mystery.
So i thought ... but all though it survived 20 iterations of kernel compiling,
it still froze while the dom0 was relatively idle, and the domU still grabing
video.
This time it gave the "RCU detected CPU stalls " again cpu 0, since it's dom0
that should be vcpu0=pcpu1. My xen serial console was frozen again, so i can't
dump anything.
But:
-the hypervisor should still have pcpu0 available
-dom0 has pcpu1-4 although shared with some other mostly idle domains
-domU with videograbbing has pcpu5
So the cpu pinning seems to change things a bit, but only in the sense that it
survives some what longer ...
Another thing i'm wondering about is that xentop reports that dom0 consumes
about 50% cpu, when i use top on dom0, i seem to get nowhere near 50% when
using the 2.6.31 pvops kernel
With the latest 2.6.32-pvops there is a problem that events/0 consumes a lot of
cpu related to xenconsoled (jeremy has allready a thread running on that).
That's why i now tested 2.6.31-pvops that hasn't got that issue.
> Can you provide the /proc/interrupts of the Dom0?
Just when running for some time, or try to get it under load / just before
freeze ?
> I wonder if this is related to the isseu I had some time ago, and never got
> to look at. The problem was that during heavy compilation (this is a 2 Nehelem
> socket box, just running Dom0 - no guests), the keyboard and USB driver would
> stop getting interrupts. So the drivers would start polling which is quite
> slow,
> albeit servicable, and then at some point it would pick up again.
> The weirdness was that the /proc/interrupts showed absolutly _no_ interrupts
> on CPU0
> during that time - as if Xen just forgot to update them. Jeremy suggested I
> try to
> disable Xen IRQ balance (noirqbalance on Xen command line) in case that is
> it, and to my
> emberrasement I haven't tried that yet.
I did try that before, didn't seem to make a difference, but i will try again
just to be sure.
> Did you try that? I think somebody suggested that but I can't recall whether
> it
> was for this issue?
>>
>> Name ID VCPU CPU State Time(s) CPU
>> Affinity
>> Domain-0 0 0 3 r-- 2169.7 1-4
>> Domain-0 0 1 1 -b- 2339.3 1-4
>> Domain-0 0 2 2 -b- 2358.9 1-4
>> Domain-0 0 3 3 -b- 2298.2 1-4
>> Domain-0 0 4 1 -b- 2221.9 1-4
>> Domain-0 0 5 4 -b- 2287.7 1-4
>> backup 9 0 4 -b- 10.6 1-4
>> database 1 0 4 -b- 45.3 1-4
>> davical 5 0 3 -b- 8.7 1-4
>> git 8 0 2 -b- 7.9 1-4
>> mail 2 0 4 -b- 8.0 1-4
>> samba 3 0 3 -b- 11.1 1-4
>> security 7 0 5 r-- 1433.2 5
>> www 4 0 1 -b- 10.2 1-4
>> zabbix 6 0 3 -b- 21.2 1-4
>>
>>
>> Is there a way a deadlock could occur between hypervisor <-> dom0 <-> domU
>> especially related to passthrough/interrupts in the context of pcpu 0 ?
> I don't know, but I do know that the IRQ handling in Xen 4.0 changed
> significantly compared
> to 3.4. I don't remember if you ever ran this setup under 3.4?
I tried xen 3.4-testing as well today (in combination with 2.6.31-pvops as
dom0), but that resulted in a videograbbing domU going beserk, the xhci driver
complains about "spurious interrupts" multiple times a second.
>>
>> --
>> Sander
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|