|
|
|
|
|
|
|
|
|
|
xen-devel
RE: [Xen-devel] NMI with SMP domain causing machine to reboot
Keir
Thanks for your reply.
I don't think the problem is caused by not properly
reseting CPU1's perf counter. I can see that the number of
NMIs being generated are similar both for CPU0 and CPU1,
and both CPUs perf counters are being programmed in the
exact same way.
(The command "xenpmc -s" enables me to see the number of NMIs
generated)
Moreover, when we have multiple non-SMP domains running
on both CPUs, this problem does not happen.
Sharing of MSRs between hyperthreads should not be the problem
either, since my machine has 2 physical CPUs and hyperthreading is
disabled in the BIOS.(ie. CPU0 and CPU1 are distinct physical
CPUs)
It seems that there is something wrong or some race condition
introduced by SMPs domains. Any idea of what is different in Xen
(maybe interrupt handling) when you have SMP domains?
Any chance you could try reproducing this behavior in one of
your machines?
Can you think of any situation that would cause the machine to
reboot without printing any error message in the serial console?
Any help is deeply appreciate since I loosing hope I will
be able to nail this down by myself.
It is always possible possible that I am doing something wrong,
but at this point the code left is not doing much and I am
starting to suspect the problem lies somewhere else in Xen.
In this case I would desperately need someone else help.
Thanks
Renato
>> -----Original Message-----
>> From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>> Sent: Friday, September 09, 2005 1:57 AM
>> To: Santos, Jose Renato G
>> Cc: Turner, Yoshio; xen-devel@xxxxxxxxxxxxxxxxxxx; G John Janakiraman
>> Subject: Re: [Xen-devel] NMI with SMP domain causing machine
>> to reboot
>>
>>
>>
>> On 8 Sep 2005, at 20:33, Santos, Jose Renato G wrote:
>>
>> > I have spend most of the last weeks trying to nail down
>> a nasty bug
>> > that is preventing me to release xenoprof for SMP domains.
>> > The bug is non-deterministic and when it happens the machine just
>> > reboots with no message or warning on the serial console.
>> > This made the debugging process painfull and slow.
>>
>> Hard to say from the code, but maybe it's somethign to do with
>> hyperthreading? The performance counter MSRs are shared in a
>> weird way
>> between hyperthreads. Maybe you're not properly resetting
>> CPU1's perf
>> counter and ending up with an NMI storm?
>>
>> -- Keir
>>
>>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|