WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] NMI with SMP domain causing machine to reboot

To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>
Subject: RE: [Xen-devel] NMI with SMP domain causing machine to reboot
From: "Santos, Jose Renato G" <joserenato.santos@xxxxxx>
Date: Fri, 9 Sep 2005 10:44:38 -0700
Cc: "Turner, Yoshio" <yoshio_turner@xxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, G John Janakiraman <john@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 09 Sep 2005 17:42:30 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcW1G9cXSmIrXwH7TUqmG4UaxPyfoQAQ5+RQ
Thread-topic: [Xen-devel] NMI with SMP domain causing machine to reboot
  Keir

  Thanks for your reply.
  I don't think the problem is caused by not properly
  reseting CPU1's perf counter. I can see that the number of
  NMIs being generated are similar both for CPU0 and CPU1,
  and both CPUs perf counters are being programmed in the
  exact same way.
  (The command "xenpmc -s" enables me to see the number of NMIs
generated)
  Moreover, when we have multiple non-SMP domains running
  on both CPUs, this problem does not happen. 
  Sharing of MSRs between hyperthreads should not be the problem
  either, since my machine has 2 physical CPUs and hyperthreading is
  disabled in the BIOS.(ie. CPU0 and CPU1 are distinct physical 
  CPUs)

  It seems that there is something wrong or some race condition
  introduced by SMPs domains. Any idea of what is different in Xen
  (maybe interrupt handling) when you have SMP domains? 
  
  Any chance you could try reproducing this behavior in one of 
  your machines?
  Can you think of any situation that would cause the machine to
  reboot without printing any error message in the serial console?
  Any help is deeply appreciate since I loosing hope I will 
  be able to nail this down by myself.
  It is always possible possible that I am doing something wrong,
  but at this point the code left is not doing much and I am
  starting to suspect the problem lies somewhere else in Xen.
  In this case I would desperately need someone else help.
  
  Thanks

  Renato  

>> -----Original Message-----
>> From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx] 
>> Sent: Friday, September 09, 2005 1:57 AM
>> To: Santos, Jose Renato G
>> Cc: Turner, Yoshio; xen-devel@xxxxxxxxxxxxxxxxxxx; G John Janakiraman
>> Subject: Re: [Xen-devel] NMI with SMP domain causing machine 
>> to reboot
>> 
>> 
>> 
>> On 8 Sep 2005, at 20:33, Santos, Jose Renato G wrote:
>> 
>> >   I have spend most of the last weeks trying to nail down 
>> a nasty bug
>> >   that is preventing me to release xenoprof for SMP domains.
>> >   The bug is non-deterministic and when it happens the machine just
>> >   reboots with no message or warning on the serial console.
>> >   This made the debugging process painfull and slow.
>> 
>> Hard to say from the code, but maybe it's somethign to do with 
>> hyperthreading? The performance counter MSRs are shared in a 
>> weird way 
>> between hyperthreads. Maybe you're not properly resetting 
>> CPU1's perf 
>> counter and ending up with an NMI storm?
>> 
>>   -- Keir
>> 
>> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>