WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] Multi-vcpu HVM Linux domain hanging during boot

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] Multi-vcpu HVM Linux domain hanging during boot
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Fri, 2 Jul 2010 09:33:51 -0700 (PDT)
Cc:
Delivery-date: Fri, 02 Jul 2010 09:35:31 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTimFdlQNryV7c5QC9ZIv0ebeobQ28Y1Ul7umhE0s@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AANLkTimFdlQNryV7c5QC9ZIv0ebeobQ28Y1Ul7umhE0s@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> I've got an HVM Linux guest, Debian 2.6.18-6-686 kernel, which works
> fine if vcpus=1 but hangs if vcpus=2.
> 
> I'm pretty sure that it worked with vcpus=2 earlier this week, but now
> I seem unable to find a hypervisor/tools/qemu combination within the
> last month that works.
> 
> It hangs just after detecting TSC as a timesource.  It's busy-waiting
> (both cpus pegged).  Vcpu 0 is in a function called
> hrtimer_run_queues, vcpu1 is in a function called do_timer.  Xentrace
> reports that vcpu 1 has an interrupt pending, but that it's not being
> delivered because interrupts are disabled in the vcpu's eflags.
> 
> I've even tried going back to an earlier disk snapshot and booting a
> different kernel (2.6.18-4-686), just to make sure it's not something
> dumb like a corrupt VM image.
> 
> Anyone else had this problem?  Can anyone ATM successfully boot a
> mutli-processor HVM guest of any kind?
> 
> I'm going to build and install a kernel that I have the source for, so
> I can see whether the guest thinks interrupts should be enabled or
> not, but I'd appreciate any other ideas / suggestions people have to
> help diagnose the problem...

Hi George --

This may be a case of "when one has a hammer, everything looks
like a nail" but given that it hangs just after detecting TSC
as a timesource and works fine with vcpus=1, I have to suspect
TSC synchronization issues.  Have you hard-rebooted the physical
hardware recently?  If not (DON'T YET), can you record the "s"
and "t" debug keys from a console or from "xm debug-key" in dom0
(while the VM is in a hung state)?  Ideally, try the "s" key
many times to get enough samples to be statistically valid
in case the sync is swinging wildly.

If you are using xen-4.0.0 or later, all of the tsc_mode work was
supposed to resolve this kind of issue, but yours might identify
a corner-case... or perhaps the tools you are using (or the
vm config file) are overriding the default tsc_mode for the VM?

If my hammer is wrong and this has nothing to do with TSC, there
were some known issues in "old" kernels under HVM that were
resolved by a different timer_mode setting (note this is
different from tsc_mode) and/or kernel boot parameters (which
sadly differ sometimes even between update releases in the
2.6.18 stream).  I can try to resurrect that data if necessary.

Thanks,
Dan

P.S. You didn't mention the version/changeset of Xen.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>