This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Strange vcpu count on forced domain crash

To: "Xen-Devel (xen-devel@xxxxxxxxxxxxxxxxxxx)" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] Strange vcpu count on forced domain crash
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Sat, 27 Mar 2010 09:29:10 -0700 (PDT)
Delivery-date: Sat, 27 Mar 2010 09:31:59 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
This is likely not a bug, but I know there's been some work in
hotplug CPU support lately, so I thought I'd report this on the
chance that it is a bug:

For benchmarking purposes (to stop a guest immediately and
leave it in a crashed state), I use:

# echo c > /proc/sysrq-trigger

in the (PV) guest.

I've noticed that when I do this on a vcpus==2 PV guest,
the "xm list" then shows it as vcpus==1, while xentop
continues to report it as vcpus==2.

When this happens, I often get a flurry of Xen messages from
memory.c and/or page_alloc.c, of the type:

(XEN) page_alloc.c:1083:d6 Over-allocation for domain 6: 131073 > 131072
(XEN) memory.c:132:d6 Could not allocate order=0 extent: id=6 memflags=0 (0 of 5

though at other times I've also seen a bunch of:

(XEN) mm.c:3798:d3 Bad donate 0000000000067a90: ed=ffff830069140000(3), sd=00000
00000000000, caf=8000000000000001, taf=0000000000000000

and once a crash with:

(XEN) mm.c:2363:d0 Bad type (saw 7400000001000000 != exp 1000000000000000) for m
fn 7af88 (pfn c78)
(XEN) mm.c:867:d0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1329:d0 Failure in alloc_l2_table: entry 64
(XEN) mm.c:2116:d0 Error while validating mfn 70326 (pfn 56d9) for type 20000000
00000000: caf=8000000000000003 taf=2000000000000001
(XEN) mm.c:1439:d0 Failure in alloc_l3_table: entry 0
(XEN) mm.c:2116:d0 Error while validating mfn 74f0f (pfn a2f0) for type 30000000
00000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:2731:d0 Error while pinning mfn 74f0f

For awhile, since I was doing tmem testing, I thought this was a
tmem problem, but I just reproduced it (the first mm.c error message anyway)
with tmem disabled, though the PV guest IS doing self-ballooning.

I'm not sure if this behavior was the same in previous Xen releases
or if this is new behavior.  I've only seen any of this after running
a test for many hours... I tried reproducing it with a shorter similar
test but without luck.

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-devel] Strange vcpu count on forced domain crash, Dan Magenheimer <=