WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] bugzilla #197 fast create/destroy BUG_ON()

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] bugzilla #197 fast create/destroy BUG_ON()
From: Ryan Harper <ryanh@xxxxxxxxxx>
Date: Fri, 9 Sep 2005 17:07:19 -0500
Delivery-date: Fri, 09 Sep 2005 22:05:12 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.6+20040907i
I haven't made any further progress on this [1]bug, even with adding in
some extra tracing.  

When we put xm create following by xm destroy in a loop, eventually xen
hits a hard reboot dumping out a BUG_ON()

(XEN) BUG at domain.c:1054
(XEN) CPU:    0
(XEN) EIP:    e008:[<ff12a6b6>] domain_relinquish_resources+0x43/0x1c8
(XEN) EFLAGS: 00010282   CONTEXT: hypervisor
(XEN) eax: ff187fb8   ebx: ffbf1080   ecx: ffbf4000   edx: 00000000
(XEN) esi: 00000007   edi: ff103fac   ebp: ff103b0c   esp: ff103af4
(XEN) cr0: 8005003b   cr3: db629000
(XEN) ds: e010   es: e010   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from esp=ff103af4:
(XEN)    ff17ba49 ff17bb67 0000041e ff11cfa9 ffbd9094 00000000 ff103b2c ff109027
(XEN)    ffbda080 ffbda308 00000001 ffbda080 ffbda080 00000000 ff103f8c ff107f88
(XEN)    ffbda080 ff103dbc 00000000 ffbda310 00000001 00000001 ff103b5c ff13f087
(XEN)    00000003 00000000 ffbc7080 ffbda080 ff103b74 00000020 ff103b8c ff10c368
(XEN)    00000000 ffbf10d4 ff103b9c ff130211 00000001 00000001 00000001 00000004
(XEN)    ffbf1e00 ff103cc8 ff111a6c ff103c2c 00000001 00000009 ff103bbc 00000004
(XEN)    00000004 ff103fb4 ff103bcc ff1233a3 00000004 80000003 80000003 00000004
(XEN)    80000002 80000003 fd8e671c ff12fb9a ff103be8 80000004 ffbf2080 ffbf2080
(XEN)    fd750ef4 ff103fac ff103c5c 80000003 00ef0000 80000003 80000003 80000003
(XEN)    80000002 ff103c48 ff1352eb fd8e6710 ff103c34 00000020 ff103c4c 00000001
(XEN)    80000003 00000001 ffbf2080 ffbf2080 ffbf2080 80000003 80000004 80000003
(XEN)    ffbf2080 00000000 ff103c8c ff1350ed fd70c770 ffbf2080 ffbf2080 00000540
(XEN)    c9e72063 00000000 00000001 00000001 00000000 000001d1 ff103cbc ff135407
(XEN)    fedd1000 ffbf2080 00000400 00000001 00000004 ffbf1e00 ff103ddc ff111a6c
(XEN)    ff103d40 00000001 00000008 ff135de0 fd6edab0 00000001 ff103cdc ff13b552
(XEN)    20000000 00000000 0000b400 000004c4 c4b40000 00000004 00008000 00009b42
(XEN)    000007dc 00000000 f0000000 04c4b400 00000004 00000001 33ef003c 33ef003c
(XEN)    00000000 00000000 00000000 00000000 000007dc 8000003e 8000003f 8000003e
(XEN)    ffbf2080 00000000 00000000 00009b42 00008000 33ef003c 00000000 33ef003c
(XEN)    00000000 00000000 00000000 0000755a 00000000 23ef0000 ff103d7c ff12315c
(XEN) Xen call trace:
(XEN)    [<ff12a6b6>] domain_relinquish_resources+0x43/0x1c8
(XEN)    [<ff109027>] domain_kill+0x62/0x9e
(XEN)    [<ff107f88>] do_dom0_op+0x54d/0x103b
(XEN)    [<ff155d8f>] hypercall+0x8f/0xaf


the line in question is:

   BUG_ON(!cpus_empty(d->cpumask));
 
This says to me,  bail if the domain's cpumask is NOT empty.  AFAICT,
the only places that cpumask for a domain are modified are:

1. startup_cpu_idle_loop() in  xen/arch/x86/domain.c

   cpu_set(smp_processor_id(),v->domain->cpumask);
   
2.  __context_switch() in xen/arch/x86/domain.c

   if ( p->domain != n->domain )
      cpu_set(cpu, n->domain->cpumask);

   ...

   if ( p->domain != n->domain )
      cpu_clear(cpu, p->domain->cpumask);

If we are hitting that assert, then one or more of the vcpus in the
domain are still running?  Any help on pointing out where I've
misunderstood whats happening or good places to insert some debugging
would be of great help.


1. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=197

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@xxxxxxxxxx

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel