WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xen dom0 2.6.32.15 kernel BUG at drivers/xen/grant-table

To: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] xen dom0 2.6.32.15 kernel BUG at drivers/xen/grant-table.c:583
From: Arnd Hannemann <hannemann@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 14 Jun 2010 14:26:39 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 14 Jun 2010 05:29:07 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <alpine.DEB.2.00.1006141156170.3401@kaball-desktop>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4C15E000.7060509@xxxxxxxxxxxxxxxxxxx> <alpine.DEB.2.00.1006141156170.3401@kaball-desktop>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100423 Lightning/1.0b2pre Thunderbird/3.0.4
Hi,

Am 14.06.2010 12:57, schrieb Stefano Stabellini:
> On Mon, 14 Jun 2010, Arnd Hannemann wrote:
>> Hi,
>>
>> we have regular but hard to reproduce (wait for a day or two starting domUs) 
>> kernel panics (see below) with latest
>> "xen/stable-2.6.32.x" git tree.
>>
>> Any idea, anyone?
>>
> 
> this CS from origin/xen/dom0/gntdev should fix your problem:
> 
> sstabellini@kaball-desktop:~/xensource/linux-pvops-latest$ git show 
> ad469f0da31bc16b945f9a06710b9d45434d0091
> commit ad469f0da31bc16b945f9a06710b9d45434d0091
> Author: Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
> Date:   Wed Jun 9 12:34:02 2010 -0700
> 
>     xen/gntdev: use spinlocks rather than rwsem for locking
>     
>     The mmu notifier mechanism calls its callbacks with an rcu lock,
>     which disables preemption.  This means we cannot use any blocking
>     synchronization for locking.
>     
>     Convert all the rwsemas to plain spinlocks.  This requires that
>     the memory allocation and copying to/from userspace be split
>     from the actual datastructure updates since they can't be done
>     under spinlock.
>     
>     Signed-off-by: Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> 

Unfortunately, this patch does not seem to help. We get a very similar
backtrace after one hour stress testing with a script starting and stopping
domUs in a loop.

Maybe the problem is the hypervisor itself?
We are currently using 4.0.1-rc2-pre (we updated from 4.0.0 because of what we 
believed was the same
problem, we had no working netconsole back then though).

Jun 14 14:07:22 vmhost2 [ 2418.542425] ------------[ cut here ]------------
Jun 14 14:07:22 vmhost2 [ 2418.542475] kernel BUG at 
drivers/xen/grant-table.c:583!
Jun 14 14:07:22 vmhost2 [ 2418.542515] invalid opcode: 0000 [#1]
Jun 14 14:07:22 vmhost2 SMP
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.542574] last sysfs file: 
/sys/devices/virtual/net/br0/bridge/topology_change_detected
Jun 14 14:07:22 vmhost2 [ 2418.542640] Modules linked in:
Jun 14 14:07:22 vmhost2 netconsole
Jun 14 14:07:22 vmhost2 raid0
Jun 14 14:07:22 vmhost2 md_mod
Jun 14 14:07:22 vmhost2 rtc_cmos
Jun 14 14:07:22 vmhost2 rtc_core
Jun 14 14:07:22 vmhost2 rtc_lib
Jun 14 14:07:22 vmhost2 ipv6
Jun 14 14:07:22 vmhost2 thermal
Jun 14 14:07:22 vmhost2 processor
Jun 14 14:07:22 vmhost2 thermal_sys
Jun 14 14:07:22 vmhost2 hwmon
Jun 14 14:07:22 vmhost2 pl2303
Jun 14 14:07:22 vmhost2 button
Jun 14 14:07:22 vmhost2 acpi_processor
Jun 14 14:07:22 vmhost2 usbserial
Jun 14 14:07:22 vmhost2 sr_mod
Jun 14 14:07:22 vmhost2 evdev
Jun 14 14:07:22 vmhost2 cdrom
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.542937]
Jun 14 14:07:22 vmhost2 [ 2418.542970] Pid: 0, comm: swapper Not tainted 
(2.6.32.15-xen4.0.0-dom0-stefano #2) System Product Name
Jun 14 14:07:22 vmhost2 [ 2418.543034] EIP: 0061:[<c120f170>] EFLAGS: 00010282 
CPU: 0
Jun 14 14:07:22 vmhost2 [ 2418.543077] EIP is at 
gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 [ 2418.543117] EAX: ffffffea EBX: c153be84 ECX: 
00000001 EDX: 00000000
Jun 14 14:07:22 vmhost2 [ 2418.543158] ESI: 00007ff0 EDI: 00000013 EBP: 
c290e660 ESP: c153be50
Jun 14 14:07:22 vmhost2 [ 2418.543199]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 
0069
Jun 14 14:07:22 vmhost2 [ 2418.543239] Process swapper (pid: 0, ti=c153a000 
task=c1543760 task.ti=c153a000)
Jun 14 14:07:22 vmhost2 [ 2418.543297] Stack:
Jun 14 14:07:22 vmhost2 [ 2418.543329]  00000000
Jun 14 14:07:22 vmhost2 00213784
Jun 14 14:07:22 vmhost2 c2904dc0
Jun 14 14:07:22 vmhost2 0002c233
Jun 14 14:07:22 vmhost2 ec233000
Jun 14 14:07:22 vmhost2 ecf85bec
Jun 14 14:07:22 vmhost2 00000013
Jun 14 14:07:22 vmhost2 ec233000
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.543461] <0>
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 ebd6e000
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 00000013
Jun 14 14:07:22 vmhost2 c1350000
Jun 14 14:07:22 vmhost2 13784001
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 0002c233
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.543616] <0>
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2 c1628284
Jun 14 14:07:22 vmhost2 c155b978
Jun 14 14:07:22 vmhost2 c1628284
Jun 14 14:07:22 vmhost2 00560014
Jun 14 14:07:22 vmhost2 c12200c1
Jun 14 14:07:22 vmhost2 00000001
Jun 14 14:07:22 vmhost2 00000000
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.543797] Call Trace:
Jun 14 14:07:22 vmhost2 [ 2418.543838]  [<c1350000>] ? sock_release+0x10/0x80
Jun 14 14:07:22 vmhost2 [ 2418.543882]  [<c12200c1>] ? net_tx_action+0x1d1/0x9b0
Jun 14 14:07:22 vmhost2 [ 2418.543925]  [<c103bc2e>] ? tasklet_action+0x9e/0xb0
Jun 14 14:07:22 vmhost2 [ 2418.543967]  [<c103c378>] ? __do_softirq+0x88/0x110
Jun 14 14:07:22 vmhost2 [ 2418.544009]  [<c1210057>] ? 
__xen_evtchn_do_upcall+0xd7/0x160
Jun 14 14:07:22 vmhost2 [ 2418.544053]  [<c103c43d>] ? do_softirq+0x3d/0x40
Jun 14 14:07:22 vmhost2 [ 2418.544094]  [<c121063a>] ? 
xen_evtchn_do_upcall+0x2a/0x40
Jun 14 14:07:22 vmhost2 [ 2418.544147]  [<c1009da7>] ? xen_do_upcall+0x7/0xc
Jun 14 14:07:22 vmhost2 [ 2418.544190]  [<c10013a7>] ? 
hypercall_page+0x3a7/0x1010
Jun 14 14:07:22 vmhost2 [ 2418.544234]  [<c10061ef>] ? xen_safe_halt+0xf/0x20
Jun 14 14:07:22 vmhost2 [ 2418.544275]  [<c100382c>] ? xen_idle+0x1c/0x30
Jun 14 14:07:22 vmhost2 [ 2418.544316]  [<c10081fa>] ? cpu_idle+0x3a/0x60
Jun 14 14:07:22 vmhost2 [ 2418.544359]  [<c15787ef>] ? start_kernel+0x2c6/0x2cb
Jun 14 14:07:22 vmhost2 [ 2418.544401]  [<c1578367>] ? 
unknown_bootoption+0x0/0x190
Jun 14 14:07:22 vmhost2 [ 2418.544444]  [<c157b0e6>] ? 
xen_start_kernel+0x624/0x62c
Jun 14 14:07:22 vmhost2 [ 2418.544483] Code:
Jun 14 14:07:22 vmhost2 8d
Jun 14 14:07:22 vmhost2 5c
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 34
Jun 14 14:07:22 vmhost2 c1
Jun 14 14:07:22 vmhost2 e0
Jun 14 14:07:22 vmhost2 0c
Jun 14 14:07:22 vmhost2 83
Jun 14 14:07:22 vmhost2 c8
Jun 14 14:07:22 vmhost2 01
Jun 14 14:07:22 vmhost2 89
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 34
Jun 14 14:07:22 vmhost2 8b
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 0c
Jun 14 14:07:22 vmhost2 c7
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 40
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 00
Jun 14 14:07:22 vmhost2 89
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 3c
Jun 14 14:07:22 vmhost2 e8
Jun 14 14:07:22 vmhost2 b8
Jun 14 14:07:22 vmhost2 1e
Jun 14 14:07:22 vmhost2 df
Jun 14 14:07:22 vmhost2 ff
Jun 14 14:07:22 vmhost2 85
Jun 14 14:07:22 vmhost2 c0
Jun 14 14:07:22 vmhost2 0f
Jun 14 14:07:22 vmhost2 84
Jun 14 14:07:22 vmhost2 2c
Jun 14 14:07:22 vmhost2 ff
Jun 14 14:07:22 vmhost2 ff
Jun 14 14:07:22 vmhost2 ff
Jun 14 12:07:21 vmhost2 unparseable log message: "<0f> "
Jun 14 14:07:22 vmhost2 0b
Jun 14 14:07:22 vmhost2 eb
Jun 14 14:07:22 vmhost2 fe
Jun 14 14:07:22 vmhost2 0f
Jun 14 14:07:22 vmhost2 0b
Jun 14 14:07:22 vmhost2 eb
Jun 14 14:07:22 vmhost2 fe
Jun 14 14:07:22 vmhost2 0f
Jun 14 14:07:22 vmhost2 0b
Jun 14 14:07:22 vmhost2 eb
Jun 14 14:07:22 vmhost2 fe
Jun 14 14:07:22 vmhost2 8b
Jun 14 14:07:22 vmhost2 54
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 04
Jun 14 14:07:22 vmhost2 8b
Jun 14 14:07:22 vmhost2 44
Jun 14 14:07:22 vmhost2 24
Jun 14 14:07:22 vmhost2 0c
Jun 14 14:07:22 vmhost2 e8
Jun 14 14:07:22 vmhost2
Jun 14 14:07:22 vmhost2 [ 2418.545277] EIP: [<c120f170>]
Jun 14 14:07:22 vmhost2 gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 SS:ESP 0069:c153be50
Jun 14 14:07:22 vmhost2 [ 2418.545597] ---[ end trace f877a40240218318 ]---
Jun 14 14:07:22 vmhost2 [ 2418.545669] Kernel panic - not syncing: Fatal 
exception in interrupt
Jun 14 14:07:22 vmhost2 [ 2418.545746] Pid: 0, comm: swapper Tainted: G      D  
  2.6.32.15-xen4.0.0-dom0-stefano #2
Jun 14 14:07:22 vmhost2 [ 2418.545840] Call Trace:
Jun 14 14:07:22 vmhost2 [ 2418.545912]  [<c141d3b5>] ? panic+0x42/0xe1
Jun 14 14:07:22 vmhost2 [ 2418.545986]  [<c100cc56>] ? oops_end+0x96/0xa0
Jun 14 14:07:22 vmhost2 [ 2418.546060]  [<c100a73f>] ? do_invalid_op+0x7f/0x90
Jun 14 14:07:22 vmhost2 [ 2418.546135]  [<c120f170>] ? 
gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 [ 2418.546223]  [<c10741e4>] ? 
__alloc_pages_nodemask+0xe4/0x5b0
Jun 14 14:07:22 vmhost2 [ 2418.546303]  [<c1006197>] ? 
xen_force_evtchn_callback+0x17/0x30
Jun 14 14:07:22 vmhost2 [ 2418.546380]  [<c1006a98>] ? check_events+0x8/0xc
Jun 14 14:07:22 vmhost2 [ 2418.546455]  [<c141faa6>] ? error_code+0x66/0x6c
Jun 14 14:07:22 vmhost2 [ 2418.546530]  [<c100a6c0>] ? do_invalid_op+0x0/0x90
Jun 14 14:07:22 vmhost2 [ 2418.546606]  [<c120f170>] ? 
gnttab_copy_grant_page+0x1f0/0x260
Jun 14 14:07:22 vmhost2 [ 2418.546687]  [<c1350000>] ? sock_release+0x10/0x80
Jun 14 14:07:22 vmhost2 [ 2418.546763]  [<c12200c1>] ? net_tx_action+0x1d1/0x9b0
Jun 14 14:07:22 vmhost2 [ 2418.546839]  [<c103bc2e>] ? tasklet_action+0x9e/0xb0
Jun 14 14:07:22 vmhost2 [ 2418.546915]  [<c103c378>] ? __do_softirq+0x88/0x110
Jun 14 14:07:22 vmhost2 [ 2418.546993]  [<c1210057>] ? 
__xen_evtchn_do_upcall+0xd7/0x160
Jun 14 14:07:22 vmhost2 [ 2418.547070]  [<c103c43d>] ? do_softirq+0x3d/0x40
Jun 14 14:07:22 vmhost2 [ 2418.547145]  [<c121063a>] ? 
xen_evtchn_do_upcall+0x2a/0x40
Jun 14 14:07:22 vmhost2 [ 2418.547222]  [<c1009da7>] ? xen_do_upcall+0x7/0xc
Jun 14 14:07:22 vmhost2 [ 2418.547299]  [<c10013a7>] ? 
hypercall_page+0x3a7/0x1010
Jun 14 14:07:22 vmhost2 [ 2418.547385]  [<c10061ef>] ? xen_safe_halt+0xf/0x20
Jun 14 14:07:22 vmhost2 [ 2418.547463]  [<c100382c>] ? xen_idle+0x1c/0x30
Jun 14 14:07:22 vmhost2 [ 2418.547537]  [<c10081fa>] ? cpu_idle+0x3a/0x60
Jun 14 14:07:22 vmhost2 [ 2418.547615]  [<c15787ef>] ? start_kernel+0x2c6/0x2cb
Jun 14 14:07:22 vmhost2 [ 2418.547690]  [<c1578367>] ? 
unknown_bootoption+0x0/0x190
Jun 14 14:07:22 vmhost2 [ 2418.547766]  [<c157b0e6>] ? 
xen_start_kernel+0x624/0x62c

Best regards,
Arnd

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel