This soft-lockup problem seems to occur when I perform a large MySQL query
that takes several seconds to complete on a DomU. At this point, the soft
lockup message appears and the Xen box seems to stall for about 5-10
seconds. After that, everything continues normally again.
The box is an Abit-LG81 motherboard (Skt775, ICH7) with an Intel Celeron
2.7GHz processor and 2 GB of RAM. I am running software RAID-5 across the 4
SATA drives in Dom0 and providing the disks to the DomUs using LVM. The
basic installation was Kubuntu Dapper Drake 6.06 and I installed the Xen
kernel from the 3.0.2-2 binaries on the Xen site.
A capture of the relevant information from syslog is below. This is what I
get for most of the errors:
Sep 3 15:48:20 hydra kernel: Pid: 0, comm: swapper
Sep 3 15:48:20 hydra kernel: EIP: 0061:[hypercall_page+935/4096] CPU: 0
Sep 3 15:48:20 hydra kernel: EIP is at 0xc01013a7
Sep 3 15:48:20 hydra kernel: EFLAGS: 00000296 Tainted: GF
(2.6.16-xen #1)
Sep 3 15:48:20 hydra kernel: EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX:
00001f8e
Sep 3 15:48:20 hydra kernel: ESI: 00000000 EDI: 00000001 EBP: c03da000 DS:
007b ES: 007b
Sep 3 15:48:20 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 00df6000 CR4:
00000640
Sep 3 15:48:20 hydra kernel: [xen_idle+83/176] xen_idle+0x53/0xb0
Sep 3 15:48:20 hydra kernel: [cpu_idle+122/224] cpu_idle+0x7a/0xe0
Sep 3 15:48:20 hydra kernel: [start_kernel+439/512]
start_kernel+0x1b7/0x200
Sep 3 15:48:20 hydra kernel: [unknown_bootoption+0/464]
unknown_bootoption+0x0/0x1d0
Sometimes I get a longer trace like the one below. The exact trace varies a
bit but the starting function is always "notify_remote_via_irq":
Sep 3 16:09:30 hydra kernel: BUG: soft lockup detected on CPU#0!
Sep 3 16:09:30 hydra kernel:
Sep 3 16:09:30 hydra kernel: Pid: 0, comm: swapper
Sep 3 16:09:30 hydra kernel: EIP: 0061:[hypercall_page+519/4096] CPU: 0
Sep 3 16:09:30 hydra kernel: EIP is at 0xc0101207
Sep 3 16:09:30 hydra kernel: EFLAGS: 00000202 Tainted: GF
(2.6.16-xen #1)
Sep 3 16:09:30 hydra kernel: EAX: 00000000 EBX: c03dbc98 ECX: c114ed40 EDX:
c03dbef4
Sep 3 16:09:30 hydra kernel: ESI: 00000000 EDI: 00000112 EBP: c0432fc0 DS:
007b ES: 007b
Sep 3 16:09:30 hydra kernel: CR0: 8005003b CR2: b7e2c4b0 CR3: 00df6000 CR4:
00000640
Sep 3 16:09:30 hydra kernel: [notify_remote_via_irq+41/64]
notify_remote_via_irq+0x29/0x40
Sep 3 16:09:30 hydra kernel: [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90
Sep 3 16:09:30 hydra kernel: [net_rx_action+1123/1280]
net_rx_action+0x463/0x500
Sep 3 16:09:30 hydra kernel: [fib_lookup+209/320] fib_lookup+0xd1/0x140
Sep 3 16:09:30 hydra kernel: [ip_route_input_slow+440/2528]
ip_route_input_slow+0x1b8/0x9e0
Sep 3 16:09:30 hydra kernel: [try_to_wake_up+768/880]
try_to_wake_up+0x300/0x370
Sep 3 16:09:30 hydra kernel: [<e13c5000>] br_forward_finish+0x0/0x70
[bridge]
Sep 3 16:09:30 hydra kernel: [neigh_lookup+136/208] neigh_lookup+0x88/0xd0
Sep 3 16:09:30 hydra kernel: [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90
Sep 3 16:09:30 hydra kernel: [arp_process+142/1456] arp_process+0x8e/0x5b0
Sep 3 16:09:30 hydra kernel: [ip_local_deliver+280/688]
ip_local_deliver+0x118/0x2b0
Sep 3 16:09:30 hydra kernel: [arp_rcv+221/400] arp_rcv+0xdd/0x190
Sep 3 16:09:30 hydra kernel: [packet_rcv_spkt+359/672]
packet_rcv_spkt+0x167/0x2a0
Sep 3 16:09:30 hydra kernel: [netif_receive_skb+650/816]
netif_receive_skb+0x28a/0x330
Sep 3 16:09:30 hydra kernel: [process_backlog+215/400]
process_backlog+0xd7/0x190
Sep 3 16:09:30 hydra kernel: [tasklet_action+157/320]
tasklet_action+0x9d/0x140
Sep 3 16:09:30 hydra kernel: [__do_softirq+245/288]
__do_softirq+0xf5/0x120
Sep 3 16:09:30 hydra kernel: [do_softirq+149/160] do_softirq+0x95/0xa0
Sep 3 16:09:30 hydra kernel: [do_IRQ+31/48] do_IRQ+0x1f/0x30
Sep 3 16:09:30 hydra kernel: [evtchn_do_upcall+168/240]
evtchn_do_upcall+0xa8/0xf0
Sep 3 16:09:30 hydra kernel: [hypervisor_callback+44/52]
hypervisor_callback+0x2c/0x34
Sep 3 16:09:30 hydra kernel: [xen_idle+83/176] xen_idle+0x53/0xb0
Sep 3 16:09:30 hydra kernel: [cpu_idle+122/224] cpu_idle+0x7a/0xe0
Sep 3 16:09:30 hydra kernel: [start_kernel+439/512]
start_kernel+0x1b7/0x200
Just once, I got the following error:
Sep 3 18:40:35 hydra kernel: Pid: 2268, comm: md0_raid5
Sep 3 18:40:35 hydra kernel: EIP: 0061:[hypercall_page+551/4096] CPU: 0
Sep 3 18:40:35 hydra kernel: EIP is at 0xc0101227
Sep 3 18:40:35 hydra kernel: EFLAGS: 00200246 Tainted: GF
(2.6.16-xen #1)
Sep 3 18:40:35 hydra kernel: EAX: 00030000 EBX: 00000000 ECX: 00000000 EDX:
c0619c2c
Sep 3 18:40:35 hydra kernel: ESI: c0619b30 EDI: c0619b40 EBP: 00000001 DS:
007b ES: 007b
Sep 3 18:40:35 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 003f2000 CR4:
00000640
Sep 3 18:40:35 hydra kernel: [force_evtchn_callback+10/16]
force_evtchn_callback+0xa/0x10
Sep 3 18:40:35 hydra kernel: [get_request+727/800] get_request+0x2d7/0x320
Sep 3 18:40:35 hydra kernel: [lock_timer_base+36/80]
lock_timer_base+0x24/0x50
Sep 3 18:40:35 hydra kernel: [get_request_wait+44/368]
get_request_wait+0x2c/0x170
Sep 3 18:40:35 hydra kernel: [blk_plug_device+99/160]
blk_plug_device+0x63/0xa0
Sep 3 18:40:35 hydra kernel: [kobject_put+31/48] kobject_put+0x1f/0x30
Sep 3 18:40:35 hydra kernel: [kobject_release+0/16]
kobject_release+0x0/0x10
Sep 3 18:40:35 hydra kernel: [<e105e3f1>] scsi_request_fn+0x261/0x400
[scsi_mod]
Sep 3 18:40:35 hydra kernel: [__make_request+170/1184]
__make_request+0xaa/0x4a0
Sep 3 18:40:35 hydra kernel: [schedule+1013/1840] schedule+0x3f5/0x730
Sep 3 18:40:35 hydra kernel: [generic_make_request+240/352]
generic_make_request+0xf0/0x160
Sep 3 18:40:35 hydra kernel: [__bio_clone+166/176] __bio_clone+0xa6/0xb0
Sep 3 18:40:35 hydra kernel: [submit_bio+98/256] submit_bio+0x62/0x100
Sep 3 18:40:35 hydra kernel: [<e108f728>] md_super_write+0xa8/0xe0
[md_mod]
Sep 3 18:40:35 hydra kernel: [<e10919a6>] md_update_sb+0x1b6/0x230
[md_mod]
Sep 3 18:40:35 hydra kernel: [<e1097793>] md_check_recovery+0x463/0x4d0
[md_mod]
Sep 3 18:40:35 hydra kernel: [schedule_timeout+169/176]
schedule_timeout+0xa9/0xb0
Sep 3 18:40:35 hydra kernel: [<e1085bb6>] raid5d+0x16/0x190 [raid5]
Sep 3 18:40:35 hydra kernel: [prepare_to_wait+32/112]
prepare_to_wait+0x20/0x70
Sep 3 18:40:35 hydra kernel: [<e109577f>] md_thread+0x5f/0x130 [md_mod]
Sep 3 18:40:35 hydra kernel: [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Sep 3 18:40:35 hydra kernel: [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Sep 3 18:40:35 hydra kernel: [<e1095720>] md_thread+0x0/0x130 [md_mod]
Sep 3 18:40:35 hydra kernel: [kthread+186/192] kthread+0xba/0xc0
Sep 3 18:40:35 hydra kernel: [kthread+0/192] kthread+0x0/0xc0
Sep 3 18:40:35 hydra kernel: [kernel_thread_helper+5/16]
kernel_thread_helper+0x5/0x10
Does anyone on this list know what is going on and why this would occur?
I have a work-around that breaks the SQL query into a set of smaller queries
which don't then cause this problem, but I would like to get to the root
cause and fix the problem properly.
Thanks in advance for any help anyone can give on this.
> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Roger Lucas
> Sent: 03 September 2006 14:47
> To: xen-users@xxxxxxxxxxxxxxxxxxx
> Subject: RE: soft lockup was (Re: [Xen-users] Kernel error)
>
> I have suddenly got these same errors occurring on my Xen-3.0.2-2 system.
> I
> have four DomUs with 256MB ram each and 512MB on the Dom0 running on an
> Intel Celeron system.
>
> Is the only solution to upgrade to Unstable, or is there a patch/upgrade
> available for the 3.0.2-2 release?
>
> Thanks, Roger.
>
> > -----Original Message-----
> > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Steve Traugott
> > Sent: 02 August 2006 23:22
> > To: Jones, Chris
> > Cc: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx
> > Subject: soft lockup was (Re: [Xen-users] Kernel error)
> >
> > Hi Chris,
> >
> > Did you ever reach any sort of conclusion about the current state of
> > the soft lockup bug? Do you have a stable build now? What changeset
> > is it?
> >
> > Thanks,
> >
> > Steve
> >
> > On Fri, Jul 07, 2006 at 07:37:33AM -0500, Jones, Chris wrote:
> > > I am getting the same errors in the stable 3.0.2 but I am not getting
> > > the errors on unstable so it looks like you are right. I am
> downloading
> > > the testing tree in an attempt to test it there. I will holler when I
> > > find something out.
> > >
> > > -----Original Message-----
> > > From: Rodrigo Borges Pereira [mailto:rbp@xxxxxxxxxxxxx]
> > > Sent: Friday, July 07, 2006 7:23 AM
> > > To: Jones, Chris; xen-users@xxxxxxxxxxxxxxxxxxx
> > > Subject: RE: [Xen-users] Kernel error
> > >
> > > I believe that thread states that the fix is already in 3.0.2. And i
> am
> > > running 3.0.2.
> > > Did i get it wrong?
> > >
> > > tks
> > >
> > > > -----Original Message-----
> > > > From: Jones, Chris [mailto:chris.jones@xxxxxxxxxxxxxxx]
> > > > Sent: sexta-feira, 7 de Julho de 2006 13:18
> > > > To: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx
> > > > Subject: RE: [Xen-users] Kernel error
> > > >
> > > > There is a fix for this issue.
> > > > http://lists.xensource.com/archives/html/xen-devel/2006-04/msg
> > > 00193.html
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> > > > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > > > Rodrigo Borges Pereira
> > > > Sent: Friday, July 07, 2006 7:06 AM
> > > > To: xen-users@xxxxxxxxxxxxxxxxxxx
> > > > Subject: [Xen-users] Kernel error
> > > >
> > > > Hi,
> > > >
> > > > I got this on the console of one DomU:
> > > >
> > > > --> BUG: soft lockup detected on CPU#0!
> > > >
> > > > Pid: 0, comm: swapper
> > > > EIP: 0061:[<c01013a7>] CPU: 0
> > > > EIP is at 0xc01013a7
> > > > EFLAGS: 00000246 Tainted: GF (2.6.16-xen3_86.1_rhel4.1 #1)
> > > > EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00004eaf
> > > > ESI: 00000000 EDI: 00000001 EBP: c03e4000 DS: 007b ES: 007b
> > > > CR0: 8005003b CR2: 8005230c CR3: 004ec000 CR4: 00000640
> > > > [<c0102b53>] xen_idle+0x53/0xb0 [<c0102c1f>]
> > > > cpu_idle+0x6f/0xe0 [<c03e69da>] start_kernel+0x1da/0x230
> > > > [<c03e6320>] unknown_bootoption+0x0/0x1f0
> > > >
> > > >
> > > > It didn't seem to affect the operation of either DomU or Dom0.
> > > >
> > > > Should i worry?
> > > >
> > > > Best regards,
> > > > r
> > > >
> > > >
> > > > _______________________________________________
> > > > Xen-users mailing list
> > > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/xen-users
> > > >
> > >
> > >
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-users
> >
> > --
> > Stephen G. Traugott (KG6HDQ)
> > UNIX/Linux Infrastructure Architect, TerraLuna LLC
> > stevegt@xxxxxxxxxxxxx
> > http://www.stevegt.com -- http://Infrastructures.Org
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-users
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|