WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]

To: "Jeremy Fitzhardinge" <jeremy@xxxxxxxx>
Subject: RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
From: "James Harper" <james.harper@xxxxxxxxxxxxxxxx>
Date: Mon, 6 Sep 2010 09:44:23 +1000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 05 Sep 2010 16:45:30 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AEC6C66638C05B468B556EA548C1A77D01A92A2B@trantor>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <AEC6C66638C05B468B556EA548C1A77D01A92A28@trantor><4C8040AE.7060902@xxxxxxxx> <AEC6C66638C05B468B556EA548C1A77D01A92A2B@trantor>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: ActK/q882zvhES+FSZ6IH+PETbgLzAAAAuywAJSVS+A=
Thread-topic: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
This is still happening with 2.6.32.18-ge6b9b2c but I'm now not so sure
if it's a problem with blkback after all. My script does this:

echo source=$source >>/tmp/attach-snapshot.log
echo dest=$dest >>/tmp/attach-snapshot.log
lvcreate --snapshot --name=$dest --size=1G $source
lvscan >>/tmp/attach-snapshot.log
xm block-attach $domname phy:$dest $destdev w

/tmp/attach-snapshot.log shows "source=" and "dest=" but not a result of
"lvscan", which would indicate that it's the "lvcreate" that's hung, not
the "xm block-attach". Disk writes are still working so I know that it's
not just a missed buffer or anything. That would explain why udevd is
hanging too - lvcreate would involve creating a new device while "xm
block-attach" wouldn't. I think I'm barking up the wrong tree on
xen-devel unless anyone else has any suggestions?

Thanks

James

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-
> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of James Harper
> Sent: Friday, 3 September 2010 10:36
> To: Jeremy Fitzhardinge
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s!
[udevd:4865]
> 
> >
> >  On 09/02/2010 05:20 PM, James Harper wrote:
> > > I've had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]"
> happen
> > > twice now and the server fairly quickly (over the course of 5-10
> > > minutes) becomes unusable after that happens. The call trace
appears
> to
> > > be:
> > >
> > > Call Trace:
> > >  [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa
> > >  [<ffffffff8100ef72>] ? check_events+0x12/0x20
> > >  [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7
> > >  [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b
> > >  [<ffffffff812b39a5>] ? page_fault+0x25/0x30
> > >  [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30
> > >  [<ffffffff81045265>] ? schedule_tail+0x92/0x96
> > >  [<ffffffff81011983>] ? ret_from_fork+0x13/0x80
> > >
> > > That was under 2.6.32.17-g69a73fa. I've since upgraded to
> > > 2.6.32.18-ge6b9b2c but the changelogs don't show anything
obviously
> > > related to the crash (but as usual there are a lot of them so
maybe
> I've
> > > missed something?)
> > >
> > > Is this a known problem under 2.6.32.17-g69a73fa that has since
been
> > > fixed?
> >
> > Hard to know; nothing springs to mind right now.  What else is going
> on
> > at the time?  What's the full softlockup message?  Is there any more
> > context?
> 
> I just noticed that it coincided pretty much exactly when a backup
runs.
> I take a snapshot in Dom0 then block-attach it to the linux DomU so it
> can be backed up from DomU which makes restores easier etc, rather
than
> doing the backup in Dom0. I hadn't noticed before because it's the
> Windows DomU that people complain about first, not the linux DomU.
> Block-attach/detach would explain why udevd is involved too which
makes
> more sense. The block-attach happens every hour on the hour. The full
> cut&paste of the first hang is:
> 
> Sep  3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref
> 1442, event-channel 13, protocol 1 (x86_64-abi)
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup -
> CPU#1 stuck for 61s! [udevd:4865]
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULO
> G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n
> f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback
> blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
> tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc
> ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c
> ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log
> dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata
> usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id
> e_core uhci_hc
> Sep  3 09:02:13 keechsvr2 kernel:  cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1:
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULO
> G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n
> f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback
> blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
> tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc
> ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c
> ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log
> dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata
> usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id
> e_core uhci_hc
> Sep  3 09:02:13 keechsvr2 kernel:  cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm:
udevd
> Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RIP:
> e030:[<ffffffff8100922a>]  [<ffffffff8100922a>]
> hypercall_page+0x22a/0x1001
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RSP:
> e02b:ffff8800114d5dd0  EFLAGS: 00000246
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RAX:
0000000000040000
> RBX: 0000000000000001 RCX: ffffffff8100922a
> keechsvr2:/usr/local/src/linux-2.6-xen# head -100 /var/log/kern.log
> Sep  3 07:00:44 keechsvr2 kernel: [228694.767691] blkback: ring-ref
311,
> event-channel 13, protocol 1 (x86_64-abi)
> Sep  3 08:00:42 keechsvr2 kernel: [232293.072248] blkback: ring-ref
> 1334, event-channel 13, protocol 1 (x86_64-abi)
> Sep  3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref
> 1442, event-channel 13, protocol 1 (x86_64-abi)
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup -
> CPU#1 stuck for 61s! [udevd:4865]
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULOG
> xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc
> nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo
> snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core
> i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix
> ata_generic libata usb_storage scsi_mod usbhid hid piix
ide_pci_generic
> bnx2 ehci_hcd ide_core uhci_hc
> Sep  3 09:02:13 keechsvr2 kernel:  cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1:
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULOG
> xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc
> nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo
> snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core
> i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix
> ata_generic libata usb_storage scsi_mod usbhid hid piix
ide_pci_generic
> bnx2 ehci_hcd ide_core uhci_hc
> Sep  3 09:02:13 keechsvr2 kernel:  cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm:
udevd
> Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RIP:
> e030:[<ffffffff8100922a>]  [<ffffffff8100922a>]
> hypercall_page+0x22a/0x1001
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RSP:
> e02b:ffff8800114d5dd0  EFLAGS: 00000246
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RAX:
0000000000040000
> RBX: 0000000000000001 RCX: ffffffff8100922a
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RDX:
ffffffff814293d0
> RSI: 0000000000000000 RDI: 0000000000000000
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] RBP:
00007fbc628a9800
> R08: 0000000000000000 R09: 0000000000000000
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] R10:
0000000000000002
> R11: 0000000000000246 R12: 0000000000000002
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] R13:
ffff8800114d5e88
> R14: ffff880017f9cfc0 R15: ffff880017eed500
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] FS:
> 00007fbc628a9770(0000) GS:ffff880002de8000(0000)
knlGS:0000000000000000
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] CS:  e033 DS: 0000
ES:
> 0000 CR0: 000000008005003b
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] CR2:
00007fbc628a9800
> CR3: 000000001e2d7000 CR4: 0000000000002660
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] DR0:
0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] DR3:
0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505] Call Trace:
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8100e8e9>]
> ? xen_force_evtchn_callback+0x9/0xa
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8100ef72>]
> ? check_events+0x12/0x20
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8100ef19>]
> ? xen_irq_enable_direct_end+0x0/0x7
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8102fcc1>]
> ? do_page_fault+0x9e/0x27b
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff812b39a5>]
> ? page_fault+0x25/0x30
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8116f28d>]
> ? __put_user_4+0x1d/0x30
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff81045265>]
> ? schedule_tail+0x92/0x96
> Sep  3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff81011983>]
> ? ret_from_fork+0x13/0x80
> 
> There's another hang at 09:03:09 which I can post if you'd find it
> useful.
> 
> If the block-attach happened at 09:00:42 and the crash happened at
> 09:02:13 then I assume that whatever udevd started doing started at
> 09:01:12 (eg 61 seconds earlier) which means that it's more likely to
be
> the block-detach at the end of the backup causing the hang not the
> block-attach. The backup runs pretty quick.
> 
> I'm inclined to turn off the snapshot backup if this isn't a known and
> resolved problem...
> 
> Thanks
> 
> James
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>