WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved)

To: mbrown@xxxxxxxxxxxxxxxxxxxxxxxxx
Subject: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved)
From: Matthias Bannach <matthias@xxxxxxxxxxx>
Date: Thu, 01 Sep 2011 21:12:19 -0400
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 02 Sep 2011 02:26:21 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4E5E8089.40801@xxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Organization: Bannach.net
References: <4E5E8089.40801@xxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20110812 Thunderbird/6.0
All,

Ha - finally - solved. Guess google is not the answer, searching the
mailing list is. After much frustration I found the following:

http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.27

based on a post by Marco Marongiu

http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-part-4-and-last

For me lockup solution #2 worked:

# DomU and Dom0
# in /etc/sysctl.conf
clocksource=jiffies
independent_wallclock=0
# then sysctl -p

# in /etc/xen/*.conf
extra="clocksource=jiffies"

And voila - no more lockups, nothing with the motherboards (which I
thought not to be the cause based on success with non-xen configurations)

Not sure if this is a kernel or XEN problem though.

Hope this helps others

On 8/31/2011 2:42 PM, Mark Brown wrote:
> Hello,
> 
> Similar to others I have freezeups on the system, it is consistent with
> high IO load. If the system runs (even with multiple) XenU it does not
> happen. But I can consistently force the situation to occur.
> 
> Running 4 dd processes dumping 20GB each on a LVM/mdadm soft RAID5
> volume it consistenly crashes in a DomU. Running without XEN I do not
> see the problem at all - (e.g. after about 3TB of read/write) nothing
> happened.
> 
> Any suggestion would be very welcome.
> 
> Marc
> 
> [ .. more .. ]
> It appears to be very unpredictable of when it actually occurs, here are
> a few examples. Kind of odd that on Aug29th it always happened on the
> same second ;-{.
> 
>> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.560009] BUG: soft 
>> lockup - CPU#0 stuck for 146s! [events/0:9]
>> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.561016] BUG: soft 
>> lockup - CPU#1 stuck for 146s! [rsyslogd:2024]
>> syslog.2:Aug 29 22:57:27 nwsc-xen-Q45 kernel: [ 4198.404353] BUG: soft 
>> lockup - CPU#0 stuck for 122s! [md1_raid5:1243]
>> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.336110] BUG: soft 
>> lockup - CPU#0 stuck for 101s! [xend:2583]
>> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.337007] BUG: soft 
>> lockup - CPU#1 stuck for 101s! [bdi-default:19]
>> syslog.2:Aug 29 23:12:27 nwsc-xen-Q45 kernel: [ 5098.304013] BUG: soft 
>> lockup - CPU#0 stuck for 136s! [blkback.5.xvdd1:7226]
>> syslog.2:Aug 29 23:12:27 nwsc-xen-Q45 kernel: [ 5098.305010] BUG: soft 
>> lockup - CPU#1 stuck for 136s! [sh:7262]
>> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.596016] BUG: soft 
>> lockup - CPU#0 stuck for 73s! [xend:2506]
>> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.597555] BUG: soft 
>> lockup - CPU#1 stuck for 73s! [md0_raid5:598]
>> syslog.6:Aug 17 12:17:08 nwsc-xen-Q45 kernel: [ 3598.534068] BUG: soft 
>> lockup - CPU#1 stuck for 150s! [xend:2506]
> 
> It does not appear to relate to a specific process. (Those above are
> from Xen 4.0.1 with Debian 2.6.32-5-xen-amd64).
> 
> This one is with Xen 4.1.2-rc2-pre/Debian 2.6.32-5-xen-amd64. Both are
> on Intel DQ45CB board with 4GB ram.
> 
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348062] BUG: soft lockup - CPU#0 
>> stuck for 79s! [xend:2767]
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348073] Modules linked in: 
>> xt_tcpudp xt_physdev iptable_filter ip_tables x_ta                    bles 
>> ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage 
>> raid456 md_mod async_raid6_recov async_                    pq raid6_pq 
>> async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 
>> loop sr_mod cdrom sg xenfs xen_evtc                    hn bridge stp 3w_9xxx 
>> usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel 
>> snd_hda_codec snd_hwdep snd_pcm_oss                     snd_mixer_oss 
>> snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer 
>> snd_seq_device firewire_ohci psmouse                     i2c_i801 video 
>> firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci 
>> pcspkr ehci_hcd i2c_core usbcor                    e nls_base e1000e button 
>> ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor 
>> thermal_sys acpi_processo                   
 
> r
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348219] CPU 0:
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348222] Modules linked in: 
>> xt_tcpudp xt_physdev iptable_filter ip_tables x_ta                    bles 
>> ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage 
>> raid456 md_mod async_raid6_recov async_                    pq raid6_pq 
>> async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 
>> loop sr_mod cdrom sg xenfs xen_evtc                    hn bridge stp 3w_9xxx 
>> usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel 
>> snd_hda_codec snd_hwdep snd_pcm_oss                     snd_mixer_oss 
>> snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer 
>> snd_seq_device firewire_ohci psmouse                     i2c_i801 video 
>> firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci 
>> pcspkr ehci_hcd i2c_core usbcor                    e nls_base e1000e button 
>> ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor 
>> thermal_sys acpi_processo                   
 
> r
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348318] Pid: 2767, comm: xend 
>> Not tainted 2.6.32-5-xen-amd64 #1
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348322] RIP: 
>> e033:[<00007fa4064c0289>]  [<00007fa4064c0289>] 0x7fa4064c0289
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348330] RSP: 
>> e02b:00007fa402ee54a0  EFLAGS: 00000206
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348334] RAX: 0000000001c3a320 
>> RBX: 0000000001f8ace0 RCX: 00007fa40650f844
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348338] RDX: ffffffffffffffe0 
>> RSI: 0000000000000000 RDI: 00007fa4067a9e40
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348341] RBP: 0000000000000000 
>> R08: 0000000000000008 R09: 0000000000000001
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348345] R10: 0000000000000000 
>> R11: 0000000000000246 R12: 00007fa4067a9e40
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348349] R13: 00007fa402ee555c 
>> R14: 00007fa402ee5548 R15: 00000000ffffffff
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348356] FS:  
>> 00007fa402ee6700(0000) GS:ffff880002995000(0000) knlGS:000000000             
>>        0000000
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348360] CS:  e033 DS: 0000 ES: 
>> 0000 CR0: 000000008005003b
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348363] CR2: 00007fb2ed832e28 
>> CR3: 00000000bba8e000 CR4: 0000000000002660
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348367] DR0: 0000000000000000 
>> DR1: 0000000000000000 DR2: 0000000000000000
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348371] DR3: 0000000000000000 
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348375] Call Trace:
>>
>> Aug 31 13:07:51 nwsc-xen-Q45 init: Id "T1" respawning too fast: disabled for 
>> 5 minutes
> 


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>