|
|
|
|
|
|
|
|
|
|
xen-users
Re: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved)
Ian,
yes - it does. Usually the DomU would crash after about 4-20 GB of heavy
IO. After the changed configuration (see below) I was able to transfer >
1TB of data and it yet has to crash.
My guess is that somehow the clock-time gets affected by some
(?marginal) value and causes the lockup.
Thanks a lot to Marco Marongiu for the detailed and well written post.
Marc
On 9/2/2011 5:57 AM, Ian Tobin wrote:
> Hi,
>
> Are you saying this one worked?
>
> # in /etc/xen/*.conf
> extra="clocksource=jiffies"
>
> we have the same issue with one of our DomUs (CentOS)
>
> thanks
>
> Ian
>
>
>
> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Matthias
> Bannach
> Sent: 02 September 2011 02:12
> To: mbrown@xxxxxxxxxxxxxxxxxxxxxxxxx
> Cc: xen-users@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved)
>
> All,
>
> Ha - finally - solved. Guess google is not the answer, searching the
> mailing list is. After much frustration I found the following:
>
> http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.2
> 7
>
> based on a post by Marco Marongiu
>
> http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-pa
> rt-4-and-last
>
> For me lockup solution #2 worked:
>
> # DomU and Dom0
> # in /etc/sysctl.conf
> clocksource=jiffies
> independent_wallclock=0
> # then sysctl -p
>
> # in /etc/xen/*.conf
> extra="clocksource=jiffies"
>
> And voila - no more lockups, nothing with the motherboards (which I
> thought not to be the cause based on success with non-xen
> configurations)
>
> Not sure if this is a kernel or XEN problem though.
>
> Hope this helps others
>
> On 8/31/2011 2:42 PM, Mark Brown wrote:
>> Hello,
>>
>> Similar to others I have freezeups on the system, it is consistent
>> with high IO load. If the system runs (even with multiple) XenU it
>> does not happen. But I can consistently force the situation to occur.
>>
>> Running 4 dd processes dumping 20GB each on a LVM/mdadm soft RAID5
>> volume it consistenly crashes in a DomU. Running without XEN I do not
>> see the problem at all - (e.g. after about 3TB of read/write) nothing
>> happened.
>>
>> Any suggestion would be very welcome.
>>
>> Marc
>>
>> [ .. more .. ]
>> It appears to be very unpredictable of when it actually occurs, here
>> are a few examples. Kind of odd that on Aug29th it always happened on
>> the same second ;-{.
>>
>>> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.560009] BUG:
>>> soft lockup - CPU#0 stuck for 146s! [events/0:9] syslog.2:Aug 29
>>> 17:35:47 nwsc-xen-Q45 kernel: [ 2698.561016] BUG: soft lockup - CPU#1
>
>>> stuck for 146s! [rsyslogd:2024] syslog.2:Aug 29 22:57:27 nwsc-xen-Q45
>
>>> kernel: [ 4198.404353] BUG: soft lockup - CPU#0 stuck for 122s!
>>> [md1_raid5:1243] syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [
>>> 4798.336110] BUG: soft lockup - CPU#0 stuck for 101s! [xend:2583]
>>> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.337007] BUG:
>>> soft lockup - CPU#1 stuck for 101s! [bdi-default:19] syslog.2:Aug 29
>>> 23:12:27 nwsc-xen-Q45 kernel: [ 5098.304013] BUG: soft lockup - CPU#0
>
>>> stuck for 136s! [blkback.5.xvdd1:7226] syslog.2:Aug 29 23:12:27
>>> nwsc-xen-Q45 kernel: [ 5098.305010] BUG: soft lockup - CPU#1 stuck
>>> for 136s! [sh:7262] syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [
>>> 2998.596016] BUG: soft lockup - CPU#0 stuck for 73s! [xend:2506]
>>> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.597555] BUG:
>>> soft lockup - CPU#1 stuck for 73s! [md0_raid5:598] syslog.6:Aug 17
>>> 12:17:08 nwsc-xen-Q45 kernel: [ 3598.534068] BUG: soft lockup - CPU#1
>
>>> stuck for 150s! [xend:2506]
>>
>> It does not appear to relate to a specific process. (Those above are
>> from Xen 4.0.1 with Debian 2.6.32-5-xen-amd64).
>>
>> This one is with Xen 4.1.2-rc2-pre/Debian 2.6.32-5-xen-amd64. Both are
>
>> on Intel DQ45CB board with 4GB ram.
>>
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348062] BUG: soft lockup
> - CPU#0 stuck for 79s! [xend:2767]
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348073] Modules linked
> in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta
> bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt
> usb_storage raid456 md_mod async_raid6_recov async_
> pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache
> firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn
> bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event
> snd_seq snd_timer snd_seq_device firewire_ohci psmouse
> i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output
> serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor
> e nls_base e1000e button ata_generic soundcore snd_page_alloc libata
> thermal scsi_mod processor thermal_sys acpi_processo
>
>> r
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348219] CPU 0:
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348222] Modules linked
> in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta
> bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt
> usb_storage raid456 md_mod async_raid6_recov async_
> pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache
> firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn
> bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog
> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event
> snd_seq snd_timer snd_seq_device firewire_ohci psmouse
> i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output
> serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor
> e nls_base e1000e button ata_generic soundcore snd_page_alloc libata
> thermal scsi_mod processor thermal_sys acpi_processo
>
>> r
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348318] Pid: 2767, comm:
>>> xend Not tainted 2.6.32-5-xen-amd64 #1 Aug 31 13:05:41 nwsc-xen-Q45
>>> kernel: [ 4039.348322] RIP: e033:[<00007fa4064c0289>]
>>> [<00007fa4064c0289>] 0x7fa4064c0289 Aug 31 13:05:41 nwsc-xen-Q45
>>> kernel: [ 4039.348330] RSP: e02b:00007fa402ee54a0 EFLAGS: 00000206
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348334] RAX:
>>> 0000000001c3a320 RBX: 0000000001f8ace0 RCX: 00007fa40650f844 Aug 31
>>> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348338] RDX: ffffffffffffffe0
> RSI: 0000000000000000 RDI: 00007fa4067a9e40 Aug 31 13:05:41 nwsc-xen-Q45
> kernel: [ 4039.348341] RBP: 0000000000000000 R08: 0000000000000008 R09:
> 0000000000000001 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348345]
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4067a9e40 Aug 31
> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348349] R13: 00007fa402ee555c R14:
> 00007fa402ee5548 R15: 00000000ffffffff
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348356] FS:
> 00007fa402ee6700(0000) GS:ffff880002995000(0000) knlGS:000000000
> 0000000
>>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348360] CS: e033 DS:
>>> 0000 ES: 0000 CR0: 000000008005003b Aug 31 13:05:41 nwsc-xen-Q45
>>> kernel: [ 4039.348363] CR2: 00007fb2ed832e28 CR3: 00000000bba8e000
>>> CR4: 0000000000002660 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [
>>> 4039.348367] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348371]
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 31
> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348375] Call Trace:
>>>
>>> Aug 31 13:07:51 nwsc-xen-Q45 init: Id "T1" respawning too fast:
>>> disabled for 5 minutes
>>
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
>
>
>
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
|
|
|
|