WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Prepping a bugfix push

To: Ian.Campbell@xxxxxxxxxxxxx, konrad.wilk@xxxxxxxxxx, pbonzini@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] Prepping a bugfix push
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Thu, 03 Dec 2009 17:10:42 -0800
Cc:
Delivery-date: Thu, 03 Dec 2009 17:11:11 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20091204002406.GB5897@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4B1810DF.40309@xxxxxxxx> <20091203193540.GB4228@xxxxxxxxxxxxxxxxx> <4B184830.7070107@xxxxxxxx> <20091204002406.GB5897@xxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091125 Fedora/3.0-3.12.rc1.fc12 Lightning/1.0pre Thunderbird/3.0
On 12/03/09 16:24, Brendan Cully wrote:
On Thursday, 03 December 2009 at 15:22, Jeremy Fitzhardinge wrote:
On 12/03/09 11:35, Brendan Cully wrote:
Not a patch, but I've just tried out xm save -c again with the latest
xen changes, and while I no longer see the grant table version panic,
the guest's devices (aside from the console) appear to be wedged on
resume. Is anyone else seeing this?

After a while on the console I see messages like this:

INFO: task syslogd:2219 blocked for more than 120 seconds.

which I assume is trouble with the block device.
Is there any backtrace or other info associated with this?
I take it you couldn't reproduce? Let me know what you want
captured. In the meantime, here's the kernel log:

Nope, works for me :) Ian has been doing testing with hundreds or thousands of suspend/resumes, so it *can* be reliable.

That said, I am seeing some odd hangs after suspend/resume with the bugfix branch merged into Linux mainline...

Can you ping the domain when its in that state?

    J

Dec  4 00:16:12 p32 kernel: suspending xenstore...
Dec  4 00:16:12 p32 kernel: Grant tables using version 2 layout.

Here's a backtrace after the block message:

INFO: task kjournald:880 blocked for more than 120 seconds.
"echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D 00000000     0   880      2 0x00000000
  cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
  cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
  c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
Call Trace:
  [<c1006f04>] ? check_events+0x8/0xc
  [<c137b0a8>] io_schedule+0x1e/0x28
  [<c10c2555>] sync_buffer+0x33/0x37
  [<c137b3a1>] __wait_on_bit+0x45/0x62
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c1049b7e>] ? wake_bit_function+0x0/0x47
  [<c10c24e5>] __wait_on_buffer+0x23/0x25
  [<c1102940>] journal_commit_transaction+0x50b/0xdeb
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
  [<c1105b66>] kjournald+0xab/0x1f0
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1105abb>] ? kjournald+0x0/0x1f0
  [<c1049a85>] kthread+0x74/0x7f
  [<c1049a11>] ? kthread+0x0/0x7f
  [<c1009b37>] kernel_thread_helper+0x7/0x10

and here's the result of xm sysrq<dom>  l
SysRq : Show backtrace of all active CPUs

and xm sysrq<dom>  w

SysRq : Show Blocked State
   task                PC stack   pid father
pdflush       D cf40297c     0   150      2 0x00000000
  cf95dec8 00000246 00000000 cf40297c c1550280 c1553680 c10067df c1550280
  00000000 c152f380 c152f380 c152f380 c1550280 cf8b3130 cf8b32bc d0808380
  c1553680 ffff995d cf95dec8 c1006f04 c1550280 c1553a6c c1550280 c1006efb
Call Trace:
  [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
  [<c1006f04>] ? check_events+0x8/0xc
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137b1cb>] schedule_timeout+0xd6/0x13e
  [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
  [<c103fc07>] ? process_timeout+0x0/0xa
  [<c137a8ea>] io_schedule_timeout+0x1e/0x28
  [<c108e9a8>] congestion_wait+0x51/0x66
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1083ef3>] wb_kupdate+0xb3/0xfa
  [<c10842d5>] ? pdflush+0x0/0x205
  [<c10843d5>] pdflush+0x100/0x205
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c1083e40>] ? wb_kupdate+0x0/0xfa
  [<c1049a85>] kthread+0x74/0x7f
  [<c1049a11>] ? kthread+0x0/0x7f
  [<c1009b37>] kernel_thread_helper+0x7/0x10
kjournald     D 00000000     0   880      2 0x00000000
  cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
  cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
  c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
Call Trace:
  [<c1006f04>] ? check_events+0x8/0xc
  [<c137b0a8>] io_schedule+0x1e/0x28
  [<c10c2555>] sync_buffer+0x33/0x37
  [<c137b3a1>] __wait_on_bit+0x45/0x62
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
  [<c10c2522>] ? sync_buffer+0x0/0x37
  [<c1049b7e>] ? wake_bit_function+0x0/0x47
  [<c10c24e5>] __wait_on_buffer+0x23/0x25
  [<c1102940>] journal_commit_transaction+0x50b/0xdeb
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
  [<c1105b66>] kjournald+0xab/0x1f0
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1105abb>] ? kjournald+0x0/0x1f0
  [<c1049a85>] kthread+0x74/0x7f
  [<c1049a11>] ? kthread+0x0/0x7f
  [<c1009b37>] kernel_thread_helper+0x7/0x10
syslogd       D 00000000     0  2219      1 0x00000000
  cfb33e40 00000286 00000000 00000000 0000aa70 00000000 c10067df cfb7026c
  00000001 c152f380 c152f380 c152f380 00000000 cfa2a130 cfa2a2bc d0808380
  cfb33e64 cfb70254 c1006efb c137c4f4 c10282ff cfb33e58 cfb33e40 c1049d05
Call Trace:
  [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c10282ff>] ? __wake_up+0x3a/0x42
  [<c1049d05>] ? prepare_to_wait+0x41/0x46
  [<c110535f>] log_wait_commit+0x87/0xef
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1100ca8>] journal_stop+0x1dc/0x29f
  [<c110156d>] ? journal_start+0x88/0xb2
  [<c11015b4>] journal_force_commit+0x1d/0x1f
  [<c10f2d8a>] ext3_force_commit+0x1c/0x22
  [<c10ed096>] ext3_write_inode+0x2d/0x3b
  [<c10bd21f>] writeback_single_inode+0x1f5/0x2be
  [<c108326f>] ? do_writepages+0x37/0x39
  [<c10bdaa1>] sync_inode+0x1f/0x37
  [<c10e9409>] ext3_sync_file+0x99/0xb0
  [<c10c06fc>] vfs_fsync+0x73/0xb1
  [<c10c075c>] do_fsync+0x22/0x32
  [<c10c078b>] sys_fsync+0xd/0xf
  [<c1009058>] sysenter_do_call+0x12/0x2c
getty         D cf402840     0  2330      1 0x00000004
  cfb23cc0 00000286 00000000 cf402840 00000000 00000000 00000000 00000000
  00000000 c152f380 c152f380 c152f380 00000000 cf8740f0 cf87427c d0808380
  cfb23d14 c19202c8 c1006efb c137c4f4 00000002 cfb23d08 cfb23cc0 c1049d05
Call Trace:
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
  [<c1049d05>] ? prepare_to_wait+0x41/0x46
  [<c1101016>] do_get_write_access+0x2ab/0x3cd
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c1049b7e>] ? wake_bit_function+0x0/0x47
  [<c1101153>] journal_get_write_access+0x1b/0x2a
  [<c10f72c2>] __ext3_journal_get_write_access+0x19/0x3f
  [<c10eab31>] ext3_reserve_inode_write+0x34/0x6a
  [<c10eab7e>] ext3_mark_inode_dirty+0x17/0x2e
  [<c10eda05>] ext3_dirty_inode+0x6b/0x6d
  [<c107ce9c>] ? file_remove_suid+0x1b/0x59
  [<c10bd91d>] __mark_inode_dirty+0x23/0xd2
  [<c10b6618>] file_update_time+0x76/0xb3
  [<c107e0b9>] __generic_file_aio_write_nolock+0x235/0x4b3
  [<c107f0e5>] ? filemap_fault+0x76/0x3b1
  [<c107e5a5>] generic_file_aio_write+0x5d/0xd1
  [<c10e9267>] ext3_file_write+0x27/0xa5
  [<c10a527a>] do_sync_write+0xcd/0x103
  [<c1049b46>] ? autoremove_wake_function+0x0/0x38
  [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c10d2c2c>] ? locks_free_lock+0x31/0x40
  [<c10d4103>] ? fcntl_setlk+0x3e/0x185
  [<c10a596e>] vfs_write+0x8b/0x13f
  [<c137e474>] ? do_page_fault+0x1b4/0x3a0
  [<c10a51ad>] ? do_sync_write+0x0/0x103
  [<c10a5f1d>] sys_write+0x3d/0x64
  [<c1009058>] sysenter_do_call+0x12/0x2c
Sched Debug Version: v0.09, 2.6.31.6 #8
now at 194118.700060 msecs
   .jiffies                                 : 4294940775
   .sysctl_sched_latency                    : 20.000000
   .sysctl_sched_min_granularity            : 4.000000
   .sysctl_sched_wakeup_granularity         : 5.000000
   .sysctl_sched_child_runs_first           : 0.000001
   .sysctl_sched_features                   : 113917

cpu#0, 2500.152 MHz
   .nr_running                    : 1
   .load                          : 3121
   .nr_switches                   : 17143
   .nr_load_updates               : 48354
   .nr_uninterruptible            : 4
   .next_balance                  : 4294.940896
   .curr->pid                     : 14
   .clock                         : 193919.954084
   .cpu_load[0]                   : 0
   .cpu_load[1]                   : 0
   .cpu_load[2]                   : 0
   .cpu_load[3]                   : 0
   .cpu_load[4]                   : 0

cfs_rq[0]:
   .exec_clock                    : 0.000000
   .MIN_vruntime                  : 0.000001
   .min_vruntime                  : 13296.631519
   .max_vruntime                  : 0.000001
   .spread                        : 0.000000
   .spread0                       : 0.000000
   .nr_running                    : 1
   .load                          : 3121
   .nr_spread_over                : 0

rt_rq[0]:
   .rt_nr_running                 : 0
   .rt_throttled                  : 0
   .rt_time                       : 0.000000
   .rt_runtime                    : 950.000000

runnable tasks:
             task   PID         tree-key  switches  prio     exec-runtime       
  sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R       xenwatch    14     13276.942024       146   115               0         
      0               0.000000               0.000000               0.000000



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel