xen-devel
Re: [Xen-devel] Prepping a bugfix push
To: |
Jeremy Fitzhardinge <jeremy@xxxxxxxx> |
Subject: |
Re: [Xen-devel] Prepping a bugfix push |
From: |
Brendan Cully <brendan@xxxxxxxxx> |
Date: |
Thu, 3 Dec 2009 11:35:40 -0800 |
Cc: |
Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> |
Delivery-date: |
Thu, 03 Dec 2009 11:36:05 -0800 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<4B1810DF.40309@xxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
Mail-followup-to: |
jeremy@xxxxxxxx, Ian.Campbell@xxxxxxxxxxxxx, konrad.wilk@xxxxxxxxxx, pbonzini@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx |
References: |
<4B1810DF.40309@xxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
User-agent: |
Mutt/1.5.20 (2009-10-28) |
Not a patch, but I've just tried out xm save -c again with the latest
xen changes, and while I no longer see the grant table version panic,
the guest's devices (aside from the console) appear to be wedged on
resume. Is anyone else seeing this?
After a while on the console I see messages like this:
INFO: task syslogd:2219 blocked for more than 120 seconds.
which I assume is trouble with the block device.
On Thursday, 03 December 2009 at 11:26, Jeremy Fitzhardinge wrote:
> I'm preparing a general bugfix push for Linus, targeted at both current
> linux-2.6.git and stable. The list of patches I have lined up (in the
> "bugfix" branch) are below. Is there anything I've overlooked? Are
> there any patches I've forgotten to apply altogether?
>
> (Note, this is all domU stuff; dom0 things will need to mature a bit.)
>
> Thanks,
> J
>
> commit b4606f2165153833247823e8c04c5e88cb3d298b
> Author: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Date: Tue Dec 1 11:47:15 2009 +0000
>
> xen: explicitly create/destroy stop_machine workqueues outside
> suspend/resume region.
>
> I have observed cases where the implicit stop_machine_destroy() done by
> stop_machine() hangs while destroying the workqueues, specifically in
> kthread_stop(). This seems to be because timer ticks are not restarted
> until after stop_machine() returns.
>
> Fortunately stop_machine provides a facility to pre-create/post-destroy
> the workqueues so use this to ensure that workqueues are only destroyed
> after everything is really up and running again.
>
> I only actually observed this failure with 2.6.30. It seems that newer
> kernels are somehow more robust against doing kthread_stop() without timer
> interrupts (I tried some backports of some likely looking candidates but
> did not track down the commit which added this robustness). However this
> change seems like a reasonable belt&braces thing to do.
>
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit 65f63384b391bf4d384327d8a7c6de9860290b5c
> Author: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Date: Tue Dec 1 11:47:14 2009 +0000
>
> xen: improve error handling in do_suspend.
>
> The existing error handling has a few issues:
> - If freeze_processes() fails it exits with shutting_down =
> SHUTDOWN_SUSPEND.
> - If dpm_suspend_noirq() fails it exits without resuming xenbus.
> - If stop_machine() fails it exits without resuming xenbus or calling
> dpm_resume_end().
> - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq()
> were not
> nested in the obvious way.
>
> Fix by ensuring each failure case goto's the correct label. Treat a
> failure of
> stop_machine() as a cancelled suspend in order to follow the correct
> resume
> path.
>
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit fed5ea87e02aaf902ff38c65b4514233db03dc09
> Author: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Date: Tue Dec 1 16:15:30 2009 +0000
>
> xen: don't leak IRQs over suspend/resume.
>
> On resume irq_info[*].evtchn is reset to 0 since event channel mappings
> are not preserved over suspend/resume. The other contents of irq_info
> is preserved to allow rebind_evtchn_irq() to function.
>
> However when a device resumes it will try to unbind from the
> previous IRQ (e.g. blkfront goes blkfront_resume() -> blkif_free() ->
> unbind_from_irqhandler() -> unbind_from_irq()). This will fail due to the
> check for VALID_EVTCHN in unbind_from_irq() and the IRQ is leaked. The
> device will then continue to resume and allocate a new IRQ, eventually
> leading to find_unbound_irq() panic()ing.
>
> Fix this by changing unbind_from_irq() to handle teardown of interrupts
> which have type!=IRQT_UNBOUND but are not currently bound to a specific
> event channel.
>
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit f6eafe3665bcc374c66775d58312d1c06c55303f
> Author: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> Date: Wed Nov 25 14:12:08 2009 +0000
>
> xen: call clock resume notifier on all CPUs
>
> tick_resume() is never called on secondary processors. Presumably this
> is because they are offlined for suspend on native and so this is
> normally taken care of in the CPU onlining path. Under Xen we keep all
> CPUs online over a suspend.
>
> This patch papers over the issue for me but I will investigate a more
> generic, less hacky, way of doing to the same.
>
> tick_suspend is also only called on the boot CPU which I presume should
> be fixed too.
>
> Signed-off-by: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> commit 6aaf5d633bb6cead81b396d861d7bae4b9a0ba7e
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Date: Wed Nov 25 13:15:38 2009 -0800
>
> xen: use iret for return from 64b kernel to 32b usermode
>
> If Xen wants to return to a 32b usermode with sysret it must use the
> right form. When using VCGF_in_syscall to trigger this, it looks at
> the code segment and does a 32b sysret if it is FLAT_USER_CS32.
> However, this is different from __USER32_CS, so it fails to return
> properly if we use the normal Linux segment.
>
> So avoid the whole mess by dropping VCGF_in_syscall and simply use
> plain iret to return to usermode.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Acked-by: Jan Beulich <jbeulich@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit 922cc38ab71d1360978e65207e4a4f4988987127
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Date: Tue Nov 24 09:58:49 2009 -0800
>
> xen: don't call dpm_resume_noirq() with interrupts disabled.
>
> dpm_resume_noirq() takes a mutex, so it can't be called from a
> no-interrupt
> context. Don't call it from within the stop-machine function, but just
> afterwards, since we're resuming anyway, regardless of what happened.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit 499d19b82b586aef18727b9ae1437f8f37b66e91
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Date: Tue Nov 24 09:38:25 2009 -0800
>
> xen: register runstate info for boot CPU early
>
> printk timestamping uses sched_clock, which in turn relies on runstate
> info under Xen. So make sure we set it up before any printks can
> be called.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit 028896721ac04f6fa0697f3ecac3f98761746363
> Author: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Date: Tue Nov 24 09:32:48 2009 -0800
>
> xen: register runstate on secondary CPUs
>
> The commit "xen: re-register runstate area earlier on resume" caused us
> to never try and setup the runstate area for secondary CPUs. Ensure that
> we do this...
>
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit f350c7922faad3397c98c81a9e5658f5a1ef0214
> Author: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Date: Tue Nov 24 10:16:23 2009 +0000
>
> xen: register timer interrupt with IRQF_TIMER
>
> Otherwise the timer is disabled by dpm_suspend_noirq() which in turn
> prevents
> correct operation of stop_machine on multi-processor systems and breaks
> suspend.
>
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit fa24ba62ea2869308ffc9f0b286ac9650b4ca6cb
> Author: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Date: Sat Nov 21 11:32:49 2009 +0000
>
> xen: correctly restore pfn_to_mfn_list_list after resume
>
> pvops kernels >= 2.6.30 can currently only be saved and restored once. The
> second attempt to save results in:
>
> ERROR Internal error: Frame# in pfn-to-mfn frame list is not in
> pseudophys
> ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max
> 0x120000
> ERROR Internal error: Failed to map/save the p2m frame list
>
> I finally narrowed it down to:
>
> commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Date: Fri Feb 27 15:34:59 2009 -0800
>
> xen: split construction of p2m mfn tables from registration
>
> Build the p2m_mfn_list_list early with the rest of the p2m
> table, but
> register it later when the real shared_info structure is in
> place.
>
> Signed-off-by: Jeremy Fitzhardinge
> <jeremy.fitzhardinge@xxxxxxxxxx>
>
> The unforeseen side-effect of this change was to cause the mfn list list
> to not
> be rebuilt on resume. Prior to this change it would have been rebuilt via
> xen_post_suspend() -> xen_setup_shared_info() ->
> xen_setup_mfn_list_list().
>
> Fix by explicitly calling xen_build_mfn_list_list() from
> xen_post_suspend().
>
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit 3905bb2aa7bb801b31946b37a4635ebac4009051
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Date: Sat Nov 21 08:46:29 2009 +0800
>
> xen: restore runstate_info even if !have_vcpu_info_placement
>
> Even if have_vcpu_info_placement is not set, we still need to set up
> the runstate area on each resumed vcpu.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit be012920ecba161ad20303a3f6d9e96c58cf97c7
> Author: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> Date: Sat Nov 21 08:35:55 2009 +0800
>
> xen: re-register runstate area earlier on resume.
>
> This is necessary to ensure the runstate area is available to
> xen_sched_clock before any calls to printk which will require it in
> order to provide a timestamp.
>
> I chose to pull the xen_setup_runstate_info out of xen_time_init into
> the caller in order to maintain parity with calling
> xen_setup_runstate_info separately from calling xen_time_resume.
>
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
> commit ae7888012969355a548372e99b066d9e31153b62
> Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Date: Wed Jul 8 12:27:39 2009 +0200
>
> xen: wait up to 5 minutes for device connetion
>
> Increases the device timeout from 10s to 5 minutes, giving the user a
> visual indication during that time in case there are problems. The patch
> is a backport of changesets 144 and 150 in the Xenbits tree.
>
> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
>
> commit f8dc33088febc63286b7a60e6b678de8e064de8e
> Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Date: Wed Jul 8 12:27:38 2009 +0200
>
> xen: improvement to wait_for_devices()
>
> When printing a warning about a timed-out device, print the
> current state of both ends of the device connection (i.e., backend as
> well as frontend). This backports half of changeset 146 from the
> Xenbits tree.
>
> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
>
> commit c6e1971139be1342902873181f3b80a979bfb33b
> Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Date: Wed Jul 8 12:27:37 2009 +0200
>
> xen: fix is_disconnected_device/exists_disconnected_device
>
> The logic of is_disconnected_device/exists_disconnected_device is wrong
> in that they are used to test whether a device is trying to connect (i.e.
> connecting). For this reason the patch fixes them to not consider a
> Closing or Closed device to be connecting. At the same time the patch
> also renames the functions according to what they really do; you could
> say a closed device is "disconnected" (the old name), but not "connecting"
> (the new name).
>
> This patch is a backport of changeset 909 from the Xenbits tree.
>
> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
>
> commit db05fed0ad72f264e39bcb366795f7367384ec92
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Date: Tue Nov 24 16:41:47 2009 -0800
>
> xen/xenbus: make DEVICE_ATTR()s static
>
> They don't need to be global, and may cause linker clashes.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
> Cc: Stable Kernel <stable@xxxxxxxxxx>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|