WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] xen: use freeze/restore/thaw PM events for suspe

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] xen: use freeze/restore/thaw PM events for suspend/resume/chkpt
From: Shriram Rajagopalan <rshriram@xxxxxxxxx>
Date: Wed, 16 Feb 2011 10:15:23 -0800
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 16 Feb 2011 10:17:28 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1297856633.21980.6293.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1297839080-17533-1-git-send-email-rshriram@xxxxxxxxx> <1297856633.21980.6293.camel@xxxxxxxxxxxxxxxxxxxxxx>
Reply-to: rshriram@xxxxxxxxx
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I didnt test the patch against the latest xen_suspend patch series you sent out. I
couldnt find it in any of the trees. And since you said earlier that the xen_hvm_suspend
fix would be (re)fixed to PM_FREEZE after my patch, I refrained from touching it.
But I did test with 2.6.38-rc1 32 bit kernel, PVHVM mode. It "seemed" to work fine for
save/restore/checkpoint. I could see the PM event messages in dmesg (freeze, thaw,
restore related timing stats)

On Wed, Feb 16, 2011 at 3:43 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Wed, 2011-02-16 at 06:51 +0000, Shriram Rajagopalan wrote:
>> Use PM_FREEZE, PM_THAW and PM_RESTORE power events for
>> suspend/resume/checkpoint functionality, instead of PM_SUSPEND
>> and PM_RESUME. Use of these pm events fixes the Xen Guest hangup
>> when taking checkpoints. When a suspend event is cancelled
>> (while taking checkpoints once/continuously), we use PM_THAW
>> instead of PM_RESUME. PM_RESTORE is used when suspend is not
>> cancelled. See Documentation/power/devices.txt and linux/pm.h
>> for more info about freeze, thaw and restore. The sequence of
>> pm events in a suspend-resume scenario is shown below.
>>
>>         dpm_suspend_start(PMSG_FREEZE);
>>
>>                 dpm_suspend_noirq(PMSG_FREEZE);
>>
>>                        sysdev_suspend(PMSG_FREEZE);
>>                        cancelled = suspend_hypercall()
>>                        sysdev_resume();
>>
>>                dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE);
>>
>>        dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE);
>
> With this patch I get
>
> [   18.902808] PM: Device pcspkr failed to freeze: error -22
> [   18.902835] xen suspend: dpm_suspend_start -22
>
> apparently due to a lack of CONFIG_HIBERNATE which is a prerequisite for
> using the freeze methods (see pm_ops function).
>
> As I mentioned earlier I think some of the CONFIG_PM_SLEEP tests in
> drivers/xen/manage.c need to be adjusted for the new suspend scheme (and
> I suspect they are a little wrong for the old one too).
>
> Since CONFIG_HIBERNATE is a "suspend to disk" option I think this needs
> running past the core pm guys to determine the correct approach, it
> might be to make PMSG_FREEZE support enabled by some some less specific
> configuration option.
>
> Enabling CONFIG_HIBERNATE does seem to be sufficient to make this work
> though.
>
> Ian.
>
On a related note, my initial kernel config had somehow enabled CONFIG_MICROCODE.
So, with a PV kernel (2.6.38-rc1), I got the following WARNING stack trace for
checkpoint & restore (ie freeze/thaw or freeze/restore)

Feb 16 06:02:35 rshriram-vm1 kernel: [  147.255561] PM: freeze of devices complete after 0.123 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.255603] PM: late freeze of devices complete after 0.035 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] ------------[ cut here ]------------
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] WARNING: at ...arch/x86/kernel/microcode_core.c:454 mc_sysdev_resume+0x30/0x5c()
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] Modules linked in:
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] Pid: 6, comm: migration/0 Not tainted 2.6.38-rc1-xenu #12
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] Call Trace:
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff810417db>] ? warn_slowpath_common+0x80/0x98
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c601>] ? cpu_stopper_thread+0x10d/0x172
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81041808>] ? warn_slowpath_null+0x15/0x17
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff810276c5>] ? mc_sysdev_resume+0x30/0x5c
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff812294f9>] ? __sysdev_resume+0x74/0xc4
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff812295ae>] ? sysdev_resume+0x65/0xa6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81204736>] ? xen_suspend+0xc4/0xcb
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c6f1>] ? stop_machine_cpu_stop+0x7d/0xb6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c674>] ? stop_machine_cpu_stop+0x0/0xb6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c5d7>] ? cpu_stopper_thread+0xe3/0x172
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff813ab106>] ? schedule+0x4e7/0x516
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81006cf2>] ? check_events+0x12/0x20
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c4f4>] ? cpu_stopper_thread+0x0/0x172
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81057438>] ? kthread+0x7d/0x85
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8100b724>] ? kernel_thread_helper+0x4/0x10
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8100ab36>] ? int_ret_from_sys_call+0x7/0x1b
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff813ac6a1>] ? retint_restore_args+0x5/0x6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8100b720>] ? kernel_thread_helper+0x0/0x10
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] ---[ end trace 24fdc8979bd6c62e ]---
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256346] PM: early restore of devices complete after 0.047 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.270496] PM: restore of devices complete after 13.106 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.279878] Setting capacity to 41943040
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.293516] Setting capacity to 41943040
Feb 16 06:04:29 rshriram-vm1 init: hvc0 main process ended, respawning

Feb 16 06:15:30 rshriram-vm1 kernel: [  906.776082] PM: freeze of devices complete after 0.161 msecs
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.776127] PM: late freeze of devices complete after 0.037 msecs
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] ------------[ cut here ]------------
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] WARNING: at ...arch/x86/kernel/microcode_core.c:454 mc_sysdev_resume+0x30/0x5c()
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] Modules linked in:
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] Pid: 6, comm: migration/0 Tainted: G        W   2.6.38-rc1-xenu #12
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] Call Trace:
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff810417db>] ? warn_slowpath_common+0x80/0x98
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c601>] ? cpu_stopper_thread+0x10d/0x172
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81041808>] ? warn_slowpath_null+0x15/0x17
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff810276c5>] ? mc_sysdev_resume+0x30/0x5c
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff812294f9>] ? __sysdev_resume+0x74/0xc4
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff812295ae>] ? sysdev_resume+0x65/0xa6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81204736>] ? xen_suspend+0xc4/0xcb
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c6f1>] ? stop_machine_cpu_stop+0x7d/0xb6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c674>] ? stop_machine_cpu_stop+0x0/0xb6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c5d7>] ? cpu_stopper_thread+0xe3/0x172
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff813ab106>] ? schedule+0x4e7/0x516
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cf2>] ? check_events+0x12/0x20
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c4f4>] ? cpu_stopper_thread+0x0/0x172
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81057438>] ? kthread+0x7d/0x85
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8100b724>] ? kernel_thread_helper+0x4/0x10
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8100ab36>] ? int_ret_from_sys_call+0x7/0x1b
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff813ac6a1>] ? retint_restore_args+0x5/0x6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8100b720>] ? kernel_thread_helper+0x0/0x10
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] ---[ end trace 24fdc8979bd6c62f ]---
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777060] PM: early thaw of devices complete after 0.045 msecs
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777060] PM: thaw of devices complete after 0.067 msecs

sysdev_resume() call we make in drivers/xen/manage.c results in calling [sysdev_drivers]->(resume)()
Looking at the microcode_core.c driver, the mc_sysdev resume function
raises this warning if more than 1 CPU is online during system resume.

If sysdev_resume took an arg like sysdev_suspend and called
appropriate [sysdev_drivers]->(thaw)() or (restore)(), we could supply (PM_THAW/PM_RESTORE)
and avoid this sort of warning.

I am not sure if this would fit in with the intended functionality of sysdev_resume()
function in drivers/base/sys.c.

Of course, disabling CONFIG_MICROCODE makes the warning go away but I was
thinking along the lines of a stock kernel config that has lots of things enabled.
Correct me if I am wrong about this.

shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel