[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] xenbus: Use .freeze/.thaw to handle xenbus devices


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jason Andryuk <jason.andryuk@xxxxxxx>
  • Date: Mon, 1 Dec 2025 13:20:40 -0500
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=invisiblethingslab.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I52DyuTMlAwzXRPy6YzI1itobKzOSJLCLxnFKBFnQTQ=; b=EyyWRruGmCYqkOuWKIk4DDtKMpF36hZt6UMh7u55PL7M7OVZBWxP/VitinL0rl6xk36HSQa1nYzTUOsZ/bDXOMYj6x0kMagygD5P51kx6v9JRyUWsKsZVa9fLSD2oT4Lx1Um1ZCGZKSkP+gnV7vIGPrDk7pL/OGkJv87FMUS97koZ+twXepdh+VEQx3J/wg/40yLi9mHqDipIUxBk7nSZzDMjH5n8iRefjEqQWgZaY1OUu3vbHFIgZepQMEEopeRq30r3WnEFHX5rQAvYoxAksEkLhmsdcpu0woHVfUtuVulnbxBP8YL/+i4lmB/EuWMLgp87sOI+ktF+t6DHsvOCg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FEOF4oz3QZ93rQQqLCdqepRsU6E2mTxvxn4aQDrdxQPxBsVCvKxSzZpii6JTxwm8jM85Bu+kxwJDxBm9frLtDYSCCCQ36qJHe2SseRlGsH+QCo3bt8W6I2JDQlQ3mz1msIfWxSnoudTAldlUoEzsJh3p1lu5rHHe+EdPoWaNAECEDGG/yWtujlyFrRweG/sYLaDpOUZXOeW5D++JNuZA7SZpMFA58IthnFK2QbZDiReewdWhzJfUNxGBMhLj3+TY0vkf4XkpXnC9yFi0UC2kMXYucc/QKT68psVKjtR5prp+vk/4AuLSZ0UgCG2LfSdPNaX3TnP007Duood9Nhn3og==
  • Cc: Juergen Gross <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>, Yann Sionneau <yann.sionneau@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 01 Dec 2025 18:21:07 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 2025-11-29 21:03, Marek Marczykowski-Górecki wrote:
On Wed, Nov 19, 2025 at 05:47:29PM -0500, Jason Andryuk wrote:
The goal is to fix s2idle and S3 for Xen PV devices.

Can you give a little more context of this? We do have working S3 in
qubes with no need for such change. We trigger it via the toolstack 
(libxl_domain_suspend_only()).
Are you talking about guest-initiated suspend here?

This is intended to help domU s2idle/S3 and resume. I guess that is what you mean by guest-initiated? The domU can use 'echo mem > /sys/power/state' to enter s2idle/S3. We also have the domU react to the ACPI sleep button from `xl trigger $dom sleep`.

AIUI, libxl_domain_suspend_only() triggers xenstore writes which Linux drivers/xen/manage.c:do_suspend() acts on. `xl save/suspend/migrate` all use this path.

The terminology gets confusing. Xen uses "suspend" for save/suspend/migrate, but the Linux power management codes uses freeze/thaw/restore. AIUI, Linux's PMSG_SUSPEND/.suspend is for runtime power management.

When you call libxl_domain_suspend_only()/libxl_domain_resume(), you pass suspend_cancel==1.
 *  1. (fast=1) Resume the guest without resetting the domain
       environment.
 *     The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will
       return 1.

That ends up in Linux do_suspend() as si.cancelled = 1, which calls PMSG_THAW -> .thaw -> xenbus_dev_cancel() which is a no-op. So it does not change the PV devices.

We needed guest user space to perform actions before entering s2idle. libxl_domain_suspend_only() triggers the Linux kernel path which does not notify user space. The ACPI power buttons let user space perform actions (lock and blank the screen) before entering the idle state.

We also have kinda working (host) s2idle. You may want to take a look at this
work (some/most of it was posted upstream, but not all got
committed/reviewed):
https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1538089344
https://github.com/QubesOS/qubes-linux-kernel/pull/910 (some patches
changed since that PR, see the current main too).

This would not affect host s2idle - it changes PV frontend devices.

Do you libxl_domain_suspend_only() all domUs and then put dom0 into s0ix?

A domain resuming
from s3 or s2idle disconnects its PV devices during resume.  The
backends are not expecting this and do not reconnect.

b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for suspend/
resume/chkpt") changed xen_suspend()/do_suspend() from
PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
suspend/resume callbacks remained.

.freeze/restore are used with hiberation where Linux restarts in a new
place in the future.  .suspend/resume are useful for runtime power
management for the duration of a boot.

The current behavior of the callbacks works for an xl save/restore or
live migration where the domain is restored/migrated to a new location
and connecting to a not-already-connected backend.

Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
.suspend/resume hook.  This matches the use in drivers/xen/manage.c for
save/restore and live migration.  With .suspend/resume empty, PV devices
are left connected during s2idle and s3, so PV devices are not changed
and work after resume.

Is that intended? While it might work for suspend by a chance(*), I'm
pretty sure not disconnecting + re-reconnecting PV devices across
save/restore/live migration will break them.

save/restore/live migration keep using .freeze/thaw/restore, which disconnects and reconnects today. Nothing changes there as xen_suspend()/do_suspend() call the power management code with PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE.

This patches makes .suspend/resume no-ops for PMSG_SUSPEND/PMSG_RESUME. When a domU goes into s2idle/S3, the backend state remains connected. With this patch, when the domU wakes up, the frontends do nothing and remain connected.

(*) and even that I'm not sure - with driver domains, depending on
suspend order this feels like might result in a deadlock...

I'm not sure. I don't think this patch changes anything with respect to them.

Thanks for testing.

Maybe the commit messages should change to highlight this is for domU PV devices? struct xen_bus_type xenbus_backend does not define dev_pm_ops.

Regards,
Jason

Signed-off-by: Jason Andryuk <jason.andryuk@xxxxxxx>
---
  drivers/xen/xenbus/xenbus_probe_frontend.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c 
b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 6d1819269cbe..199917b6f77c 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -148,11 +148,9 @@ static void xenbus_frontend_dev_shutdown(struct device 
*_dev)
  }
static const struct dev_pm_ops xenbus_pm_ops = {
-       .suspend        = xenbus_dev_suspend,
-       .resume         = xenbus_frontend_dev_resume,
        .freeze         = xenbus_dev_suspend,
        .thaw           = xenbus_dev_cancel,
-       .restore        = xenbus_dev_resume,
+       .restore        = xenbus_frontend_dev_resume,
  };
static struct xen_bus_type xenbus_frontend = {
--
2.34.1







 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.