[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: domU suspend issue - freeze processes failed - Linux 6.16


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: "Yann Sionneau" <yann.sionneau@xxxxxxxxxx>
  • Date: Wed, 24 Sep 2025 14:28:27 +0000
  • Delivery-date: Wed, 24 Sep 2025 14:28:37 +0000
  • Feedback-id: 30504962:30504962.20250924:md
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 9/24/25 15:30, Marek Marczykowski-Górecki wrote:
> On Wed, Sep 24, 2025 at 01:17:15PM +0300, Grygorii Strashko wrote:
>>
>>
>> On 22.09.25 13:09, Marek Marczykowski-Górecki wrote:
>>> On Fri, Aug 22, 2025 at 08:42:30PM +0200, Marek Marczykowski-Górecki wrote:
>>>> On Fri, Aug 22, 2025 at 05:27:20PM +0200, Jürgen Groß wrote:
>>>>> On 22.08.25 16:42, Marek Marczykowski-Górecki wrote:
>>>>>> On Fri, Aug 22, 2025 at 04:39:33PM +0200, Marek Marczykowski-Górecki 
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> When suspending domU I get the following issue:
>>>>>>>
>>>>>>>        Freezing user space processes
>>>>>>>        Freezing user space processes failed after 20.004 seconds (1 
>>>>>>> tasks refusing to freeze, wq_busy=0):
>>>>>>>        task:xl              state:D stack:0     pid:466   tgid:466   
>>>>>>> ppid:1      task_flags:0x400040 flags:0x00004006
>>>>>>>        Call Trace:
>>>>>>>         <TASK>
>>>>>>>         __schedule+0x2f3/0x780
>>>>>>>         schedule+0x27/0x80
>>>>>>>         schedule_preempt_disabled+0x15/0x30
>>>>>>>         __mutex_lock.constprop.0+0x49f/0x880
>>>>>>>         unregister_xenbus_watch+0x216/0x230
>>>>>>>         xenbus_write_watch+0xb9/0x220
>>>>>>>         xenbus_file_write+0x131/0x1b0
>>>>>>>         vfs_writev+0x26c/0x3d0
>>>>>>>         ? do_writev+0xeb/0x110
>>>>>>>         do_writev+0xeb/0x110
>>>>>>>         do_syscall_64+0x84/0x2c0
>>>>>>>         ? do_syscall_64+0x200/0x2c0
>>>>>>>         ? generic_handle_irq+0x3f/0x60
>>>>>>>         ? syscall_exit_work+0x108/0x140
>>>>>>>         ? do_syscall_64+0x200/0x2c0
>>>>>>>         ? __irq_exit_rcu+0x4c/0xe0
>>>>>>>         entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>>        RIP: 0033:0x79b618138642
>>>>>>>        RSP: 002b:00007fff9a192fc8 EFLAGS: 00000246 ORIG_RAX: 
>>>>>>> 0000000000000014
>>>>>>>        RAX: ffffffffffffffda RBX: 00000000024fd490 RCX: 000079b618138642
>>>>>>>        RDX: 0000000000000003 RSI: 00007fff9a193120 RDI: 0000000000000014
>>>>>>>        RBP: 00007fff9a193000 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
>>>>>>>        R13: 00007fff9a193120 R14: 0000000000000003 R15: 0000000000000000
>>>>>>>         </TASK>
>>>>>>>        OOM killer enabled.
>>>>>>>        Restarting tasks: Starting
>>>>>>>        Restarting tasks: Done
>>>>>>>        xen:manage: do_suspend: freeze processes failed -16
>>>>>>>
>>>>>>> The process in question is `xl devd` daemon. It's a domU serving a
>>>>>>> xenvif backend.
>>>>>>>
>>>>>>> I noticed it on 6.16.1, but looking at earlier test logs I see it with
>>>>>>> 6.16-rc6 already (but interestingly, not 6.16-rc2 yet? feels weird given
>>>>>>> seemingly no relevant changes between rc2 and rc6).
>>>>>>
>>>>>> I forgot to include link for (a little) more details:
>>>>>> https://github.com/QubesOS/qubes-linux-kernel/pull/1157
>>>>>>
>>>>>> Especially, there is another call trace with panic_on_warn enabled -
>>>>>> slightly different, but looks related.
>>>>>>
>>>>>
>>>>> I'm pretty sure the PV variant for suspending is just wrong: it is calling
>>>>> dpm_suspend_start() from do_suspend() without taking the required
>>>>> system_transition_mutex, resulting in the WARN() in 
>>>>> pm_restrict_gfp_mask().
>>>>>
>>>>> It might be as easy as just adding the mutex() call to do_suspend(), but 
>>>>> I'm
>>>>> really not sure that will be a proper fix.
>>>>
>>>> Hm, this might explain the second call trace, but not the freeze failure
>>>> quoted here above, I think?
>>>
>>> While the patch I sent appears to fix this particular issue, it made me
>>> wonder: is there any fundamental reason why do_suspend() is not using
>>> pm_suspend() and register Xen-specific actions via platform_suspend_ops
>>> (and maybe syscore_ops)? From a brief look at the code, it should
>>> theoretically be possible, and should avoid issues like this.
>>>
>>> I tried to do a quick&dirty attempt at that[1], and it failed (panic). I
>>> surely made several mistakes there (and also left a ton of todo
>>> comments). But before spending any more time at that, I'd like to ask
>>> if this is a viable option at all.
>>
>> I think it might, but be careful with this, because there are two "System 
>> Low power" paths in Linux
>> 1) Suspend2RAM and Co
>> 2) Hybernation
>>
>> While "Suspend2RAM and Co" path is relatively straight forward and expected 
>> to be always
>> started through pm_suspend(). In general, it's expected to happen
>>   - from sysfs (User space)
>>   - from autosuspend (wakelocks).
>>
>> the "hibernation" path is more complicated:(
>> - Genuine Linux hybernation hibernate()/hibernate_quiet_exec()
>
> IIUC hibernation is very different as it puts Linux in charge of dumping
> all the state to the disk. In case of Xen, the primary use case for
> suspend is preparing VM for Xen toolstack serializing its state to disk
> (or migrating to another host).
> Additionally, VM suspend may be used as preparation for host suspend
> (this is what I actually do here). This is especially relevant if the VM
> has some PCI passthrough - to properly suspend (and resume) devices
> across host suspend.
>
>> I'm not sure what path Xen originally implemented :( It seems like 
>> "suspend2RAM",
>> but, at the same time "hybernation" specific staff is used, like 
>> PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE.
>> As result, Linux suspend/hybernation code moves forward while Xen stays 
>> behind and unsync.
>
> Yeah, I think it's supposed to be suspend2RAM. TBH the
> PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE confuses me too and Qubes OS has a
> patch[2] to switch it to PMSG_SUSPEND/PMSG_RESUME.
>
>> So it sounds reasonable to avoid custom implementation, but may be not easy 
>> :(
>>
>> Suspending Xen features can be split between suspend stages, but
>> not sure if platform_suspend_ops can be used.
>>
>> Generic suspend stages list
>> - freeze
>> - prepare
>> - suspend
>> - suspend_late
>> - suspend_noirq (SPIs disabled, except wakeups)
>>    [most of Xen specific staff has to be suspended at this point]
>> - disable_secondary_cpus
>> - arch disable IRQ (from this point no IRQs allowed, no timers, no 
>> scheduling)
>> - syscore_suspend
>>    [rest here]
>> - platform->enter() (suspended)
>>
>> You can't just overwrite platform_suspend_ops, because ARM64 is expected to 
>> enter
>> suspend through PSCI FW interface:
>> drivers/firmware/psci/psci.c
>>   static const struct platform_suspend_ops psci_suspend_ops = {
>
> Does this apply to a VM on ARM64 too? At least on x86, the VM is
> supposed to make a hypercall to tell Xen it suspended (the hypercall
> will return only on resume).
>
>> As an option, some Xen components could be converted to use syscore_ops (but 
>> not xenstore),
>> and some might need to use DD(dev_pm_ops).
>>
>>>
>>> [1] 
>>> https://github.com/marmarek/linux/commit/47cfdb991c85566c9c333570511e67bf477a5da6
>>
>> --
>> Best regards,
>> -grygorii
>>
>
> [2] 
> https://github.com/QubesOS/qubes-linux-kernel/blob/main/xen-pm-use-suspend.patch
>

On my setup I get a weird behavior when trying to suspend (s2idle) a
Linux guest.
Doing echo freeze > /sys/power/state in the guest seems to "freeze" the
guest for good, I could not unfreeze it afterward.
VCPU goes to 100% according to XenOrchestra
xl list shows state "r" but xl console blocks forever
xl shutdown would block for some time and then print:
Shutting down domain 721
?ibxl: error: libxl_domain.c:848:pvcontrol_cb: guest didn't acknowledge
control request: -9
shutdown failed (rc=-9)

Do you think it's related to your current issue?

Regards,

--


--
Yann Sionneau | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech






 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.