This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] long latency of domain shutdown

To: Jan Beulich <jbeulich@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] long latency of domain shutdown
From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Date: Mon, 28 Apr 2008 14:59:46 +0100
Delivery-date: Mon, 28 Apr 2008 07:00:27 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <4815F127.76E4.0078.0@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcipOB2aXEnNShUrEd2neQAX8io7RQ==
Thread-topic: [Xen-devel] long latency of domain shutdown
User-agent: Microsoft-Entourage/
This was addressed by xen-unstable:15821. The fix is present in releases
since 3.2.0. It was never backported to 3.1 branch.

There are a few changesets related to 15821 that you would also want to take
into your tree. For example, 15838 is a bugfix. And there is also a change
on the tools side that is required because domain_destroy can now return
-EAGAIN if it gets preempted. Any others will probably become obvious when
you try to backport 15821.

 -- Keir

On 28/4/08 14:45, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:

> In (3.0.4-based) SLE10 SP1 we are currently dealing with a (reproducible)
> report of time getting screwed up during domain shutdown. Debugging
> revealed that the PM timer misses at least one overflow (i.e. platform
> time lost about 4 seconds), which subsequently leads to disastrous
> effects.
> Apart from tracking the time calibration, as the (currently) last step of
> narrowing the cause I now made the first processor detecting severe
> anomalies in time flow send an IPI to CPU0 (which is exclusively
> responsible for managing platform time), which appears to prove that
> this CPU is indeed busy processing a domain_kill() request, and namely
> is in the process of tearing down the address spaces of the guest.
> Obviously, the hypervisor's behavior should not depend on the amount
> of time needed to free a dead domain's resources, but the way it is
> coded (and from doing some code comparison I would conclude that
> while the code has significantly changed, the base characteristic of
> domain shutdown being executed synchronously on the CPU requesting
> so doesn't appear to have changed - of course, history shows that I
> may easily overlook something here), and if that CPU happens to be
> CPU0 the whole system will suffer due to the asymmetry of platform
> time handling.
> If I'm indeed not overlooking an important fix in that area, what would
> be considered a reasonable solution to this? I can imagine (in order of
> my preference)
> - inserting calls to do_softirq() in the put_page_and_type() call
> hierarchy (e.g. in alloc_l2_table() or even alloc_l1_table(), to
> guarantee uniform behavior across sub-architectures; this might help
> address other issues as the same scenario might happen when a
> page table hierarchy gets destroyed at times other than domain
> shutdown); perhaps the same might then also be needed in the
> get_page_type() hierarchy, e.g. in free_l{2,1}_table()
> - simply doing round-robin responsibility of platform time among all
> CPUs (would leave the unlikely UP case as still affected by the problem)
> - detecting platform timer overflow (and properly estimating how many
> times it has overflowed) and sync-ing platform time back from local time
> (as indicated in a comment somewhere)
> - marshalling the whole operation to another CPU
> For reference, this is the CPU0 backtrace I'm getting from the IPI:
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) State at keyhandler.c:109
> (XEN) ----[ Xen-3.0.4_13138-0.63  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff83000010e8a2>] dump_execstate+0x62/0xe0
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 000000000013dd62
> (XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi: ffff8300002b2142
> (XEN) rbp: 0000000000000000   rsp: ffff8300001d3a30   r8:  0000000000000001
> (XEN) r9:  0000000000000001   r10: 00000000fffffffc   r11: 0000000000000001
> (XEN) r12: 0000000000000001   r13: 0000000000000001   r14: 0000000000000001
> (XEN) r15: cccccccccccccccd   cr0: 0000000080050033   cr4: 00000000000006f0
> (XEN) cr3: 000000000ce02000   cr2: 00002b47f8871ca8
> (XEN) ds: 0000   es: 0000   fs: 0063   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff8300001d3a30:
> (XEN)    0000000000000046 ffff830000f7e280 ffff8300002b0e00 ffff830000f7e280
> (XEN)    ffff83000013b665 0000000000000000 ffff83000012dc8a cccccccccccccccd
> (XEN)    0000000000000001 0000000000000001 0000000000000001 ffff830000f7e280
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    ffff8284008f7aa0 ffff8284008f7ac8 0000000000000000 0000000000000000
> (XEN)    0000000000039644 ffff8284008f7aa0 000000fb00000000 ffff83000011345d
> (XEN)    000000000000e008 0000000000000246 ffff8300001d3b18 000000000000e010
> (XEN)    ffff830000113348 ffff83000013327f 0000000000000000 ffff8284008f7aa0
> (XEN)    ffff8307cc1b7288 ffff8307cc1b8000 ffff830000f7e280 00000000007cc315
> (XEN)    ffff8284137e4498 ffff830000f7e280 ffff830000132c24 0000000020000001
> (XEN)    0000000020000000 ffff8284137e4498 00000000007cc315 ffff8284137e7b48
> (XEN)    ffff830000132ec4 ffff8284137e4498 000000000000015d ffff830000f7e280
> (XEN)    ffff8300001328d2 ffff8307cc315ae8 ffff830000132cbb 0000000040000001
> (XEN)    0000000040000000 ffff8284137e7b48 ffff830000f7e280 ffff8284137f6be8
> (XEN)    ffff830000132ec4 ffff8284137e7b48 00000000007cc919 ffff8307cc91a000
> (XEN)    ffff8300001331a2 ffff8307cc919018 ffff830000132d41 0000000060000001
> (XEN)    0000000060000000 ffff8284137f6be8 0000000000006ea6 ffff8284001149f0
> (XEN)    ffff830000132ec4 ffff8284137f6be8 0000000000000110 ffff830000f7e280
> (XEN)    ffff830000133132 ffff830006ea6880 ffff830000132df0 0000000080000001
> (XEN)    0000000080000000 ffff8284001149f0 ffff8284001149f0 ffff8284001149f0
> (XEN) Xen call trace:
> (XEN)    [<ffff83000010e8a2>] dump_execstate+0x62/0xe0
> (XEN)    [<ffff83000013b665>] smp_call_function_interrupt+0x55/0xa0
> (XEN)    [<ffff83000012dc8a>] call_function_interrupt+0x2a/0x30
> (XEN)    [<ffff83000011345d>] free_domheap_pages+0x2bd/0x3b0
> (XEN)    [<ffff830000113348>] free_domheap_pages+0x1a8/0x3b0
> (XEN)    [<ffff83000013327f>] put_page_from_l1e+0x9f/0x120
> (XEN)    [<ffff830000132c24>] free_page_type+0x314/0x540
> (XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN)    [<ffff8300001328d2>] put_page_from_l2e+0x32/0x70
> (XEN)    [<ffff830000132cbb>] free_page_type+0x3ab/0x540
> (XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN)    [<ffff8300001331a2>] put_page_from_l3e+0x32/0x70
> (XEN)    [<ffff830000132d41>] free_page_type+0x431/0x540
> (XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN)    [<ffff830000133132>] put_page_from_l4e+0x32/0x70
> (XEN)    [<ffff830000132df0>] free_page_type+0x4e0/0x540
> (XEN)    [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN)    [<ffff83000012923a>] relinquish_memory+0x17a/0x290
> (XEN)    [<ffff830000183665>] identify_cpu+0x5/0x1f0
> (XEN)    [<ffff830000117f10>] vcpu_runstate_get+0xb0/0xf0
> (XEN)    [<ffff8300001296aa>] domain_relinquish_resources+0x35a/0x3b0
> (XEN)    [<ffff8300001083e8>] domain_kill+0x28/0x60
> (XEN)    [<ffff830000107560>] do_domctl+0x690/0xe60
> (XEN)    [<ffff830000121def>] __putstr+0x1f/0x70
> (XEN)    [<ffff830000138016>] mod_l1_entry+0x636/0x670
> (XEN)    [<ffff830000118143>] schedule+0x1f3/0x270
> (XEN)    [<ffff830000175ca6>] toggle_guest_mode+0x126/0x140
> (XEN)    [<ffff830000175fa8>] do_iret+0xa8/0x1c0
> (XEN)    [<ffff830000173b32>] syscall_enter+0x62/0x67
> Jan
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>