|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] long latency of domain shutdown
>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 28.04.08 15:59 >>>
>This was addressed by xen-unstable:15821. The fix is present in releases
>since 3.2.0. It was never backported to 3.1 branch.
>
>There are a few changesets related to 15821 that you would also want to take
>into your tree. For example, 15838 is a bugfix. And there is also a change
>on the tools side that is required because domain_destroy can now return
>-EAGAIN if it gets preempted. Any others will probably become obvious when
>you try to backport 15821.
>
> -- Keir
Okay, thanks - so I indeed missed the call to hypercall_preempt_check()
in relinquish_memory(), which is the key indicator here.
However, that change deals exclusively with domain shutdown, but not
with the more general page table pinning/unpinning operations, which I
believe are (as described) vulnerable to mis-use by a malicious guest (I
realize that well behaved guests would not normally present a heavily
populated address space here, but it also cannot be entirely excluded)
- the upper bound to the number of operations on x86-64 is 512**4
or 2**36 l1 table entries (ignoring the hypervisor hole which doesn't
need processing).
Jan
On 28/4/08 14:45, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:
> In (3.0.4-based) SLE10 SP1 we are currently dealing with a (reproducible)
> report of time getting screwed up during domain shutdown. Debugging
> revealed that the PM timer misses at least one overflow (i.e. platform
> time lost about 4 seconds), which subsequently leads to disastrous
> effects.
>
> Apart from tracking the time calibration, as the (currently) last step of
> narrowing the cause I now made the first processor detecting severe
> anomalies in time flow send an IPI to CPU0 (which is exclusively
> responsible for managing platform time), which appears to prove that
> this CPU is indeed busy processing a domain_kill() request, and namely
> is in the process of tearing down the address spaces of the guest.
>
> Obviously, the hypervisor's behavior should not depend on the amount
> of time needed to free a dead domain's resources, but the way it is
> coded (and from doing some code comparison I would conclude that
> while the code has significantly changed, the base characteristic of
> domain shutdown being executed synchronously on the CPU requesting
> so doesn't appear to have changed - of course, history shows that I
> may easily overlook something here), and if that CPU happens to be
> CPU0 the whole system will suffer due to the asymmetry of platform
> time handling.
>
> If I'm indeed not overlooking an important fix in that area, what would
> be considered a reasonable solution to this? I can imagine (in order of
> my preference)
>
> - inserting calls to do_softirq() in the put_page_and_type() call
> hierarchy (e.g. in alloc_l2_table() or even alloc_l1_table(), to
> guarantee uniform behavior across sub-architectures; this might help
> address other issues as the same scenario might happen when a
> page table hierarchy gets destroyed at times other than domain
> shutdown); perhaps the same might then also be needed in the
> get_page_type() hierarchy, e.g. in free_l{2,1}_table()
>
> - simply doing round-robin responsibility of platform time among all
> CPUs (would leave the unlikely UP case as still affected by the problem)
>
> - detecting platform timer overflow (and properly estimating how many
> times it has overflowed) and sync-ing platform time back from local time
> (as indicated in a comment somewhere)
>
> - marshalling the whole operation to another CPU
>
> For reference, this is the CPU0 backtrace I'm getting from the IPI:
>
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) State at keyhandler.c:109
> (XEN) ----[ Xen-3.0.4_13138-0.63 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff83000010e8a2>] dump_execstate+0x62/0xe0
> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 000000000013dd62
> (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff8300002b2142
> (XEN) rbp: 0000000000000000 rsp: ffff8300001d3a30 r8: 0000000000000001
> (XEN) r9: 0000000000000001 r10: 00000000fffffffc r11: 0000000000000001
> (XEN) r12: 0000000000000001 r13: 0000000000000001 r14: 0000000000000001
> (XEN) r15: cccccccccccccccd cr0: 0000000080050033 cr4: 00000000000006f0
> (XEN) cr3: 000000000ce02000 cr2: 00002b47f8871ca8
> (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff8300001d3a30:
> (XEN) 0000000000000046 ffff830000f7e280 ffff8300002b0e00 ffff830000f7e280
> (XEN) ffff83000013b665 0000000000000000 ffff83000012dc8a cccccccccccccccd
> (XEN) 0000000000000001 0000000000000001 0000000000000001 ffff830000f7e280
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) ffff8284008f7aa0 ffff8284008f7ac8 0000000000000000 0000000000000000
> (XEN) 0000000000039644 ffff8284008f7aa0 000000fb00000000 ffff83000011345d
> (XEN) 000000000000e008 0000000000000246 ffff8300001d3b18 000000000000e010
> (XEN) ffff830000113348 ffff83000013327f 0000000000000000 ffff8284008f7aa0
> (XEN) ffff8307cc1b7288 ffff8307cc1b8000 ffff830000f7e280 00000000007cc315
> (XEN) ffff8284137e4498 ffff830000f7e280 ffff830000132c24 0000000020000001
> (XEN) 0000000020000000 ffff8284137e4498 00000000007cc315 ffff8284137e7b48
> (XEN) ffff830000132ec4 ffff8284137e4498 000000000000015d ffff830000f7e280
> (XEN) ffff8300001328d2 ffff8307cc315ae8 ffff830000132cbb 0000000040000001
> (XEN) 0000000040000000 ffff8284137e7b48 ffff830000f7e280 ffff8284137f6be8
> (XEN) ffff830000132ec4 ffff8284137e7b48 00000000007cc919 ffff8307cc91a000
> (XEN) ffff8300001331a2 ffff8307cc919018 ffff830000132d41 0000000060000001
> (XEN) 0000000060000000 ffff8284137f6be8 0000000000006ea6 ffff8284001149f0
> (XEN) ffff830000132ec4 ffff8284137f6be8 0000000000000110 ffff830000f7e280
> (XEN) ffff830000133132 ffff830006ea6880 ffff830000132df0 0000000080000001
> (XEN) 0000000080000000 ffff8284001149f0 ffff8284001149f0 ffff8284001149f0
> (XEN) Xen call trace:
> (XEN) [<ffff83000010e8a2>] dump_execstate+0x62/0xe0
> (XEN) [<ffff83000013b665>] smp_call_function_interrupt+0x55/0xa0
> (XEN) [<ffff83000012dc8a>] call_function_interrupt+0x2a/0x30
> (XEN) [<ffff83000011345d>] free_domheap_pages+0x2bd/0x3b0
> (XEN) [<ffff830000113348>] free_domheap_pages+0x1a8/0x3b0
> (XEN) [<ffff83000013327f>] put_page_from_l1e+0x9f/0x120
> (XEN) [<ffff830000132c24>] free_page_type+0x314/0x540
> (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN) [<ffff8300001328d2>] put_page_from_l2e+0x32/0x70
> (XEN) [<ffff830000132cbb>] free_page_type+0x3ab/0x540
> (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN) [<ffff8300001331a2>] put_page_from_l3e+0x32/0x70
> (XEN) [<ffff830000132d41>] free_page_type+0x431/0x540
> (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN) [<ffff830000133132>] put_page_from_l4e+0x32/0x70
> (XEN) [<ffff830000132df0>] free_page_type+0x4e0/0x540
> (XEN) [<ffff830000132ec4>] put_page_type+0x74/0xf0
> (XEN) [<ffff83000012923a>] relinquish_memory+0x17a/0x290
> (XEN) [<ffff830000183665>] identify_cpu+0x5/0x1f0
> (XEN) [<ffff830000117f10>] vcpu_runstate_get+0xb0/0xf0
> (XEN) [<ffff8300001296aa>] domain_relinquish_resources+0x35a/0x3b0
> (XEN) [<ffff8300001083e8>] domain_kill+0x28/0x60
> (XEN) [<ffff830000107560>] do_domctl+0x690/0xe60
> (XEN) [<ffff830000121def>] __putstr+0x1f/0x70
> (XEN) [<ffff830000138016>] mod_l1_entry+0x636/0x670
> (XEN) [<ffff830000118143>] schedule+0x1f3/0x270
> (XEN) [<ffff830000175ca6>] toggle_guest_mode+0x126/0x140
> (XEN) [<ffff830000175fa8>] do_iret+0xa8/0x1c0
> (XEN) [<ffff830000173b32>] syscall_enter+0x62/0x67
>
> Jan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|