This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] How can Xen trigger a context switch in an HVM guest do

To: XiaYubin <xiayubin@xxxxxxxxx>
Subject: Re: [Xen-devel] How can Xen trigger a context switch in an HVM guest domain?
From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date: Tue, 3 Nov 2009 11:51:35 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, "James \(song wei\)" <jsong@xxxxxxxxxx>
Delivery-date: Tue, 03 Nov 2009 03:52:07 -0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=OiqMvxZBpjmcsRXdHSFB+X6addBa/P/I33Qm7fg6OGI=; b=Hgo/1ADU6ovX78Yy6UNq1OyjC+Bi1T/IDWtGMIzxukRe4+eSHpq4u2++0xH4O/qaV2 UDkqU32z9gPoof/j0Uvhi6OYLFkZBHPzW101lJaUTQ0fyCM6EP+SdNfCoegHyXCtNkGQ HW8IwrmQ40vUvuuF/bv8/nI0qyzYZQhTZuVFA=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=o8GaUAzhO+JMJtBmte5oIXleEHX4rmVQcPGI7prX1CtxOCn7coS9SCvFk+eqaK8GTt SFjA2QDjNMw8J5simCOL0P8/uOp/itDULV7yx3otRcwPOnQSkz5jxRfyMgh1yFmvTN0G JTsTRegALo46CrS/ViRAbFyCvo3CcmWsSpznU=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <8ee64b0c0911021743v64c65f63uc4308ed7ddd9c09e@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <8ee64b0c0910310402x7e3aabbeh26d77455408a9d0f@xxxxxxxxxxxxxx> <de76405a0910310820p375f4d02xc94aea9804b99b96@xxxxxxxxxxxxxx> <8ee64b0c0910312254u6931cc08sebffd47b6e100f88@xxxxxxxxxxxxxx> <de76405a0911020805h59954bc9r9155b4cdb87ff01@xxxxxxxxxxxxxx> <8ee64b0c0911021743v64c65f63uc4308ed7ddd9c09e@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
When I first started doing performance analysis, the sedf scheduler
was using a 500us timeslice, which (in my estimates) caused the
first-gen VMX-capable processors to spend at least 5% of their time
handling vmenters and vmexits.  Obviously performance has increased
somewhat since then, but they're still not free. :-)


On Tue, Nov 3, 2009 at 1:43 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
> James and George, thank you both! The breakpoint way is interesting, I
> don't event think of it :)
> OK, I'm going to use a simpler way to verify my idea first. Before the
> preempting-state VM runs, I will set a timer to make Xen get to run
> every 100us (maybe longer for the first iteration). The timer-handler
> will check if the preempting VM is in kernel-mode or user-mode. If it
> is in user-mode with cpu-hog's CR3, then it will be scheduled out.
> Meanwhile, if the iteration goes beyond some threshold (say 5 times),
> the VM will also be scheduled out. This way seems much simpler than
> the one using breakpoint, and more accurate than the one using
> 1ms-timer. It may bring some overhead, but the preemption is not
> supposed to occur frequently and the fairness is more important.
> The thread problem also exists in Linux platform. Currently I have no
> good idea to identify different threads from the hypervisor's
> perspective. I have a dream that one day those OS guys will export
> this information to VMM, a dream that one day our children will live
> in a world where virtualization rules. I have a dream today :)
> Thanks!
> --
> Yubin
> On Tue, Nov 3, 2009 at 12:05 AM, George Dunlap
> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>> OK, so you want to allow a VM to run so that it can do packet
>> processing in the kernel, but once it's done in the kernel you want to
>> preempt the VM again.
>> An idea I was going to try out is that if a VM receives an interrupt
>> (possibly only certain interrupts, like network), let it run for a
>> very short amount of time (say, 1ms or 500us).  That should be enough
>> for it to do its basic packet processing (or audio processing, video
>> processing, whatever).  True, you're going to run the "cpu hog" during
>> that time, but that will be debited against time he'll run later.  (I
>> haven't tested this idea yet. It may work better with some credit
>> algorithms than others.)
>> The problem with inducing a guest to call schedule():
>> * It may not have any other runnable processes, or it may choose the
>> same process to run again; so it may not switch the cr3 anyway.
>> * The only reliable way to do it without some kind of
>> paravirtualization (if even a kernel driver) would be to give it a
>> timer interrupt, which may mess up other things on the system, such as
>> the system time.
>> If you're really keen to preempt on return to userspace, you could try
>> something like the following.  Before delivering the interrupt, note
>> the EIP the guest is at.  If it's in user space, set a hardware
>> breakpoint at that address.  Then deliver the interrupt.  If the guest
>> calls schedule(), you can catch the CR3 switch; if it returns to the
>> same process, it will hit the breakpoint.
>> Two possible problems:
>> * For reasons of ancient history, the iret instruction may set the RF
>> flag in the EFLAGS register, which will cause the breakpoint not to
>> fire after the guest iret.  You may need to decode the instruction and
>> set the breakpoint at the instruction after, or something like that.
>> * I believe windows doens't do a cr3 switch if it does a *thread*
>> switch.  If so, on a thread switch you'll get neither the CR3 switch
>> nor the breakpoint (since the other thread is probably running
>> somewhere else).
>> Peace,
>>  -George
>> On Sun, Nov 1, 2009 at 5:54 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
>>> Hi, George,
>>> Thank you for your reply. Actually, I'm looking for a generic
>>> mechanism of cooperative scheduling. The independence of  guest OS can
>>> make such mechanism more convincing and practical, just like the
>>> balloon driver does.
>>> Maybe you are wondering why I asked such a wired question, let me
>>> describe it with more details. My current work is based on "Task-aware
>>> VM scheduling", which is published on VEE'09. By monitoring CR3
>>> changing at VMM level, Xen can get information of tasks' CPU
>>> consumption to identify CPU hogs and I/O tasks. Therefore, the
>>> task-aware mechanism offers a more fine-grained scheduler than the
>>> original VCPU-level scheduler, as a VCPU may run CPU hogs and I/O
>>> tasks in a mixed style.
>>> Imagine there are n VMs. One of them, named mix-VM, runs two tasks:
>>> cpuhog and iotask (network). The other VMs, named CPU-VM, run just
>>> cpuhog. All VMs are using PV driver ( GPLPV driver for Windows).
>>> Here's what supposed to happen when iotask receiving an network
>>> packet: The NIC raises an IRQ, passes to Xen, then domain-0 sends an
>>> inter-domain event to mix-VM, which is likely to be in run-queue. Xen
>>> then schedules it to run immediately and set its state to
>>> preempting-state. Right after that, the mix-VM *should* schedules
>>> iotask to process the incoming packet, and then schedules cpuhog after
>>> processing. When the CR3 is changing to cpuhog, Xen knows that the
>>> mix-VM has finished I/O processing (here we assume that the priority
>>> of cpuhog is usually lower than iotask in most OS), and schedules the
>>> mix-VM out to finish its preempting-state. Therefore, the mix-VM can
>>> preempt other VMs to process I/O ASAP, while making the preempting
>>> time as short as possible to keep fairness. The point is: cpuhog
>>> should not run in preempting-state.
>>> However, a problem arises when the mix-VM sending packets. When iotask
>>> sends an amount of data (using TCP protocol), it will block and wait
>>> to be waked up after guest kernel sending all the data, which may be
>>> split into thousands of TCP packets. The mix-VM will receives an ACK
>>> packet every time it sending a packet, which makes it enter
>>> preempting-state. Note that at this moment, the CR3 of mix-VM is
>>> cpuhog's (as the only running process). After the guest kernel
>>> processing the ACK packet and sending next packet, it switches to user
>>> mode, which means the cpuhog gets to run in preempting-state. The
>>> point is: as there is no CR3-changing, Xen has no way to run.
>>> One way is to add a hook at user/kernel mode switching, then Xen can
>>> catch the moment when cpuhog gets to run. However, this way costs too
>>> much. Another way is to force a VM to schedule when it entering
>>> preempting-state. Therefore, it will trap to Xen when CR3 is changed,
>>> and Xen can finish its preempting-state when it schedules cpuhog to
>>> run. That's why I want to trigger guest context switch from Xen. I
>>> don't really care *which* process it will switch to, I just want to
>>> get Xen a chance to run. The point is: is there a better/simpler way
>>> to solve this problem?
>>> Hope I described the problem clearly. And would you please show more
>>> details about the thought of "reschedule event channel"? Thanks!
>>> --
>>> Yubin
>>> On Sat, Oct 31, 2009 at 11:20 PM, George Dunlap
>>> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>>>> Context switching is a choice the guest OS has to make, and how that's
>>>> done will differ based on the operating system.  I think if you're
>>>> thinking about modifying the guest scheduler, you're probably better
>>>> off starting with Linux.  Even if there's a way to convince Windows to
>>>> call schedule() to pick a new process, I'm not sure you'll be able to
>>>> tell it *which* process to choose.
>>>> As far as mechanism on Xen's side, it would be easy enough to allocate
>>>> a "reschedule" event channel for the guest, such that whenever you
>>>> want to trigger a guest reschedule, just raise the event channel.
>>>>  -George
>>>> On Sat, Oct 31, 2009 at 11:02 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
>>>>> Hi, all,
>>>>> As I'm doing some research in cooperative scheduling between Xen and
>>>>> guest domain, I want to know how many ways can Xen trigger a context
>>>>> switch inside an HVM guest domain (which runs Windows in my case). Do
>>>>> I have to write a driver (like balloon-driver)? Or a user process is
>>>>> enough? Or there is an even simpler way?
>>>>> All your suggestions are appreciated. Thanks! :)
>>>>> --
>>>>> Yubin
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>> http://lists.xensource.com/xen-devel
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

Xen-devel mailing list

<Prev in Thread] Current Thread [Next in Thread>