James and George, thank you both! The breakpoint way is interesting, I
don't event think of it :)
OK, I'm going to use a simpler way to verify my idea first. Before the
preempting-state VM runs, I will set a timer to make Xen get to run
every 100us (maybe longer for the first iteration). The timer-handler
will check if the preempting VM is in kernel-mode or user-mode. If it
is in user-mode with cpu-hog's CR3, then it will be scheduled out.
Meanwhile, if the iteration goes beyond some threshold (say 5 times),
the VM will also be scheduled out. This way seems much simpler than
the one using breakpoint, and more accurate than the one using
1ms-timer. It may bring some overhead, but the preemption is not
supposed to occur frequently and the fairness is more important.
The thread problem also exists in Linux platform. Currently I have no
good idea to identify different threads from the hypervisor's
perspective. I have a dream that one day those OS guys will export
this information to VMM, a dream that one day our children will live
in a world where virtualization rules. I have a dream today :)
On Tue, Nov 3, 2009 at 12:05 AM, George Dunlap
> OK, so you want to allow a VM to run so that it can do packet
> processing in the kernel, but once it's done in the kernel you want to
> preempt the VM again.
> An idea I was going to try out is that if a VM receives an interrupt
> (possibly only certain interrupts, like network), let it run for a
> very short amount of time (say, 1ms or 500us). That should be enough
> for it to do its basic packet processing (or audio processing, video
> processing, whatever). True, you're going to run the "cpu hog" during
> that time, but that will be debited against time he'll run later. (I
> haven't tested this idea yet. It may work better with some credit
> algorithms than others.)
> The problem with inducing a guest to call schedule():
> * It may not have any other runnable processes, or it may choose the
> same process to run again; so it may not switch the cr3 anyway.
> * The only reliable way to do it without some kind of
> paravirtualization (if even a kernel driver) would be to give it a
> timer interrupt, which may mess up other things on the system, such as
> the system time.
> If you're really keen to preempt on return to userspace, you could try
> something like the following. Before delivering the interrupt, note
> the EIP the guest is at. If it's in user space, set a hardware
> breakpoint at that address. Then deliver the interrupt. If the guest
> calls schedule(), you can catch the CR3 switch; if it returns to the
> same process, it will hit the breakpoint.
> Two possible problems:
> * For reasons of ancient history, the iret instruction may set the RF
> flag in the EFLAGS register, which will cause the breakpoint not to
> fire after the guest iret. You may need to decode the instruction and
> set the breakpoint at the instruction after, or something like that.
> * I believe windows doens't do a cr3 switch if it does a *thread*
> switch. If so, on a thread switch you'll get neither the CR3 switch
> nor the breakpoint (since the other thread is probably running
> somewhere else).
> On Sun, Nov 1, 2009 at 5:54 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
>> Hi, George,
>> Thank you for your reply. Actually, I'm looking for a generic
>> mechanism of cooperative scheduling. The independence of guest OS can
>> make such mechanism more convincing and practical, just like the
>> balloon driver does.
>> Maybe you are wondering why I asked such a wired question, let me
>> describe it with more details. My current work is based on "Task-aware
>> VM scheduling", which is published on VEE'09. By monitoring CR3
>> changing at VMM level, Xen can get information of tasks' CPU
>> consumption to identify CPU hogs and I/O tasks. Therefore, the
>> task-aware mechanism offers a more fine-grained scheduler than the
>> original VCPU-level scheduler, as a VCPU may run CPU hogs and I/O
>> tasks in a mixed style.
>> Imagine there are n VMs. One of them, named mix-VM, runs two tasks:
>> cpuhog and iotask (network). The other VMs, named CPU-VM, run just
>> cpuhog. All VMs are using PV driver ( GPLPV driver for Windows).
>> Here's what supposed to happen when iotask receiving an network
>> packet: The NIC raises an IRQ, passes to Xen, then domain-0 sends an
>> inter-domain event to mix-VM, which is likely to be in run-queue. Xen
>> then schedules it to run immediately and set its state to
>> preempting-state. Right after that, the mix-VM *should* schedules
>> iotask to process the incoming packet, and then schedules cpuhog after
>> processing. When the CR3 is changing to cpuhog, Xen knows that the
>> mix-VM has finished I/O processing (here we assume that the priority
>> of cpuhog is usually lower than iotask in most OS), and schedules the
>> mix-VM out to finish its preempting-state. Therefore, the mix-VM can
>> preempt other VMs to process I/O ASAP, while making the preempting
>> time as short as possible to keep fairness. The point is: cpuhog
>> should not run in preempting-state.
>> However, a problem arises when the mix-VM sending packets. When iotask
>> sends an amount of data (using TCP protocol), it will block and wait
>> to be waked up after guest kernel sending all the data, which may be
>> split into thousands of TCP packets. The mix-VM will receives an ACK
>> packet every time it sending a packet, which makes it enter
>> preempting-state. Note that at this moment, the CR3 of mix-VM is
>> cpuhog's (as the only running process). After the guest kernel
>> processing the ACK packet and sending next packet, it switches to user
>> mode, which means the cpuhog gets to run in preempting-state. The
>> point is: as there is no CR3-changing, Xen has no way to run.
>> One way is to add a hook at user/kernel mode switching, then Xen can
>> catch the moment when cpuhog gets to run. However, this way costs too
>> much. Another way is to force a VM to schedule when it entering
>> preempting-state. Therefore, it will trap to Xen when CR3 is changed,
>> and Xen can finish its preempting-state when it schedules cpuhog to
>> run. That's why I want to trigger guest context switch from Xen. I
>> don't really care *which* process it will switch to, I just want to
>> get Xen a chance to run. The point is: is there a better/simpler way
>> to solve this problem?
>> Hope I described the problem clearly. And would you please show more
>> details about the thought of "reschedule event channel"? Thanks!
>> On Sat, Oct 31, 2009 at 11:20 PM, George Dunlap
>> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>>> Context switching is a choice the guest OS has to make, and how that's
>>> done will differ based on the operating system. I think if you're
>>> thinking about modifying the guest scheduler, you're probably better
>>> off starting with Linux. Even if there's a way to convince Windows to
>>> call schedule() to pick a new process, I'm not sure you'll be able to
>>> tell it *which* process to choose.
>>> As far as mechanism on Xen's side, it would be easy enough to allocate
>>> a "reschedule" event channel for the guest, such that whenever you
>>> want to trigger a guest reschedule, just raise the event channel.
>>> On Sat, Oct 31, 2009 at 11:02 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
>>>> Hi, all,
>>>> As I'm doing some research in cooperative scheduling between Xen and
>>>> guest domain, I want to know how many ways can Xen trigger a context
>>>> switch inside an HVM guest domain (which runs Windows in my case). Do
>>>> I have to write a driver (like balloon-driver)? Or a user process is
>>>> enough? Or there is an even simpler way?
>>>> All your suggestions are appreciated. Thanks! :)
>>>> Xen-devel mailing list
>> Xen-devel mailing list
Xen-devel mailing list