OK, so you want to allow a VM to run so that it can do packet
processing in the kernel, but once it's done in the kernel you want to
preempt the VM again.
An idea I was going to try out is that if a VM receives an interrupt
(possibly only certain interrupts, like network), let it run for a
very short amount of time (say, 1ms or 500us). That should be enough
for it to do its basic packet processing (or audio processing, video
processing, whatever). True, you're going to run the "cpu hog" during
that time, but that will be debited against time he'll run later. (I
haven't tested this idea yet. It may work better with some credit
algorithms than others.)
The problem with inducing a guest to call schedule():
* It may not have any other runnable processes, or it may choose the
same process to run again; so it may not switch the cr3 anyway.
* The only reliable way to do it without some kind of
paravirtualization (if even a kernel driver) would be to give it a
timer interrupt, which may mess up other things on the system, such as
the system time.
If you're really keen to preempt on return to userspace, you could try
something like the following. Before delivering the interrupt, note
the EIP the guest is at. If it's in user space, set a hardware
breakpoint at that address. Then deliver the interrupt. If the guest
calls schedule(), you can catch the CR3 switch; if it returns to the
same process, it will hit the breakpoint.
Two possible problems:
* For reasons of ancient history, the iret instruction may set the RF
flag in the EFLAGS register, which will cause the breakpoint not to
fire after the guest iret. You may need to decode the instruction and
set the breakpoint at the instruction after, or something like that.
* I believe windows doens't do a cr3 switch if it does a *thread*
switch. If so, on a thread switch you'll get neither the CR3 switch
nor the breakpoint (since the other thread is probably running
On Sun, Nov 1, 2009 at 5:54 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
> Hi, George,
> Thank you for your reply. Actually, I'm looking for a generic
> mechanism of cooperative scheduling. The independence of guest OS can
> make such mechanism more convincing and practical, just like the
> balloon driver does.
> Maybe you are wondering why I asked such a wired question, let me
> describe it with more details. My current work is based on "Task-aware
> VM scheduling", which is published on VEE'09. By monitoring CR3
> changing at VMM level, Xen can get information of tasks' CPU
> consumption to identify CPU hogs and I/O tasks. Therefore, the
> task-aware mechanism offers a more fine-grained scheduler than the
> original VCPU-level scheduler, as a VCPU may run CPU hogs and I/O
> tasks in a mixed style.
> Imagine there are n VMs. One of them, named mix-VM, runs two tasks:
> cpuhog and iotask (network). The other VMs, named CPU-VM, run just
> cpuhog. All VMs are using PV driver ( GPLPV driver for Windows).
> Here's what supposed to happen when iotask receiving an network
> packet: The NIC raises an IRQ, passes to Xen, then domain-0 sends an
> inter-domain event to mix-VM, which is likely to be in run-queue. Xen
> then schedules it to run immediately and set its state to
> preempting-state. Right after that, the mix-VM *should* schedules
> iotask to process the incoming packet, and then schedules cpuhog after
> processing. When the CR3 is changing to cpuhog, Xen knows that the
> mix-VM has finished I/O processing (here we assume that the priority
> of cpuhog is usually lower than iotask in most OS), and schedules the
> mix-VM out to finish its preempting-state. Therefore, the mix-VM can
> preempt other VMs to process I/O ASAP, while making the preempting
> time as short as possible to keep fairness. The point is: cpuhog
> should not run in preempting-state.
> However, a problem arises when the mix-VM sending packets. When iotask
> sends an amount of data (using TCP protocol), it will block and wait
> to be waked up after guest kernel sending all the data, which may be
> split into thousands of TCP packets. The mix-VM will receives an ACK
> packet every time it sending a packet, which makes it enter
> preempting-state. Note that at this moment, the CR3 of mix-VM is
> cpuhog's (as the only running process). After the guest kernel
> processing the ACK packet and sending next packet, it switches to user
> mode, which means the cpuhog gets to run in preempting-state. The
> point is: as there is no CR3-changing, Xen has no way to run.
> One way is to add a hook at user/kernel mode switching, then Xen can
> catch the moment when cpuhog gets to run. However, this way costs too
> much. Another way is to force a VM to schedule when it entering
> preempting-state. Therefore, it will trap to Xen when CR3 is changed,
> and Xen can finish its preempting-state when it schedules cpuhog to
> run. That's why I want to trigger guest context switch from Xen. I
> don't really care *which* process it will switch to, I just want to
> get Xen a chance to run. The point is: is there a better/simpler way
> to solve this problem?
> Hope I described the problem clearly. And would you please show more
> details about the thought of "reschedule event channel"? Thanks!
> On Sat, Oct 31, 2009 at 11:20 PM, George Dunlap
> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>> Context switching is a choice the guest OS has to make, and how that's
>> done will differ based on the operating system. I think if you're
>> thinking about modifying the guest scheduler, you're probably better
>> off starting with Linux. Even if there's a way to convince Windows to
>> call schedule() to pick a new process, I'm not sure you'll be able to
>> tell it *which* process to choose.
>> As far as mechanism on Xen's side, it would be easy enough to allocate
>> a "reschedule" event channel for the guest, such that whenever you
>> want to trigger a guest reschedule, just raise the event channel.
>> On Sat, Oct 31, 2009 at 11:02 AM, XiaYubin <xiayubin@xxxxxxxxx> wrote:
>>> Hi, all,
>>> As I'm doing some research in cooperative scheduling between Xen and
>>> guest domain, I want to know how many ways can Xen trigger a context
>>> switch inside an HVM guest domain (which runs Windows in my case). Do
>>> I have to write a driver (like balloon-driver)? Or a user process is
>>> enough? Or there is an even simpler way?
>>> All your suggestions are appreciated. Thanks! :)
>>> Xen-devel mailing list
> Xen-devel mailing list
Xen-devel mailing list