This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] Fast inter-VM signaling using monitor/mwait

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] Fast inter-VM signaling using monitor/mwait
From: Michael Abd-El-Malek <mabdelmalek@xxxxxxx>
Date: Mon, 20 Apr 2009 13:43:49 -0400
Delivery-date: Mon, 20 Apr 2009 10:44:15 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I've implemented a fast inter-VM signaling mechanism using the x86 monitor/mwait instructions. One-way event notification takes ~0.5us, compared to ~8us when using Xen's event channels. If there's interest in this code, I'm willing to clean it up and/or share it with others.

A little bit of background... For my dissertation work, I'm enabling portable file system implementations by running a file system in a VM. Small file system-agnostic modules in the kernel pass all VFS operations from the user OS (running user applications) to the file system VM (running the preferred OS for the file system). In contrast to user-level file systems, my approach leverages unmodified file system implementations and provides better isolation for the FS from the myriad OSs that a user may be running. I've implemented a unified buffer caching mechanism between VMs that requires very little changes to the OSs: less than a dozen line of changes. Additionally, we've modified Xen's migration mechanism to support atomic migration of two VMs. We currently have NetBSD and Linux (2.6.18 and 2.6.28) ports. I've implemented an IPC layer that's very similar to the one in the block and network PV drivers (i.e., uses shared memory for data transfer and event channels for signaling).

Unfortunately, Xen's event channels were too slow for my purposes. For the remainder of this email, assume that each VM has a dedicated core -- I'm trying to optimize latency for this case. The culprit is the overhead for context switching to the guest OS interrupt handler (~3.5us for x86_64 2.6.28) and another context switch to a worker thread (~3us). In addition, there's a ~2us cost for making a "send event" hypercall; this includes the cost of a hypercall and for sending an x86 inter-process-interrupt (IPI). Thus, a one-way event notification costs ~8us. Thus, an IPC takes ~16us for a request and a response notification. This cost hasn't been problematic for the block and network drivers primarily since the hardware access cost for the underlying operations is typically in the millisecond range. An extra 16us is noise.

Our design goal of preserving file system semantics without modifying the file system necessitates that all VFS operations are sent to the file system VM. In other words, there is no client caching. Thus, there is a high frequency of IPCs among the VMs. For example, we pass all in-cache data and metadata accesses, and permission checks and directory entry validation callbacks. These VFS operations can often cost less than 1us. Adding a 16us signaling cost is thus a big overhead, slowing macrobenchmarks by ~20%.

I implemented a polling mechanism that spins on a shared memory location to check for requests/responses. Its performance overhead was minimal (<1us). But it had an adverse effect on power consumption during idle time. Fortunately, since the Pentium chip, x86 has included two instructions for efficiently (power-wise) implementing this type of inter-processor polling. A processor executes a monitor instruction with a memory address to be monitored, then executes an mwait instruction. The mwait instruction returns when a write occurs to that memory location, or when an interrupt occurs.

The mwait instruction is privileged. So I added a new hypercall that wraps access to the mwait instruction. Thus, my code has a Xen component (the new hypercall) and a guest kernel component (code for executing the hypercall and for turning off/on the timer interrupts around the hypercall). For this code to be merged into Xen, it would need to add security checks and check whether the processor supports such a feature.

Are any folks interested in this code? Would it make sense to integrate this into Xen? I've implemented the guest code in Linux 2.6.28, but I can easily port it to 2.6.30 or 2.6.18. I'm also happy to provide my benchmarking code.


Xen-devel mailing list