To whom it may concern,
For many months, some of us at Novell working on and testing Xen have contended
with chaotic mouse behavior in HVM Linux guests. This ill-mannered mouse,
however, appears to be sensitive to certain hardware. Although I have seen the
mouse jump around the screen occasionally on diverse machines, I see it
continuously on the Harwich Twin Castle Paxville (3GHz, 8GB, x86_64, 8 way
duel-core). The mouse is completely unusable in the guest as the slightest
mouse event produces wild results in the guest, either erratic mouse movement
or button presses.
Bug 167187, “Erratic mouse behavior with HVM Linux guest and SDL” was entered
into Novell's Bugzilla April 17th, 2006, and Intel was informed of the issue.
Since Novell's first release of Xen with SLES is with full support of
para-virtualized guests, this issue relative to the HVM guest has been put
aside until recently when I began to explore the cause of the mouse problem.
Here's what I've learned.
First, the mouse behaves erratically because the data coming out of
/dev/input/mice is jumbled up, out of order actually. This was rather
perplexing because I had been able to determine that qemu was delivering the
data in the proper order and, in fact, i8042_interrupt() of
linux-2.6.16/drivers/input/serio/i8042.c executing in the HVM guest was also
reporting that the data had been read in proper order, yet the processing of
the data occurred out of order.
After exploring a number of possible causes for this behavior I discovered an
assumption in the kernel code that is true when the kernel is running natively
but not necessarily true when hosted by the hypervisor.
I learned that the i8042_interrupt() will be polled by the timer interrupt if
HZ/20 jiffies has expired since the last 8042 interrupt. So here's what I
believe is happening. Each mouse event generates at least three bytes of data,
each byte of data generates an interrupt. When the first interrupt is injected
in the guest, as well as all interrupts, the kernel masks the interrupt vector
in the PIC and then EOIs the PIC before actually handling the interrupt. This,
of course, allows ANY other interrupt to occur save the one currently begin
serviced. When i8042_interrupt() is called, it first calls timer_mod() to
delay the timer callback another HZ/20, takes a spin_lock_irqsave() disabling
interrupts (interrupts are enabled prior to i8042_interrupt() being called),
reads the 8042 obtaining the first byte of data from qemu, and then releases
the spinlock. Immediately after releasing the spinlock, this isr is
interrupted by a timer interrupt which discovers that the 8042's HZ/20 timer
has expired and i8042_interrupt() is reentered and runs to completion as there
is not a pending timer interrupt. When the timer interrupt completes, the
previously interrupted isr resumes and continues to process what was to be the
first byte but now is not. I have been able to determine that the timer is
indeed calling i8052_interrupt() and causing the mis-ordered data.
For the timer interrupt handler to believe that HZ/20 jiffies had expired there
must have been at least that amount of time lapse between i8052_interrupt()
releasing the spinlock and calling serio_intrerrupt() a dozen lines later,
suggesting a lengthy hypervisor preemption followed by a timer isr before
resuming from the point of preemption. Or, a considerable amount of time, >
HZ/20, expired reading the data from qemu's emulation of port 0x60, followed by
a timer isr after the spin_unlock_irqrestore() in i8052_interrupt(). Which
ever case may be, i8052_interrupt() is _assuming_ that HZ/20 jiffies are not
going to lapse before its isr completes. This assumption is probably fair
enough for running natively, but not a good assumption when hosted by the
current implementation of the hypervisor.
The question now is, does the hypervisor change to accommodate the assumption,
or is the assumption removed from the kernel, or is there yet some other
fiendish time-consuming bug yet to be discovered ?
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|