Re: [Xen-devel] [HVM] Corruption of buffered_io_page

To:	xen-devel@xxxxxxxxxxxxxxxxxxx
Subject:	Re: [Xen-devel] [HVM] Corruption of buffered_io_page
From:	"Trolle Selander" <trolle.selander@xxxxxxxxx>
Date:	Thu, 7 Dec 2006 10:50:48 +0100
Delivery-date:	Thu, 07 Dec 2006 01:50:49 -0800
Domainkey-signature:	a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=hMRWbnS3LVu28FFwVa7yL3tBJwBCLLC1c5T0IQizu4OXdRqxEujdi4YEj4fmj5YeNj4VHJcCIT2cGuhdE/XfHfq1RxkBqTM2vBeNpdrSlRld43Jy0VREJG224kRUzwHh0wn2QlSI5SKkpSbECWe4A9KBALaCS86fwZSMnADw+HA=
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxx
In-reply-to:	<907625E08839C4409CE5768403633E0B018E17BA@xxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<8A87A9A84C201449A0C56B728ACF491E04EDE0@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <907625E08839C4409CE5768403633E0B018E17BA@xxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx

I thought i had replied to the list, but apparently gmail's default reply action goes to the last poster, not the mailing list. Ian got these answers already, but I'll cut & paste to make sure it gets to the general list as well:

Distro is FC6, compiler is gcc-4.1.1, guest os is OS/2, worload is the boot process. :)
No drivers are loaded yet at this stage - it happens fairly early in the boot. However, after I added the corruption-catching "padding" struct member, the boot does in fact progress to the driver loading stage, although with severely corrupted boot-logo graphics.
Since it currently happens reproducibly after a specific "no op" vmexit (read from an unused port), Mats's suggestion of marking the iopage read-only sounds doable if I insert code to set the page readonly when this specific vmexit occurs. From what I saw when running qemu in the debugger, there's no "proper" use of the page about to occur, so the only thing that will write to it should be whatever is doing it erroneously. I'll try that tomorrow.

One correction: I managed to confuse myself a bit here. The very last vmexit_ioio at which the guest stalls is a read from 0x1f7, but when that io happens, the iopage is already corrupted, and that's why it stalls - qemu-dm is "stuck" and never performs the io. The port 0x23 is the io preceeding that one - the last one that "gets through", which is why that was the one I've used to trace things.

I don't know if it's any clue to anyone, but the bad value that gets written into read_pointer is 0x1df1000.

Now to what you said - I thought Keir's patch to fixed up all the segment base = 0 assumptions in x86_emulate? At least we're past the problem that was causing that I posted about before. I must confess I never actually looked at the code, because Keir said the patch would fix all the segment base = 0 assumptions, and once the patch showed up in mercurial and I built from that changeset, I didn't hit the seg.base != 0 problem anymore.

On 12/7/06, Petersson, Mats <Mats.Petersson@xxxxxxx> wrote:

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Ian Pratt
> Sent: 06 December 2006 22:05
> To: Trolle Selander; xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] [HVM] Corruption of buffered_io_page
>
> > read_pointer is the first member of buffered_ioreq_t, so on
> the hunch
> that
> > the corruption was occuring by something other than a wrong value
> actually
> > being written into the structure member, either overflowing
> a previous
> > structure in memory or a pointer var mistake. I thus added a 64bit
> dummy
> > member to "pad" the buffered_ioreq_t structure at the
> start, and as I
> had
> > suspected, the bad value does get written into this dummy member
> rather
> > than the read_pointer. I haven't (yet) been able to track
> down what it
> is
> > that actually writes the bad value, and any help finding it would be
> > welcome.
>
> What compiler are you using? What guest OS? Are you using PV
> or emulated
> drivers? Any idea if there are particular workloads that provoke the
> problem?

I'll answer for Trolle as best as I can:
Compiler: gcc 4.1 I believe.
Guest OS: OS/2
Drivers would be emulated ones.
I think it's failing during initial boot, as Trolle hasn't told me "It
works" yet... ;-)

By the way, I'm still a bit worried that this is caused by segment base
!= 0 in x86_emulate.c - this can cause all sorts of "interesting"
interaction between the page-table updates and actual memory being
affected.

--
Mats
>
> Best,
> Ian
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [HVM] Corruption of buffered_io_page