> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> Trolle Selander
> Sent: 07 December 2006 09:51
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [HVM] Corruption of buffered_io_page
>
> I thought i had replied to the list, but apparently gmail's
> default reply action goes to the last poster, not the mailing
> list. Ian got these answers already, but I'll cut & paste to
> make sure it gets to the general list as well:
>
>
>
> Distro is FC6, compiler is gcc-4.1.1, guest os is OS/2,
> worload is the boot process. :)
> No drivers are loaded yet at this stage - it happens
> fairly early in the boot. However, after I added the
> corruption-catching "padding" struct member, the boot does in
> fact progress to the driver loading stage, although with
> severely corrupted boot-logo graphics.
> Since it currently happens reproducibly after a
> specific "no op" vmexit (read from an unused port), Mats's
> suggestion of marking the iopage read-only sounds doable if I
> insert code to set the page readonly when this specific
> vmexit occurs. From what I saw when running qemu in the
> debugger, there's no "proper" use of the page about to occur,
> so the only thing that will write to it should be whatever is
> doing it erroneously. I'll try that tomorrow.
>
> One correction: I managed to confuse myself a bit here.
> The very last vmexit_ioio at which the guest stalls is a read
> from 0x1f7, but when that io happens, the iopage is already
> corrupted, and that's why it stalls - qemu-dm is "stuck" and
> never performs the io. The port 0x23 is the io preceeding
> that one - the last one that "gets through", which is why
> that was the one I've used to trace things.
>
> I don't know if it's any clue to anyone, but the bad
> value that gets written into read_pointer is 0x1df1000.
One thing is for sure, it's not a page-table entry. But it could be the
value of a physical page-address. Is this value in any of the registers
around the time of the crash?
>
>
>
> Now to what you said - I thought Keir's patch to fixed up all
> the segment base = 0 assumptions in x86_emulate? At least
> we're past the problem that was causing that I posted about
> before. I must confess I never actually looked at the code,
> because Keir said the patch would fix all the segment base =
> 0 assumptions, and once the patch showed up in mercurial and
> I built from that changeset, I didn't hit the seg.base != 0
> problem anymore.
There was patch(es) from Jan Beulich to fix the HVM side of seg.base !=
0, but as far as I've seen, Keir hasn't posted his "big patch" yet -
it's possible that Keir could send you a "private patch". This patch
fixed x86_emulate.c, which isn't in the HVM section, it's the part that
fixes up page-table writes (which you may not see many of at the early
part of boot, so it may not be an issue, of course).
Whilst marking the page read-only and trapping on the page-fault is
indeed doable, I'd also add a check on every vmexit and just at the time
of doing vmrun (in the C code just before calling svm_asm_do_resume() or
some such).
--
Mats
>
>
> On 12/7/06, Petersson, Mats <Mats.Petersson@xxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > [mailto: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> <mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx> ] On Behalf Of
> Ian Pratt
> > Sent: 06 December 2006 22:05
> > To: Trolle Selander; xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: RE: [Xen-devel] [HVM] Corruption of buffered_io_page
> >
> > > read_pointer is the first member of buffered_ioreq_t, so on
> > the hunch
> > that
> > > the corruption was occuring by something other than
> a wrong value
> > actually
> > > being written into the structure member, either overflowing
> > a previous
> > > structure in memory or a pointer var mistake. I
> thus added a 64bit
> > dummy
> > > member to "pad" the buffered_ioreq_t structure at the
> > start, and as I
> > had
> > > suspected, the bad value does get written into this
> dummy member
> > rather
> > > than the read_pointer. I haven't (yet) been able to track
> > down what it
> > is
> > > that actually writes the bad value, and any help
> finding it would be
> > > welcome.
> >
> > What compiler are you using? What guest OS? Are you using PV
> > or emulated
> > drivers? Any idea if there are particular workloads
> that provoke the
> > problem?
>
> I'll answer for Trolle as best as I can:
> Compiler: gcc 4.1 I believe.
> Guest OS: OS/2
> Drivers would be emulated ones.
> I think it's failing during initial boot, as Trolle
> hasn't told me "It
> works" yet... ;-)
>
> By the way, I'm still a bit worried that this is caused
> by segment base
> != 0 in x86_emulate.c - this can cause all sorts of
> "interesting"
> interaction between the page-table updates and actual
> memory being
> affected.
>
> --
> Mats
> >
> > Best,
> > Ian
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> <mailto:Xen-devel@xxxxxxxxxxxxxxxxxxx>
> > http://lists.xensource.com/xen-devel
> >
> >
> >
>
>
>
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|