RE: [Xen-devel] [HVM] Corruption of buffered_io_page

 

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Trolle Selander
> Sent: 07 December 2006 09:51
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [HVM] Corruption of buffered_io_page
> 
> I thought i had replied to the list, but apparently gmail's 
> default reply action goes to the last poster, not the mailing 
> list. Ian got these answers already, but I'll cut & paste to 
> make sure it gets to the general list as well: 
> 
> 
> 
>       Distro is FC6, compiler is gcc-4.1.1, guest os is OS/2, 
> worload is the boot process. :) 
>       No drivers are loaded yet at this stage - it happens 
> fairly early in the boot. However, after I added the 
> corruption-catching "padding" struct member, the boot does in 
> fact progress to the driver loading stage, although with 
> severely corrupted boot-logo graphics. 
>       Since it currently happens reproducibly after a 
> specific "no op" vmexit (read from an unused port), Mats's 
> suggestion of marking the iopage read-only sounds doable if I 
> insert code to set the page readonly when this specific 
> vmexit occurs. From what I saw when running qemu in the 
> debugger, there's no "proper" use of the page about to occur, 
> so the only thing that will write to it should be whatever is 
> doing it erroneously. I'll try that tomorrow.
>       
>       One correction: I managed to confuse myself a bit here. 
> The very last vmexit_ioio at which the guest stalls is a read 
> from 0x1f7, but when that io happens, the iopage is already 
> corrupted, and that's why it stalls - qemu-dm is "stuck" and 
> never performs the io. The port 0x23 is the io preceeding 
> that one - the last one that "gets through", which is why 
> that was the one I've used to trace things. 
>       
>       I don't know if it's any clue to anyone, but the bad 
> value that gets written into read_pointer is 0x1df1000.

One thing is for sure, it's not a page-table entry. But it could be the
value of a physical page-address. Is this value in any of the registers
around the time of the crash?

>       
> 
> 
> Now to what you said - I thought Keir's patch to fixed up all 
> the segment base = 0 assumptions in x86_emulate? At least 
> we're past the problem that was causing that I posted about 
> before. I must confess I never actually looked at the code, 
> because Keir said the patch would fix all the segment base = 
> 0 assumptions, and once the patch showed up in mercurial and 
> I built from that changeset, I didn't hit the seg.base != 0 
> problem anymore.

There was patch(es) from Jan Beulich to fix the HVM side of seg.base !=
0, but as far as I've seen, Keir hasn't posted his "big patch" yet -
it's possible that Keir could send you a "private patch". This patch
fixed x86_emulate.c, which isn't in the HVM section, it's the part that
fixes up page-table writes (which you may not see many of at the early
part of boot, so it may not be an issue, of course). 

Whilst marking the page read-only and trapping on the page-fault is
indeed doable, I'd also add a check on every vmexit and just at the time
of doing vmrun (in the C code just before calling svm_asm_do_resume() or
some such). 

--
Mats
> 
> 
> On 12/7/06, Petersson, Mats <Mats.Petersson@xxxxxxx> wrote:
> 
> 
> 
>       > -----Original Message-----
>       > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>       > [mailto: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> <mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx> ] On Behalf Of 
> Ian Pratt
>       > Sent: 06 December 2006 22:05
>       > To: Trolle Selander; xen-devel@xxxxxxxxxxxxxxxxxxx
>       > Subject: RE: [Xen-devel] [HVM] Corruption of buffered_io_page 
>       >
>       > > read_pointer is the first member of buffered_ioreq_t, so on
>       > the hunch
>       > that
>       > > the corruption was occuring by something other than 
> a wrong value
>       > actually
>       > > being written into the structure member, either overflowing 
>       > a previous
>       > > structure in memory or a pointer var mistake. I 
> thus added a 64bit
>       > dummy
>       > > member to "pad" the buffered_ioreq_t structure at the
>       > start, and as I
>       > had 
>       > > suspected, the bad value does get written into this 
> dummy member
>       > rather
>       > > than the read_pointer. I haven't (yet) been able to track
>       > down what it
>       > is
>       > > that actually writes the bad value, and any help 
> finding it would be 
>       > > welcome.
>       >
>       > What compiler are you using? What guest OS? Are you using PV
>       > or emulated
>       > drivers? Any idea if there are particular workloads 
> that provoke the
>       > problem?
>       
>       I'll answer for Trolle as best as I can:
>       Compiler: gcc 4.1 I believe.
>       Guest OS: OS/2
>       Drivers would be emulated ones.
>       I think it's failing during initial boot, as Trolle 
> hasn't told me "It
>       works" yet... ;-) 
>       
>       By the way, I'm still a bit worried that this is caused 
> by segment base
>       != 0 in x86_emulate.c - this can cause all sorts of 
> "interesting"
>       interaction between the page-table updates and actual 
> memory being 
>       affected.
>       
>       --
>       Mats
>       >
>       > Best,
>       > Ian
>       >
>       >
>       > _______________________________________________
>       > Xen-devel mailing list
>       > Xen-devel@xxxxxxxxxxxxxxxxxxx 
> <mailto:Xen-devel@xxxxxxxxxxxxxxxxxxx> 
>       > http://lists.xensource.com/xen-devel
>       >
>       >
>       >
>       
>       
>       
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] [HVM] Corruption of buffered_io_page