[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [HVM] Corruption of buffered_io_page

To: xen-devel@xxxxxxxxxxxxxxxxxxx
From: "Trolle Selander" <trolle.selander@xxxxxxxxx>
Date: Wed, 6 Dec 2006 17:38:35 +0100
Delivery-date: Wed, 06 Dec 2006 08:38:35 -0800
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=jj2Xt2FerJBeHMEd1RTQOzniFM42zVYDcYAhe4ozHh2Axr4JaCR4Ko1Ew6zK4ebiOq+Wh6ZpWdk5chfoNlylxPrCl1OzXvnZSYtr5bXZ5yII+uzBT2BIoBmu17eUbOAgaXMFDnsApWaLF3CBVFoGtMDkG1iONsOb2jui3ehxquo=
List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I've been tracking a strange bug which leaves qemu spinning at 100% and the hvm domain stalled. The final vmexit before this happens is a pio read from an unused port (0x23), leading me to fairly quickly start believing the final vmexit is actually unrelated to the bug itself. I've seen very similar symptoms before, but I haven't previously looked into what qemu-dm was actually doing when it was "spinning". Now I know:

The problem is caused by something corrupting buffered_iopage by writing a value into buffered_iopage->read_pointer which is bigger than buffered_iopage->write_pointer. Since the loop in __handle_buffered_iopage is looping on != rather than <, it never exits (well... it would eventually, but it's a 64bit value, so it'd take a while...). I checked around in the qemu code as well as steppting through the execution with a debugger, but I could find nothing there that appears to write anything "bad" into the read_pointer. In fact, when inserting logging to trace where, exactly, the corrupted value gets inserted, it seemed to happen at a "random" point, rather than tied to the execution of any specific function that I could see.

read_pointer is the first member of buffered_ioreq_t, so on the hunch that the corruption was occuring by something other than a wrong value actually being written into the structure member, either overflowing a previous structure in memory or a pointer var mistake. I thus added a 64bit dummy member to "pad" the buffered_ioreq_t structure at the start, and as I had suspected, the bad value does get written into this dummy member rather than the read_pointer. I haven't (yet) been able to track down what it is that actually writes the bad value, and any help finding it would be welcome.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

Follow-Ups:
- RE: [Xen-devel] [HVM] Corruption of buffered_io_page
  - From: Ian Pratt

Prev by Date: RE: [Xen-devel] [RFC][PATCH] 1/3] [XEN] Use explicit bit sized fieldsfor exported xentrace data.
Next by Date: Re: [Xen-devel] [PATCH] Scrub VNC passwords from XenD log files
Previous by thread: [Xen-devel] direct I/O access from domU
Next by thread: RE: [Xen-devel] [HVM] Corruption of buffered_io_page
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.