[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [HVM] Corruption of buffered_io_page

  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: "Trolle Selander" <trolle.selander@xxxxxxxxx>
  • Date: Wed, 6 Dec 2006 17:38:35 +0100
  • Delivery-date: Wed, 06 Dec 2006 08:38:35 -0800
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=jj2Xt2FerJBeHMEd1RTQOzniFM42zVYDcYAhe4ozHh2Axr4JaCR4Ko1Ew6zK4ebiOq+Wh6ZpWdk5chfoNlylxPrCl1OzXvnZSYtr5bXZ5yII+uzBT2BIoBmu17eUbOAgaXMFDnsApWaLF3CBVFoGtMDkG1iONsOb2jui3ehxquo=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I've been tracking a strange bug which leaves qemu spinning at 100% and the hvm domain stalled. The final vmexit before this happens is a pio read from an unused port (0x23), leading me to fairly quickly start believing the final vmexit is actually unrelated to the bug itself. I've seen very similar symptoms before, but I haven't previously looked into what qemu-dm was actually doing when it was "spinning". Now I know:

The problem is caused by something corrupting buffered_iopage by writing a value into buffered_iopage->read_pointer which is bigger than buffered_iopage->write_pointer. Since the loop in __handle_buffered_iopage is looping on != rather than <, it never exits (well... it would eventually, but it's a 64bit value, so it'd take a while...). I checked around in the qemu code as well as steppting through the execution with a debugger, but I could find nothing there that appears to write anything "bad" into the read_pointer. In fact, when inserting logging to trace where, exactly, the corrupted value gets inserted, it seemed to happen at a "random" point, rather than tied to the execution of any specific function that I could see.

read_pointer is the first member of buffered_ioreq_t, so on the hunch that the corruption was occuring by something other than a wrong value actually being written into the structure member, either overflowing a previous structure in memory or a pointer var mistake. I thus added a 64bit dummy member to "pad" the buffered_ioreq_t structure at the start, and as I had suspected, the bad value does get written into this dummy member rather than the read_pointer. I haven't (yet) been able to track down what it is that actually writes the bad value, and any help finding it would be welcome.
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.