WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [HVM] Corruption of buffered_io_page

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] [HVM] Corruption of buffered_io_page
From: "Trolle Selander" <trolle.selander@xxxxxxxxx>
Date: Wed, 6 Dec 2006 17:38:35 +0100
Delivery-date: Wed, 06 Dec 2006 08:38:35 -0800
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=jj2Xt2FerJBeHMEd1RTQOzniFM42zVYDcYAhe4ozHh2Axr4JaCR4Ko1Ew6zK4ebiOq+Wh6ZpWdk5chfoNlylxPrCl1OzXvnZSYtr5bXZ5yII+uzBT2BIoBmu17eUbOAgaXMFDnsApWaLF3CBVFoGtMDkG1iONsOb2jui3ehxquo=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I've been tracking a strange bug which leaves qemu spinning at 100% and the hvm domain stalled. The final vmexit before this happens is a pio read from an unused port (0x23), leading me to fairly quickly start believing the final vmexit is actually unrelated to the bug itself. I've seen very similar symptoms before, but I haven't previously looked into what qemu-dm was actually doing when it was "spinning". Now I know:

The problem is caused by something corrupting buffered_iopage by writing a value into buffered_iopage->read_pointer which is bigger than buffered_iopage->write_pointer. Since the loop in __handle_buffered_iopage is looping on != rather than <, it never exits (well... it would eventually, but it's a 64bit value, so it'd take a while...). I checked around in the qemu code as well as steppting through the execution with a debugger, but I could find nothing there that appears to write anything "bad" into the read_pointer. In fact, when inserting logging to trace where, exactly, the corrupted value gets inserted, it seemed to happen at a "random" point, rather than tied to the execution of any specific function that I could see.

read_pointer is the first member of buffered_ioreq_t, so on the hunch that the corruption was occuring by something other than a wrong value actually being written into the structure member, either overflowing a previous structure in memory or a pointer var mistake. I thus added a 64bit dummy member to "pad" the buffered_ioreq_t structure at the start, and as I had suspected, the bad value does get written into this dummy member rather than the read_pointer. I haven't (yet) been able to track down what it is that actually writes the bad value, and any help finding it would be welcome.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>