WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ia64-devel

[Xen-ia64-devel] Important Xen/ia64 domU/vbd fix committed

To: <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-ia64-devel] Important Xen/ia64 domU/vbd fix committed
From: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@xxxxxx>
Date: Tue, 20 Dec 2005 12:54:50 -0800
Delivery-date: Tue, 20 Dec 2005 20:57:29 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post: <mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcYFp556+aqCg5FESJu5rrSeo+jchg==
Thread-topic: Important Xen/ia64 domU/vbd fix committed
I've just committed a bug fix to xen-ia64-unstable.hg that seems
to make domU much much more stable.  With it, I have been able
to untar and build Linux on domU for the first time, and also
fsck domU's "disk" (rhel.img file).

Before I explain, let me first apologize for falling behind on
other patches.  I have been very focused on understanding and
exploring various solutions to this bug since last Friday.
I'll try to catch up after I get some sleep and recovery time.
(Note also that HP is closed for the holidays from Dec 23 PM
until Jan 3.  I think I will have some access to email and test
machines during that time, but will also take some vacation days.)

The problem:  Although domU has been booting successfully for
many Xen/ia64 users, everyone has experienced some instability.
While many commands work fine, others fail and some have
caused the system to crash.  In particular, some file intensive
operations such as fsck and untar'ing Linux consistently fail,
and in some cases dom0's disk has been trashed, requiring a
full RHEL reinstallation.

Last week, Matt Chapman isolated a serious problem:  When
domU shares pages with dom0 (e.g. for virtual I/O rings), dom0
accesses them by "direct mapping" a domU machine address.
While this works fine, some drivers layered under the dom0
virtual I/O backend (including the loopback driver) sometimes
use a virt_to_page() on the dom0 virtual address.  Since
the virtual address represents a physical address that was
not in dom0's EFI memory map, dom0's memmap may not have
allocated a "struct page" for this address, so virt_to_page
gives an address of a non-existent "struct page" (e.g. off
the end of the memmap array).  Accessing this non-existent
struct page may read/write "random" memory in dom0, domU,
or even in Xen itself.  Boom!

The obvious answer is to ensure that when dom0 boots, a memmap
is built that is sufficiently large to cover accesses to
domU shared pages.  This is easier said than done.  After
several days of poring over Linux code, consulting with HP
Linux experts, and trying out various solutions, I gave up;
without significant changes to Linux (including common code),
I don't think it is possible to coax Linux to both create
a memmap to cover all of physical memory AND ensure that it doesn't
use those pages itself.

I also considered giving all memory (except the Xen heap) to
domain0 and "ballooning" it back for domUs.  The core Xen team
wasn't too keen on that approach, the balloon driver isn't yet
implemented on Xen/ia64, and I think there will be some
challenging security questions (e.g. what if dom0 swaps out
domU's pages to the dom0 disk?).

Finally, I settled on "reserving a chunk" at the end of physical
memory for domain0's exclusive use.  To make this visible via
the EFI mem_map, the chunk has to be granule sized/aligned.
This granule gets reserved early in Xen's boot and gets passed
to dom0 at dom0 launch.  This is an ugly hack, but it is simple,
requires no changes to Linux and, most importantly, it works.
The patch is checked into xen-ia64-unstable.hg as cset 8374.
I would appreciate it if others would give it a try.  And if
someone can implement a better solution, please let me know.

It won't work for NUMA machines, but we can worry about that later.
In the meantime, domU is much much more stable.  I doubt that
this is the "last bug" we will find affecting domU stability,
but it was a tough one.  Thanks very much to Matt for isolating
the problem!

Dan

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-ia64-devel] Important Xen/ia64 domU/vbd fix committed, Magenheimer, Dan (HP Labs Fort Collins) <=