WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] PDFLUSH deadlock

To: Xen-Users <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-users] PDFLUSH deadlock
From: Gareth Bult <gareth@xxxxxxxxxxxxx>
Date: Sun, 13 Jan 2008 11:02:47 +0000 (GMT)
Delivery-date: Sun, 13 Jan 2008 03:03:37 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <27060467.191200157706456.JavaMail.root@zimbra>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Hi,

I did post this before, but as time passes I'm becoming more confident of my observations.

When using XEN in the way I'm using it, there is a deadlock condition somewhere in PDFLUSH that can lock your machine up.
The lockup may be very short / un-noticeable, or it may extent to a number of minutes.

Conditions
========

Lots of DomU's (say 10) all running off file-backed storage.
Dom0 with ~ 1/8 th of the system memory allocated to it. (say 800k on a 6G machine)
Big machine, 4 CPU's, 6G RAM.

Example cause of lockup
==================

In a DomU, do;
dd if=/dev/zero of=/tmp/big bs=1M count=1000

If you watch /proc/meminfo, "Dirty" will grow VERY rapidly and hit 500,000+ within seconds.
Wait for 10 seconds, then try to use "vi" in the Dom0, it will freeze.

"ps ax" in the Dom0 will reveal a number of processes have gone "D" state and at this point things like "df" will lock your session.

Cure
====

Type "sync" in the Dom0 - instant release.

Things I've tried
===========

Tweaking /proc/sys/vm/dirty_*

Whereas these can make a difference, there is stilla fundamental problem. PDFLUSH reaches a point where it "should" sync, and does not, which then causes itself to "pause", which leads to a deadlock as the system runs out of free pages.

Short terms fix
===========

I now have a "live" server running 10 DomU's in a hostile / live Internet environment.
Uptime - 3 days.

It's running very well and very smoothly, however this is because the Dom0 is running the following script;

root@nodea:~# cat syncme.sh
#!/bin/bash
while true; do sync ; sleep 5 ; done

If I kill this script, the server is guaranteed to lock up depending on load .. typically I would expect 10-15 mins.

QUESTION::
=========

I don't know why this only effects XEN Dom0's, however I notice that PDFLUSH's algorithms do seem to be dependent on the amount of memory available in the systems. Does anyone know if the balloon memory driver makes adjustments here, or after starting a DomU does the Dom0's PDFLUSH still think it has access to 100% of the system ram ????

tia
Gareth.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
<Prev in Thread] Current Thread [Next in Thread>