This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] RE: Linux balloon driver stops accepting target_kb for a lon

To: Jan Beulich <JBeulich@xxxxxxxxxx>
Subject: [Xen-devel] RE: Linux balloon driver stops accepting target_kb for a long time
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Tue, 24 Aug 2010 15:38:01 -0700 (PDT)
Cc: jeremy@xxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, Keir Fraser <Keir.Fraser@xxxxxxxxxxxxx>
Delivery-date: Tue, 24 Aug 2010 15:39:26 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4C7394A60200007800011D5D@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <af1cf3f8-2df8-496e-83a0-4d6407ab7e4f@default 4C7394A60200007800011D5D@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
> Subject: Re: Linux balloon driver stops accepting target_kb for a long
> time
> >>> On 24.08.10 at 00:45, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
> wrote:
> > Reviewing code, one thing caught my attention.  In balloon_process(),
> > the balloon_mutex is down'ed then, under certain conditions
> > schedule() is called with the balloon_mutex still held and without
> > another timer set.  Any chance this could be a problem, especially
> > if another kernel thread invokes balloon_set_new_target()?
> > If so, what might finally kick the scheduled-out thread after
> > 30 minutes to reset the balloon_timer and up the mutex?
> How could this be a problem? Calling schedule() is a yield, not an
> indefinite sleep, and hence the loop will resume as soon as there's
> no higher priority runnable task anymore for a long enough time
> (obviously very much less than 30 minutes, unless something
> really odd is running on your box).

Hi Jan --

Well the 1 vcpu system is very busy doing a "make -j64" and there's
a high amount of swap activity.  What priority does balloon_worker
(launched with schedule_work()) have relative to userland
threads and other kernel threads such as kswapd?  I.e. is
it possible that it gets locked out for 30 minutes?  It appears
that the new balloon target is applied only when system activity
goes way down (when the number of cc1's run from make starts
going down).

Is there any way to boost the priority of this thread?
Also, if it matters, the "make -j64" is launched from /etc/rc.local,
so might that boost the priority of the "userland" threads?
> Furthermore, besides the obvious option of inserting some debug
> code

Since it's hard to reproduce, I've been avoiding adding debug
code so I don't lose my test case.  I'm about to try some things
now but hoped to narrow down the likely problem sources first.

> I think SysRq-t would also allow you to check whether
> balloon_process() indeed doesn't exit over a period of minutes

This was a good idea, but I haven't yet gotten a full SysRq-t
output because there are so many processes running and I think
the SysRq-t adds to the general chaos... When I use it, the
guest goes into 100% vcpu usage after the "make -j64" is
complete. :-(  However, I can ssh in and top shows the
thread "events/0" using nearly 100% of the cpu.

Assuming I get a good SysRq-t, would I simply be looking for
a process stack dump with balloon_process in the stack?
Would this kind of a yielded kernel thread even show up in
SysRq-t output?

Xen-devel mailing list