WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie saved domai

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie saved domains
From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Date: Thu, 10 Jun 2010 10:54:57 -0700 (PDT)
Cc: JBeulich@xxxxxxxxxx
Delivery-date: Thu, 10 Jun 2010 10:58:06 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C836CD82.174BB%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4e50e2b3-6f87-47cc-88b5-b626a742cd74@default C836CD82.174BB%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Could you give the attached a try on your test case?  If
it passes and Jan thinks it is OK (as I backed out most of
his patch at cs 20918), then:

Tmem: fix domain refcount leak by returning to simpler model
which claims a ref once when the tmem client is first associated
with the domain, and puts it once when the tmem client is
destroyed.

Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

> If you have a handle on a domain already I wonder why you need to
> continually look up by domid...

Nearly all tmem uses are via current->domain.  The remaining
(such as from the tools) are via a specified domid.  I don't
keep a domid->domain lookup table around as the frequency is
very low and the existing mechanism works fine (or it does
if I use it correctly anyway ;-)

> RCU locking
> is fully implemented already. It's highly unlikely to change in future
> and we can work out something else for your case if that happens.

I guess I was confused by the fact that the rcu_lock/unlock macros
are no-ops.

In any case, I think I understand the semantics well enough now
after your second reply pointing me to rcu_unlock_domain, so
I think the attached patch should avoid special cases in the
future.

> > I'd like to do a get_domain_by_id() without doing a get_domain()
> > as the tmem code need only get_domain() once on first use
> > and put_domain() once when destroying, but frequently needs
> > to lookup a domain by id.
> 
> If you have a handle on a domain already I wonder why you need to
> continually look up by domid...
> 
> > It looks like rcu_lock_domain_by_id() does what I need, but
> > I don't need any rcu critical sections (outside of the domain
> > lookup itself) and am fearful that if rcu locking ever is fully
> > implemented, my use of rcu_lock_domain_by_id() would become
> > incorrect and I may have a problem.  Should I create an equivalent
> > get_domain_by_id_no_ref()?  Or am I misunderstanding something?
> 
> If you really know what you're doing, I suggest just have your own
> tmh_lookup_domain() macro mapping onto rcu_lock_domain_by_id(). RCU
> locking
> is fully implemented already. It's highly unlikely to change in future
> and
> we can work out something else for your case if that happens.
> 
> I'm not keen on providing an explicitly synchronisation-free version in
> common code. It just encourages people not to think about
> synchronisation at
> all.
> 
>  -- Keir
> 
> > Semi-related, rcu_lock_domain_by_id() has a "return d" inside
> > the for loop without an rcu_read_unlock().  I see that this
> > is irrelevant now because rcu_read_unlock() is a no-op anyway,
> > but maybe this should be cleaned up for the same reason --
> > in case rcu locking is ever fully implemented.
> >
> > Thanks,
> > Dan
> >
> >> -----Original Message-----
> >> From: Dan Magenheimer
> >> Sent: Thursday, June 10, 2010 7:08 AM
> >> To: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxx
> >> Subject: [Xen-devel] RE: Bug in tmem: refcount leak leaves zombie
> saved
> >> domains
> >>
> >> OK, will take a look.
> >>
> >> IIRC, Jan's work to fix the domain reference stuff just
> >> before 4.0 shipped was a heavy hammer but since it seemed
> >> to work, I didn't want to mess with it so close to release...
> >> really there's only a need to take a reference once on
> >> first use and release it at shutdown, rather than
> >> take/release frequently.  IIRC, I had used a macro that
> >> took references when they weren't really needed and
> >> Jan placed the matching macros that did the release.
> >>
> >> Dan
> >>
> >>> -----Original Message-----
> >>> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx]
> >>> Sent: Thursday, June 10, 2010 3:47 AM
> >>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> >>> Cc: Dan Magenheimer
> >>> Subject: Bug in tmem: refcount leak leaves zombie saved domains
> >>>
> >>> Dan,
> >>>
> >>> Just doing some save/restore testing on xen-unstable tip, I noticed
> >>> that:
> >>> # xm create ./pv_config
> >>> # xm save PV1
> >>>
> >>> Would leave the saved guest as a zombie in the DOMDYING_dead state
> >> with
> >>> no
> >>> pages, yet with refcnt=1. This happens absolutely consistently.
> Just
> >> as
> >>> consistently, it does not happen when I boot Xen with no-tmem. My
> >>> conclusion
> >>> is that tmem is leaking a domain reference count during domain
> save.
> >>> This
> >>> doesn't happen if I merely "xm create ...; xm destroy ...".
> >>>
> >>> My pv_config file contains nothing exciting:
> >>> kernel = "/nfs/keir/xen/xen64.hg/dist/install/boot/vmlinuz-
> 2.6.18.8-
> >>> xenU"
> >>> memory = 750
> >>> name = "PV1"
> >>> vcpus = 2
> >>> vif = [ 'mac=00:1a:00:00:01:01' ]
> >>> disk = [ 'phy:/dev/VG/Suse10.1_64_1,sda1,w' ]
> >>> root = "/dev/sda1 ro xencons=tty"
> >>> extra = ""
> >>> tsc_native = 1
> >>> on_poweroff = 'destroy'
> >>> on_reboot   = 'restart'
> >>> on_crash    = 'preserve'
> >>>
> >>> The dom{0,U} kernels are tip of linux-2.6.18-xen, default -xen{0,U}
> >>> configs.
> >>>
> >>>  -- Keir
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@xxxxxxxxxxxxxxxxxxx
> >> http://lists.xensource.com/xen-devel
> 
> 

Attachment: tmem-fix-refcount-leak.patch
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel