xen-devel
[Xen-devel] Re: [PATCH 4/7] bio-cgroup: Split the cgroup memory subsyste
To: |
kamezawa.hiroyu@xxxxxxxxxxxxxx |
Subject: |
[Xen-devel] Re: [PATCH 4/7] bio-cgroup: Split the cgroup memory subsystem into two parts |
From: |
Hirokazu Takahashi <taka@xxxxxxxxxxxxx> |
Date: |
Thu, 07 Aug 2008 16:25:12 +0900 (JST) |
Cc: |
xen-devel@xxxxxxxxxxxxxxxxxxx, containers@xxxxxxxxxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx, dm-devel@xxxxxxxxxx, agk@xxxxxxxxxxxxxx, ryov@xxxxxxxxxxxxx, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> |
Delivery-date: |
Thu, 07 Aug 2008 00:25:38 -0700 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<16255819.1218030343593.kamezawa.hiroyu@xxxxxxxxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<20080804.175748.189722512.ryov@xxxxxxxxxxxxx> <20080806165421.f76edd47.kamezawa.hiroyu@xxxxxxxxxxxxxx> <16255819.1218030343593.kamezawa.hiroyu@xxxxxxxxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Hi,
> >> > This patch splits the cgroup memory subsystem into two parts.
> >> > One is for tracking pages to find out the owners. The other is
> >> > for controlling how much amount of memory should be assigned to
> >> > each cgroup.
> >> >
> >> > With this patch, you can use the page tracking mechanism even if
> >> > the memory subsystem is off.
> >> >
> >> > Based on 2.6.27-rc1-mm1
> >> > Signed-off-by: Ryo Tsuruta <ryov@xxxxxxxxxxxxx>
> >> > Signed-off-by: Hirokazu Takahashi <taka@xxxxxxxxxxxxx>
> >> >
> >>
> >> Plese CC me or Balbir or Pavel (See Maintainer list) when you try this ;)
> >>
> >> After this patch, the total structure is
> >>
> >> page <-> page_cgroup <-> bio_cgroup.
> >> (multiple bio_cgroup can be attached to page_cgroup)
> >>
> >> Does this pointer chain will add
> >> - significant performance regression or
> >> - new race condtions
> >> ?
> >
> >I don't think it will cause significant performance loss, because
> >the link between a page and a page_cgroup has already existed, which
> >the memory resource controller prepared. Bio_cgroup uses this as it is,
> >and does nothing about this.
> >
> >And the link between page_cgroup and bio_cgroup isn't protected
> >by any additional spin-locks, since the associated bio_cgroup is
> >guaranteed to exist as long as the bio_cgroup owns pages.
> >
> Hmm, I think page_cgroup's cost is visible when
> 1. a page is changed to be in-use state. (fault or radixt-tree-insert)
> 2. a page is changed to be out-of-use state (fault or radixt-tree-removal)
> 3. memcg hit its limit or global LRU reclaim runs.
> "1" and "2" can be catched as 5% loss of exec throuput.
> "3" is not measured (because LRU walk itself is heavy.)
>
> What new chances to access page_cgroup you'll add ?
> I'll have to take into account them.
I haven't add any at this moment, but I thinks some people may want
to move some pages in page-cache from one cgroup to another cgroup.
When that time comes, I'll try to make the cost minimized that
I will probably only update the link between a page_cgroup and
a bio_cgroup and leave the others untouched.
> >I've just noticed that most of overhead comes from the spin-locks
> >when reclaiming the pages inside mem_cgroups and the spin-locks to
> >protect the links between pages and page_cgroups.
> Overhead between page <-> page_cgroup lock is cannot be catched by
> lock_stat now.Do you have numbers ?
> But ok, there are too many locks ;(
The problem is that every time the lock is held, the associated
cache line is flushed.
> >The latter overhead comes from the policy your team has chosen
> >that page_cgroup structures are allocated on demand. I still feel
> >this approach doesn't make any sense because linux kernel tries to
> >make use of most of the pages as far as it can, so most of them
> >have to be assigned its related page_cgroup. It would make us happy
> >if page_cgroups are allocated at the booting time.
> >
> Now, multi-sizer-page-cache is discussed for a long time. If it's our
> direction, on-demand page_cgroup make sense.
I don't think I can agree to this.
When multi-sized-page-cache is introduced, some data structures will be
allocated to manage multi-sized-pages. I think page_cgroups should be
allocated at the same time. This approach will make things simple.
It seems like the on-demand allocation approach leads not only
overhead but complexity and a lot of race conditions.
If you allocate page_cgroups when allocating page structures,
You can get rid of most of the locks and you don't have to care about
allocation error of page_cgroups anymore.
And it will also give us flexibility that memcg related data can be
referred/updated inside critical sections.
> >> For example, adding a simple function.
> >> ==
> >> int get_page_io_id(struct page *)
> >> - returns a I/O cgroup ID for this page. If ID is not found, -1 is returne
> d.
> >> ID is not guaranteed to be valid value. (ID can be obsolete)
> >> ==
> >> And just storing cgroup ID to page_cgroup at page allocation.
> >> Then, making bio_cgroup independent from page_cgroup and
> >> get ID if avialble and avoid too much pointer walking.
> >
> >I don't think there are any diffrences between a poiter and ID.
> >I think this ID is just a encoded version of the pointer.
> >
> ID can be obsolete, pointer is not. memory cgroup has to take care of
> bio cgroup's race condition ? (About race conditions, it's already complicated
> enough)
Bio-cgroup just expects that the call-backs bio-cgroup prepares are called
when the status of a page_cgroup get changed.
> To be honest, I think adding a new (4 or 8 bytes) page struct and record infor
> mation of bio-control is more straightforward approach. Buy as you might
> think, "there is no room"
But only if everyone allows me to add some new members into "struct page."
I think the same thing goes with memcg you're working on.
Thank you,
Hirokazu Takahashi.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|