WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] page ref/type count overflows

To: "Gianluca Guida" <gianluca.guida@xxxxxxxxxxxxx>, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx>, "Tim Deegan" <Tim.Deegan@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] page ref/type count overflows
From: "Jan Beulich" <jbeulich@xxxxxxxxxx>
Date: Thu, 29 Jan 2009 10:54:09 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 29 Jan 2009 02:53:59 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C5A71D2C.513%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4981782A.76E4.0078.0@xxxxxxxxxx> <C5A71D2C.513%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 29.01.09 09:45 >>>
>On 29/01/2009 08:34, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:
>> Actually, I'd like to go a step further: Is there any reason why struct
>> shadow_page_info must be separate from struct page_info (rather than
>> sharing the definition, requiring some re-ordering of its elements)?
>
>Not really, apart from wanting to keep shadow stuff in one place in a
>private header file, I suppose. Would it risk turning page_info's definition
>into crazy union soup? If it could be done as something like:
> unsigned long count_info
> union {
>  struct { page_info stuff }; // anonymous
>  struct { sh_page_info stuff }; // anonymous
> } // anonymous
>That would be nicer than what we currently have, I'd agree. And the more
>stuff we can pull out of the anonymous union (e.g., perhaps a list_head?)
>then the better, of course.

Below is what I currently have it at. I'm afraid it won't get much simpler,
but I think it reasonably expresses the individual overlays. There are three
more transformations I plan to make:
- _domain -> unsigned int
- next_shadow -> __mfn_t
- split u into two unions (one having type_info, type/pinned/count, and
cpumask, the other having _domain, back, and order).

That last step is to avoid having to re-add __attribute__ ((__packed__)),
so that other (future) changes to the structure won't risk mis-aligning any
fields again.

Does this look acceptable?

Jan

/*
 * This definition is solely for the use in struct page_info (and
 * struct page_list_head), intended to allow easy adjustment once x86-64
 * wants to support more than 16Tb.
 * 'unsigned long' should be used for MFNs everywhere else.
 */
#define __mfn_t unsigned int

#ifndef __i386__
#undef page_list_entry
struct page_list_entry
{
    __mfn_t next, prev;
};
#endif

struct page_info
{
    union {
        /* Each frame can be threaded onto a doubly-linked list.
         *
         * For unused shadow pages, a list of pages of this order; for
         * pinnable shadows, if pinned, a list of other pinned shadows
         * (see sh_type_is_pinnable() below for the definition of
         * "pinnable" shadow types).
         */
        struct page_list_entry list;
        /* For non-pinnable shadows, a higher entry that points at us. */
        paddr_t up;
    };

    /* Reference count and various PGC_xxx flags and fields. */
    unsigned long count_info;

    /* Context-dependent fields follow... */
    union {

        /* Page is in use: ((count_info & PGC_count_mask) != 0). */
        struct {
            /* Owner of this page (NULL if page is anonymous). */
            unsigned long _domain; /* pickled format */
            /* Type reference count and various PGT_xxx flags and fields. */
            unsigned long type_info;
        } inuse;

        /* Page is in use by shadow code: count_info == 0. */
        struct {
            unsigned long type:5;   /* What kind of shadow is this? */
            unsigned long pinned:1; /* Is the shadow pinned? */
            unsigned long count:26; /* Reference count */
            union {
                /* When in use, GMFN of guest page we're a shadow of. */
                __mfn_t back;
                /* When free, order of the freelist we're on. */
                unsigned int order;
            };
        } sh;

        /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
        struct {
            /* Order-size of the free chunk this page is the head of. */
            u32 order;
            /* Mask of possibly-tainted TLBs. */
            cpumask_t cpumask;
        } free;

    } u;

    union {
        /*
         * Timestamp from 'TLB clock', used to avoid extra safety flushes.
         * Only valid for: a) free pages, and b) pages with zero type count
         * (except page table pages when the guest is in shadow mode).
         */
        u32 tlbflush_timestamp;

        /*
         * When PGT_partial is true then this field is valid and indicates
         * that PTEs in the range [0, @nr_validated_ptes) have been validated.
         * An extra page reference must be acquired (or not dropped) whenever
         * PGT_partial gets set, and it must be dropped when the flag gets
         * cleared. This is so that a get() leaving a page in partially
         * validated state (where the caller would drop the reference acquired
         * due to the getting of the type [apparently] failing [-EAGAIN])
         * would not accidentally result in a page left with zero general
         * reference count, but non-zero type reference count (possible when
         * the partial get() is followed immediately by domain destruction).
         * Likewise, the ownership of the single type reference for partially
         * (in-)validated pages is tied to this flag, i.e. the instance
         * setting the flag must not drop that reference, whereas the instance
         * clearing it will have to.
         *
         * If @partial_pte is positive then PTE at @nr_validated_ptes+1 has
         * been partially validated. This implies that the general reference
         * to the page (acquired from get_page_from_lNe()) would be dropped
         * (again due to the apparent failure) and hence must be re-acquired
         * when resuming the validation, but must not be dropped when picking
         * up the page for invalidation.
         *
         * If @partial_pte is negative then PTE at @nr_validated_ptes+1 has
         * been partially invalidated. This is basically the opposite case of
         * above, i.e. the general reference to the page was not dropped in
         * put_page_from_lNe() (due to the apparent failure), and hence it
         * must be dropped when the put operation is resumed (and completes),
         * but it must not be acquired if picking up the page for validation.
         */
        struct {
            u16 nr_validated_ptes;
            s8 partial_pte;
        };

        /*
         * Guest pages with a shadow.  This does not conflict with
         * tlbflush_timestamp since page table pages are explicitly not
         * tracked for TLB-flush avoidance when a guest runs in shadow mode.
         */
        u32 shadow_flags;

        /* When in use, next shadow in this hash chain */
        struct page_info *next_shadow;
    };
};


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel