WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guest

To: Daniel Kiper <dkiper@xxxxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Thu, 12 Aug 2010 17:46:20 -0700
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, stefano.stabellini@xxxxxxxxxxxxx, konrad.wilk@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, v.tolstov@xxxxxxxxx, linux-mm@xxxxxxxxx
Delivery-date: Thu, 12 Aug 2010 17:47:03 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100812012224.GA16479@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20100812012224.GA16479@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100720 Fedora/3.1.1-1.fc13 Lightning/1.0b2pre Thunderbird/3.1.1
 On 08/11/2010 06:22 PM, Daniel Kiper wrote:
Hi,

Here is the third version of memory hotplug support
for Xen guests patch. This one cleanly applies to
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
repository, xen/memory-hotplug head.

Thanks. I'll paste in the full diff and comment on that rather than this incremental update.


diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index fad3df2..4f35eaf 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -9,6 +9,16 @@ config XEN_BALLOON the system to expand the domain's memory allocation, or alternatively return unneeded memory to the system. +config XEN_BALLOON_MEMORY_HOTPLUG + bool "Xen memory balloon driver with memory hotplug support" + default n + depends on XEN_BALLOON && MEMORY_HOTPLUG + help + Xen memory balloon driver with memory hotplug support allows expanding + memory available for the system above limit declared at system startup. + It is very useful on critical systems which require long run without + rebooting. + config XEN_SCRUB_PAGES bool "Scrub pages before returning them to system" depends on XEN_BALLOON diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 1a0d8c2..5120075 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -6,6 +6,7 @@ * Copyright (c) 2003, B Dragovic * Copyright (c) 2003-2004, M Williamson, K Fraser * Copyright (c) 2005 Dan M. Smith, IBM Corporation + * Copyright (c) 2010 Daniel Kiper * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2 @@ -44,6 +45,8 @@ #include <linux/list.h> #include <linux/sysdev.h> #include <linux/gfp.h> +#include <linux/memory.h> +#include <linux/suspend.h> #include <asm/page.h> #include <asm/pgalloc.h> @@ -77,6 +80,11 @@ struct balloon_stats { /* Number of pages in high- and low-memory balloons. */ unsigned long balloon_low; unsigned long balloon_high; +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + unsigned long boot_max_pfn; + u64 hotplug_start_paddr; + u64 hotplug_size;


So does this mean you only support adding a single hotplug region? What happens if your initial increase wasn't enough and you want to add more? Would you make this a list of hot-added memory or something?

But I'm not even quite sure why you need to keep this as global data.

+#endif }; static DEFINE_MUTEX(balloon_mutex); @@ -184,17 +192,173 @@ static void balloon_alarm(unsigned long unused) schedule_work(&balloon_worker); } -static unsigned long current_target(void) +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG +static inline u64 is_memory_resource_reserved(void) +{ + return balloon_stats.hotplug_start_paddr; +} + +static int allocate_additional_memory(unsigned long nr_pages) +{ + long rc; + resource_size_t r_min, r_size; + struct resource *r; + struct xen_memory_reservation reservation = { + .address_bits = 0, + .extent_order = 0, + .domid = DOMID_SELF + }; + unsigned long flags, i, pfn; + + if (nr_pages > ARRAY_SIZE(frame_list)) + nr_pages = ARRAY_SIZE(frame_list); + + if (!is_memory_resource_reserved()) { + + /* + * Look for first unused memory region starting at page + * boundary. Skip last memory section created at boot time + * becuase it may contains unused memory pages with PG_reserved + * bit not set (online_pages require PG_reserved bit set). + */ + + r = kzalloc(sizeof(struct resource), GFP_KERNEL); + + if (!r) { + rc = -ENOMEM; + goto out_0; + } + + r->name = "System RAM"; + r->flags = IORESOURCE_MEM | IORESOURCE_BUSY; + r_min = PFN_PHYS(section_nr_to_pfn(pfn_to_section_nr(balloon_stats.boot_max_pfn) + 1)); + r_size = (balloon_stats.target_pages - balloon_stats.current_pages) << PAGE_SHIFT;

So this just reserves enough resource to satisfy the current outstanding requirement? That's OK if we can repeat it, but it looks like it will only do this once?

+ + rc = allocate_resource(&iomem_resource, r, r_size, r_min, + ULONG_MAX, PAGE_SIZE, NULL, NULL);

Does this need to be section aligned, with a section size?  Or is any old place 
OK?

+ + if (rc < 0) { + kfree(r); + goto out_0; + } + + balloon_stats.hotplug_start_paddr = r->start; + } + + spin_lock_irqsave(&balloon_lock, flags); + + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); + + for (i = 0; i < nr_pages; ++i, ++pfn) + frame_list[i] = pfn; + + set_xen_guest_handle(reservation.extent_start, frame_list); + reservation.nr_extents = nr_pages; + + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);


Allocating all the memory here seems sub-optimal.

+ + if (rc < 0) + goto out_1; + + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr + balloon_stats.hotplug_size); + + for (i = 0; i < rc; ++i, ++pfn) { + BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && + phys_to_machine_mapping_valid(pfn)); + set_phys_to_machine(pfn, frame_list[i]); + } + + balloon_stats.hotplug_size += rc << PAGE_SHIFT; + balloon_stats.current_pages += rc; + +out_1: + spin_unlock_irqrestore(&balloon_lock, flags); + +out_0: + return rc < 0 ? rc : rc != nr_pages; +} + +static void hotplug_allocated_memory(void)


Why is this done separately from the reservation/allocation from xen?

{ - unsigned long target = balloon_stats.target_pages; + int nid, ret; + struct memory_block *mem; + unsigned long pfn, pfn_limit; + + nid = memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr);

Is the entire reserved memory range guaranteed to be within one node?

I see that this function has multiple definitions depending on a number of config settings. Do we care about what definition it has?

+ + ret = add_registered_memory(nid, balloon_stats.hotplug_start_paddr, + balloon_stats.hotplug_size); + + if (ret) { + pr_err("%s: add_registered_memory: Memory hotplug failed: %i\n", + __func__, ret); + goto error; + } + + if (xen_pv_domain()) { + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn + (balloon_stats.hotplug_size >> PAGE_SHIFT); - target = min(target, - balloon_stats.current_pages + - balloon_stats.balloon_low + - balloon_stats.balloon_high); + for (; pfn < pfn_limit; ++pfn) + if (!PageHighMem(pfn_to_page(pfn))) + BUG_ON(HYPERVISOR_update_va_mapping( + (unsigned long)__va(pfn << PAGE_SHIFT), + mfn_pte(pfn_to_mfn(pfn), PAGE_KERNEL), 0)); + } + + ret = online_pages(PFN_DOWN(balloon_stats.hotplug_start_paddr), + balloon_stats.hotplug_size >> PAGE_SHIFT);

Are the pages available for allocation by the rest of the kernel from this point on?

Which function allocates the actual page structures?

+ + if (ret) { + pr_err("%s: online_pages: Failed: %i\n", __func__, ret); + goto error; + } + + pfn = PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn + (balloon_stats.hotplug_size >> PAGE_SHIFT); - return target; + for (; pfn < pfn_limit; pfn += PAGES_PER_SECTION) { + mem = find_memory_block(__pfn_to_section(pfn)); + BUG_ON(!mem); + BUG_ON(!present_section_nr(mem->phys_index)); + mutex_lock(&mem->state_mutex); + mem->state = MEM_ONLINE; + mutex_unlock(&mem->state_mutex); + }

What does this do?  How is it different from what online_pages() does?

+ + goto out; + +error: + balloon_stats.current_pages -= balloon_stats.hotplug_size >> PAGE_SHIFT; + balloon_stats.target_pages -= balloon_stats.hotplug_size >> PAGE_SHIFT; + +out: + balloon_stats.hotplug_start_paddr = 0; + balloon_stats.hotplug_size = 0; } +#else +static inline u64 is_memory_resource_reserved(void) +{ + return 0; +} + +static inline int allocate_additional_memory(unsigned long nr_pages) +{ + /* + * CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not set. + * balloon_stats.target_pages could not be bigger + * than balloon_stats.current_pages because additional + * memory allocation is not possible. + */ + balloon_stats.target_pages = balloon_stats.current_pages; + + return 0; +} + +static inline void hotplug_allocated_memory(void) +{ +} +#endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */ static int increase_reservation(unsigned long nr_pages) { @@ -236,7 +400,7 @@ static int increase_reservation(unsigned long nr_pages) set_phys_to_machine(pfn, frame_list[i]); /* Link back into the page tables if not highmem. */ - if (pfn < max_low_pfn) { + if (xen_pv_domain() && !PageHighMem(page)) { int ret; ret = HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), @@ -286,7 +450,7 @@ static int decrease_reservation(unsigned long nr_pages) scrub_page(page); - if (!PageHighMem(page)) { + if (xen_pv_domain() && !PageHighMem(page)) { ret = HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), __pte_ma(0), 0); @@ -334,9 +498,15 @@ static void balloon_process(struct work_struct *work) mutex_lock(&balloon_mutex); do { - credit = current_target() - balloon_stats.current_pages; - if (credit > 0) - need_sleep = (increase_reservation(credit) != 0); + credit = balloon_stats.target_pages - balloon_stats.current_pages; + + if (credit > 0) { + if (balloon_stats.balloon_low || balloon_stats.balloon_high) + need_sleep = (increase_reservation(credit) != 0); + else + need_sleep = (allocate_additional_memory(credit) != 0); + } + if (credit < 0) need_sleep = (decrease_reservation(-credit) != 0); @@ -347,8 +517,10 @@ static void balloon_process(struct work_struct *work) } while ((credit != 0) && !need_sleep); /* Schedule more work if there is some still to be done. */ - if (current_target() != balloon_stats.current_pages) + if (balloon_stats.target_pages != balloon_stats.current_pages) mod_timer(&balloon_timer, jiffies + HZ); + else if (is_memory_resource_reserved()) + hotplug_allocated_memory();

Why can't this be done in allocate_additional_memory()?

mutex_unlock(&balloon_mutex); } @@ -405,17 +577,27 @@ static int __init balloon_init(void) unsigned long pfn; struct page *page; - if (!xen_pv_domain()) + if (!xen_domain()) return -ENODEV; pr_info("xen_balloon: Initialising balloon driver.\n"); - balloon_stats.current_pages = min(xen_start_info->nr_pages, max_pfn); + if (xen_pv_domain()) + balloon_stats.current_pages = min(xen_start_info->nr_pages, max_pfn); + else + balloon_stats.current_pages = max_pfn; + balloon_stats.target_pages = balloon_stats.current_pages; balloon_stats.balloon_low = 0; balloon_stats.balloon_high = 0; balloon_stats.driver_pages = 0UL; +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + balloon_stats.boot_max_pfn = max_pfn; + balloon_stats.hotplug_start_paddr = 0; + balloon_stats.hotplug_size = 0; +#endif + init_timer(&balloon_timer); balloon_timer.data = 0; balloon_timer.function = balloon_alarm; @@ -423,11 +605,12 @@ static int __init balloon_init(void) register_balloon(&balloon_sysdev); /* Initialise the balloon with excess memory space. */ - for (pfn = xen_start_info->nr_pages; pfn < max_pfn; pfn++) { - page = pfn_to_page(pfn); - if (!PageReserved(page)) - balloon_append(page); - } + if (xen_pv_domain()) + for (pfn = xen_start_info->nr_pages; pfn < max_pfn; pfn++) { + page = pfn_to_page(pfn); + if (!PageReserved(page)) + balloon_append(page); + } target_watch.callback = watch_target; xenstore_notifier.notifier_call = balloon_init_watcher; diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 35b07b7..37f1894 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -202,6 +202,7 @@ static inline int is_mem_section_removable(unsigned long pfn, } #endif /* CONFIG_MEMORY_HOTREMOVE */ +extern int add_registered_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); extern int remove_memory(u64 start, u64 size); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index be211a5..48a65bb 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -481,22 +481,13 @@ static void rollback_node_hotadd(int nid, pg_data_t *pgdat) return; } - /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory(int nid, u64 start, u64 size) +static int __ref __add_memory(int nid, u64 start, u64 size) { pg_data_t *pgdat = NULL; int new_pgdat = 0; - struct resource *res; int ret; - lock_system_sleep(); - - res = register_memory_resource(start, size); - ret = -EEXIST; - if (!res) - goto out; - if (!node_online(nid)) { pgdat = hotadd_new_pgdat(nid, start); ret = -ENOMEM; @@ -533,11 +524,45 @@ error: /* rollback pgdat allocation and others */ if (new_pgdat) rollback_node_hotadd(nid, pgdat); - if (res) - release_memory_resource(res); out: + return ret; +} + +int __ref add_registered_memory(int nid, u64 start, u64 size) +{ + int ret; + + lock_system_sleep(); + ret = __add_memory(nid, start, size); unlock_system_sleep(); + + return ret; +} +EXPORT_SYMBOL_GPL(add_registered_memory); + +int __ref add_memory(int nid, u64 start, u64 size) +{ + int ret = -EEXIST; + struct resource *res; + + lock_system_sleep(); + + res = register_memory_resource(start, size); + + if (!res) + goto out; + + ret = __add_memory(nid, start, size); + + if (!ret) + goto out; + + release_memory_resource(res);

In your earlier, patch I think you made the firmware_map_add_hotplug() be specific to add_memory, but now you have it in __add_memory. Does it make a difference either way?

+ +out: + unlock_system_sleep(); + return ret; } EXPORT_SYMBOL_GPL(add_memory);

As before, this all looks reasonably good. I think the next steps should be:

  1. identify how to incrementally allocate the memory from Xen, rather
     than doing it at hotplug time
  2. identify how to disable the sysfs online interface for Xen
     hotplugged memory

For 1., I think the code should be something like:

increase_address_space(unsigned long pages)
{
        - reserve resource for memory section
        - online section
        for each page in section {
                online page
                mark page structure allocated
                add page to ballooned_pages list
                balloon_stats.balloon_(low|high)++;
        }
}


The tricky part is making sure that the memory for the page structures has been populated so it can be used. Aside from that, there should be no need to have another call to HYPERVISOR_memory_op(XENMEM_populate_physmap, ...) aside from the existing one.

Or to look at it another way, memory hotplug is the mechanism for increasing the amount of available physical address space, but ballooning is the way to increase the number of allocated pages. They are orthogonal.


2 requires a deeper understanding of the existing hotplug code. It needs to be refactored so that you can use the core hotplug machinery without enabling the sysfs page-onlining mechanism, while still leaving it available for physical hotplug. In the short term, having a boolean to disable the onlining mechanism is probably the pragmatic solution, so the balloon code can simply disable it.

Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>