WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH] Fix cpu online/offline bug: mce memory leak.

To: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] Re: [PATCH] Fix cpu online/offline bug: mce memory leak.
From: Keir Fraser <keir.xen@xxxxxxxxx>
Date: Tue, 01 Mar 2011 10:47:45 +0000
Cc: Haitao Shan <maillists.shan@xxxxxxxxx>
Delivery-date: Tue, 01 Mar 2011 02:48:33 -0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:user-agent:date:subject:from:to:cc:message-id :thread-topic:thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; bh=Itx7FeywMrRpuj0ANKJZcQbvETN/XhgcBCnvVY5G7+M=; b=eL5aiB+otSl67uf8p5Vg675Trb6avpKY5639Z4qP2ALTWhlcLJ65PfavBMsaqyEgtj 3FG5aJI2mpQs43iqJcrwGiM7HPJRKxAjlzc7j7ZgBrkFslslogWGGuxsYL68lnRKWJSs fLZQu5lwOnc1gQUHyjG6dWUV2P8AdA+dtnNn8=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=user-agent:date:subject:from:to:cc:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=hxWea8FgVmMNsNf53QIBSH1wVvcrT/4ous4gAoy20Qhfrf8Mq9z+dVcs9QhU9LGtGD dO1hQl8kV8ZROz6pTcjBh8eSbeI0WsksCvFLWmyM5YyR62MH9H5W3JQSZMwG+7Z/8s17 gJljZMWLzkSqumi3K7ZgS4zV2KVijfTp9ZaaA=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C9926D37.13F59%keir.xen@xxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcvX8FtN6tarrAIBRGq0NWDWPv4e2AAAvOL4AAKyhfk=
Thread-topic: [PATCH] Fix cpu online/offline bug: mce memory leak.
User-agent: Microsoft-Entourage/12.28.0.101117
A further comment: Please remove the presmp_initcall(intel_mce_initcall)
altogether. It will be causing unnecessary memory allocations even on
non-Intel CPUs. Instead, I suggest to register the cpu_notifier from
intel_mcheck_init(), at the same time you allocate the mcabanks for CPU0.
This will ensure that the CMCI/allocation work in the notifier is only
attempted for suitably MCA-capable Intel processors.

Um, also, you should fail CPU_UP_PREPARE if cpu_mcabank_alloc() fails. CPU
bringup will probably fail anyway if you are that low on memory. I don't
think it makes sense to struggle on with hobbled MCA support for the new
processor in this case. If cpu_mcabank_alloc() fails for CPU0, called from
intel_mcheck_init() you can quite reasonably BUG() on that.

 -- Keir

On 01/03/2011 09:30, "Keir Fraser" <keir.xen@xxxxxxxxx> wrote:

> Some comments inline below. Overall a good patch to have, but needs some
> minor adjustments!
> 
>  -- Keir
> 
> On 01/03/2011 09:09, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> 
>> Fix cpu online/offline bug: mce memory leak.
>> 
>> Current Xen mce logic didn't free mcabanks. This would be a memory leak when
>> cpu offline.
>> When repeatly do cpu online/offline, this memory leak would make xenpool
>> shrink, and at a time point,
>> will call alloc_heap_pages --> flush_area_mask, which
>> ASSERT(local_irq_is_enabled()).
>> However, cpu online is irq disable, so it finally result in Xen crash.
>> 
>> This patch fix the memory leak bug, and tested OK over 110,000 round cpu
>> online/offline.
>> 
>> Signed-off-by: Liu, Jinsong <jinsong.liu@xxxxxxxxx>
>> 
>> diff -r 1a364b17d66a xen/arch/x86/cpu/mcheck/mce_intel.c
>> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c Fri Feb 25 01:26:01 2011 +0800
>> +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c Mon Feb 28 19:19:20 2011 +0800
>> @@ -1227,9 +1227,24 @@ static void intel_init_mce(void)
>>      mce_uhandler_num = sizeof(intel_mce_uhandlers)/sizeof(struct
>> mca_error_handler);
>>  }
>>  
>> -static int intel_init_mca_banks(void)
>> +static void cpu_mcabank_free(unsigned int cpu)
>>  {
>> -    struct mca_banks *mb1, *mb2, * mb3;
>> +    struct mca_banks *mb1, *mb2, *mb3, *mb4;
>> +
>> +    mb1 = per_cpu(mce_clear_banks, cpu);
>> +    mb2 = per_cpu(no_cmci_banks, cpu);
>> +    mb3 = per_cpu(mce_banks_owned, cpu);
>> +    mb4 = per_cpu(poll_bankmask, cpu);
>> +
>> +    mcabanks_free(mb1);
>> +    mcabanks_free(mb2);
>> +    mcabanks_free(mb3);
>> +    mcabanks_free(mb4);
>> +}
>> +
>> +static void cpu_mcabank_alloc(unsigned int cpu)
>> +{
>> +    struct mca_banks *mb1, *mb2, *mb3;
>>  
>>      mb1 = mcabanks_alloc();
>>      mb2 = mcabanks_alloc();
>> @@ -1237,22 +1252,23 @@ static int intel_init_mca_banks(void)
>>      if (!mb1 || !mb2 || !mb3)
>>          goto out;
>>  
>> -    __get_cpu_var(mce_clear_banks) = mb1;
>> -    __get_cpu_var(no_cmci_banks) = mb2;
>> -    __get_cpu_var(mce_banks_owned) = mb3;
>> +    per_cpu(mce_clear_banks, cpu) = mb1;
>> +    per_cpu(no_cmci_banks, cpu) = mb2;
>> +    per_cpu(mce_banks_owned, cpu) = mb3;
>> +    return;
>>  
>> -    return 0;
>>  out:
>>      mcabanks_free(mb1);
>>      mcabanks_free(mb2);
>>      mcabanks_free(mb3);
>> -    return -ENOMEM;
>>  }
>>  
>>  /* p4/p6 family have similar MCA initialization process */
>>  enum mcheck_type intel_mcheck_init(struct cpuinfo_x86 *c)
>>  {
>> -    if (intel_init_mca_banks())
>> +    if ( !this_cpu(mce_clear_banks) ||
>> +         !this_cpu(no_cmci_banks)   ||
>> +         !this_cpu(mce_banks_owned) )
>>           return mcheck_none;
>>  
>>      intel_init_mca(c);
>> @@ -1301,13 +1317,19 @@ static int cpu_callback(
>>  static int cpu_callback(
>>      struct notifier_block *nfb, unsigned long action, void *hcpu)
>>  {
>> +    unsigned int cpu = (unsigned long)hcpu;
>> +
>>      switch ( action )
>>      {
>> +    case CPU_UP_PREPARE:
>> +        cpu_mcabank_alloc(cpu);
>> +        break;
>>      case CPU_DYING:
>>          cpu_mcheck_disable();
>>          break;
>>      case CPU_DEAD:
>>          cpu_mcheck_distribute_cmci();
>> +        cpu_mcabank_free(cpu);
>>          break;
> 
> You also need to cpu_mcabank_free(cpu) on CPU_UP_CANCELED. Else there's a
> memory leak on failed CPU bringup.
> 
>>      default:
>>          break;
>> @@ -1322,6 +1344,8 @@ static struct notifier_block cpu_nfb = {
>>  
>>  static int __init intel_mce_initcall(void)
>>  {
>> +    void *hcpu = (void *)(long)smp_processor_id();
>> +    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, hcpu);
>>      register_cpu_notifier(&cpu_nfb);
>>      return 0;
>>  }
>> diff -r 1a364b17d66a xen/arch/x86/setup.c
>> --- a/xen/arch/x86/setup.c Fri Feb 25 01:26:01 2011 +0800
>> +++ b/xen/arch/x86/setup.c Mon Feb 28 19:19:20 2011 +0800
>> @@ -1203,6 +1203,8 @@ void __init __start_xen(unsigned long mb
>>  
>>      arch_init_memory();
>>  
>> +    do_presmp_initcalls();
>> +
>>      identify_cpu(&boot_cpu_data);
>>      if ( cpu_has_fxsr )
>>          set_in_cr4(X86_CR4_OSFXSR);
>> @@ -1235,8 +1237,6 @@ void __init __start_xen(unsigned long mb
>>      initialize_keytable();
>>  
>>      console_init_postirq();
>> -
>> -    do_presmp_initcalls();
> 
> This looks risky, especially so close to 4.1 release. Instead of moving
> do_presmp_initcalls(), please special-case cpu_mcabank_alloc() for CPU0 in
> intel_mcheck_init(), and remove your direct call to
> cpu_callback(...CPU_UP_PREPARE...) in intel_mce_initcall().
> 
>>      for_each_present_cpu ( i )
>>      {
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel