WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] shadow2 corrupting PV guest state

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] shadow2 corrupting PV guest state
From: Doi.Tsunehisa@xxxxxxxxxxxxxx
Date: Fri, 20 Oct 2006 22:42:39 +0900
Cc: Chris Wright <chrisw@xxxxxxxxxxxx>, Michael A Fetterman <Michael.Fetterman@xxxxxxxxxxxx>, Tim Deegan <Tim.Deegan@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 20 Oct 2006 09:40:30 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: Your message of Fri, 13 Oct 2006 16:27:42 -0700. <453020EE.4080603@xxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <453020EE.4080603@xxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi,

You (jeremy) said:
> I've been fighting random crashes in the paravirt tree for a while.  
> After a fair amount of head-banging, it  looks to me like the shadow2 
> code is trashing the guest stack (and maybe register state) at random 
> points.

  I have a question about shadow2 in another point of view.

  I've been porting PV-on-HVM driver for ia64 platform. In my jobs,
I had a doubt that shadow2 might occur a problem of memory corruption.

  At first, I had found the problem as a hypervisor crash during
destruction of HVM domain with active VNIF on ia64 platform. The
reason of crash was that hypervisor detected P2M table used by 
gnttab_copy in the HVM domain destruction. Thus I looked for a way
to avoid hypervisor crash in x86 code.

  So, I found that:

  * Before shadow2 age, x86 and ia64 use same logic for domain
    destruction.
    - at first, release gnttab references
    - destruct page table for VCPU
    - destruct P2M table for domain
    - relinquish memory for domain

  * After shadow2 age, x86 introduces delayed P2M table destruction.
    - release gnttab references
    - destruct page table for VCPU
    - relinquish memory for domain
    - destruct P2M table for domain in domain_destroy()
    *** I don't have confidence in my investigation. 
    *** Am I right ?

  I try to show the code that...

[common/domain.c]
   203  void domain_kill(struct domain *d)
   204  {
   205      domain_pause(d);
   206
   207      if ( test_and_set_bit(_DOMF_dying, &d->domain_flags) )
   208          return;
   209
   210      gnttab_release_mappings(d);
   211      domain_relinquish_resources(d);
   212      put_domain(d);
   213
   214      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   215  }

[arch/x86/domain.c]
   930  void domain_relinquish_resources(struct domain *d)
   931  {
   932      struct vcpu *v;
   933      unsigned long pfn;
       ....
   937      /* Drop the in-use references to page-table bases. */
   938      for_each_vcpu ( d, v )
       ....
   979      /* Relinquish every page of memory. */
   980      relinquish_memory(d, &d->xenpage_list);
   981      relinquish_memory(d, &d->page_list);
       ....

  This is the code for domain_kill phase. I think that hypervisor
relinquishes memory for domain in this code.

  In the other hand...

[common/domain.c]
   322  /* Release resources belonging to task @p. */
   323  void domain_destroy(struct domain *d)
   324  {
   325      struct domain **pd;
   326      atomic_t      old, new;
       ....
   354      arch_domain_destroy(d);
   355
   356      free_domain(d);
   357
   358      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   359  }

[arch/x86/domain.c]
   237  void arch_domain_destroy(struct domain *d)
   238  {
   239      shadow_final_teardown(d);
      ....

[arch/x86/mm/shadow/common.c]
  2580  void shadow_final_teardown(struct domain *d)
  2581  /* Called by arch_domain_destroy(), when it's safe to pull down the p2m
map. */
  2582  {
      ....
  2597      /* It is now safe to pull down the p2m map. */
  2598      if ( d->arch.shadow.p2m_pages != 0 )
  2599          shadow_p2m_teardown(d);

  In this code, P2M table are released.

  If my speculation is correct, shadow2 may occur a problem of memory
corruption.

  What do you think about this point ?

Thanks,
- Tsunehisa Doi

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel