[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] shadow2 corrupting PV guest state


You (jeremy) said:
> I've been fighting random crashes in the paravirt tree for a while.  
> After a fair amount of head-banging, it  looks to me like the shadow2 
> code is trashing the guest stack (and maybe register state) at random 
> points.

  I have a question about shadow2 in another point of view.

  I've been porting PV-on-HVM driver for ia64 platform. In my jobs,
I had a doubt that shadow2 might occur a problem of memory corruption.

  At first, I had found the problem as a hypervisor crash during
destruction of HVM domain with active VNIF on ia64 platform. The
reason of crash was that hypervisor detected P2M table used by 
gnttab_copy in the HVM domain destruction. Thus I looked for a way
to avoid hypervisor crash in x86 code.

  So, I found that:

  * Before shadow2 age, x86 and ia64 use same logic for domain
    - at first, release gnttab references
    - destruct page table for VCPU
    - destruct P2M table for domain
    - relinquish memory for domain

  * After shadow2 age, x86 introduces delayed P2M table destruction.
    - release gnttab references
    - destruct page table for VCPU
    - relinquish memory for domain
    - destruct P2M table for domain in domain_destroy()
    *** I don't have confidence in my investigation. 
    *** Am I right ?

  I try to show the code that...

   203  void domain_kill(struct domain *d)
   204  {
   205      domain_pause(d);
   207      if ( test_and_set_bit(_DOMF_dying, &d->domain_flags) )
   208          return;
   210      gnttab_release_mappings(d);
   211      domain_relinquish_resources(d);
   212      put_domain(d);
   214      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   215  }

   930  void domain_relinquish_resources(struct domain *d)
   931  {
   932      struct vcpu *v;
   933      unsigned long pfn;
   937      /* Drop the in-use references to page-table bases. */
   938      for_each_vcpu ( d, v )
   979      /* Relinquish every page of memory. */
   980      relinquish_memory(d, &d->xenpage_list);
   981      relinquish_memory(d, &d->page_list);

  This is the code for domain_kill phase. I think that hypervisor
relinquishes memory for domain in this code.

  In the other hand...

   322  /* Release resources belonging to task @p. */
   323  void domain_destroy(struct domain *d)
   324  {
   325      struct domain **pd;
   326      atomic_t      old, new;
   354      arch_domain_destroy(d);
   356      free_domain(d);
   358      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   359  }

   237  void arch_domain_destroy(struct domain *d)
   238  {
   239      shadow_final_teardown(d);

  2580  void shadow_final_teardown(struct domain *d)
  2581  /* Called by arch_domain_destroy(), when it's safe to pull down the p2m
map. */
  2582  {
  2597      /* It is now safe to pull down the p2m map. */
  2598      if ( d->arch.shadow.p2m_pages != 0 )
  2599          shadow_p2m_teardown(d);

  In this code, P2M table are released.

  If my speculation is correct, shadow2 may occur a problem of memory

  What do you think about this point ?

- Tsunehisa Doi

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.