WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] RE: [PATCH] [RFC] VT-d: always clean up dpci timers.

At 18:08 -0700 on 20 Jul (1311185311), Kay, Allen M wrote:
> Hi Tim,
> 
> Can you provide the code flow that can cause this failure?
> 
> In pci_release_devices(), pci_clean_dpci_irqs() is called before
> "d->need_iommu = 0" in deassign_device().  If this path is taken, then
> it should not return from !need_iommu(d) in pci_clean_dpci_irqs().

The problem is that the xl toolstack has already deassigned the domain's
devices, using a hypercall to invoke deassign_device(), so by the time
the domain is destroyed, pci_release_devices() can't tell that it once
had a PCI device passed through.

It seems like the Right Thing[tm] would be for deassign_device() to find
and undo the relevant IRQ plumbing but I couldn't see how.  Is there a
mapping from bdf to irq in the iommu code or are they handled entirely
separately?

Tim.

> Allen
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@xxxxxxxxxx] 
> Sent: Monday, July 18, 2011 9:39 AM
> To: xen-devel@xxxxxxxxxxxxxxxxxxx
> Cc: keir@xxxxxxx; Kay, Allen M
> Subject: [PATCH] [RFC] VT-d: always clean up dpci timers.
> 
> If a VM has all its PCI devices deassigned, need_iommu(d) becomes false
> but it might still have DPCI EOI timers that were init_timer()d but not
> yet kill_timer()d.  That causes xen to crash later because the linked
> list of inactive timers gets corrupted, e.g.:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82c480126256>] set_timer+0x1c2/0x24f
> (XEN)    [<ffff82c48011fbf8>] schedule+0x129/0x5dd
> (XEN)    [<ffff82c480122c1e>] __do_softirq+0x7e/0x89
> (XEN)    [<ffff82c480122c9d>] do_softirq+0x26/0x28
> (XEN)    [<ffff82c480153c85>] idle_loop+0x5a/0x5c
> (XEN)    
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'entry->next->prev == entry' failed at 
> /local/scratch/tdeegan/xen-unstable.hg/xen/include:172
> (XEN) ****************************************
> 
> The following patch makes sure that the domain destruction path always
> clears up the DPCI state even if !needs_iommu(d). 
> 
> Although it fixes the crash for me, I'm sufficiently confused by this
> code that I don't know whether it's enough.  If the dpci timer state
> gets freed earlier than pci_clean_dpci_irqs() then there's still a race,
> and some other function (reassign_device_ownership() ?) needs to sort
> out the timers when the PCI card is deassigned.
> 
> Allen, can you comment?
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@xxxxxxxxxx>
> 
> diff -r ab6551e30841 xen/drivers/passthrough/pci.c
> --- a/xen/drivers/passthrough/pci.c   Mon Jul 18 10:59:44 2011 +0100
> +++ b/xen/drivers/passthrough/pci.c   Mon Jul 18 17:22:48 2011 +0100
> @@ -269,7 +269,7 @@ static void pci_clean_dpci_irqs(struct d
>      if ( !iommu_enabled )
>          return;
>  
> -    if ( !is_hvm_domain(d) || !need_iommu(d) )
> +    if ( !is_hvm_domain(d) )
>          return;
>  
>      spin_lock(&d->event_lock);
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel