WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Problems with MSI interrupts

To: <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Problems with MSI interrupts
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Wed, 3 Aug 2011 13:05:02 +0100
Delivery-date: Wed, 03 Aug 2011 05:06:15 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4E393632.4020300@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4E393632.4020300@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11

On 03/08/11 12:51, Andrew Cooper wrote:
> Hello,
>
> I am currently investigating an issue with MSI allocation/deallocation
> which appears to be an MSI resource leak in Xen.  This is XenServer 6.0
> based on Xen 4.1.1, with no changesets I can see affecting the relevant
> Xen codepaths.
>
> The box in question is a Netscalar SDX box with 24 logical cores (2
> Nehalem sockets , 6 cores , hyperthreading), 96GB RAM, with 4 dual-port
> Intel 10G ixgbe cards, (and two SSL 'Xcelerator' cards, but I have
> disabled these for debugging purposes).  Each of the 8 NIC ports exports
> 40 virtual functions.  There are 40 (identical) VMs which have 1 VF from
> each NIC passed through to them, giving each VM 8 VFs.  Each VF itself
> uses 3 MSI-X interrupts.  Therefore, for all VMs to be working
> correctly, there are 3irqs per VF for 8 VFs for 40 VMs = 960 MSI-X
> interrupts.
>
> The symptoms are: Reboot the VMs a couple of times, and eventually Xen
> says "(XEN) ../physdev.c:140: domXXX: can't create irq for msi!".  After
> adding extra debugging, the call call to create_irq() was returning
> -ENOSPC.  At the point at which create_irq() was failing, there were
> huge numbers of irqs listed with the debugkeys 'i' with a descriptor
> affinity mask of all cpus, which I believe is interfering with the
> calculations in __assign_irq_vector().
>
> I suspected that this might be because of scheduling under load swapping
> VCPUs across PCPUs, resulting in the irq descriptor being written into
> all PCPU IDTs.  As a result, I pinned each VM to a specific PCPU in the
> hope that this would go away.
>
> When starting each VM individually, the problem appears to go away. 
> However, when starting all VMs at once, there are still some irqs with
> an affinity mask of all CPUs.
>
> Specifically, one case is this:  (I added extra debugging to put
> irq_cfg->cpu_mask into the 'i' debugkeys)
>
> (XEN)    IRQ: 845 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000010 vec:7e type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 55(----),
> (XEN)    IRQ: 846 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:86 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 54(----),
> (XEN)    IRQ: 847 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:96 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 53(----),
> (XEN)    IRQ: 848 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:be type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 52(----),
> (XEN)    IRQ: 849 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:c6 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 51(----),
> (XEN)    IRQ: 850 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:ce type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 50(----),
> (XEN)    IRQ: 851 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:b7 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 49(----),
> (XEN)    IRQ: 852 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:cf type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 48(----),
> (XEN)    IRQ: 853 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:d7 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 47(----),
> (XEN)    IRQ: 854 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:d9 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 46(----),
> (XEN)    IRQ: 855 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:22 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 45(----),
> (XEN)    IRQ: 856 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:2a type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 44(----),
> (XEN)    IRQ: 857 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000010 vec:3c type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 43(----),
> (XEN)    IRQ: 858 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:4c type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 42(----),
> (XEN)    IRQ: 859 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:54 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 41(----),
> (XEN)    IRQ: 860 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:b5 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 40(----),
> (XEN)    IRQ: 861 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:ae type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 39(----),
> (XEN)    IRQ: 862 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:de type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 38(----),
> (XEN)    IRQ: 863 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000010 vec:55 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 37(----),
> (XEN)    IRQ: 864 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:9d type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 36(----),
> (XEN)    IRQ: 865 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:46 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 35(----),
> (XEN)    IRQ: 866 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:a6 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 34(----),
> (XEN)    IRQ: 867 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:5f type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 33(----),
> (XEN)    IRQ: 868 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff
> cfg_aff:00000000,00000000,00000000,00000020 vec:7f type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=34: 32(----),
>
> Shows all irqs for dom34.  The descriptors have full affinity, but the
> irq_cfg has a cpu_mask between processor 8 and 9.
>
> The domain dump for dom34 is
> (XEN) General information for domain 34:
> (XEN)     refcnt=3 dying=0 nr_pages=131065 xenheap_pages=8 dirty_cpus={}
> max_pages=133376
> (XEN)     handle=97ef6eef-69c2-024c-1bbb-a150ca668691 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts translate external
> (XEN) Rangesets belonging to domain 34:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { 32-55 }
> (XEN)     I/O Memory { f9f00-f9f03, fa001-fa003, fa19c-fa19f,
> fa29d-fa29f, fa39c-fa39f, fa49d-fa49f, fa59c-fa59f, fa69d-fa69f,
> fa79c-fa79f, fa89d-fa89f, fa99c-fa99f, faa9d-faa9f, fab9c-fab9f,
> fac9d-fac9f, fad9c-fad9f, fae9d-fae9f }
> (XEN) Memory pages belonging to domain 34:
> (XEN)     DomPage list too long to display
> (XEN)     P2M entry stats:
> (XEN)      L1:     1590 entries, 6512640 bytes
> (XEN)      L2:      253 entries, 530579456 bytes
> (XEN)     PoD entries=0 cachesize=0 superpages=0
> (XEN)     XenPage 00000000001146e1: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000001146e0: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000001146df: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000001146de: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 00000000000bdc0e: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 0000000000114592: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 000000000011458f: caf=c000000000000001,
> taf=7400000000000001
> (XEN)     XenPage 000000000011458c: caf=c000000000000001,
> taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 34:
> (XEN)     VCPU0: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00,
> upcall_mask = 00 dirty_cpus={} cpu_affinity={3}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN)     VCPU1: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00,
> upcall_mask = 00 dirty_cpus={3} cpu_affinity={3}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
>
> Showing that this domain is actually pinned to pcpu 3.
>
> Am I mis-interpreting the information, or does this indicate that the
> scheduler (credit) is not obeying the cpu_affinity?  The virtual
> functions seem to be passing network traffic correctly so I would assume
> that interrupts are getting where they are supposed to be going.
>
>
> Another question which may or may not be related.  cpu_cfg has a vector
> and a cpu_mask.  From this, I assume that the same interrupt must occupy
> the same IDT entry for every pcpu it might be received on.  Is there an
> architectural reason why this should be the case, or is it just the way
> Xen is coded?
>
> (Also, it seems that <asm/irq.h> and <xen/irq.h> both define struct
> irq_cfg and while one is strictly an extension of the other, there
> appears to be no guards around them meaning that sizeof(irq_cfg) depends
> on which header file you include.  I don't know if this is relevant or
> not, but it strikes me that code getting confused as to which they are
> using could be computing on junk if it is expecting the longer irq_cfg
> and actually getting the shorter irq_cfg.
Correction - I wasn't reading the source closely enough.  There are
#ifdef __ia64__ guards around this.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>