RE: [Xen-devel] [PATCH] Re: SMP dom0 with 8 cpus of i386

To:	"Kamble, Nitin A" <nitin.a.kamble@xxxxxxxxx>, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>
Subject:	RE: [Xen-devel] [PATCH] Re: SMP dom0 with 8 cpus of i386
From:	"Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date:	Thu, 1 Sep 2005 01:15:22 +0100
Cc:	xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date:	Thu, 01 Sep 2005 00:13:22 +0000
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcWtPJLK72L6LhUaRXe+r95yGbS2fAASAyNAAA8A6hAAMhKaAA==
Thread-topic:	[Xen-devel] [PATCH] Re: SMP dom0 with 8 cpus of i386

 
Is this PAE or non-PAE?

Please can you try forceing emulation mode by toggling the "#if 0" in
arch/x86/mm.c ptwr_do_page_fault

The other thing to try is modifying set_pte_pfn_ma to call xen_l1_update
rather than set_pte. You could try set_pte_at too.

This will help narrow down the issue.

Thanks,
Ian

> Keir, Ian,
>    With PCI mmconfig option on, and with the PCI express 
> enabled BIOS, the dom0 kernel reads the PCI config from 
> fix-mapped PCI mmconfig space.
>    The PCI mmconfig space is of 256MB size, and it's access 
> is implemented differently on i386 & x86_64. On x86_64 the 
> whole 256MB is mapped in the Kernel virtual address space. On 
> i386 it will consume too much of the kernels virtual address 
> space, hence it is implemented using a single fix-mapped 
> page. This page is mapped to the desired physical address for 
> every PCI mmconfig access, as seen in the following code from 
> mmconfig.c .
> 
> static inline void pci_exp_set_dev_base(int bus, int devfn) {
>     u32 dev_base = pci_mmcfg_base_addr | (bus << 20) | (devfn << 12);
>     if (dev_base != mmcfg_last_accessed_device) {
>         mmcfg_last_accessed_device = dev_base;
>         set_fixmap_nocache(FIX_PCIE_MCFG, dev_base);
>     }
> }
> 
> static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
>               unsigned int devfn, int reg, int len, u32 *value) {
>     unsigned long flags;
> 
>     if (!value || (bus > 255) || (devfn > 255) || (reg > 4095))
>         return -EINVAL;
> 
>     spin_lock_irqsave(&pci_config_lock, flags);
> 
>     pci_exp_set_dev_base(bus, devfn);
> 
>     switch (len) {
> 
>    At the time of boot the PCI mmconfig space is accessed 
> thousands times, one after another; that causes fixed map & 
> unmap continuously very fast for a long time. Currently the 
> fix-mapped virtual address for Shared_info_page for dom0 & 
> the PCI mmconfig page are adjacent in the fixed_addresses in 
> the fixedmap.h.
> 
> #ifdef CONFIG_PCI_MMCONFIG
>     FIX_PCIE_MCFG,
> #endif
>     FIX_SHARED_INFO,
>     FIX_GNTTAB_BEGIN,
> 
>    I am suspecting that this is causing a race condition 
> because of writable page tables. While accessing the PCI 
> mmconfig on i386 the dom0 kernel (cpu 0) is continuously 
> rewriting into the pte for FIX_PCIE_MCFG at a very fast rate. 
> With writable page tables the updates to ptes are deferred. 
> In the SMP case other CPUs are getting the interrupts (timer) 
> at the same time, interrupts handlers access the shared_info 
> page to notify the dom0 of the events such as timer event. 
> The problem possibly is that because of the writable page 
> tables, the L1 page is getting evicted during the mmconfig 
> access, and the shared_page translation needed for event 
> notification is also in the same L1 page. All the cpus are 
> using the same page tables at this time. While writing the pte, the
> L2 page is getting cut off from the page table. This is 
> somehow causing corruption in the dom0 page tables, and we 
> see the errors.
>     I belive this issue is not on x86_64 because each 
> mmconfig access does not map/unmap fixmap, and the racing 
> condition accessing the l2 page is not there.
>    The current work around working for me is to disable 
> PCI_MMCONFIG for
> i386 in the xen0 kernel config. Today or later other people 
> will also notice this corruption on SMP boxes with SNMP dom0. 
> I can see it once in a while on a 4 way box. 
> 
> Can we disable PCI_MMCONFIG for i386 in the xen0 config till 
> we solve the race condition issue? Attached is the patch for 
> the config.
>    As I have a workaround and I am seeing issues with VMX 
> guests, I am trying to fix those issues now.
> 
> Thanks & Regards,
> Nitin
> --------------------------------------------------------------
> ----------
> -----------
> Sr Software Engineer
> Open Source Technology Center, Intel Corp -----Original Message-----
> From: Kamble, Nitin A
> Sent: Tuesday, August 30, 2005 10:06 AM
> To: Keir Fraser
> Cc: xen-devel
> Subject: RE: [Xen-devel] Re: SMP dom0 with 8 cpus of i386
> 
> > Default but with smp enabled.
> Same here. I am seeing the issue inconsistently on a 4 way 
> box. 8 way system does not have any issue with maxcpus=1. 
> with 8 cpus it is consistent. More no of cpus are causing 
> some corruption. It is always happening at the time of 
> reading/writing the pci mmconfig space.
>   I am debugging here. 
> 
> Thanks & Regards,
> Nitin
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] [PATCH] Re: SMP dom0 with 8 cpus of i386