WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] RE: Xen 4.1 rc1 test report

To: "Zheng, Shaohui" <shaohui.zheng@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] RE: Xen 4.1 rc1 test report
From: "Wei, Gang" <gang.wei@xxxxxxxxx>
Date: Tue, 25 Jan 2011 22:05:21 +0800
Accept-language: zh-CN, en-US
Acceptlanguage: zh-CN, en-US
Cc: "Wei, Gang" <gang.wei@xxxxxxxxx>
Delivery-date: Tue, 25 Jan 2011 06:06:36 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <A24AE1FFE7AEC5489F83450EE98351BF2BF2EC4CB0@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <A24AE1FFE7AEC5489F83450EE98351BF2BF2EC4CB0@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acu6GEBstpnTfIH/TdeQZvf0FjUZ0QAOlF+wAI5FPEA=
Thread-topic: Xen 4.1 rc1 test report
Zheng, Shaohui wrote on 2011-01-23:
>2. [VT-d]xen panic on function do_IRQ after many times NIC pass-throu (Intel)
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1706

I may need some help on this bug. Below are my findings.

According the call trace, just got the fault code point is at the last line of 
below code segment.
--------------------
__do_IRQ_guest(...)
        for ( i = 0; i < action->nr_guests; i++ )
        d = action->guest[i];
        pirq = domain_irq_to_pirq(d, irq);
===========
Fatal page fault while access ((d)->arch.irq_pirq[irq]), because 
(d)->arch.irq_pirq is already NULL.

More experiments shows that while doing the one before last 'xl create', 
pciback could not locate the device to be assigned:
---------------------
[ 4802.773665] pciback pci-26-0: 22 Couldn't locate PCI device 
(0000:05:00.0)!perhaps already in-use?
============

And while doing the following 'xl destroy', device model didn't response:
---------------------
libxl: error: libxl_device.c:477:libxl__wait_for_device_model Device Model not 
ready
libxl: error: libxl_pci.c:866:do_pci_remove Device Model didn't respond in time
============

In the immediate 'xl debug i' output, we can see the guest pirqs of the 
assigned device were not unbound from the host irq desc.
---------------------
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:a8 
type=IO-APIC-level status=00000050 in-flight=0 domain-list=0: 16(-S--),1: 
16(----),
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000004 vec:ba 
type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 55(----),
============

The unbound guest domain info(which is already destroy while 'xl destroy') then 
induces the null address access while there comes a spurious interrupt for that 
device.

There are three points we may need to do: 
1. Figure out the root cause why the pciback could not locate the device.
I suspect the previous 'xl destroy' didn't return the device to pcistub 
successfully.

2. Figure out the root cause why the guest pirq was not force unbound.
Just found:
Some time because if ( !IS_PRIV_FOR(current->domain, d) ) hit, so returned with 
-EINVAL;
Sometime if ( !(desc->status & IRQ_GUEST) ) hit, so do not unbind.

3. Think about how we could prevent such cases from panic Xen.

Any ideas, hints, comments, suggestions or even fixes on it?

Jimmy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel