WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadca

To: xen-devel@xxxxxxxxxxxxxxxxxxx, JBeulich@xxxxxxxxxx
Subject: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast
From: Andreas Kinzler <ml-xen-devel@xxxxxx>
Date: Thu, 09 Sep 2010 11:20:51 +0200
Cc:
Delivery-date: Thu, 09 Sep 2010 02:22:04 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
I am talking a while (via email) with Jan now to track the following problem and he suggested that I report the problem on xen-devel:

Jul 9 01:48:04 virt kernel: aacraid: Host adapter reset request. SCSI hang ?
Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
Jul  9 01:49:10 virt kernel: Calling adapter init
Jul 9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not guaranteed on shared IRQs
Jul  9 01:49:49 virt kernel: Acquiring adapter information
Jul  9 01:49:49 virt kernel: update_interval=30:00 check_interval=86400s
Jul 9 01:53:13 virt kernel: aacraid: aac_fib_send: first asynchronous command timed out. Jul 9 01:53:13 virt kernel: Usually a result of a PCI interrupt routing problem; Jul 9 01:53:13 virt kernel: update mother board BIOS or consider utilizing one of
Jul  9 01:53:13 virt kernel: the SAFE mode kernel options (acpi, apic etc)

After the VMs have been running a while the aacraid driver reports a non-responding RAID controller. Most of the time the NIC is also no longer working. I nearly tried every combination of dom0 kernel (pvops0, xenfied suse 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable. No success in two month. Every combination earlier or later had the problem shown above. I did extensive tests to make sure that the hardware is OK. And it is - I am sure it is a Xen/dom0 problem.

Jan suggested to try the fix in c/s 22051 but it did not help. My answer to him:

> In the meantime I did try xen-unstable c/s 22068 (contains staging c/s 22051) and > it did not fix the problem at all. I was able to fix a problem with the serial console > and so I got some debug info that is attached to this email. The following line looks
> suspicious to me (irr=1, delivery_status=1):

> (XEN)     IRQ 16 Vec216:
> (XEN) Apic 0x00, Pin 16: vector=216, delivery_mode=1, dest_mode=logical, > delivery_status=1, polarity=1, irr=1, trigger=level, mask=0, dest_id:1

> IRQ 16 is the aacraid controller which after some while seems to be enable to receive
> interrupts. Can you see from the debug info what is going on?

I also applied a small patch which disables HPET broadcast. The machine is now running for 110 hours without a crash while normally it crashes within a few minutes. Is there something wrong (race, deadlock) with HPET broadcasts in relation to blocked interrupt
reception (see above)?

Andreas

Attachment: xen-nohpet-broadcast.patch
Description: Text document

Attachment: jan-debugkeys.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel