WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-devel] pci-passthrough in pvops causing offline raid

To: Mark Adams <mark@xxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] pci-passthrough in pvops causing offline raid
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Thu, 11 Nov 2010 12:58:09 -0500
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 11 Nov 2010 10:00:24 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20101111173850.GA8756@xxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <20101111102416.GA32457@xxxxxxxxxxxxxxxxxx> <20101111165340.GB30006@xxxxxxxxxxxx> <20101111173850.GA8756@xxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.20 (2009-06-14)
On Thu, Nov 11, 2010 at 05:38:50PM +0000, Mark Adams wrote:
> On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
> > > Hi All,
> > > 
> > > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
> > > 
> > > In a voip setup, where I have forwarded the onboard NIC interfaces
> > > through to domU using the following grub config:
> > > 
> > > module  /vmlinuz-2.6.32-5-xen-amd64 placeholder 
> > > root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet 
> > > xen-pciback.permissive xen-pciback.hide=(02:00.0)(03:00.0) 
> > > pci=resource_alignment=02:00.0;03:00.0
> > > 
> > > I'm having a serious issue where the raid card goes offline after an
> > > indefinate period of time. Sometimes runs fine for a week, other times 1
> > > day before I get "offline device" errors. Rebooting the machine fixes it
> > > straight away, and everything is back online.
> > > 
> > > What in the Xen pciback is causing the raid card to go offline? The
> > > only devices hidden are the 2 onboard NIC's.
> > 
> > You need to give more details. Is the RAID card a 3Ware? An LSI? Do you
> > run with an IOMMU? When the RAID card goes offline, do you see a stop of
> > IRQs going to the device? Are the IRQs for the RAID card sent to all of your
> > CPUs or just a specific one? Are you pinning your guests to specific CPUs?
> > Does the issue disappear if you don't passthrough the NIC interfaces? If so 
> > have
> > you run this setup for "a week" to make sure?
> 
> It is an Areca 1220. I can't see anything when the device goes offline
> apart from 
> 
>     [77324.264270] sd 0:0:0:1: rejecting I/O to offline device
>     [77334.005854] sd 0:0:0:0: rejecting I/O to offline device

That is it? No other details from the driver? Did you poke at the driver 
(modinfo)
to see if there are any options to increase its verbosity.

> 
> Unfortunately nothing get's logged because there is nothing to write to
> anymore. I'm not sure how I can see the IRQs otherwise. There is no

cat /proc/interrupts

> pinning being done at all, and the machine was running for a few months
> OK before the pciback was added.

Ok, what about your NICs? Are they on-board? Are they sharing the IRQ
with the card? You should be able to see this by looking at /proc/interrupts.
Which NICs are they? lspci can you help you there. As of matter of fact, run
lspci -vvv and send that.
> 
> Is my kernel module line correct above? are the xen-pciback.permissive
> and resource_alignment options required? Also I am passing through the

Not always. The resource_alignment only if the BARs (look at lspci output) are
not page-aligned. If you have no idea what I am talking about then the answer
is yes.

> onboard NIC's - is this something that should be avoided or is it ok to
> do?

It is fine. That is the first thing I test..

> 
> > > 
> > > I know that this issue is with Xen, as I had this running on a different
> > > server (same xen setup) and it had the same issues, which I initially
> > > thought were to do with the raid card.
> > 
> > So you never ran this setup on this kernel (2.6.32-5) without the Xen 
> > hypervisor?
> 
> no, its always had the hypervisor - but it was running ok before the
> pciback options were added. This week, it's seemed to happen
> approximately every 24 hours.

When this hang occurs, can you do 'xm debug-key Q', 'xm debug-key i', 'xm 
debug-key z'.
Then run 'xm dmesg' and provide that to me?

Is your boot disk on the same disk as the RAID?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel