WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: VM hung after running sometime

To: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Subject: Re: [Xen-devel] Re: VM hung after running sometime
From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Thu, 23 Sep 2010 16:20:09 -0700
Cc: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, keir.fraser@xxxxxxxxxxxxx
Delivery-date: Thu, 23 Sep 2010 16:21:07 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <BAY121-W31394CA05B46D94F7DC8F8DA610@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C8BE230D.239BA%keir.fraser@xxxxxxxxxxxxx>, <4C98EB42.4020808@xxxxxxxx>, <BAY121-W10DFCBC2F3B78D89381527DA600@xxxxxxx>, <4C994B08.7050509@xxxxxxxx> <BAY121-W688EF9F79369127219FB3DA600@xxxxxxx>, <4C9A4B7A.3010308@xxxxxxxx> <BAY121-W31394CA05B46D94F7DC8F8DA610@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100907 Fedora/3.1.3-1.fc13 Lightning/1.0b3pre Thunderbird/3.1.3
 On 09/22/2010 05:55 PM, MaoXiaoyun wrote:
> The interrputs file is attached. The server has 24 HVM domains
> runnning about 40 hours.
>
> Well, we may upgrade to the new kernel in the further, but currently
> we prefer the fix has least impact on our present server.
> So it is really nice of you if you could offer the sets of patches,
> also, it would be our fisrt choice.

Try cherry-picking:
8401e9b96f80f9c0128e7c8fc5a01abfabbfa021 xen: use percpu interrupts for
IPIs and VIRQs
66fd3052fec7e7c21a9d88ba1a03bc062f5fb53d xen: handle events as
edge-triggered
29a2e2a7bd19233c62461b104c69233f15ce99ec xen/apic: use handle_edge_irq
for pirq events
f61692642a2a2b83a52dd7e64619ba3bb29998af xen/pirq: do EOI properly for
pirq events
0672fb44a111dfb6386022071725c5b15c9de584 xen/events: change to using fasteoi
2789ef00cbe2cdb38deb30ee4085b88befadb1b0 xen: make pirq interrupts use
fasteoi
d0936845a856816af2af48ddf019366be68e96ba xen/evtchn: rename
enable/disable_dynirq -> unmask/mask_irq
c6a16a778f86699b339585ba5b9197035d77c40f xen/evtchn: rename
retrigger_dynirq -> irq
f4526f9a78ffb3d3fc9f81636c5b0357fc1beccd xen/evtchn: make pirq
enable/disable unmask/mask
43d8a5030a502074f3c4aafed4d6095ebd76067c xen/evtchn: pirq_eoi does unmask
cb23e8d58ca35b6f9e10e1ea5682bd61f2442ebd xen/evtchn: correction, pirq
hypercall does not unmask
2390c371ecd32d9f06e22871636185382bf70ab7 xen/events: use
PHYSDEVOP_pirq_eoi_gmfn to get pirq need-EOI info
158d6550716687486000a828c601706b55322ad0 xen/pirq: use eoi as enable
d2ea486300ca6e207ba178a425fbd023b8621bb1 xen/pirq: use fasteoi for MSI too
f0d4a0552f03b52027fb2c7958a1cbbe210cf418 xen/apic: fix pirq_eoi_gmfn resume

> Later I will kick off the irqbalance disabled test in different
> servers, will keep you noticed.

Thanks,
J

>
> Thanks for your kindly assitance.
>
> > Date: Wed, 22 Sep 2010 11:31:22 -0700
> > From: jeremy@xxxxxxxx
> > To: tinnycloud@xxxxxxxxxxx
> > CC: keir.fraser@xxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] Re: VM hung after running sometime
> >
> > On 09/21/2010 06:19 PM, MaoXiaoyun wrote:
> > > Thanks for the details.
> > >
> > > Currently guest VM hang in our heavy IO stress test, (In detail, we
> > > have created more than 12 HVMS on our 16cores physical server,
> > > and each of HVM inside, iometer and ab regard as heavy IO periodically
> > > run). Guest hang shows up in 1 or 2 days. So the IO is very
> > > heavy, so as the interrupts, I think.
> >
> > What does /proc/interrupts look like?
> >
> > >
> > > According to the hang log, the domain blocked in _VPF_blocked_in_xen,
> > > indicates "x=1" in log file below, and that is port 1, 2. And
> > > all our HVM a re have PVdriver installed, one thing I am not clear
> > > right now is the IO event in these two ports. Does it only include
> > > "mouse, vga"event, or it also includes hard disk events? (If it has
> > > hard disk events included, the interrupt would be very heavy, right?
> > > and right now we have 4 physical CPU allocated to domain 0, is it
> > > appropriate ? )
> >
> > I'm not sure of the details of how qemu<->hvm interaction works, but it
> > was hangs in blkfront in PV domains which brought the lost event problem
> > to light. At the basic event channel level, they will both look the
> > same, and suffer from the same problems.
> >
> > >
> > > Anyway, I think I can have irqbalance disabled for a quick test.
> >
> > Thanks; that should confirm the diagnosis.
> >
> > > Meanwhile, I will spent some time on the patch merge.
> >
> > If you're not willing to go to t he current kernel, I can help you with
> > the minimal set of patches to backport.
> >
> > J
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel