I'd strongly suggest to try upgrading your kernel, or at least the
blktap component. The condition below is new to me, but that wait_queue
file and some related code was known to be buggy and has long since been
removed.
If you choose to only upgrade blktap from tip, let me know what kernel
version you're dealing with, you might need to backport some of the
device queue macros to match your version's needs.
Daniel
On Sat, 2010-10-16 at 01:39 -0400, MaoXiaoyun wrote:
> Well, Thanks Keir.
> Fortunately we caught the bug, it turned out to be a tapdisk problem.
> A brief explaination for other guys might confront this issue.
>
> Clear BLKTAP_DEFERRED on line 19 will lead to the concurrent access
> of
> tap->deferred_queue between line 24 and 37, which will finally cause
> bad
> pointer of tap->deferred_queue, and infinte loop in while clause in
> line 22.
> Lock line 24 will be a simple fix.
>
> /linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c
> 9 void
> 10 blktap_run_deferred(void)
> 11 {
> 12 LIST_HEAD(queue);
> 13 struct blktap *tap;
> 14 unsigned long flags;
> 15
> 16 spin_lock_irqsave(&deferred_work_lock, flags);
> 17 list_splice_init(&deferred_work_queue, &queue);
> 18 list_for_each_entry(tap, &queue, deferred_queue)
> 19 clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
> 20 spin_unlock_irqrestore(&deferred_work_lock, flags);
> 21
> 22 while (!list_empty(&queue)) {
> 23 tap = list_entry(queue.next, struct blktap,
> deferred_queue);
> 24 &nb sp; list_del_init(&tap->deferred_queue);
> 25 blktap_device_restart(tap);
> 26 }
> 27 }
> 28
> 29 void
> 30 blktap_defer(struct blktap *tap)
> 31 {
> 32 unsigned long flags;
> 33
> 34 spin_lock_irqsave(&deferred_work_lock, flags);
> 35 if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) {
> 36 set_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
> 37 list_add_tail(&tap->deferred_queue, &deferred_work_queue);
> 38 }
> 39 spin_unlock_irqrestore(&deferred_work_lock, f lags);
> 40 }
>
>
> > Date: Fri, 15 Oct 2010 13:57:09 +0100
> > Subject: Re: [Xen-devel] Domain 0 stop response on frequently reboot
> VMS
> > From: keir@xxxxxxx
> > To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> >
> > You'll probably want to see if you can get SysRq output from dom0
> via serial
> > line. It's likely you can if it is alive enough to respond to ping.
> This
> > might tell you things like what all processes are getting blocked
> on, and
> > thus indicate what is stopping dom0 from making progress.
> >
> > -- Keir
> >
> > On 15/10/2010 13:43, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
> >
> > >
> > > Hi Keir:
> > >
> > > First, I'd like to express my appreciation for the help your
> offered
> > > before.
> > > Well, recently we confront a rather nasty domain 0 no response
> > > problem.
> > >
> > > We still have 12 HVMs almost continuously and con currently reboot
> > > test on a physical server.
> > > A few hours later, the server looks like dead. We only can ping to
> > > the server and get right response,
> > > the Xen works fine since we can get debug info from serial port.
> Attached is
> > > the full debug output.
> > > After decode the domain 0 CPU stack, I find the CPU still works
> for domain 0
> > > since the stack changed
> > > info changed every time I dumped.
> > >
> > > Could help to take a look at the attentchment to see whether there
> are
> > > some hints for debugging this
> > > problem. Thanks in advance.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-devel
> >
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|