RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS

Hi Dainnel:

      Well, where can I start if I want to maintain the current kernel(2.6.31), and only update the blktap2?
      As I go throught the git branch of xen/dom0/backend/blktap2, I found wait_queue.c is removed.
      It looks like blktap2 has changed a lot, right?
      So I am courious the difference between the new and the old one.
      Could you share some brief explainations, that would be very helpful.
      Thanks in advance.

> Subject: RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS
> From: daniel.stodden@xxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: keir@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> Date: Sat, 23 Oct 2010 22:56:51 -0700
>
> On Sun, 2010-10-24 at 01:48 -0400, MaoXiaoyun wrote:
> > Hi Daniel:
> >
> > Sorry for tht late response, and really thanks for your kindly
> > suggestion.
> > Well, I believe we will upgrade to the lastest kernel in the
> > coming future, but currently
> > we perfer to maintain for stable reason.
> >
> > Our kernel version is 2.6.31. Now I am going through the change
> > set of blktap to get
> > more detail info.
>
> NP. Let me know if you have questions.
>
> Daniel
>
> > thanks.
> >
> > > Subject: RE: [Xen-devel] Domain 0 stop response on freq uently reboot
> > VMS
> > > From: daniel.stodden@xxxxxxxxxx
> > > To: tinnycloud@xxxxxxxxxxx; jeremy@xxxxxxxx
> > > CC: keir@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> > > Date: Mon, 18 Oct 2010 14:17:50 -0700
> > >
> > >
> > > I'd strongly suggest to try upgrading your kernel, or at least the
> > > blktap component. The condition below is new to me, but that
> > wait_queue
> > > file and some related code was known to be buggy and has long since
> > been
> > > removed.
> > >
> > > If you choose to only upgrade blktap from tip, let me know what
> > kernel
> > > version you're dealing with, you might need to backport some of the
> > > device queue macros to match your version's needs.
> > >
> > > Daniel
> > >
> > >
> &g t; > On Sat, 2010-10-16 at 01:39 -0400, MaoXiaoyun wrote:
> > > > Well, Thanks Keir.
> > > > Fortunately we caught the bug, it turned out to be a tapdisk
> > problem.
> > > > A brief explaination for other guys might confront this issue.
> > > >
> > > > Clear BLKTAP_DEFERRED on line 19 will lead to the concurrent
> > access
> > > > of
> > > > tap->deferred_queue between line 24 and 37, which will finally
> > cause
> > > > bad
> > > > pointer of tap->deferred_queue, and infinte loop in while clause
> > in
> > > > line 22.
> > > > Lock line 24 will be a simple fix.
> > > >
> > > > /linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c
> > > > 9 void
> > > > 10 blktap_run_deferred(void)
> > > > 11 {
> > > > 12 LIST_HEAD(queue);
> > > > 13 struct blktap *tap;
> > > > 14 unsigned long flags;
> > > > 15
> > > > 16 spin_lock_irqsave(&deferred_work_lock, flags);
> > > > 17 list_splice_init(&deferred_work_queue, &queue);
> > > > 18 list_for_each_entry(tap, &queue, deferred_queue)
> > > > 19 clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
> > > > 20 spin_unlock_irqrestore(&deferred_work_lock, flags);
> > > > 21
> > > > 22 while (!list_empty(&queue)) {
> > > > 23 tap = list_entry(queue.next, struct blktap,
> > > > deferred_queue);
> > > > 24 &nb sp; list_del_init(&tap->deferred_queue);
> > > > 25 blktap_device_restart(tap);
> > > > 26 }
> > > > 27 }
> > > > 28
&g t; > > > 29 void
> > > > 30 blktap_defer(struct blktap *tap)
> > > > 31 {
> > > > 32 unsigned long flags;
> > > > 33
> > > > 34 spin_lock_irqsave(&deferred_work_lock, flags);
> > > > 35 if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) {
> > > > 36 set_bit(BLKTAP_DEFERRED, &tap->dev_inuse);
> > > > 37 list_add_tail(&tap->deferred_queue, &deferred_work_queue);
> > > > 38 }
> > > > 39 spin_unlock_irqrestore(&deferred_work_lock, f lags);
> > > > 40 }
> > > >
> > > >
> > > > > Date: Fri, 15 Oct 2010 13:57:09 +0100
> > > > > Subject: Re: [Xen-devel] Domain 0 stop response on frequently
> > reboot
> > > > VMS
> > > > > From: keir@xxxxxxx
> > > > > To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> > > > >
> > > > > You'll probably want to see if you can get SysRq output from
> > dom0
> > > > via serial
> > > > > line. It's likely you can if it is alive enough to respond to
> > ping.
> > > > This
> > > > > might tell you things like what all processes are getting
> > blocked
> > > > on, and
> > > > > thus indicate what is stopping dom0 from making progress.
> > > > >
> > > > > -- Keir
> > > > >
> > > > > On 15/10/2010 13:43, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx>
> > wrote:
> > > > >
> > > > > >
> > > > > > Hi Keir:
> > > > > >
> > > > > > First, I'd like to expres s my appreciation for the help your
> > > > offered
> > > > > > before.
> > > > > > Well, recently we confront a rather nasty domain 0 no response
> > > > > > problem.
> > > > > >
> > > > > > We still have 12 HVMs almost continuously and con currently
> > reboot
> > > > > > test on a physical server.
> > > > > > A few hours later, the server looks like dead. We only can
> > ping to
> > > > > > the server and get right response,
> > > > > > the Xen works fine since we can get debug info from serial
> > port.
> > > > Attached is
> > > > > > the full debug output.
> > > > > > After decode the domain 0 CPU stack, I find the CPU still
> > works
> > > > for domain 0
> > > > > > since the stack changed
> > > > > > info changed every time I dumped.
> > > > > >
> > > > > > Could help to take a look at the attentchment to see whether
> > there
> > > > are
> > > > > > some hints for debugging this
> > > > > > problem. Thanks in advance.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Xen-devel mailing list
> > > > > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > > > > http://lists.xensource.com/xen-devel
> > > > >
> > > > >
> &g t; >
> > >
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS