xen-devel
RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS
To: |
<daniel.stodden@xxxxxxxxxx> |
Subject: |
RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS |
From: |
MaoXiaoyun <tinnycloud@xxxxxxxxxxx> |
Date: |
Tue, 26 Oct 2010 16:16:55 +0800 |
Cc: |
xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx> |
Delivery-date: |
Tue, 26 Oct 2010 01:17:32 -0700 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
Importance: |
Normal |
In-reply-to: |
<1287899811.4575.32.camel@ramone> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<BLU157-w4633FF2011F6D0F8408CAEDA570@xxxxxxx>,,, <C8DE0E35.261CD%keir@xxxxxxx>, , <BLU157-w8114EC5EB660DA26E51B9DA580@xxxxxxx>, , <1287436670.23170.43.camel@xxxxxxxxxxxxxxxxxxxxxxx>, <BLU157-w385D9EE38F5058B8F364C2DA400@xxxxxxx>, <1287899811.4575.32.camel@ramone> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Hi Dainnel:
Well, where can I start if I want to maintain the current kernel(2.6.31), and only update the blktap2?
As I go throught the git branch of xen/dom0/backend/blktap2, I found wait_queue.c is removed.
It looks like blktap2 has changed a lot, right?
So I am courious the difference between the new and the old one.
Could you share some brief explainations, that would be very helpful.
Thanks in advance.
> Subject: RE: [Xen-devel] Domain 0 stop response on frequently reboot VMS > From: daniel.stodden@xxxxxxxxxx > To: tinnycloud@xxxxxxxxxxx > CC: keir@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > Date: Sat, 23 Oct 2010 22:56:51 -0700 > > On Sun, 2010-10-24 at 01:48 -0400, MaoXiaoyun wrote: > > Hi Daniel: > > > > Sorry for tht late response, and really thanks for your kindly > > suggestion. > > Well, I believe we will upgrade to the lastest kernel in the > > coming future, but currently > > we perfer to maintain for stable reason. > > > > Our kernel version is 2.6.31. Now I am going through the change > > set of blktap to get > > more detail info. > > NP. Let me know if you have questions. > > Daniel > > > thanks. > > > > > Subject: RE: [Xen-devel] Domain 0 stop response on freq
uently reboot > > VMS > > > From: daniel.stodden@xxxxxxxxxx > > > To: tinnycloud@xxxxxxxxxxx; jeremy@xxxxxxxx > > > CC: keir@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > > > Date: Mon, 18 Oct 2010 14:17:50 -0700 > > > > > > > > > I'd strongly suggest to try upgrading your kernel, or at least the > > > blktap component. The condition below is new to me, but that > > wait_queue > > > file and some related code was known to be buggy and has long since > > been > > > removed. > > > > > > If you choose to only upgrade blktap from tip, let me know what > > kernel > > > version you're dealing with, you might need to backport some of the > > > device queue macros to match your version's needs. > > > > > > Daniel > > > > > > > &g
t; > On Sat, 2010-10-16 at 01:39 -0400, MaoXiaoyun wrote: > > > > Well, Thanks Keir. > > > > Fortunately we caught the bug, it turned out to be a tapdisk > > problem. > > > > A brief explaination for other guys might confront this issue. > > > > > > > > Clear BLKTAP_DEFERRED on line 19 will lead to the concurrent > > access > > > > of > > > > tap->deferred_queue between line 24 and 37, which will finally > > cause > > > > bad > > > > pointer of tap->deferred_queue, and infinte loop in while clause > > in > > > > line 22. > > > > Lock line 24 will be a simple fix. > > > > > > > > /linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c > > > > 9 void > > > > 10 blktap_run_deferred(void) > > > >
11 { > > > > 12 LIST_HEAD(queue); > > > > 13 struct blktap *tap; > > > > 14 unsigned long flags; > > > > 15 > > > > 16 spin_lock_irqsave(&deferred_work_lock, flags); > > > > 17 list_splice_init(&deferred_work_queue, &queue); > > > > 18 list_for_each_entry(tap, &queue, deferred_queue) > > > > 19 clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse); > > > > 20 spin_unlock_irqrestore(&deferred_work_lock, flags); > > > > 21 > > > > 22 while (!list_empty(&queue)) { > > > > 23 tap = list_entry(queue.next, struct blktap, > > > > deferred_queue); > > > > 24 &nb sp; list_del_init(&tap->deferred_queue); > > > > 25 blktap_device_restart(tap); > > > > 26 } > > > > 27 } > > > > 28 &g
t; > > > 29 void > > > > 30 blktap_defer(struct blktap *tap) > > > > 31 { > > > > 32 unsigned long flags; > > > > 33 > > > > 34 spin_lock_irqsave(&deferred_work_lock, flags); > > > > 35 if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) { > > > > 36 set_bit(BLKTAP_DEFERRED, &tap->dev_inuse); > > > > 37 list_add_tail(&tap->deferred_queue, &deferred_work_queue); > > > > 38 } > > > > 39 spin_unlock_irqrestore(&deferred_work_lock, f lags); > > > > 40 } > > > > > > > > > > > > > Date: Fri, 15 Oct 2010 13:57:09 +0100 > > > > > Subject: Re: [Xen-devel] Domain 0 stop response on frequently > > reboot > > > > VMS > > > > > From: keir@xxxxxxx > > > > > To:
tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx > > > > > > > > > > You'll probably want to see if you can get SysRq output from > > dom0 > > > > via serial > > > > > line. It's likely you can if it is alive enough to respond to > > ping. > > > > This > > > > > might tell you things like what all processes are getting > > blocked > > > > on, and > > > > > thus indicate what is stopping dom0 from making progress. > > > > > > > > > > -- Keir > > > > > > > > > > On 15/10/2010 13:43, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> > > wrote: > > > > > > > > > > > > > > > > > Hi Keir: > > > > > > > > > > > > First, I'd like to expres
s my appreciation for the help your > > > > offered > > > > > > before. > > > > > > Well, recently we confront a rather nasty domain 0 no response > > > > > > problem. > > > > > > > > > > > > We still have 12 HVMs almost continuously and con currently > > reboot > > > > > > test on a physical server. > > > > > > A few hours later, the server looks like dead. We only can > > ping to > > > > > > the server and get right response, > > > > > > the Xen works fine since we can get debug info from serial > > port. > > > > Attached is > > > > > > the full debug output. > > > > > > After decode the domain 0 CPU stack, I find the CPU still > > works > > > > for domain 0 >
> > > > > since the stack changed > > > > > > info changed every time I dumped. > > > > > > > > > > > > Could help to take a look at the attentchment to see whether > > there > > > > are > > > > > > some hints for debugging this > > > > > > problem. Thanks in advance. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Xen-devel mailing list > > > > > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > > > > > http://lists.xensource.com/xen-devel > > > > > > > > > > > &g
t; > > > > > >
|
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|