WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] analyze for the P1 bug 593(xensource bug tracker)

To: "Yu, Ke" <ke.yu@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] analyze for the P1 bug 593(xensource bug tracker)
From: "Han, Zhu" <zhu.han@xxxxxxxxx>
Date: Wed, 10 May 2006 14:26:46 +0800
Cc: Helix-vmm <helix-vmm@xxxxxxxxx>
Delivery-date: Tue, 09 May 2006 23:28:19 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcZz+rYqIWjz4Q0tQaidsiPHBhEwgw==
Thread-topic: analyze for the P1 bug 593(xensource bug tracker)
Hi, all!
Our QA team submitted a bug 593 to xensource bug tracker one month ago
and it was boosted up to P1 several days ago! So I spend some time to
trace this bug this week! Below words is what I have found:
1) This bug is hard to been reproduced on most of the platforms we owns,
especially the UP box.  The platform on which we got the bug and could
reproduce the bug stably is Paxville, which owns 4 physical CPUs, and 2
cores, 2 hyperthreads for each CPU.
2) This root cause of this problem is "losetup -d /dev/loop*" could fail
at a rather low probability. "losetup -d /dev/loop*" is invoked by
/etc/xen/scripts/block when the script processes remove action. If we
exhausted all the loop devices, the VMX cannot be initialized properly.
That's why XEND complains "Error: Device creation failed for domain
****". However, if we remove the loop device manually, everything goes
OK!
3) "losetup -d /dev/loop" failed because kernel/drivers/block/loop.c
return EBUSY for the LOOP_CLR_FD ioctl operation. The probable cause for
this action is some one else didn't close the loop device when we try to
delete it!
4) The program opens the loop device could be VBD device driver. It
opens the loop device in vbd_create() through open_by_devnum. It closes
the handle for the loop device in vbd_free which is called by a
schedulable work item free_blkif. Is it true? If so, the problem could
be arised by the possible race condition between the work item and the
hotplug script! When the xenbus driver is notified the front end device
has been destroyed by the xenstore thread, it will remove the backend
device and related resources, and then notify the hotplug subsystem the
remove action! Because the code close the loop device's handle and the
script delete the loop device can run concurrently, the script could
fail when it try to delete the loop device!

My question is:
1) Does this possible race condition exist?
2) Why does the code closing the loop device been put to another out of
code workitem instead of finishing all work directly in
blkback_remove()? Any operation in free_blkif() could be blocked? Which
one?

Since I'm a really newbie to this field, any tips and comments will be
appreciated!
Thanks a lot!



Best Regards, 
hanzhu

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>