WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-bugs

[Xen-bugs] [Bug 1371] New: On heavy IO load in domU, dom0 shows a kernel

To: xen-bugs@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-bugs] [Bug 1371] New: On heavy IO load in domU, dom0 shows a kernel bug and domU hangs on 100% io wait
From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
Date: Fri, 17 Oct 2008 00:03:44 -0700
Delivery-date: Fri, 17 Oct 2008 00:03:49 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-bugs-request@lists.xensource.com?subject=help>
List-id: Xen Bugzilla <xen-bugs.lists.xensource.com>
List-post: <mailto:xen-bugs@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-bugs>, <mailto:xen-bugs-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-bugs>, <mailto:xen-bugs-request@lists.xensource.com?subject=unsubscribe>
Reply-to: bugs@xxxxxxxxxxxxxxxxxx
Sender: xen-bugs-bounces@xxxxxxxxxxxxxxxxxxx
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1371

           Summary: On heavy IO load in domU, dom0 shows a kernel bug and
                    domU hangs on 100% io wait
           Product: Xen
           Version: unstable
          Platform: x86-64
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Unspecified
        AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx
        ReportedBy: michael.wyraz@xxxxxxxxxxx


System: Gentoo
Xen: 3.3.0
Kernel: 2.6.21

The virtual disks are LVM on Raid5. I tried both, tap:aio and phy: to access
the disks. The system worked well for a few weeks. but now i have the following
error every nicht. During our nightly backup of domU the dom0 shows errors in
the logfile. "top" in domU shows 100% io wait. I need to kill the domU and
reboot the host to fix it. It happend with different domUs.

>From the log:

Oct 17 00:02:20 office-host ------------[ cut here ]------------
Oct 17 00:02:20 office-host kernel BUG at fs/direct-io.c:1113!
Oct 17 00:02:20 office-host invalid opcode: 0000 [1] SMP 
Oct 17 00:02:20 office-host CPU 0 
Oct 17 00:02:20 office-host Modules linked in: nfs lockd nfs_acl sunrpc bridge
w83627ehf hwmon_vid hwmon i2c_isa autofs4 skge sk98lin i2c_i801 iTCO_wdt
i2c_core iTCO_vendor_support floppy ehci_hcd pcspkr uhci_hcd
Oct 17 00:02:20 office-host Pid: 7887, comm: tapdisk Not tainted 2.6.21-xen #5
Oct 17 00:02:20 office-host RIP: e030:[<ffffffff802b20e7>] 
[<ffffffff802b20e7>] __blockdev_direct_IO+0xa57/0xb70
Oct 17 00:02:20 office-host RSP: e02b:ffff880035475cd8  EFLAGS: 00010202
Oct 17 00:02:20 office-host RAX: 0000000000000001 RBX: ffff8800e64b8000 RCX:
0000000000000000
Oct 17 00:02:20 office-host RDX: 0000000000000001 RSI: 0000000000000000 RDI:
0000000000000000
Oct 17 00:02:20 office-host RBP: ffff8800e64b8324 R08: ffff880000ddc838 R09:
ffff880035f014c0
Oct 17 00:02:20 office-host R10: ffff880001d4b910 R11: ffffffff80432840 R12:
0000000000000001
Oct 17 00:02:20 office-host R13: fffffffffffffdef R14: ffff8800e64b8088 R15:
0000000000000000
Oct 17 00:02:20 office-host FS:  00002b96dda23bb0(0000)
GS:ffffffff8072d000(0000) knlGS:0000000000000000
Oct 17 00:02:20 office-host CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 17 00:02:20 office-host CR2: 0000000000000000 CR3: 0000000035476000 CR4:
0000000000002620
Oct 17 00:02:20 office-host Process tapdisk (pid: 7887, threadinfo
ffff880035474000, task ffff8800cb2e8080)
Oct 17 00:02:20 office-host Stack:  0000000000001000 0000005d5a262000
ffff8801ed00be38 ffff8801ee7c8a48
Oct 17 00:02:20 office-host 0000000c00000000 0000000000000000 ffff8801ed00be38
ffff8800099f6708
Oct 17 00:02:20 office-host ffff880000000001 ffff8801edd4f3c0 0000000cffffffff
0000000000000001
Oct 17 00:02:20 office-host Call Trace:
Oct 17 00:02:20 office-host [<ffffffff802b0b95>] blkdev_direct_IO+0x45/0x50
Oct 17 00:02:20 office-host [<ffffffff802b0a90>] blkdev_get_blocks+0x0/0xc0
Oct 17 00:02:20 office-host [<ffffffff8025fb9f>]
generic_file_direct_IO+0xef/0x150
Oct 17 00:02:20 office-host [<ffffffff802612fb>]
generic_file_aio_read+0xcb/0x1b0
Oct 17 00:02:20 office-host [<ffffffff80261230>]
generic_file_aio_read+0x0/0x1b0
Oct 17 00:02:20 office-host [<ffffffff802a168f>] aio_rw_vect_retry+0x7f/0x170
Oct 17 00:02:20 office-host [<ffffffff802a1610>] aio_rw_vect_retry+0x0/0x170
Oct 17 00:02:20 office-host [<ffffffff802a21d7>] aio_run_iocb+0xc7/0x160
Oct 17 00:02:20 office-host [<ffffffff802a2ef4>] io_submit_one+0x344/0x3e0
Oct 17 00:02:20 office-host [<ffffffff802a3555>] sys_io_submit+0xb5/0x130
Oct 17 00:02:20 office-host [<ffffffff80209ae6>] system_call+0x86/0x8b
Oct 17 00:02:20 office-host [<ffffffff80209a60>] system_call+0x0/0x8b
Oct 17 00:02:20 office-host 
Oct 17 00:02:20 office-host 
Oct 17 00:02:20 office-host Code: 0f 0b eb fe 4d 85 e4 75 1d 48 8b 74 24 08 44
89 ea 48 89 df 
Oct 17 00:02:20 office-host RIP  [<ffffffff802b20e7>]
__blockdev_direct_IO+0xa57/0xb70
Oct 17 00:02:20 office-host RSP <ffff880035475cd8>

...


Oct 17 03:07:36 office-host eth0: port 4(tap5.0) entering disabled state
Oct 17 03:07:36 office-host device tap5.0 left promiscuous mode
Oct 17 03:07:36 office-host audit(1224205656.682:20): dev=tap5.0 prom=0
old_prom=256 auid=4294967295
Oct 17 03:07:36 office-host eth0: port 4(tap5.0) entering disabled state
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vfb/5/0
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/console/5/0
Oct 17 03:07:36 office-host eth0: port 5(vif5.0) entering disabled state
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/blktap: remove
XENBUS_PATH=backend/tap/5/768
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/vif-bridge: offline
XENBUS_PATH=backend/vif/5/0
Oct 17 03:07:36 office-host device vif5.0 left promiscuous mode
Oct 17 03:07:36 office-host audit(1224205656.898:21): dev=vif5.0 prom=0
old_prom=256 auid=4294967295
Oct 17 03:07:36 office-host eth0: port 5(vif5.0) entering disabled state
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/blktap: remove
XENBUS_PATH=backend/tap/5/768
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/blktap: remove
XENBUS_PATH=backend/tap/5/832
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/blktap: remove
XENBUS_PATH=backend/tap/5/832
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/tap/5/768
Oct 17 03:07:36 office-host BLKTAPCTRL[4287]: blktapctrl.c:691: Unmapping vbd
Oct 17 03:07:36 office-host BLKTAPCTRL[4287]: blktapctrl.c:312: Write_msg
called: CTLMSG_CLOSE
Oct 17 03:07:36 office-host BLKTAPCTRL[4287]: blktapctrl.c:240: DEL_DISKTYPE:
Freeing entry
Oct 17 03:07:36 office-host TAPDISK[5249]: tapdisk.c:389: Tapdisk: Received
msg, len 16, type 7, UID 2
Oct 17 03:07:36 office-host logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/tap/5/832
Oct 17 03:07:36 office-host BLKTAPCTRL[4287]: blktapctrl.c:691: Unmapping vbd
Oct 17 03:07:36 office-host BLKTAPCTRL[4287]: blktapctrl.c:312: Write_msg
called: CTLMSG_CLOSE
Oct 17 03:07:36 office-host BLKTAPCTRL[4287]: blktapctrl.c:240: DEL_DISKTYPE:
Freeing entry
Oct 17 03:07:36 office-host TAPDISK[5292]: tapdisk.c:389: Tapdisk: Received
msg, len 16, type 7, UID 3
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:609: Received a poll
for a new vbd
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl_linux.c:56: Sent domid
12 and be_id 768
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl_linux.c:23:
/dev/xen/blktap3 device already exists
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl_linux.c:75: Received
device id 3 and major 251
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:167: Detected
handle: [aio]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:562: tapdisk process
does not exist:
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:580: Launching
process, CMDLINE [tapdisk /var/run/tap/tapctrlwrite3 /var/run/tap/tapctrlread3]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:327: Write_msg
called: CTLMSG_PID
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:426: Received
CTLMSG_PID_RSP
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:432:         PID:
[15534]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:270: Write_msg
called: CTLMSG_PARAMS, sending [aio:/dev/VolData1/TermSrv2,
/dev/VolData1/TermSrv2]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:284: Generated
cookie, 22
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:400: Received
CTLMSG_IMG: 167772160, 512, 0
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:674: Received a poll
for a new devmap
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:293: Write_msg
called: CTLMSG_NEWDEV
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:411: Received
CTLMSG_NEWDEV_RSP
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:684: Exiting
map_new_blktapctrl
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:389: Tapdisk: Received
msg, len 16, type 9, UID 22
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:389: Tapdisk: Received
msg, len 39, type 1, UID 22
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:398: Received
CTLMSG_PARAMS: [/dev/VolData1/TermSrv2]
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:406: Loaded driver: name
[tapdisk_aio], type [0]
Oct 17 03:07:37 office-host TAPDISK[15534]: block-aio.c:127: block-aio
open('/dev/VolData1/TermSrv2')
Oct 17 03:07:37 office-host TAPDISK[15534]: tapaio.c:237: AIO state initialised
Oct 17 03:07:37 office-host TAPDISK[15534]: block-aio.c:147:
open(/dev/VolData1/TermSrv2) with O_DIRECT
Oct 17 03:07:37 office-host TAPDISK[15534]: block-aio.c:84: Image size:        
pre sector_shift  [85899345920]         post sector_shift [167772160]
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:159: Adding fd_list_entry
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:421: Entered cookie 22
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:389: Tapdisk: Received
msg, len 20, type 4, UID 22
Oct 17 03:07:37 office-host TAPDISK[15534]: tapdisk.c:447: Retrieving state,
cookie 22.....[OK]
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/vif-bridge: brctl delif
eth0 vif5.0 failed
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/blktap: add
XENBUS_PATH=backend/tap/12/832
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:609: Received a poll
for a new vbd
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl_linux.c:56: Sent domid
12 and be_id 832
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl_linux.c:23:
/dev/xen/blktap4 device already exists
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl_linux.c:75: Received
device id 4 and major 251
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:167: Detected
handle: [aio]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:562: tapdisk process
does not exist:
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:580: Launching
process, CMDLINE [tapdisk /var/run/tap/tapctrlwrite4 /var/run/tap/tapctrlread4]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:327: Write_msg
called: CTLMSG_PID
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:389: Tapdisk: Received
msg, len 16, type 9, UID 23
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:426: Received
CTLMSG_PID_RSP
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:432:         PID:
[15569]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:270: Write_msg
called: CTLMSG_PARAMS, sending [aio:/dev/VolStore1/TermSrv2Swap,
/dev/VolStore1/TermSrv2Swap]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:284: Generated
cookie, 23
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:389: Tapdisk: Received
msg, len 44, type 1, UID 23
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:398: Received
CTLMSG_PARAMS: [/dev/VolStore1/TermSrv2Swap]
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:406: Loaded driver: name
[tapdisk_aio], type [0]
Oct 17 03:07:37 office-host TAPDISK[15569]: block-aio.c:127: block-aio
open('/dev/VolStore1/TermSrv2Swap')
Oct 17 03:07:37 office-host TAPDISK[15569]: tapaio.c:237: AIO state initialised
Oct 17 03:07:37 office-host TAPDISK[15569]: block-aio.c:147:
open(/dev/VolStore1/TermSrv2Swap) with O_DIRECT
Oct 17 03:07:37 office-host TAPDISK[15569]: block-aio.c:84: Image size:        
pre sector_shift  [2147483648]  post sector_shift [4194304]
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:159: Adding fd_list_entry
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:421: Entered cookie 23
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:400: Received
CTLMSG_IMG: 4194304, 512, 0
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:674: Received a poll
for a new devmap
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:293: Write_msg
called: CTLMSG_NEWDEV
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:389: Tapdisk: Received
msg, len 20, type 4, UID 23
Oct 17 03:07:37 office-host TAPDISK[15569]: tapdisk.c:447: Retrieving state,
cookie 23.....[OK]
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:411: Received
CTLMSG_NEWDEV_RSP
Oct 17 03:07:37 office-host BLKTAPCTRL[4287]: blktapctrl.c:684: Exiting
map_new_blktapctrl
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/vif-bridge: ifconfig
vif5.0 down failed
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/blktap: add
XENBUS_PATH=backend/tap/12/768
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/vif-bridge: Successful
vif-bridge offline for vif5.0, bridge eth0.
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/blktap: add
XENBUS_PATH=backend/tap/12/832
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/blktap: add
XENBUS_PATH=backend/tap/12/768
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/xen-hotplug-cleanup:
XENBUS_PATH=backend/vif/5/0
Oct 17 03:07:37 office-host device tap12.0 entered promiscuous mode
Oct 17 03:07:37 office-host audit(1224205657.330:22): dev=tap12.0 prom=256
old_prom=0 auid=4294967295
Oct 17 03:07:37 office-host eth0: port 4(tap12.0) entering learning state
Oct 17 03:07:37 office-host eth0: topology change detected, propagating
Oct 17 03:07:37 office-host eth0: port 4(tap12.0) entering forwarding state
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/vif-bridge: online
XENBUS_PATH=backend/vif/12/0
Oct 17 03:07:37 office-host device vif12.0 entered promiscuous mode
Oct 17 03:07:37 office-host audit(1224205657.383:23): dev=vif12.0 prom=256
old_prom=0 auid=4294967295
Oct 17 03:07:37 office-host eth0: port 5(vif12.0) entering learning state
Oct 17 03:07:37 office-host eth0: topology change detected, propagating
Oct 17 03:07:37 office-host eth0: port 5(vif12.0) entering forwarding state
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/vif-bridge: Successful
vif-bridge online for vif12.0, bridge eth0.
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/vif-bridge: Writing
backend/vif/12/0/hotplug-status connected to xenstore.
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/blktap: Writing
backend/tap/12/832/hotplug-status connected to xenstore.
Oct 17 03:07:37 office-host logger: /etc/xen/scripts/blktap: Writing
backend/tap/12/768/hotplug-status connected to xenstore.
Oct 17 03:08:14 office-host blktap: ring-ref 2047, event-channel 5, protocol 1
(unspecified, assuming native)
Oct 17 03:08:14 office-host blk_tap: unknown operation [255]
Oct 17 03:08:14 office-host blk_tap: Bad number of segments in request (0)
Oct 17 03:08:14 office-host blktap: ring-ref 1758, event-channel 6, protocol 1
(unspecified, assuming native)
Oct 17 03:08:14 office-host blk_tap: unknown operation [255]
Oct 17 03:08:14 office-host blk_tap: Bad number of segments in request (0)

...


-- 
Configure bugmail: 
http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-bugs

<Prev in Thread] Current Thread [Next in Thread>
  • [Xen-bugs] [Bug 1371] New: On heavy IO load in domU, dom0 shows a kernel bug and domU hangs on 100% io wait, bugzilla-daemon <=