RE: [Xen-devel] network hang again

I tracked the glitch back to the 2.4.27 domain-1 (unpriv, uses evms
blocks from dom0 to serve out as iscsi targets via file-io) with this
error message being the trigger point of the colapse.

Sep 15 00:16:55 localhost kernel: fileio_make_request(85) Bad things
happened 40
96, -5

from kernel/file-io.c:lines 76 to 85 seems to be the error point.
                        if (rw == READ)
                                ret = generic_file_read(filp, buf,
count, &ppos);
                        else
                                ret = generic_file_write(filp, buf,
count, &ppos);

                        if (ret != count)
                                printk("%s(%d) Bad things happened %lld,
%d\n",
                                       __FUNCTION__, __LINE__, count,
ret);


-5 is -EIO in linux-2.4.27/include/asm-i386/errno.h:8
#define EIO              5      /* I/O error */

I do NOT get any errors from domain0, so I can't trace through to dom0
right now. 8-(

This error coincides perfectly time wise with the linux-iscsi initiator
errors I got earlier this week, so I believe that this is what's
triggering the iscsi-initiator error.

Any advice on how to figure out what is causing the I/O error would be
greatly appreciated. Right now it is the ONLY thing that is holding me
back from using the IET iSCSI target.

Thanks!

Brian Wolfe

On Tue, 2004-09-14 at 21:50, James Harper wrote:
> When I explained about the patch on the iet list, I was asked if I was
> getting frequent disconnections :)
> 
> It sounds like the network issues I'm seeing in xen are probably
> triggering the crash in iscsi.
> 
> I'm running iet 0.3.3 + 2.6 patch + my additional 2.6 patch on dom0, and
> linux-iscsi 4.0.1.8 on dom1.
> 
> James
> 
> > -----Original Message-----
> > From: Brian Wolfe [mailto:ahzz@xxxxxxxxxxx]
> > Sent: Wednesday, 15 September 2004 02:22
> > To: James Harper
> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] network hang again
> > 
> > I have been running IET 0.3.3 on 2.4.27 on one machine, and cisco's
> > linux-iscsi on 2.6.8.1 on a second physical machine for a couple days
> > now. So far the only thing that I have run into is a dump message
> > concerning OOM on the linux-iscsi machine.
> > 
> > 
> > Sep 13 00:20:11 vhost1 kernel: iSCSI: 4.0.1 ( 9-Feb-2004) built for
> > Linux 2.6.8-tbc-vhost-Xen0
> > Sep 13 00:20:11 vhost1 kernel: iSCSI: will translate deferred sense to
> > current sense on disk command responses
> > Sep 13 00:20:11 vhost1 kernel: iSCSI: control device major number 254
> > Sep 13 00:20:11 vhost1 kernel: scsi_proc_hostdir_add: proc_mkdir
> failed
> > for <NULL>
> > Sep 13 00:20:11 vhost1 kernel: scsi17 : Cisco iSCSI driver
> > Sep 13 00:20:11 vhost1 kernel: iSCSI:detected HBA host #17
> > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 =
> > iqn.2001-04.dmz.iscsi1:wnhttp
> > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 portal 0 =
> address
> > 10.11.7.1 port 3260 group 1
> > Sep 13 00:20:11 vhost1 kernel: iSCSI: starting timer thread at
> 21835751
> > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 trying to
> establish
> > session to portal 0, address 10.11.7.1 port 32
> > 60 group 1
> > Sep 13 00:20:12 vhost1 kernel: iSCSI: session c1478000 authenticated
> by
> > target iqn.2001-04.dmz.iscsi1:wnhttp
> > Sep 13 00:20:12 vhost1 kernel: iSCSI: bus 0 target 0 established
> session
> > #1, portal 0, address 10.11.7.1 port 3260 grou
> > p 1
> > Sep 13 00:20:12 vhost1 kernel:   Vendor: LINUX     Model:
> > ISCSI             Rev: 0
> > Sep 13 00:20:12 vhost1 kernel:   Type:
> > Direct-Access                      ANSI SCSI revision: 03
> > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: 16777212 512-byte hdwr
> > sectors (8590 MB)
> > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: drive cache: write
> back
> > Sep 13 00:20:12 vhost1 kernel:  sda: unknown partition table
> > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sda at scsi17,
> channel
> > 0, id 0, lun 0
> > Sep 13 00:20:12 vhost1 kernel:   Vendor: LINUX     Model:
> > ISCSI             Rev: 0
> > Sep 13 00:20:12 vhost1 kernel:   Type:
> > Direct-Access                      ANSI SCSI revision: 03
> > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: 65536 512-byte hdwr
> > sectors (34 MB)
> > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: drive cache: write
> back
> > Sep 13 00:20:12 vhost1 kernel:  sdb: unknown partition table
> > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sdb at scsi17,
> channel
> > 0, id 0, lun 1
> > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: found reiserfs format
> > "3.6" with standard journal
> > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: using ordered data mode
> > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: journal params: device
> > sda, size 8192, journal first block 18, max trans
> > len 1024, max batch 900, max commit age 30, max trans age 30
> > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: checking transaction log
> > (sda)
> > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: replayed 1 transactions
> in
> > 0 seconds
> > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: Using r5 hash to sort
> > names
> > Sep 13 00:28:51 vhost1 kernel: iscsi-tx: page allocation failure.
> > order:1, mode:0x20
> > Sep 13 00:28:51 vhost1 kernel:  [__alloc_pages+728/848]
> > __alloc_pages+0x2d8/0x350
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [__get_free_pages+31/64]
> > __get_free_pages+0x1f/0x40
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [kmem_getpages+30/224]
> > kmem_getpages+0x1e/0xe0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [cache_grow+159/336]
> > cache_grow+0x9f/0x150
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [cache_alloc_refill+318/512]
> > cache_alloc_refill+0x13e/0x200
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [__kmalloc+139/160]
> __kmalloc+0x8b/0xa0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [alloc_skb+71/224] alloc_skb+0x47/0xe0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+38296326/1002676224]
> > rhine_rx+0x156/0x460 [via_rhine]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+38295340/1002676224]
> > rhine_interrupt+0x1ac/0x1d0 [via_rhine]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [handle_IRQ_event+73/144]
> > handle_IRQ_event+0x49/0x90
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [do_IRQ+109/240] do_IRQ+0x6d/0xf0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [evtchn_do_upcall+156/256]
> > evtchn_do_upcall+0x9c/0x100
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [hypervisor_callback+51/73]
> > hypervisor_callback+0x33/0x49
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [csum_partial_copy_generic+63/248]
> > csum_partial_copy_generic+0x3f/0xf8
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [tcp_sendmsg+578/4176]
> > tcp_sendmsg+0x242/0x1050
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [inet_sendmsg+77/96]
> > inet_sendmsg+0x4d/0x60
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [sock_sendmsg+165/192]
> > sock_sendmsg+0xa5/0xc0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [__do_softirq+149/160]
> > __do_softirq+0x95/0xa0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [do_softirq+69/80]
> do_softirq+0x45/0x50
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [do_IRQ+194/240] do_IRQ+0xc2/0xf0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+39270168/1002676224]
> > iscsi_xmit_queued_cmnds+0x188/0x3c0 [iscsi]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+39254271/1002676224]
> > iscsi_sendmsg+0x4f/0x70 [iscsi]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+39271874/1002676224]
> > iscsi_xmit_data+0x472/0x8d0 [iscsi]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [__do_softirq+149/160]
> > __do_softirq+0x95/0xa0
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+39273273/1002676224]
> > iscsi_xmit_r2t_data+0x119/0x1f0 [iscsi]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+39165617/1002676224]
> > iscsi_tx_thread+0x711/0x8d0 [iscsi]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [autoremove_wake_function+0/96]
> > autoremove_wake_function+0x0/0x60
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [autoremove_wake_function+0/96]
> > autoremove_wake_function+0x0/0x60
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [default_wake_function+0/32]
> > default_wake_function+0x0/0x20
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [pg0+39163808/1002676224]
> > iscsi_tx_thread+0x0/0x8d0 [iscsi]
> > Sep 13 00:28:51 vhost1 kernel:
> > Sep 13 00:28:51 vhost1 kernel:  [kernel_thread_helper+5/16]
> > kernel_thread_helper+0x5/0x10
> > Sep 13 00:28:51 vhost1 kernel:
> > 
> > The only reason I'm posting the "trace" from linux-iscsi is because it
> > contains the hypervisor_callback function in it and it's in the rx
> phase
> > of the via_rhine driver.
> > 
> > What iscsi are you running on each machine? (Sorry if I missed it,
> been
> > offline for a few deays now. 8-( ) I'd be interested to know if this
> is
> > in any way similar to your issue.
> > 
> > Brian
> > 
> > 
> > On Tue, 2004-09-14 at 07:38, James Harper wrote:
> > > I'm now seeing this network hang a lot, to the point where it makes
> my
> > > iscsi testing unusable. I believe this is more to do with the sort
> of
> > > testing I'm doing now more so than a bug that has suddenly appeared.
> > >
> > > My setup is this:
> > > Dom0:
> > > 2.6.8.1
> > > Iscsitarget 0.3.3 + 2.6 patches + my own 2.6 patches.
> > > No conntrack or other netfilter related modules
> > > Bridged eth0 to Dom1
> > > /usr/src exported via nfs
> > >
> > > Dom1:
> > > 2.6.8.1
> > > Linux-iscsi 4.0.1.8
> > > No conntrack or other netfilter related modules
> > > /usr/src mounted from Dom0
> > >
> > > Iscsi works for a while, normally crashing in Dom0 due to another
> > > non-xen related bug before it hits this bug, but if I try to do a
> > > compile on Dom1 in the nfs mounted /usr/src, the network locks up
> almost
> > > instantly, but then clears up shortly after if I kill the compile.
> > >
> > > The logs show absolutely nothing of any use.
> > >
> > > I've just tried a few netperf tests. A quick hammering goes off
> without
> > > a hitch, but afterwards I see random dropped packets. I'll keep
> testing.
> > >
> > > James
> > >
> > >
> > > -------------------------------------------------------
> > > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
> > > Project Admins to receive an Apple iPod Mini FREE for your judgement
> on
> > > who ports your project to Linux PPC the best. Sponsored by IBM.
> > > Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> > 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
> Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
> Camcorder. More prizes in the weekly Lunch Hour Challenge.
> Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] network hang again