WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] [XCP] ext3 crashes and slowdowns

To: Christian Fischer <christian.fischer@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-users] [XCP] ext3 crashes and slowdowns
From: Pasi Kärkkäinen <pasik@xxxxxx>
Date: Wed, 5 Jan 2011 12:03:37 +0200
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Wed, 05 Jan 2011 02:10:17 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <201101041537.36450.christian.fischer@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <201101041537.36450.christian.fischer@xxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Tue, Jan 04, 2011 at 03:37:36PM +0100, Christian Fischer wrote:
> Hi Folks.
> 
> I've two Intel boxes (Intel server S5520UR, 2x E5520, 32GB ram, SATA HW-Raid, 
> BBU) running as XCP-0.5 pool, both running a OpenFiler-2.3 domU, clustered, 
> active/passive. Data Storage is provided as SCSISR (without LVM layer, like a 
> HBASR) to OpenFiler. Shared storage is provided as iSCSI target by OpenFiler 
> via clusterIP (storage frontend network), replication is done by drbd 
> (storage 
> backend network), HA is done by haertbeat (hearbeat network). All networks 
> are 
> built on top of redundant HP gigabit switches, 2 pairs of Intel gigabit NICs, 
> each bonded and plugged into the same switch, both bonds multipathed 
> (active/passive multipathing, patched OpenVSwitch-1.1.2p1) via the two 
> switches, which are linked together with 2 ports each.
> 

Hello,

Did you try XCP 1.0 beta? 

-- Pasi

> XCP pool works, ISCSI works, replication works, HA works.
> 
> If filer 1 (running on server1) is active i can install and run domUs on 
> server 2 without problems, I can not install or run domUs on server 1.
> 
> If  I switch to filer 2 (on server 2) as the active one the running but 
> stalled domUs on server 1 get back their life, and the running domUs on 
> filer2 
> loose their life.
> # dd if=/dev/zero of=/tmp/test bs=512M count=1 oflag=direct
> shows a rate of  0.8 - 1.2 MB/sec.
> 
> The kernel shows traces like
> 
> INFO: task syslogd:1081 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syslogd       D ffff880001003460     0  1081      1          1084  1073 
> (NOTLB)
>  ffff8800367edd88  0000000000000286  ffff8800367edd98  ffffffff80262dd3 
>  0000000000000009  ffff88003fb007a0  ffffffff804f4b80  0000000000000d5b 
>  ffff88003fb00988  0000000000006d06 
> Call Trace:
>  [<ffffffff80262dd3>] thread_return+0x6c/0x113
>  [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5
>  [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
>  [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
>  [<ffffffff802d2ff1>] do_readv_writev+0x26e/0x291
>  [<ffffffff802e555b>] sync_inode+0x24/0x33
>  [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
>  [<ffffffff80252276>] do_fsync+0x52/0xa4
>  [<ffffffff802d37f5>] __do_fsync+0x23/0x36
>  [<ffffffff802602f9>] tracesys+0xab/0xb6
> 
> 
> Iscsiadm shows no errors.
> 
> # iscsiadm -m session -r 1 -s
> Stats for session [sid: 1, target: 
> iqn.2006-01.com.openfiler:tsn.26336ef50fe0:storage1_osimages, portal: 
> 172.16.0.2,3260]
> iSCSI SNMP:
>         txdata_octets: 486181549212
>         rxdata_octets: 2622687792
>         noptx_pdus: 0
>         scsicmd_pdus: 15184105
>         tmfcmd_pdus: 0
>         login_pdus: 0
>         text_pdus: 0
>         dataout_pdus: 195910
>         logout_pdus: 0
>         snack_pdus: 0
>         noprx_pdus: 0
>         scsirsp_pdus: 15184088
>         tmfrsp_pdus: 0
>         textrsp_pdus: 0
>         datain_pdus: 87898
>         logoutrsp_pdus: 0
>         r2t_pdus: 151200
>         async_pdus: 0
>         rjt_pdus: 0
>         digest_err: 0
>         timeout_err: 0
> iSCSI Extended:
>         tx_sendpage_failures: 0
>         rx_discontiguous_hdr: 0
>         eh_abort_cnt: 0
> 
> If I reboot the domU after giving back her life, in most cases, the ext3 
> journal is corrupt, and the kernel panics after one reboot more.
> 
> If I try to install a PV-Domain (CentOS-5.5) the installer asks if I wish to 
> initialize the disk xvda, but if the disk partitioning and layout questions 
> appear the disk is missing in the list. There's nothing more than a question 
> mark.
> Sometimes I have the disk in the list, if so I can install the OS, all seems 
> fine, but after the second reboot the ext3 journal is missing and the kernel 
> panics after the third reboot, rootfs is gone.
> 
> 
> Are there any ideas? I'm out of.
> 
> Thanks
> Christian
> 
> Some kernel logging from domU, nothing inside dom0 log.
> 
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743295
> Aborting journal on device dm-0.
> ext3_abort called.
> EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
> Remounting filesystem read-only
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743296
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743297
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743298
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743299
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743300
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743301
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743302
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743303
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743304
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743305
> EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_truncate: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_orphan_del: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted
> __journal_remove_journal_head: freeing b_committed_data
> __journal_remove_journal_head: freeing b_committed_data
> __journal_remove_journal_head: freeing b_committed_data
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>