Re: [Xen-users] [XCP] ext3 crashes and slowdowns

On Tue, Jan 04, 2011 at 03:37:36PM +0100, Christian Fischer wrote:
> Hi Folks.
> 
> I've two Intel boxes (Intel server S5520UR, 2x E5520, 32GB ram, SATA HW-Raid, 
> BBU) running as XCP-0.5 pool, both running a OpenFiler-2.3 domU, clustered, 
> active/passive. Data Storage is provided as SCSISR (without LVM layer, like a 
> HBASR) to OpenFiler. Shared storage is provided as iSCSI target by OpenFiler 
> via clusterIP (storage frontend network), replication is done by drbd 
> (storage 
> backend network), HA is done by haertbeat (hearbeat network). All networks 
> are 
> built on top of redundant HP gigabit switches, 2 pairs of Intel gigabit NICs, 
> each bonded and plugged into the same switch, both bonds multipathed 
> (active/passive multipathing, patched OpenVSwitch-1.1.2p1) via the two 
> switches, which are linked together with 2 ports each.
> 

Hello,

Did you try XCP 1.0 beta? 

-- Pasi

> XCP pool works, ISCSI works, replication works, HA works.
> 
> If filer 1 (running on server1) is active i can install and run domUs on 
> server 2 without problems, I can not install or run domUs on server 1.
> 
> If  I switch to filer 2 (on server 2) as the active one the running but 
> stalled domUs on server 1 get back their life, and the running domUs on 
> filer2 
> loose their life.
> # dd if=/dev/zero of=/tmp/test bs=512M count=1 oflag=direct
> shows a rate of  0.8 - 1.2 MB/sec.
> 
> The kernel shows traces like
> 
> INFO: task syslogd:1081 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syslogd       D ffff880001003460     0  1081      1          1084  1073 
> (NOTLB)
>  ffff8800367edd88  0000000000000286  ffff8800367edd98  ffffffff80262dd3 
>  0000000000000009  ffff88003fb007a0  ffffffff804f4b80  0000000000000d5b 
>  ffff88003fb00988  0000000000006d06 
> Call Trace:
>  [<ffffffff80262dd3>] thread_return+0x6c/0x113
>  [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5
>  [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
>  [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
>  [<ffffffff802d2ff1>] do_readv_writev+0x26e/0x291
>  [<ffffffff802e555b>] sync_inode+0x24/0x33
>  [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
>  [<ffffffff80252276>] do_fsync+0x52/0xa4
>  [<ffffffff802d37f5>] __do_fsync+0x23/0x36
>  [<ffffffff802602f9>] tracesys+0xab/0xb6
> 
> 
> Iscsiadm shows no errors.
> 
> # iscsiadm -m session -r 1 -s
> Stats for session [sid: 1, target: 
> iqn.2006-01.com.openfiler:tsn.26336ef50fe0:storage1_osimages, portal: 
> 172.16.0.2,3260]
> iSCSI SNMP:
>         txdata_octets: 486181549212
>         rxdata_octets: 2622687792
>         noptx_pdus: 0
>         scsicmd_pdus: 15184105
>         tmfcmd_pdus: 0
>         login_pdus: 0
>         text_pdus: 0
>         dataout_pdus: 195910
>         logout_pdus: 0
>         snack_pdus: 0
>         noprx_pdus: 0
>         scsirsp_pdus: 15184088
>         tmfrsp_pdus: 0
>         textrsp_pdus: 0
>         datain_pdus: 87898
>         logoutrsp_pdus: 0
>         r2t_pdus: 151200
>         async_pdus: 0
>         rjt_pdus: 0
>         digest_err: 0
>         timeout_err: 0
> iSCSI Extended:
>         tx_sendpage_failures: 0
>         rx_discontiguous_hdr: 0
>         eh_abort_cnt: 0
> 
> If I reboot the domU after giving back her life, in most cases, the ext3 
> journal is corrupt, and the kernel panics after one reboot more.
> 
> If I try to install a PV-Domain (CentOS-5.5) the installer asks if I wish to 
> initialize the disk xvda, but if the disk partitioning and layout questions 
> appear the disk is missing in the list. There's nothing more than a question 
> mark.
> Sometimes I have the disk in the list, if so I can install the OS, all seems 
> fine, but after the second reboot the ext3 journal is missing and the kernel 
> panics after the third reboot, rootfs is gone.
> 
> 
> Are there any ideas? I'm out of.
> 
> Thanks
> Christian
> 
> Some kernel logging from domU, nothing inside dom0 log.
> 
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743295
> Aborting journal on device dm-0.
> ext3_abort called.
> EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
> Remounting filesystem read-only
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743296
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743297
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743298
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743299
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743300
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743301
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743302
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743303
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743304
> EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for 
> block 743305
> EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_truncate: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_orphan_del: Journal has aborted
> EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted
> __journal_remove_journal_head: freeing b_committed_data
> __journal_remove_journal_head: freeing b_committed_data
> __journal_remove_journal_head: freeing b_committed_data
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
WARNING - OLD ARCHIVES

xen-users

Re: [Xen-users] [XCP] ext3 crashes and slowdowns