Hi:
I finally captured extents overlaped in the ext4. But still wondering how
it happen.
I checked overlap for the last extent in the tree at the very beginning of
ext4_ext_convert_to_initialized. Messages.12 attached show the overlap found.
Line 8-10: 3467:[1]15:57921642 3468:[0]14:57921643 has overlaped.
8 Sep 15 08:27:39 xmao kernel: 3331:[0]7:53750025 3338:[0]8:53750033
3346:[0]1:53848953 3347:[0]7:53848955 3354:[0]1:53848969 3355:[0]7:53848971
3362:[0]1:53848985 3363:[0]7:56996848 3370:[0]1:57606144 3371:[0]7:57795290
3378:[0]1:57814407 3379:[0]7:57858606 3386:[0]8:57858620 3394:[0]1:57858629
3395:[0]8:57858637 3403:[0]7:57858646 3410:[0]1:57858661 3411:[0]8:57858669
3419:[0]7:57858678 3426:[0]8:57858692 3434:[0]1:57858701 3435:[0]7:57858709
3442:[0]1:57858717 3443:[0]7:57858725 3450:[0]1:57858733 3451:[0]7:57858741
3458:[0]1:57858749 3459:[0]7:57858757 3466:[0]1:57921634 3467:[1]15:57921642
9 Sep 15 08:27:39 xmao kernel: Displaying leaf extents for inode 12339004
10 Sep 15 08:27:39 xmao kernel: 3468:[0]14:57921643 3482:[0]1:57921664
3483:[0]7:57921666 3490:[0]1:57921680 3491:[0]8:57921682 3499:[0]7:57921691
3506:[0]8:57921705 3514:[0]1:57921714 3515:[0]7:57921722 3522:[0]41:57916683
3563:[0]7:58159767 3570:[0]1:58159781 3571:[0]7:58238992 3578:[0]1:58288144
3579:[0]7:58327750 3586:[0]1:58579969 3587:[0]7:58954838 3594:[0]1:59006641
3595:[0]7:59006643 3602:[0]1:59006657 3603:[0]7:59006659 3610:[0]8:59006673
3618:[0]8:59006688 3626:[0]470:58982658 4096:[0]3:58987732 4099:[0]1:58992655
4100:[0]7:59143253 4107:[0]1:59171840 4108:[0]7:59183878 4115:[0]1:59192886
4116:[0]8:59593463 4124:[0]8:59669484 4132:[0]7:73086538 4139:[0]1:73352801
4140:[0]7:73339273 4147:[0]1:73526280 4148:[0]8:78229012 4156:[0]1:78229021
4157:[0]7:78818388 4164:[0]1:79069383 4165:[0]7:79428616 4172:[0]1:80490925
4173:[0]7:81439488 4180:[0]1:82854062 4181:[0]7:83462272 4188:[0]1:83656904
4189:[0]7:89127381 4196:[0]1:89584313 4197:[0]8:91592930 4205:[0]7:91592945
4212:[0]1:91592953 4213:[0]7:91592961 422
I also dumped file in disk use filefrag which show no overlap, no extent
3468:[0]14:57921643.
ext logical physical expected length flags
....
337 3459 57858757 57858749 7
338 3466 57921634 57858763 1 unwritten
339 3467 57921642 57921634 15 unwritten
340 3482 57921664 57921656 1
341 3483 57921666 57921664 7
.....
There is one assumption, After 3468:[0]14:57921643 successfully inserted,
there is something err happen.
At the bottom of ext4_ext_convert_to_initialized, fix_extent_len will fix the
origin ex ee_len.(Later I will do the err check)
3403 fix_extent_len:
3404 ex->ee_block = orig_ex.ee_block;
3405 ex->ee_len = orig_ex.ee_len;
3406 ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
3407 ext4_ext_mark_uninitialized(ex);
3408 ext4_ext_dirty(handle, inode, path + depth);
Any comments?
Well, but something strange messages.12.
message.12 is from another machine, it log is printf right before
BUG_ON(newext->ee_block == nearex->ee_block);
strange is 14412:[1]16:9927's pblock is much different from
14411:[0]1:222332613.
1993 if(newext->ee_block == nearex->ee_block){
1994 len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct
ext4_extent);
1995 len = len < 0 ? 0 : len;
1996 printk("old_depth %d depth %d old_path %p path %p
next_has_free %d next %llu\n",
1997 old_depth, depth, old_path, path, next_has_free,
(unsigned long long)next);
2004
2005 printk("insert %d:%llu:[%d]%d before: nearest 0x%p, "
2006 "move %d from 0x%p to 0x%p\n",
2007 le32_to_cpu(newext->ee_block),
2008 ext_pblock(newext),
2009 ext4_ext_is_uninitialized(newext),
2010 ext4_ext_get_actual_len(newext),
2011 nearex, len, nearex + 1, nearex + 2);
2012 ext4_ext_show_leaf_xmao(inode, old_path);
2013 ext4_ext_show_leaf_xmao(inode, path);
2014 };
2015 BUG_ON(newext->ee_block == nearex->ee_block);
Sep 13 16:16:35 xmao kernel: 57:[0]31:157254721 12288:[0]54:157503830
12342:[0]10:157503884 12352:[0]5:157534763 12357:[0]1:157534768
12358:[0]58:157534769 12416:[0]64:157567168 12480:[0]13:158051261
12493:[0]73:172263095 12566:[0]24:172265399 12590:[0]71:172521859
12661:[0]71:172627897 12732:[0]71:172733735 12803:[0]69:172722619
12872:[0]9:172764859 12881:[0]42:110500028 12923:[0]86:143030061
13009:[0]86:143119859 13095:[0]48:143173376 13143:[0]16:195333586
13159:[0]32:197526105 13191:[0]40:198875861 13231:[0]39:198872300
13270:[0]5:199663576 13275:[0]26:200964192 13301:[0]36:202015708
13337:[0]47:202221682 13384:[0]9:202221729 13393:[0]58:202624966
13451:[0]12:202606535 13463:[0]35:212117725 13498:[0]35:212135811
13533:[0]34:212115513 13567:[0]32:212108608 13599:[0]29:212144185
13628:[0]50:231280420 13678:[0]38:231645389 13716:[0]13:231645427
13729:[0]51:231650765 13780:[0]50:231647658 13830:[0]54:231985340
13884:[0]24:231981259 13908:[0]64:105098731 13972:[0]87:136696745
14059:[0]45:136700237 14104:[0]61:2
Sep 13 16:16:35 xmao kernel: 3651 14165:[0]69:222042299 14234:[0]68:222044092
14302:[0]34:222091761 14336:[0]68:222172860 14404:[0]7:222332606
14411:[0]1:222332613
Sep 13 16:16:35 xmao kernel: Displaying leaf extents for inode 30685060
Sep 13 16:16:35 xmao kernel: 14412:[1]16:9927 14428:[1]41:13213
14469:[1]1:13254 14470:[0]67:222673085
Also, filefrag show extents is ok.
336 14302 222091761 222044159 34
337 14336 222172860 222091794 68
338 14404 222332606 222172927 7
339 14411 222332613 59 unwritten
340 14470 222673085 222332671 67
341 14537 222848155 222673151 43
342 14580 165617358 222848197 56
343 14636 165777353 165617413 55
344 14691 165961927 165777407 57
seems 14412:[1]16:9927 14428:[1]41:13213 14469:[1]1:13254 is unexpected.
Many thanks.
----------------------------------------
> From: tinnycloud@xxxxxxxxxxx
> To: jeremy@xxxxxxxx
> CC: konrad.wilk@xxxxxxxxxx; linux-ext4@xxxxxxxxxxxxxxx;
> xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
> Date: Wed, 7 Sep 2011 10:35:21 +0800
>
>
>
>
> ----------------------------------------
> > Date: Tue, 6 Sep 2011 11:55:02 -0700
> > From: jeremy@xxxxxxxx
> > To: tinnycloud@xxxxxxxxxxx
> > CC: konrad.wilk@xxxxxxxxxx; linux-ext4@xxxxxxxxxxxxxxx;
> > xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
> >
> > On 09/06/2011 08:11 AM, MaoXiaoyun wrote:
> > >
> > > > Date: Tue, 6 Sep 2011 10:53:47 -0400
> > > > From: konrad.wilk@xxxxxxxxxx
> > > > To: tinnycloud@xxxxxxxxxxx
> > > > CC: linux-ext4@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx;
> > > jeremy@xxxxxxxx
> > > > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36
> > > >
> > > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:
> > > > >
> > > > >
> > > > > Hi:
> > > > >
> > > > > I've met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack
> > > below)
> > > >
> > > > Did you try the 3.0 kernel?
> > > No, I am afried the change would be to much for our current env.
> > > May result in other stable issue.
> > > So, I want to dig out what really happen. Hopes.
> >
> > Another question is whether this is a regression compared to earlier
> > versions of 2.6.32? Do you know if this problem exists in a non-Xen
> > environment?
> >
>
> There are some others reports this issue in non-xen env.
> http://markmail.org/message/ywr4nfgiuvgdcr7y
> http://www.spinics.net/lists/linux-ext4/msg21066.html
>
> The difficulty is I haven't find a efficient way to reproduce it.
> (Currently it only show in our cluster, redeploy our cluster may cost 3days
> more. )
>
>
> > Thanks,
> > J
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
messages.12
Description: Binary data
messages.15
Description: Binary data
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|