WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] ADs over dom0 iSCSI = high page_count()

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] ADs over dom0 iSCSI = high page_count()
From: Joshua Nicholas <jnicholas@xxxxxxxxxxxxxxx>
Date: Fri, 05 Dec 2008 11:51:16 -0500
Delivery-date: Fri, 05 Dec 2008 08:51:40 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.18 (X11/20081119)

I've come across a disturbing page ref count situation and need some advice.
This only happens very rarely, when writing through ADs to iSCSI storage.
(My guess is that this is probably during a tcp fragmented retransmit.)

Novell SLES10sp2 kernel :: Xen 3.2, but all of the blkback and netback
code is the same as unstable.

1.  blkback :: maps the foreign page :: page_count() == 1
2.  blkback :: submits a bio with this foreign page
3.  iscsi_tcp :: makes a tcp request with this foreign page
4.  tcp :: gets twice, page_count() == 3
5.  tcp :: puts once, page_count() == 2
6.  tcp :: gets twice, page_count() == 4
7.  __gnttab_dma_map_page(), sets page_mapcount() == 1
8.  tcp :: puts twice, page_count() == 2
9.  tcp :: done, but page_count() == 2, not 1
10. iscsi_tcp :: done bio completes
11. blkback :: __end_block_io_op() call fast_flush_area()
       page state:  page_count() == 2, page_mapcount() == 1

BUT:    page_count() should be 1 and page_mapcount() should be 0
       Perhaps these two counts are related, but I'm wondering if these
       might be two separate issues.  However, in all of my reproductions
       of this issue, if __gnttab_dma_map_page() gets called, then it is
       the case where the page_count() is high.

QUESTION 1:  Is having the page_count() be high after leaving the tcp layer
   when the packets are fragmented, a known unsolved problem?

Looking at netback.c I see the comment in the read path:

   net_rx_action()
       /* We can't rely on skb_release_data to release the
          pages used by fragments for us, since it tries to
          touch the pages in the fraglist.  If we're in
          flipping mode, that doesn't work.  In copying mode,
          we still have access to all of the pages, and so
          it's safe to let release_data deal with it. */
       /* (Freeing the fragments is safe since we copy
          non-linear skbs destined for flipping interfaces) */

Also in netback.c in net_tx_action_dealloc() after make_tx_response() I see:

   /* Ready for next use. */
   gnttab_reset_grant_page()

Sure this resets the page_mapcount() back to 0, but it also sets the page_count() to 1 regardless of the current value.

QUESTION 2:  Why does the page_count() have to be set to 1?

QUESTION 3:  If the page_count() is known to be high after leaving the
   tcp layer by only 1 ( ie. page_count() == 2 instead of being 1 ),
   then wouldn't a atomic_cmpxchg() be safer or can the count be even
   higher?

I can add a call to gnttab_reset_grant_page() in blkback.  However, we
have found legitimate cases where the page_count() is 2, such as when
dhcpd is sniffing for a release_renew while there are IOs in progress.
Thus I'd like more understanding before setting the page_count().

Thank you,

Joshua

PS: Below is a more detailed walk through the get_page, put_page calls,
   which were made resulting in the page_count() being high.

PSS:The thread originally discussing dhcpd SEGV when dhcpd is loses
   the race to when blkback unmaps the page from dom0 is:

Problem with PV disk and iSCSI
http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00330.html

================================================================
================================================================
================================================================

blkback maps the foreign page
   page_count() == 1

GetPage_Trace [ffff8800087ba6c0] (1) G 1 0
   | 562 /srcTrees/na_main/nex.bk/linux/net/ipv4/tcp.c
       do_tcp_sendpages()
           !can_coalesce

GetPage_Trace [ffff8800087ba6c0] (2) G 2 0
   | 1576 /srcTrees/na_main/nex.bk/linux/net/core/skbuff.c
       skb_split_no_header()
           pos < len
           /* Split frag.
            * We have two variants in this case:
            * 1. Move all the frag to the second
            *    part, if it is possible. F.e.
            *    this approach is mandatory for TUX,
            *    where splitting is expensive.
            * 2. Split is accurately. We make this.
            */
   | 1134 /srcTrees/na_main/nex.bk/linux/net/ipv4/tcp_output.c
       tcp_write_xmit()
           calls tso_fragment()
           which eventually calls skb_split_no_header()

PutPage_Trace [ffff8800087ba6c0] (3) P 3 0
   | 281 /srcTrees/na_main/nex.bk/linux/net/core/skbuff.c
       skb_release_data()
           for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
   | 462 /srcTrees/na_main/nex.bk/linux/include/net/sock.h
       sk_stream_free_skb()
           calls __kfree_skb()
           which ecventually calls skb_release_data()

??? second put_page() seems to be missing ???

================ ??? retransmit maybe ??? ================

GetPage_Trace [ffff8800087ba6c0] (4) G 2 0
   | 562 /srcTrees/na_main/nex.bk/linux/net/ipv4/tcp.c
       do_tcp_sendpages()
           !can_coalesce

GetPage_Trace [ffff8800087ba6c0] (5) G 3 0
   | 1576 /srcTrees/na_main/nex.bk/linux/net/core/skbuff.c
       skb_split_no_header()
           pos < len
           /* Split frag.
            * We have two variants in this case:
            * 1. Move all the frag to the second
            *    part, if it is possible. F.e.
            *    this approach is mandatory for TUX,
            *    where splitting is expensive.
            * 2. Split is accurately. We make this.
            */
   | 1134 /srcTrees/na_main/nex.bk/linux/net/ipv4/tcp_output.c
       tcp_write_xmit()
           calls tso_fragment()
           which eventually calls skb_split_no_header()

dma_map_single()
   swiotlb_map_single()
       gnttab_dma_map_page()
           __gnttab_dma_map_page()
               In: drivers/xen/core/gnttab.c

               page->_mapcount gets set
               (Not an increment, but like a flag)

Sometimes this gets called multiple times for this same page

PutPage_Trace [ffff8800087ba6c0] (6) P 4 1
   | 281 /srcTrees/na_main/nex.bk/linux/net/core/skbuff.c
       skb_release_data()
           for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
   | 462 /srcTrees/na_main/nex.bk/linux/include/net/sock.h
       sk_stream_free_skb()
           calls __kfree_skb()
           which ecventually calls skb_release_data()

PutPage_Trace [ffff8800087ba6c0] (7) P 3 1
   | 281 /srcTrees/na_main/nex.bk/linux/net/core/skbuff.c
       skb_release_data()
           for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
   | 462 /srcTrees/na_main/nex.bk/linux/include/net/sock.h
       sk_stream_free_skb()
           calls __kfree_skb()
           which ecventually calls skb_release_data()

================================================================

Joshua Nicolas
Virtual Iron Software, Inc.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>