Re: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for

On Thu, 2010-07-01 at 16:47 +0100, Ian Campbell wrote:
> On Thu, 2010-07-01 at 16:29 +0100, Jeremy Fitzhardinge wrote:
> > On 07/01/2010 04:48 PM, Ian Campbell wrote:
> > > On Mon, 2010-06-21 at 12:14 +0100, Jeremy Fitzhardinge wrote:
> > >   
> > >> Subject: [PATCH] xen/netback: make sure all the group structures are 
> > >> initialized before starting async code
> > >>
> > >> Split the netbk group array initialization into two phases: one to do
> > >> simple "can't fail" initialization of lists, timers, locks, etc; and a
> > >> second pass to allocate memory and start the async code.
> > >>
> > >> This makes sure that all the basic stuff is in place by the time the 
> > >> async code
> > >> is running.
> > >>     
> > > Paul noticed a crash in netback init because...
> > >
> > >   
> > >> @@ -1764,16 +1766,6 @@ static int __init netback_init(void)
> > >>                  netbk->netbk_tx_pending_timer.function =
> > >>                          netbk_tx_pending_timeout;
> > >>  
> > >> -                netbk->mmap_pages =
> > >> -                        alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> > >> -                if (!netbk->mmap_pages) {
> > >> -                        printk(KERN_ALERT "%s: out of memory\n", 
> > >> __func__);
> > >> -                        del_timer(&netbk->netbk_tx_pending_timer);
> > >> -                        del_timer(&netbk->net_timer);
> > >> -                        rc = -ENOMEM;
> > >> -                        goto failed_init;
> > >> -                }
> > >> -
> > >>                  for (i = 0; i < MAX_PENDING_REQS; i++) {
> > >>                          page = netbk->mmap_pages[i];
> > >>                          SetPageForeign(page, netif_page_release);
> > >>     
> > > ...this dereference of netbk->mmap_pages[i]...
> > >
> > >   
> > >> @@ -1786,6 +1778,26 @@ static int __init netback_init(void)
> > >>     
> > > [...]
> > >   
> > >> +                netbk->mmap_pages =
> > >> +                        alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> > >>     
> > > ... happens before this initialisation of the array.
> > >   
> > 
> > Hm, I hadn't meant to commit that properly.  I had it locally and
> > accidentally pushed it out.
> > 
> > I only did that patch as an RFC in response to an issue alluded to by
> > Dongxiao (or was it you?) about things not being fully initialized by
> > the time the async code starts.  Is this a real issue, and if so, what's
> > the correct fix?
> 
> I don't think there is an actual current issue, just a potential one
> since we are relying on data structures being zeroed rather than
> properly initialised to keep the async code from running off into the
> weeds, it just seemed a little fragile this way.
> 
> Originally I said:
> >> The crash is in one of the calls to list_move_tail and I think it is
> >> because netbk->pending_inuse_head not being initialised until after
> >> the 
> >> threads and/or tasklets are created (I was running in threaded mode).
> >> Perhaps even though we are now zeroing the netbk struct those fields
> >> should still be initialised before kicking off any potentially
> >> asynchronous tasks?
> 
> this specific issue was fixed by zeroing the netbk array as it is
> allocated, I just thought we could make things more robust by not
> triggering the async code until everything was fully setup.

I suspect simply not letting the kthread off the leash until the end of
the loop may be sufficient. The tasklet case I think is safe until they
are explicitly tickled.

e.g. something like:

diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c
index 29b60c8..36f3cc7 100644
--- a/drivers/xen/netback/netback.c
+++ b/drivers/xen/netback/netback.c
@@ -1794,7 +1794,6 @@ static int __init netback_init(void)
 
                        if (!IS_ERR(netbk->kthread.task)) {
                                kthread_bind(netbk->kthread.task, group);
-                               wake_up_process(netbk->kthread.task);
                        } else {
                                printk(KERN_ALERT
                                        "kthread_run() fails at netback\n");
@@ -1820,6 +1819,9 @@ static int __init netback_init(void)
                spin_lock_init(&netbk->net_schedule_list_lock);
 
                atomic_set(&netbk->netfront_count, 0);
+
+               if (MODPARM_netback_kthread)
+                       wake_up_process(netbk->kthread.task);
        }
 
        netbk_copy_skb_mode = NETBK_DONT_COPY_SKB;





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback