On Thu, 2010-07-01 at 16:47 +0100, Ian Campbell wrote:
> On Thu, 2010-07-01 at 16:29 +0100, Jeremy Fitzhardinge wrote:
> > On 07/01/2010 04:48 PM, Ian Campbell wrote:
> > > On Mon, 2010-06-21 at 12:14 +0100, Jeremy Fitzhardinge wrote:
> > >
> > >> Subject: [PATCH] xen/netback: make sure all the group structures are
> > >> initialized before starting async code
> > >>
> > >> Split the netbk group array initialization into two phases: one to do
> > >> simple "can't fail" initialization of lists, timers, locks, etc; and a
> > >> second pass to allocate memory and start the async code.
> > >>
> > >> This makes sure that all the basic stuff is in place by the time the
> > >> async code
> > >> is running.
> > >>
> > > Paul noticed a crash in netback init because...
> > >
> > >
> > >> @@ -1764,16 +1766,6 @@ static int __init netback_init(void)
> > >> netbk->netbk_tx_pending_timer.function =
> > >> netbk_tx_pending_timeout;
> > >>
> > >> - netbk->mmap_pages =
> > >> - alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> > >> - if (!netbk->mmap_pages) {
> > >> - printk(KERN_ALERT "%s: out of memory\n",
> > >> __func__);
> > >> - del_timer(&netbk->netbk_tx_pending_timer);
> > >> - del_timer(&netbk->net_timer);
> > >> - rc = -ENOMEM;
> > >> - goto failed_init;
> > >> - }
> > >> -
> > >> for (i = 0; i < MAX_PENDING_REQS; i++) {
> > >> page = netbk->mmap_pages[i];
> > >> SetPageForeign(page, netif_page_release);
> > >>
> > > ...this dereference of netbk->mmap_pages[i]...
> > >
> > >
> > >> @@ -1786,6 +1778,26 @@ static int __init netback_init(void)
> > >>
> > > [...]
> > >
> > >> + netbk->mmap_pages =
> > >> + alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> > >>
> > > ... happens before this initialisation of the array.
> > >
> >
> > Hm, I hadn't meant to commit that properly. I had it locally and
> > accidentally pushed it out.
> >
> > I only did that patch as an RFC in response to an issue alluded to by
> > Dongxiao (or was it you?) about things not being fully initialized by
> > the time the async code starts. Is this a real issue, and if so, what's
> > the correct fix?
>
> I don't think there is an actual current issue, just a potential one
> since we are relying on data structures being zeroed rather than
> properly initialised to keep the async code from running off into the
> weeds, it just seemed a little fragile this way.
>
> Originally I said:
> >> The crash is in one of the calls to list_move_tail and I think it is
> >> because netbk->pending_inuse_head not being initialised until after
> >> the
> >> threads and/or tasklets are created (I was running in threaded mode).
> >> Perhaps even though we are now zeroing the netbk struct those fields
> >> should still be initialised before kicking off any potentially
> >> asynchronous tasks?
>
> this specific issue was fixed by zeroing the netbk array as it is
> allocated, I just thought we could make things more robust by not
> triggering the async code until everything was fully setup.
I suspect simply not letting the kthread off the leash until the end of
the loop may be sufficient. The tasklet case I think is safe until they
are explicitly tickled.
e.g. something like:
diff --git a/drivers/xen/netback/netback.c b/drivers/xen/netback/netback.c
index 29b60c8..36f3cc7 100644
--- a/drivers/xen/netback/netback.c
+++ b/drivers/xen/netback/netback.c
@@ -1794,7 +1794,6 @@ static int __init netback_init(void)
if (!IS_ERR(netbk->kthread.task)) {
kthread_bind(netbk->kthread.task, group);
- wake_up_process(netbk->kthread.task);
} else {
printk(KERN_ALERT
"kthread_run() fails at netback\n");
@@ -1820,6 +1819,9 @@ static int __init netback_init(void)
spin_lock_init(&netbk->net_schedule_list_lock);
atomic_set(&netbk->netfront_count, 0);
+
+ if (MODPARM_netback_kthread)
+ wake_up_process(netbk->kthread.task);
}
netbk_copy_skb_mode = NETBK_DONT_COPY_SKB;
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|