On Thu, 2010-07-01 at 16:29 +0100, Jeremy Fitzhardinge wrote:
> On 07/01/2010 04:48 PM, Ian Campbell wrote:
> > On Mon, 2010-06-21 at 12:14 +0100, Jeremy Fitzhardinge wrote:
> >
> >> Subject: [PATCH] xen/netback: make sure all the group structures are
> >> initialized before starting async code
> >>
> >> Split the netbk group array initialization into two phases: one to do
> >> simple "can't fail" initialization of lists, timers, locks, etc; and a
> >> second pass to allocate memory and start the async code.
> >>
> >> This makes sure that all the basic stuff is in place by the time the async
> >> code
> >> is running.
> >>
> > Paul noticed a crash in netback init because...
> >
> >
> >> @@ -1764,16 +1766,6 @@ static int __init netback_init(void)
> >> netbk->netbk_tx_pending_timer.function =
> >> netbk_tx_pending_timeout;
> >>
> >> - netbk->mmap_pages =
> >> - alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> >> - if (!netbk->mmap_pages) {
> >> - printk(KERN_ALERT "%s: out of memory\n", __func__);
> >> - del_timer(&netbk->netbk_tx_pending_timer);
> >> - del_timer(&netbk->net_timer);
> >> - rc = -ENOMEM;
> >> - goto failed_init;
> >> - }
> >> -
> >> for (i = 0; i < MAX_PENDING_REQS; i++) {
> >> page = netbk->mmap_pages[i];
> >> SetPageForeign(page, netif_page_release);
> >>
> > ...this dereference of netbk->mmap_pages[i]...
> >
> >
> >> @@ -1786,6 +1778,26 @@ static int __init netback_init(void)
> >>
> > [...]
> >
> >> + netbk->mmap_pages =
> >> + alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> >>
> > ... happens before this initialisation of the array.
> >
>
> Hm, I hadn't meant to commit that properly. I had it locally and
> accidentally pushed it out.
>
> I only did that patch as an RFC in response to an issue alluded to by
> Dongxiao (or was it you?) about things not being fully initialized by
> the time the async code starts. Is this a real issue, and if so, what's
> the correct fix?
I don't think there is an actual current issue, just a potential one
since we are relying on data structures being zeroed rather than
properly initialised to keep the async code from running off into the
weeds, it just seemed a little fragile this way.
Originally I said:
>> The crash is in one of the calls to list_move_tail and I think it is
>> because netbk->pending_inuse_head not being initialised until after
>> the
>> threads and/or tasklets are created (I was running in threaded mode).
>> Perhaps even though we are now zeroing the netbk struct those fields
>> should still be initialised before kicking off any potentially
>> asynchronous tasks?
this specific issue was fixed by zeroing the netbk array as it is
allocated, I just thought we could make things more robust by not
triggering the async code until everything was fully setup.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|