[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Unmatched decrementing of net device reference count

On Wed, Dec 20, 2006 at 10:58:22AM +1100, Herbert Xu wrote:
> Glauber de Oliveira Costa <gcosta@xxxxxxxxxx> wrote:
> > 
> > This bug was found when heavy stressing the netfront 
> > attach/detach mechanism with the following script:
> > 
> >   for i in $(seq 200); 
> >   do 
> >     xm network-attach <domid>;  
> >     xm network-detach <domid> $i;
> >   done
> > 
> > Guest kernel shows the following messages:
> > 
> > unregister_netdevice: waiting for eth1 to become free. Usage count = -1
> > 
> > After this patch, it ran okay in multiple iterations
> Could you please use in-line patches? It's much easier to comment on.
It is. I could swear I inlined it, but maybe I forgot.
> Your patch description doesn't make sense.  unregister_netdev()
> cannot possibly cause the device to be freed.  Otherwise the
> subsequent free_netdev() call which you kept would be wrong.

In fact. I read it again, and it was confusing (I myself was confused).

I'll try to rephrase: ( I digged more, cleared things up, and it'll be
more precise now)

unregister_netdev() works as a barrier in this case. The call to
netif_disconnect_backend() introduces a new carrier watch, which hold()s a
reference to be put()'d in a future time. If we call free right after that, 
it might be the case that put() is called after free. Nothing in this
case prevents this memory region to have been allocated again to another

unregister_netdev() holds the rntl lock. It means that when the lock is
released, netdev_run_todo() (which is setup by unregister_netdev()
itself, with net_set_todo() ), will call netdev_wait_allrefs(), which 
takes care of the linkwatch_runqueue. Calling unregister_netdev()
between the carrier watch and free_netdev() guarantees that the device
will be only free'd when the watches were already handled.

There would most probably be other ways to guarantee that, such as,
calling linkwatch_runqueue() directly. But I think that we lose nothing
by calling unregister_netdev() in the middle, and gain serialization for

> So most likely what's happening is that free_netdev() is occuring
> without a preceding unregister_netdev(), which implies that there
> is a bug in the frontend state transition.

It is not the case, see above.
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.