On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:
> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:
> > On 02/24/2010 04:29 PM, Daniel Stodden wrote:
> > > On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:
> > >
> > >> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
> > >>
> > >>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
> > >>>
> > >>>
> > >>>> When rebooting the machine, I got this crash from blktap. The rip
> > >>>> maps to line 262 in
> > >>>> 0xffffffff812548a1 is in blktap_request_pool_free
> > >>>> (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> > >>>>
> > >>>>
> > >>> Uhm, where did that RIP come from?
> > >>>
> > >>> pool_free is on the module exit path. The stack trace below looks like a
> > >>> crash from the broadcasted SIGTERM before reboot.
> > >>>
> > >>>
> > >> Ignore it; I generated it from a different kernel from the one that
> > >> crashed. But the other oops I posted should be all consistent and
> > >> meaningful.
> > >>
> > > Ignore only the debuginfo quote, right?
> > > Cos this looks like a different issue to me.
> > >
> >
> > Perhaps. I got all the others on normal domain shutdown, but this one
> > was on machine reboot. I'll try to repro (as I boot the test kernel
> > with your patch in it).
>
> (gdb) list *(blktap_device_restart+0x7a)
> 0x2a73 is in blktap_device_restart
> (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
> 915 /* Re-enable calldowns. */
> 916 if (blk_queue_stopped(dev->gd->queue))
> 917 blk_start_queue(dev->gd->queue);
> 918
> 919 /* Kick things off immediately. */
> 920 blktap_device_do_request(dev->gd->queue);
> 921
> 922 spin_unlock_irq(&dev->lock);
> 923 }
> 924
>
> Assuming we've been dereferencing a NULL gendisk, i.e. device_destroy
> racing against device_restart.
>
> Would take
>
> * Tapdisk killed on the other thread, which goes through into
> a device_restart(). Which is what your stacktrace shows.
>
> * Device removal pending, blocking until
> device->users drops to 0, then doing the device_destroy().
> That might have happened during bdev .release.
>
> Both running at the same time sounds like what happens if you kill them
> all at once.
>
> That clearly takes another patch then.
Jeremy,
can you try out the attached patch for me?
This should close the above shutdown race as well.
Should be nowhere as frequent as the timer_sync crash fixed earlier.
Thanks,
Daniel
fix2.diff
Description: Text Data
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|