On 07/21/2010 03:26 PM, Daniel Stodden wrote:
> On Wed, 2010-07-21 at 17:39 -0400, Jeremy Fitzhardinge wrote:
>
>> On 07/21/2010 02:12 PM, Daniel Stodden wrote:
>>
>>> On Wed, 2010-07-21 at 15:49 -0400, Jeremy Fitzhardinge wrote:
>>>
>>>
>>>> When I was preparing the latest set of blkfront patches to send upstream
>>>> to Jens Axboe, he pointed out there were conflicts with what he
>>>> currently has queued.
>>>>
>>>> It turns out the conflict was from pushing the BKL (lock/unlock_kernel)
>>>> into the open and release functions. I did the merge keeping them
>>>> around all the new stuff you added to those functions, but I wonder if
>>>> its actually necessary. Do we rely on open/release being globally
>>>> serialized in there?
>>>>
>>>> I've pushed what I have into the upstream/blkfront branch in xen.git.
>>>>
>>>>
>>> Whatever it was, a BLK presumably fixed it.
>>>
>>>
>> There's an ongoing project to remove the BKL; part of that is to remove
>> implicit use of the BKL from the core kernel and push uses down to
>> drivers which need it. That basically means mechanically adding
>> lock_kernel/unlock_kernel pairs to driver functions as they're removed
>> from the core kernel. blkfront got hit with that at some point (haven't
>> identified precisely where), so we have the option to remove those
>> lock/unlocks if they're not necessary.
>>
> Ah, understood. But for a while I used to dig through bdev open/release
> stuff quite regularly. I never was aware of any potential big lock on
> that path to push down.
>
> Do you think it's worth to look into the lock usage now removed in more
> detail? Would take a hint regarding which code was affected.
>
>
>>> Anyway, it should not be necessary any more.
>>>
>>> Next I made the common mistake of looking into my code again, and
>>> immediately found I would send an unlucky opener transiently spinning in
>>> blkdev_get. Sometimes I wonder what I'm thinking.
>>>
>>>
>> What's the effect of that?
>>
> Uhm, false alarm. Better just ignore me.
>
> I stuck that ERESTARTSYS stuff into blkif_open. It's not strictly
> needed, but looked like a good idea. I picked it up from the device
> mapper code.
>
> It's telling the block layer to retry if it managed to get a ref on a
> disk which just happened to be del_gendisk'd on a different thread. So
> it will retry and either succeed with a new disk, or fail right on the
> next attempt.
>
> I would have bet there would be a case where blkfront clears
> disk->private_data before removing the disk. But apparently not.
> Otherwise a thread hitting that phase would peg the CPU until the
> removal succeeded elsewhere.
>
> Sorry for the noise.
>
>
>>> Hold on for a patch to both.
>>>
> So rather not. As far as I can tell, feel free to just drop the bkl
> code.
>
>
>> Thanks. BTW, did you get a change to look into those barrier comments?
>>
> Nope, sorry. In fact I dropped that one. Getting more urgent?
>
Well, it would be nice to address it for the next merge window, if
there's something to address.
There's also a mysterious IO-related hang I've been seeing moderately
frequently. Something will end up waiting on IO while holding a lock,
and eventually the system grinds to a halt as things want to do block IO.
What appears to be happening is that an IO request is getting lost
somewhere, I think, either because blkfront is dropping it or perhaps
blkback/tap (but this has been observed on 2.6.18-based systems too, so
I suspect its a frontend issue. Perhaps an IO really getting lost, or
perhaps its just an event; I'm not sure. Or maybe there's something
screwy with barriers.
Tom Kopek and Alexander McKinney seem to be able to reproduce it
reliably, and have a blocktrace of a hang showing a bunch of outstanding IO.
Anyway, I'd been meaning to mention it to you before...
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|