>>> On 12/7/2009 at 3:04 AM, in message
Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 06/12/2009 23:15, "Ky Srinivasan" <ksrinivasan@xxxxxxxxxx> wrote:
>>> This looks like a hack. I think you'd get most of the benefit by simply
>>> always calling net_tx_action_dealloc() at the top of net_tx_action() (i.e.,
>>> remove the dealloc_cons!=dealloc_prod conditional). I think the presence of
>>> that conditional is actually a bug, and removing it should get interface
>>> shutdowns to a reasonable time with no other changes.
>> That was my first impression as well and was the first patch we tested. It
>> does not work. Take the case where we have only one netif that netback is
>> handling that gets into this weird state, then we would never call
>> net_tx_action_dealloc() if the carrier is turned off on the netif that is
>> being cleaned up. This patch, fixes that problem by forcefully scheduling
>> netif whose carrier has been turned off.
> I think that this is simply another bug. There is an early 'return' in
> net_tx_action() which should actually goto the end of the function which
> checks the pending list and conditionally calls mod_timer(). Otherwise the
> pending list can get partially flushed but the timer doesn't get re-mod()ed.
I suspect you are referring to the conditional early return ( if
(mop==tx_map_ops)) . Anna Fischer of HP has a test environment where we can see
this problem most easily. I will give her a netback with these changes and
report back on the cleanup times.
>> In the interest of minimizing the
>> code changes, I went about making these changes one at a time, and the
>> combination of what I have in the patch appears to reduce the cleanup time
>> most. Even with these hacks, I am not particularly happy about the maximum
>> time we have seen the cleanup take, and for this duration the physical host
>> unusable for launching new guests. So, as I noted in my earlier email, we
>> still want to not tie up the xenbus thread for the duration of this cleanup.
> Perhaps we could decouple the whole thing from xenbus thread. In any case,
> the above bug fixes are to me preferable to the original patch, and will get
> the orders-of-magnitude low-hanging fruit (delay xenbus thread for maybe a
> second, rather than hours!).
> -- Keir
Xen-devel mailing list