| 
>>> On 12/6/2009 at  4:44 PM, in message 
>>> <C741D648.3B7A%keir.fraser@xxxxxxxxxxxxx>,
Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote: 
> This looks like a hack. I think you'd get most of the benefit by simply
> always calling net_tx_action_dealloc() at the top of net_tx_action() (i.e.,
> remove the dealloc_cons!=dealloc_prod conditional). I think the presence of
> that conditional is actually a bug, and removing it should get interface
> shutdowns to a reasonable time with no other changes.
That was my first impression as well and was the first patch we tested. It does 
not work. Take the case where we have only one netif that netback is handling 
that gets into this weird state, then we would never call 
net_tx_action_dealloc() if the carrier is turned off on the netif that is being 
cleaned up. This patch, fixes that problem by forcefully scheduling the netif 
whose carrier has been turned off. In the interest of minimizing the code 
changes, I went about making these changes one at a time, and the combination 
of what I have in the patch appears to reduce the cleanup time the most. Even 
with these hacks, I am not particularly happy about the maximum time we have 
seen the cleanup take, and for this duration the physical host is unusable for 
launching new guests. So, as I noted in my earlier email, we may still want to 
not tie up the xenbus thread for the duration of this cleanup.
Regards,
K. Y
> 
>  -- Keir
> 
> On 06/12/2009 17:18, "Ky Srinivasan" <ksrinivasan@xxxxxxxxxx> wrote:
> 
>> We have seen cases where,  when a guest is shutdown during heavy network
>> traffic, the backend vif hangs around for considerable amounts of time 
> holding
>> up the xenbus thread for the duration of the cleanup. We have seen cases 
> (HP,
>> Novell) where it has taken hours to cleanup the interface and for that
>> duration we cannot launch new guests. We have traced this problem to the 
> fact
>> that the front-end can shutdown with transmit resources still in transit on
>> the back-end side. On the back-end side, the moment the carrier is turned off
>> on the interface, the interface is not eligible for further scheduling in 
> the
>> backend. Since turning off the carrier happens completely asynchronously 
> with
>> the rest of the packet processing in the back-end (via tasklets today), we 
> can
>> have situations where the vif cannot be cleaned up for a significant amount 
> of
>> time.
>> 
>> I am attaching a patch for addressing this issue. I have tried to fix the
>> problem with minimal changes to the current code. The main idea is to 
> cleanup
>> the guest transmit resources quickly and aggressively. With this patch, the
>> cleanup time has been dramatically reduced. Even in this patch, the xenbus
>> thread is held up for the duration of the cleanup - however we attempt to
>> cleanup aggressively. This is in comparison to the current code where the
>> xenbus thread just waits for the refcnt  on the interface to drop to zero. 
> An
>> obvious improvement would be to not tie up the xenbus thread for the 
> duration
>> of cleaning up the interface. If there is interest, I can submit a patch for
>> doing this. This patch has been tested on 2.6.16 (SLES10 sp2, sp3 kernels) 
> as
>> well as 2.6.27 (sles11) kernel. This  patch is however against the
>> 2.6.18-xen.hg sources.
>> 
>> Signed-off-by: K. Y. Srinivasan <ksrinivasan@xxxxxxxxxx>
>> 
>> 
>> 
>> Regards,
>> 
>> K. Y
>> 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 |