Hi Steven,
Thanks for your careful review. Some explaination inline.
For your consideration of group and interrupt affinity, I will reply in another
thread.
Thanks,
Dongxiao
Steven Smith wrote:
>> I'd like to make an update on these patches. The main logic is not
>> changed, and I only did a rebase towards the upstream pv-ops kernel.
>> See attached patch. The original patches are checked in Jeremy's
>> netback-tasklet branch.
> I have a couple of (quite minor) comments on the patches:
>
> 0001-Netback-Generilize-static-global-variables-into-stru.txt:
>> diff --git a/drivers/xen/netback/netback.c
>> b/drivers/xen/netback/netback.c index c24debf..a484b0a 100644 ---
>> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c
>> @@ -49,18 +49,13 @@
>>
>> /*define NETBE_DEBUG_INTERRUPT*/
>>
>> +struct xen_netbk *xen_netbk;
>
>> +int group_nr = 1;
>> +struct page_foreign_tracker *foreign_page_tracker;
> I think these two would benefit from more descriptive names, given
> that they're not static.
Oops...I thought I had modified this when Jeremy commented this
last time, maybe there was some mistake and I left it until today...
I swear I will change this in next version of patchset. ;-)
>
>
>
> If I was feeling pedantic, I'd complain that this includes some bits
> of support for multiple struct xen_netbks, rather than just moving all
> of the fields around, which reduces its obviously-correct-ness quite a
> bit.
Actually I was struggling how to split the first patch into small ones,
however I don't have much idea since the patch changes the function
Interface/data structure, so the corresponding caller needs change too,
which generates a 1k line of patch...
>
> Even more pedantically, it might be better to pass around a struct
> xen_netbk in a few places, rather than an int group, so that you get
> better compiler type checking.
I will change this in next version of patch.
>
>
>
> 0002-Netback-Multiple-tasklets-support.txt:
> Design question: you do your balancing by making each tasklet serve
> roughly equal numbers of remote domains. Would it not have been
> easier to make them serve equal numbers of netfronts? You could then
> get rid of the domain_entry business completely and just have a count
> of serviced netfronts in each xen_netbk, which might be a bit easier
> to deal with.
According to my understanding, one guest with VNIF driver represented
by one netfront. Is this true? Therefore I don't understand the difference
between "number of domains" and "number of netfronts", I used to thought
they were the same. Please correct me my understanding is wrong.
Actually in the very early stage of this patch, I use a simple method to
identify which group does a netfront belong to, by calculating
(domid % online_cpu_nr()). The advantage is simple, however it may
cause netfront count unbalanced between different groups.
I will try to remove "domain_entry" related code in next version patch.
>
>> diff --git a/drivers/xen/netback/interface.c
>> b/drivers/xen/netback/interface.c index 21c1f95..bdf3c1d 100644 ---
>> a/drivers/xen/netback/interface.c +++
>> b/drivers/xen/netback/interface.c @@ -54,6 +54,59 @@
>> static unsigned long netbk_queue_length = 32;
>> module_param_named(queue_length, netbk_queue_length, ulong, 0644);
>>
>>
>> +static void remove_domain_from_list(struct xen_netbk *netbk,
>> + struct xen_netif *netif)
>> +{
>> + struct domain_entry *dom_entry = NULL;
>> + int group = netif->group;
>> +
>> + list_for_each_entry(dom_entry,
>> + &netbk[group].group_domain_list, dom) {
>> + if (dom_entry->domid == netif->domid)
>> + break;
>> + }
>> + if (!dom_entry)
>> + return;
> Can this ever happen? If so, you might have issues when several
> netfronts all end in the same frontend domain e.g.:
>
> -- netfront A arrives and is added to list
> -- netfront B arrives and is added to list
> -- netfront B is removed
> -- netfront B is removed again. It's no longer in the list,
> but A is, so A gets removed instead.
>
> The end result being that netback thinks the group is idle, but it
> actually has netfront A in it. I guess the worst that can happen is
> that you don't balance across tasklets properly, but it'd still be
> better avoided.
>
> If it *can't* happen, there should probably be some kind of warning
> when it does.
>
>> +
>> + spin_lock(&netbk[netif->group].group_domain_list_lock);
>> + netbk[netif->group].group_domain_nr--;
>> + list_del(&dom_entry->dom);
>> + spin_unlock(&netbk[netif->group].group_domain_list_lock);
>> + kfree(dom_entry); +}
>> +
>> static void __netif_up(struct xen_netif *netif)
>> {
>> enable_irq(netif->irq);
>>
>> @@ -70,6 +123,7 @@ static int net_open(struct net_device *dev) {
>> struct xen_netif *netif = netdev_priv(dev);
>> if (netback_carrier_ok(netif)) {
>> + add_domain_to_list(xen_netbk, group_nr, netif);
>> __netif_up(netif);
>> netif_start_queue(dev);
>> }
>> @@ -79,8 +133,10 @@ static int net_open(struct net_device *dev)
>> static int net_close(struct net_device *dev)
>> {
>> struct xen_netif *netif = netdev_priv(dev);
>> - if (netback_carrier_ok(netif))
>> + if (netback_carrier_ok(netif)) {
>> __netif_down(netif);
>> + remove_domain_from_list(xen_netbk, netif);
>> + }
>> netif_stop_queue(dev);
>> return 0;
>> }
> Okay, so if the interface gets down'd and then up'd it'll potentially
> move to a different group. How much testing has that situation had?
>
> I'd be tempted to add the interface to the list as soon as it's
> created and then leave it there until it's removed, and not rebalance
> during its lifetime at all, and hence avoid the issue.
>
>> @@ -1570,6 +1570,7 @@ static int __init netback_init(void)
>> /* We can increase reservation by this much in net_rx_action(). */
>> // balloon_update_driver_allowance(NET_RX_RING_SIZE);
>>
>> + group_nr = num_online_cpus();
>> xen_netbk = kzalloc(group_nr * sizeof(struct xen_netbk),
>> GFP_KERNEL); if (!xen_netbk) { printk(KERN_ALERT "%s: out of
>> memory\n", __func__);
> What happens if the number of online CPUs changes while netback is
> running? In particular, do we stop trying to send a tasklet/thread to
> a CPU which has been offlined?
The group_nr just defines the max number of tasklets, however it doesn't decide
which tasklet is handled by which CPU. It is decided by the delivery of
interrupt.
>
>
> 0003-Use-Kernel-thread-to-replace-the-tasklet.txt:
>> diff --git a/drivers/xen/netback/netback.c
>> b/drivers/xen/netback/netback.c index 773cd4f..8b55efc 100644 ---
>> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c
>> +static int netbk_action_thread(void *index)
>> +{
>> + int group = (long)index;
>> + struct xen_netbk *netbk = &xen_netbk[group];
>> + while (1) {
>> + wait_event_interruptible(netbk->netbk_action_wq,
>> + rx_work_todo(group) +
>> || tx_work_todo(group));
>> + cond_resched();
>> +
>> + if (rx_work_todo(group))
>> + net_rx_action(group);
>> +
>> + if (tx_work_todo(group))
>> + net_tx_action(group);
>> + }
> Hmm... You use kthread_stop() on this thread, so you should probably
> test kthread_should_stop() every so often.
OK, I will modify it.
>
>> +
>> + return 0;
>> +}
>> +
>> +
>> static int __init netback_init(void)
>> {
>> int i;
>
>
> Apart from that, it all looks fine to me.
>
> Steven.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|