RE: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for

To:	Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Subject:	RE: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback multiple threads patchset
From:	"Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>
Date:	Thu, 17 Jun 2010 16:16:27 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "djmagee@xxxxxxxxxxxx" <djmagee@xxxxxxxxxxxx>, Fantu <fantonifabio@xxxxxxxxxx>
Delivery-date:	Thu, 17 Jun 2010 01:17:58 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<1276248930.19091.2870.camel@xxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<D5AB6E638E5A3E4B8F4406B113A5A19A1F205536@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <1276248930.19091.2870.camel@xxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcsJSXfaaTruyhj4Sqa0ftXTMGHO4QEjq1LQ
Thread-topic:	[Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback multiple threads patchset

Ian,

Sorry for the late response, I was on vacation days before.

Ian Campbell wrote:
> On Thu, 2010-06-10 at 12:48 +0100, Xu, Dongxiao wrote:
>> Hi Jeremy,
>> 
>> The attached patch should fix the PV network issue after applying
>> the netback multiple threads patchset. 
> 
> Thanks for this Donxiao. Do you think this crash was a potential
> symptom 
> of this issue? It does seem to go away if I apply your patch.

Actually, the phenomenon is the same on my side without the fixing patch.

>         BUG: unable to handle kernel paging request at 70000027
>         IP: [<c0294867>] make_tx_response+0x17/0xd0
>         *pdpt = 0000000000000000
>         Oops: 0000 [#2] SMP
>         last sysfs file:
>         Modules linked in:
>         Supported: Yes
> 
>         Pid: 1083, comm: netback/0 Tainted: G      D  
>         (2.6.27.45-0.1.1-x86_32p-xen #222) EIP: 0061:[<c0294867>]
>         EFLAGS: 00010296 CPU: 0 EIP is at make_tx_response+0x17/0xd0
>         EAX: 6fffffff EBX: 00000000 ECX: 00000000 EDX: f00610a4
>         ESI: 6fffffff EDI: f00620a4 EBP: ed0c3f18 ESP: ed0c3f0c
>          DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: e021
>         Process netback/0 (pid: 1083, ti=ed0c2000 task=ee9de070
>         task.ti=ed0c2000) Stack: 00000000 00000000 f00620a4 ed0c3fa8
>                c029676a c0456000 ee9de070 ed0c3fd0 ed0c3f94 00000002
>                ed0c3fb8 f0062ca4 f0061000 6fffffff 011d9000 f00620a4
>         f006108c ed0c3f5c c04ffb00 c04ffb00 ed0c3fc0 ed0c3fbc
>          ed0c3fb8 ed0c2000 Call Trace: [<c029676a>] ?
>          net_tx_action+0x32a/0xa50 [<c0296f62>] ?
>          netbk_action_thread+0x62/0x190 [<c0296f00>] ?
>          netbk_action_thread+0x0/0x190 [<c013f84c>] ?
>          kthread+0x3c/0x70 [<c013f810>] ? kthread+0x0/0x70
>          [<c0105633>] ? kernel_thread_helper+0x7/0x10
>          =======================
>         Code: ec 8d 41 01 89 47 2c c7 45 e4 ea ff ff ff eb dd 8d 74
>         26 00 55 66 0f be c9 89 e5 83 ec 0c 89 74 24 04 89 c6 89 1c
>         24 89 7c 24 08 <8b> 78 28 8b 40 30 0f b7 5a 08 83 e8 01 21 f8
> 8d 04 40 c1 e0 02 EIP: [<c0294867>] make_tx_response+0x17/0xd0 SS:ESP
> e021:ed0c3f0c ---[ end trace f7e370bf10f6f981 ]---  
> 
> The crash is in one of the calls to list_move_tail and I think it is
> because netbk->pending_inuse_head not being initialised until after
> the 
> threads and/or tasklets are created (I was running in threaded mode).
> Perhaps even though we are now zeroing the netbk struct those fields
> should still be initialised before kicking off any potentially
> asynchronous tasks?

You are right, I will commit another patch to fix it.

> 
> I didn't even start any guests so I think we only got to the reference
> to pending_inuse_head because tx_work_todo can return a false positive
> if netbk is not properly zeroed and therefore we can call
> net_tx_action 
> before we are ready.
> 
> On an unrelated note, do you have any plans to make the number of
> groups 
> react dynamically to CPU hotplug? Not necessarily while there are
> actually active VIFs (might be tricky to get right) but perhaps only
> when netback is idle (i.e. when there are no VIFs configured), since
> often the dynamic adjustment of VCPUs happens at start of day to
> reduce 
> the domain 0 VCPU allocation from the total number of cores in the
> machine to something more manageable.

I'm sorry, currently I am busy with some other tasks and may not have
time to do this job.

But if the case is to reduce dom0 VCPU number, keep the group number
unchanged will not impact the performance, since the group reflects the
tasklet/kthread number, and it doesn't have direct association with
dom0's VCPU number.

Thanks,
Dongxiao


> 
> Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback