WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback

To: "Xu, Dongxiao" <dongxiao.xu@xxxxxxxxx>
Subject: Re: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback multiple threads patchset
From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
Date: Fri, 11 Jun 2010 10:35:30 +0100
Cc: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Fantu <fantonifabio@xxxxxxxxxx>, Pasi, "djmagee@xxxxxxxxxxxx" <djmagee@xxxxxxxxxxxx>
Delivery-date: Fri, 11 Jun 2010 02:36:29 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <D5AB6E638E5A3E4B8F4406B113A5A19A1F205536@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Citrix Systems, Inc.
References: <D5AB6E638E5A3E4B8F4406B113A5A19A1F205536@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
On Thu, 2010-06-10 at 12:48 +0100, Xu, Dongxiao wrote:
> Hi Jeremy,
> 
> The attached patch should fix the PV network issue after applying the netback 
> multiple threads patchset.

Thanks for this Donxiao. Do you think this crash was a potential symptom
of this issue? It does seem to go away if I apply your patch.
        BUG: unable to handle kernel paging request at 70000027
        IP: [<c0294867>] make_tx_response+0x17/0xd0
        *pdpt = 0000000000000000
        Oops: 0000 [#2] SMP
        last sysfs file:
        Modules linked in:
        Supported: Yes
        
        Pid: 1083, comm: netback/0 Tainted: G      D   
(2.6.27.45-0.1.1-x86_32p-xen #222)
        EIP: 0061:[<c0294867>] EFLAGS: 00010296 CPU: 0
        EIP is at make_tx_response+0x17/0xd0
        EAX: 6fffffff EBX: 00000000 ECX: 00000000 EDX: f00610a4
        ESI: 6fffffff EDI: f00620a4 EBP: ed0c3f18 ESP: ed0c3f0c
         DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: e021
        Process netback/0 (pid: 1083, ti=ed0c2000 task=ee9de070 
task.ti=ed0c2000)
        Stack: 00000000 00000000 f00620a4 ed0c3fa8 c029676a c0456000 ee9de070 
ed0c3fd0
               ed0c3f94 00000002 ed0c3fb8 f0062ca4 f0061000 6fffffff 011d9000 
f00620a4
               f006108c ed0c3f5c c04ffb00 c04ffb00 ed0c3fc0 ed0c3fbc ed0c3fb8 
ed0c2000
        Call Trace:
         [<c029676a>] ? net_tx_action+0x32a/0xa50
         [<c0296f62>] ? netbk_action_thread+0x62/0x190
         [<c0296f00>] ? netbk_action_thread+0x0/0x190
         [<c013f84c>] ? kthread+0x3c/0x70
         [<c013f810>] ? kthread+0x0/0x70
         [<c0105633>] ? kernel_thread_helper+0x7/0x10
         =======================
        Code: ec 8d 41 01 89 47 2c c7 45 e4 ea ff ff ff eb dd 8d 74 26 00 55 66 
0f be c9 89 e5 83 ec 0c 89 74 24 04 89 c6 89 1c 24 89 7c 24 08 <8b> 78 28 8b 40 
30 0f b7 5a 08 83 e8 01 21 f8 8d 04 40 c1 e0 02
        EIP: [<c0294867>] make_tx_response+0x17/0xd0 SS:ESP e021:ed0c3f0c
        ---[ end trace f7e370bf10f6f981 ]---

The crash is in one of the calls to list_move_tail and I think it is
because netbk->pending_inuse_head not being initialised until after the
threads and/or tasklets are created (I was running in threaded mode).
Perhaps even though we are now zeroing the netbk struct those fields
should still be initialised before kicking off any potentially
asynchronous tasks?

I didn't even start any guests so I think we only got to the reference
to pending_inuse_head because tx_work_todo can return a false positive
if netbk is not properly zeroed and therefore we can call net_tx_action
before we are ready.

On an unrelated note, do you have any plans to make the number of groups
react dynamically to CPU hotplug? Not necessarily while there are
actually active VIFs (might be tricky to get right) but perhaps only
when netback is idle (i.e. when there are no VIFs configured), since
often the dynamic adjustment of VCPUs happens at start of day to reduce
the domain 0 VCPU allocation from the total number of cores in the
machine to something more manageable.

Ian.





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel