Re: [Xen-devel] veth0 stuff in -unstable?

Nate Carlson <natecars@xxxxxxxxxxxxxxx> writes:

> Has anyone run down what the root of this is yet?

Trapped into this as well.  I think there is another bug as well, see
the comments in the log below.  Network setup is the "classic" one,
with the bridge being configured as network device, veth0/vif0.0 is
unused.  "eth0" is the bridge, "hw-eth0" the network card.

master-xen login: root
Password: 
Last login: Thu Jul 14 07:34:27 from eskarina.ber.suse.de
Have a lot of fun...
SuSE Linux 9.3 (i586)
SysRq : Changing Loglevel
Loglevel set to 9
master-xen root ~# device vif1.0 entered promiscuous mode
eth0: port 2(vif1.0) entering learning state
(XEN) (file=traps.c, line=872) Non-priv domain attempted 
RDMSR(c0000080,00000000,20100000).
(XEN) (file=traps.c, line=864) Non-priv domain attempted 
WRMSR(c0000080,00000800,00000000).
eth0: topology change detected, propagating
eth0: port 2(vif1.0) entering forwarding state

  [ Note #1: That was the initial domU boot.  fsck asked for a manual run
    due to unclean filesystem from the previous crash, so I did that and
    rebooted ]

device vif1.0 left promiscuous mode
eth0: port 2(vif1.0) entering disabled state
eth0: port 2(vif1.0) entering disabled state
device vif1.0 entered promiscuous mode
eth0: port 2(vif1.0) entering learning state
(XEN) (file=traps.c, line=872) Non-priv domain attempted 
RDMSR(c0000080,00000000,20100000).
(XEN) (file=traps.c, line=864) Non-priv domain attempted 
WRMSR(c0000080,00000800,00000000).
eth0: port 2(vif1.0) entering disabled state

  [ Note #2: DomU comes up fine now, but without functional network. ]

ip link ls vif1.0
7: vif1.0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue 
    link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff

  [ Note #3: Hmm, the virtual bridge port is down.  That shouldn't
    be that way, should it?  

    Fixed up manually.  Shortly thereafter the machine dies, must be
    one of the first network packets from domU which kills it.  Full
    oops log below. ]

master-xen root ~# ip link set vif1.0 up
eth0: port 2(vif1.0) entering learning state
master-xen root ~# eth0: topology change detected, propagating
eth0: port 2(vif1.0) entering forwarding state
general protection fault: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0061:[<c02f0dad>]    Not tainted VLI
EFLAGS: 00010213   (2.6.12-xen0-hg64f26eed8d473a96beab96162c230f1300539d7c) 
EIP is at skb_release_data+0x54/0xe2
eax: dd0c4080   ebx: 00000000   ecx: 00000002   edx: ffffffff
esi: dbcdf580   edi: 00000012   ebp: 0000003c   esp: c0453c68
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, threadinfo=c0452000 task=c03c4500)
Stack: dd0c4000 00000000 00000000 dd553b80 dbcdf580 dbcdf580 c02f0e4b dbcdf580 
       dd553b80 00000000 c02f0f32 dbcdf580 0081f992 dbcdf580 dc56ee20 dbcdf580 
       dc56ee20 c0274685 dbcdf580 00000002 00000000 38704032 0000003c 00000000 
Call Trace:
 [<c02f0e4b>] kfree_skbmem+0x10/0x26
 [<c02f0f32>] __kfree_skb+0xd1/0xdd
 [<c0274685>] net_rx_action+0x3e3/0x4b3
 [<c0125d5c>] update_process_times+0x130/0x140
 [<c011e3bd>] profile_tick+0x4e/0x5a
 [<c0107b81>] xen_idle+0x45/0x4c
 [<c010b6ea>] __get_time_values_from_xen+0x6a/0x6b
 [<c010bf44>] timer_interrupt+0x39/0x4ca
 [<c013d4a7>] mempool_alloc_slab+0x17/0x1b
 [<c02084a2>] __delay+0x12/0x16
 [<c0208524>] __const_udelay+0x25/0x29
 [<c029a196>] ata_exec_command_pio+0x27/0x2b
 [<c029a1f1>] ata_exec_command+0x2b/0x2f
 [<c013d4c2>] mempool_free_slab+0x17/0x25
 [<c01196ce>] recalc_task_prio+0x141/0x151
 [<c02f0e5c>] kfree_skbmem+0x21/0x26
 [<c02f0e35>] skb_release_data+0xdc/0xe2
 [<c02f0e5c>] kfree_skbmem+0x21/0x26
 [<c02f0f32>] __kfree_skb+0xd1/0xdd
 [<c02f6c95>] dev_queue_xmit+0x291/0x2a7
 [<c033ae64>] packet_rcv_spkt+0x212/0x21f
 [<c02f0f5e>] skb_clone+0x20/0x191
 [<c02f71fd>] netif_receive_skb+0x20c/0x24b
 [<c033dfdf>] br_pass_frame_up_finish+0xf/0x18
 [<c033e00d>] br_pass_frame_up+0x25/0x29
 [<c033e0c7>] br_handle_frame_finish+0xb6/0x120
 [<c033e26a>] br_handle_frame+0x139/0x17f
 [<c01254db>] __mod_timer+0xb1/0xd7
 [<c02f0c02>] alloc_skb_from_cache+0x51/0x141
 [<c0269fb2>] e100_poll+0xe6/0x87e
 [<c01221d4>] tasklet_action+0x8b/0xca
 [<c0121edb>] __do_softirq+0x4b/0x9e
 [<c0121f5a>] do_softirq+0x2c/0x45
 [<c012200a>] irq_exit+0x29/0x2a
 [<c010e002>] do_IRQ+0x22/0x28
 [<c01062e6>] evtchn_do_upcall+0x66/0x8e
 [<c0109dc8>] hypervisor_callback+0x2c/0x34
 [<c0107b81>] xen_idle+0x45/0x4c
 [<c0107bc4>] cpu_idle+0x3c/0x4a
 [<c022bf06>] acpi_enable_subsystem+0x29/0x55
 [<c0105024>] _stext+0x24/0x28
 [<c010505a>] init+0x0/0xfa
 [<c045484a>] start_kernel+0x1ca/0x1d1
 [<c045432f>] unknown_bootoption+0x0/0x23e
Code: 89 c1 0f c1 02 01 c8 85 c0 0f 85 a4 00 00 00 8b 96 94 00 00 00 89 d0 83 
7a 04 00 74 74 bb 00 00 00 00 3b 5a 04 73 6a 8b 54 d8 10 <8b> 02 f6 c4 08 75 53 
8b 42 04 83 f8 ff 75 35 c7 44 24 0c 99 71 
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 (XEN) Domain 0 shutdown: rebooting machine.

The faulting instruction is this:

c02f0d9f:       bb 00 00 00 00          mov    $0x0,%ebx
c02f0da4:       3b 5a 04                cmp    0x4(%edx),%ebx
c02f0da7:       73 6a                   jae    c02f0e13 <skb_release_data+0xba>
c02f0da9:       8b 54 d8 10             mov    0x10(%eax,%ebx,8),%edx
c02f0dad:       8b 02                   mov    (%edx),%eax   <= HERE

That should be this loop here:

void skb_release_data(struct sk_buff *skb)
[ ... ]
                        for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
                                put_page(skb_shinfo(skb)->frags[i].page);

ebx is the loop count and is zero, so it's the first time we enter the
loop.  skb_shinfo(skb)->frags[0].page is loaded into edx.  It is
0xffffffff (-1?).  Trying to dereference edx faults because it points
into xen's memory area ...

So the question is why the heck the struct page pointer is -1 at this
point?

  Gerd


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] veth0 stuff in -unstable?