Nate Carlson <natecars@xxxxxxxxxxxxxxx> writes:
> Has anyone run down what the root of this is yet?
Trapped into this as well. I think there is another bug as well, see
the comments in the log below. Network setup is the "classic" one,
with the bridge being configured as network device, veth0/vif0.0 is
unused. "eth0" is the bridge, "hw-eth0" the network card.
master-xen login: root
Password:
Last login: Thu Jul 14 07:34:27 from eskarina.ber.suse.de
Have a lot of fun...
SuSE Linux 9.3 (i586)
SysRq : Changing Loglevel
Loglevel set to 9
master-xen root ~# device vif1.0 entered promiscuous mode
eth0: port 2(vif1.0) entering learning state
(XEN) (file=traps.c, line=872) Non-priv domain attempted
RDMSR(c0000080,00000000,20100000).
(XEN) (file=traps.c, line=864) Non-priv domain attempted
WRMSR(c0000080,00000800,00000000).
eth0: topology change detected, propagating
eth0: port 2(vif1.0) entering forwarding state
[ Note #1: That was the initial domU boot. fsck asked for a manual run
due to unclean filesystem from the previous crash, so I did that and
rebooted ]
device vif1.0 left promiscuous mode
eth0: port 2(vif1.0) entering disabled state
eth0: port 2(vif1.0) entering disabled state
device vif1.0 entered promiscuous mode
eth0: port 2(vif1.0) entering learning state
(XEN) (file=traps.c, line=872) Non-priv domain attempted
RDMSR(c0000080,00000000,20100000).
(XEN) (file=traps.c, line=864) Non-priv domain attempted
WRMSR(c0000080,00000800,00000000).
eth0: port 2(vif1.0) entering disabled state
[ Note #2: DomU comes up fine now, but without functional network. ]
ip link ls vif1.0
7: vif1.0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
[ Note #3: Hmm, the virtual bridge port is down. That shouldn't
be that way, should it?
Fixed up manually. Shortly thereafter the machine dies, must be
one of the first network packets from domU which kills it. Full
oops log below. ]
master-xen root ~# ip link set vif1.0 up
eth0: port 2(vif1.0) entering learning state
master-xen root ~# eth0: topology change detected, propagating
eth0: port 2(vif1.0) entering forwarding state
general protection fault: 0000 [#1]
Modules linked in:
CPU: 0
EIP: 0061:[<c02f0dad>] Not tainted VLI
EFLAGS: 00010213 (2.6.12-xen0-hg64f26eed8d473a96beab96162c230f1300539d7c)
EIP is at skb_release_data+0x54/0xe2
eax: dd0c4080 ebx: 00000000 ecx: 00000002 edx: ffffffff
esi: dbcdf580 edi: 00000012 ebp: 0000003c esp: c0453c68
ds: 007b es: 007b ss: 0069
Process swapper (pid: 0, threadinfo=c0452000 task=c03c4500)
Stack: dd0c4000 00000000 00000000 dd553b80 dbcdf580 dbcdf580 c02f0e4b dbcdf580
dd553b80 00000000 c02f0f32 dbcdf580 0081f992 dbcdf580 dc56ee20 dbcdf580
dc56ee20 c0274685 dbcdf580 00000002 00000000 38704032 0000003c 00000000
Call Trace:
[<c02f0e4b>] kfree_skbmem+0x10/0x26
[<c02f0f32>] __kfree_skb+0xd1/0xdd
[<c0274685>] net_rx_action+0x3e3/0x4b3
[<c0125d5c>] update_process_times+0x130/0x140
[<c011e3bd>] profile_tick+0x4e/0x5a
[<c0107b81>] xen_idle+0x45/0x4c
[<c010b6ea>] __get_time_values_from_xen+0x6a/0x6b
[<c010bf44>] timer_interrupt+0x39/0x4ca
[<c013d4a7>] mempool_alloc_slab+0x17/0x1b
[<c02084a2>] __delay+0x12/0x16
[<c0208524>] __const_udelay+0x25/0x29
[<c029a196>] ata_exec_command_pio+0x27/0x2b
[<c029a1f1>] ata_exec_command+0x2b/0x2f
[<c013d4c2>] mempool_free_slab+0x17/0x25
[<c01196ce>] recalc_task_prio+0x141/0x151
[<c02f0e5c>] kfree_skbmem+0x21/0x26
[<c02f0e35>] skb_release_data+0xdc/0xe2
[<c02f0e5c>] kfree_skbmem+0x21/0x26
[<c02f0f32>] __kfree_skb+0xd1/0xdd
[<c02f6c95>] dev_queue_xmit+0x291/0x2a7
[<c033ae64>] packet_rcv_spkt+0x212/0x21f
[<c02f0f5e>] skb_clone+0x20/0x191
[<c02f71fd>] netif_receive_skb+0x20c/0x24b
[<c033dfdf>] br_pass_frame_up_finish+0xf/0x18
[<c033e00d>] br_pass_frame_up+0x25/0x29
[<c033e0c7>] br_handle_frame_finish+0xb6/0x120
[<c033e26a>] br_handle_frame+0x139/0x17f
[<c01254db>] __mod_timer+0xb1/0xd7
[<c02f0c02>] alloc_skb_from_cache+0x51/0x141
[<c0269fb2>] e100_poll+0xe6/0x87e
[<c01221d4>] tasklet_action+0x8b/0xca
[<c0121edb>] __do_softirq+0x4b/0x9e
[<c0121f5a>] do_softirq+0x2c/0x45
[<c012200a>] irq_exit+0x29/0x2a
[<c010e002>] do_IRQ+0x22/0x28
[<c01062e6>] evtchn_do_upcall+0x66/0x8e
[<c0109dc8>] hypervisor_callback+0x2c/0x34
[<c0107b81>] xen_idle+0x45/0x4c
[<c0107bc4>] cpu_idle+0x3c/0x4a
[<c022bf06>] acpi_enable_subsystem+0x29/0x55
[<c0105024>] _stext+0x24/0x28
[<c010505a>] init+0x0/0xfa
[<c045484a>] start_kernel+0x1ca/0x1d1
[<c045432f>] unknown_bootoption+0x0/0x23e
Code: 89 c1 0f c1 02 01 c8 85 c0 0f 85 a4 00 00 00 8b 96 94 00 00 00 89 d0 83
7a 04 00 74 74 bb 00 00 00 00 3b 5a 04 73 6a 8b 54 d8 10 <8b> 02 f6 c4 08 75 53
8b 42 04 83 f8 ff 75 35 c7 44 24 0c 99 71
<0>Kernel panic - not syncing: Fatal exception in interrupt
(XEN) Domain 0 shutdown: rebooting machine.
The faulting instruction is this:
c02f0d9f: bb 00 00 00 00 mov $0x0,%ebx
c02f0da4: 3b 5a 04 cmp 0x4(%edx),%ebx
c02f0da7: 73 6a jae c02f0e13 <skb_release_data+0xba>
c02f0da9: 8b 54 d8 10 mov 0x10(%eax,%ebx,8),%edx
c02f0dad: 8b 02 mov (%edx),%eax <= HERE
That should be this loop here:
void skb_release_data(struct sk_buff *skb)
[ ... ]
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
put_page(skb_shinfo(skb)->frags[i].page);
ebx is the loop count and is zero, so it's the first time we enter the
loop. skb_shinfo(skb)->frags[0].page is loaded into edx. It is
0xffffffff (-1?). Trying to dereference edx faults because it points
into xen's memory area ...
So the question is why the heck the struct page pointer is -1 at this
point?
Gerd
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|