Am Montag, den 06.06.2005, 09:23 +0100 schrieb Keir Fraser:
> On 5 Jun 2005, at 17:57, Birger Toedtmann wrote:
>
> > Apparently it is happening somewhere here:
> >
> > [...]
> > 0xc028cbe5 <net_rx_action+1135>: test %eax,%eax
> > 0xc028cbe7 <net_rx_action+1137>: je 0xc028ca82
> > <net_rx_action+780>
> > 0xc028cbed <net_rx_action+1143>: mov %esi,%eax
> > 0xc028cbef <net_rx_action+1145>: shr $0xc,%eax
> > 0xc028cbf2 <net_rx_action+1148>: mov %eax,(%esp)
> > 0xc028cbf5 <net_rx_action+1151>: call 0xc028c4c4 <free_mfn>
> > 0xc028cbfa <net_rx_action+1156>: mov $0xffffffff,%ecx
> > ^^^^^^^^^^
>
> Most likely the driver has tried to send a bogus page to a domU.
> Because it's bogus the transfer fails. The driver then tries to free
> the page back to Xen, but that also fails because the page is bogus.
> This confuses the driver, which then BUG()s out.
I commented out the free_mfn() and status= lines: the kernel now reports
the following after it configured the 10th domU and ~80th vif, with
approx. 20-25 bridges up. Just an idea: the number of vifs + bridges is
somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the
crash happens - could this hint to something?
[...]
Jun 6 10:12:14 lomin kernel: 10.2.23.8: port 2(vif10.3) entering
forwarding state
Jun 6 10:12:14 lomin kernel: 10.2.35.16: topology change detected,
propagating
Jun 6 10:12:14 lomin kernel: 10.2.35.16: port 2(vif10.4) entering
forwarding state
Jun 6 10:12:14 lomin kernel: 10.2.35.20: topology change detected,
propagating
Jun 6 10:12:14 lomin kernel: 10.2.35.20: port 2(vif10.5) entering
forwarding state
Jun 6 10:12:20 lomin kernel: c014cea4
Jun 6 10:12:20 lomin kernel: [do_page_fault+643/1665] do_page_fault
+0x469/0x738
Jun 6 10:12:20 lomin kernel: [<c0115720>] do_page_fault+0x469/0x738
Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun 6 10:12:20 lomin kernel: [<c0109a7e>] page_fault+0x2e/0x34
Jun 6 10:12:20 lomin kernel: [do_page_fault+49/1665] do_page_fault
+0x217/0x738
Jun 6 10:12:20 lomin kernel: [<c01154ce>] do_page_fault+0x217/0x738
Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun 6 10:12:20 lomin kernel: [<c0109a7e>] page_fault+0x2e/0x34
Jun 6 10:12:20 lomin kernel: PREEMPT
Jun 6 10:12:20 lomin kernel: Modules linked in: dm_snapshot pcmcia
bridge ipt_REJECT ipt_state iptable_filter ipt_MASQUERADE iptable_nat
ip_conntrack ip_tables autofs4 snd_seq snd_seq_device evdev usbhid
rfcomm l2cap bluetooth dm_mod cryptoloop snd_pcm_oss snd_mixer_oss
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore
snd_page_alloc tun uhci_hcd usb_storage usbcore irtty_sir sir_dev
ircomm_tty ircomm irda yenta_socket rsrc_nonstatic pcmcia_core 3c59x
Jun 6 10:12:20 lomin kernel: CPU: 0
Jun 6 10:12:20 lomin kernel: EIP: 0061:[do_wp_page+622/1175] Not
tainted VLI
Jun 6 10:12:20 lomin kernel: EIP: 0061:[<c014cea4>] Not tainted
VLI
Jun 6 10:12:20 lomin kernel: EFLAGS: 00010206 (2.6.11.11-xen0)
Jun 6 10:12:20 lomin kernel: EIP is at handle_mm_fault+0x5d/0x222
Jun 6 10:12:20 lomin kernel: eax: 15555b18 ebx: d8788000 ecx:
00000b18 edx: 15555b18
Jun 6 10:12:20 lomin kernel: esi: dcfc3b4c edi: dcaf5580 ebp:
d8789ee4 esp: d8789ebc
Jun 6 10:12:20 lomin kernel: ds: 0069 es: 0069 ss: 0069
Jun 6 10:12:20 lomin kernel: Process python (pid: 4670,
threadinfo=d8788000 task=de1a1520)
Jun 6 10:12:20 lomin kernel: Stack: 00000040 00000001 d40e687c d40e6874
00000006 d40e685c d8789f14 dcaf5580
Jun 6 10:12:20 lomin kernel: dcaf55ac d40e6b1c d8789fbc c01154ce
dcaf5580 d40e6b1c b4ec6ff0 00000001
Jun 6 10:12:20 lomin kernel: 00000001 de1a1520 b4ec6ff0 00000006
d8789fc4 d8789fc4 c03405b0 00000006
Jun 6 10:12:20 lomin kernel: Call Trace:
Jun 6 10:12:20 lomin kernel: [dump_stack+16/32] show_stack+0x80/0x96
Jun 6 10:12:20 lomin kernel: [<c0109c51>] show_stack+0x80/0x96
Jun 6 10:12:20 lomin kernel: [show_registers+384/457] show_registers
+0x15a/0x1d1
Jun 6 10:12:20 lomin kernel: [<c0109de1>] show_registers+0x15a/0x1d1
Jun 6 10:12:20 lomin kernel: [die+301/458] die+0x106/0x1c4
Jun 6 10:12:20 lomin kernel: [<c010a001>] die+0x106/0x1c4
Jun 6 10:12:20 lomin kernel: [do_page_fault+675/1665] do_page_fault
+0x489/0x738
Jun 6 10:12:20 lomin kernel: [<c0115740>] do_page_fault+0x489/0x738
Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun 6 10:12:20 lomin kernel: [<c0109a7e>] page_fault+0x2e/0x34
Jun 6 10:12:20 lomin kernel: [do_page_fault+49/1665] do_page_fault
+0x217/0x738
Jun 6 10:12:20 lomin kernel: [<c01154ce>] do_page_fault+0x217/0x738
Jun 6 10:12:20 lomin kernel: [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun 6 10:12:20 lomin kernel: [<c0109a7e>] page_fault+0x2e/0x34
Jun 6 10:12:20 lomin kernel: Code: 8b 47 1c c1 ea 16 83 43 14 01 8d 34
90 85 f6 0f 84 52 01 00 00 89 f2 8b 4d 10 89 f8 e8 4a d1 ff ff 85 c0 89
c2 0f 84 3c 01 00 00 <8b> 00 a8 81 75 3d 85 c0 0f 84 01 01 00 00 a8 40
0f 84 a4 00 00
>
> It's not at all clear where the bogus address comes from: the driver
> basically just reads the address out of an skbuff, and converts it from
> virtual to physical address. But something is obviously going wrong,
> perhaps under memory pressure. :-(
Where, within the domUs or dom0? The latter has lots of memory at hand,
the domU are quite strapped of memory. I'll try to find out...
Regards,
--
Birger Tödtmann
Technik der Rechnernetze, Institut für Experimentelle Mathematik
Universität Duisburg-Essen, Campus Essen email:btoedtmann@xxxxxxxxxxxxxx
skype:birger.toedtmann pgp:0x6FB166C9
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|