Am Dienstag, den 07.06.2005, 18:47 +0200 schrieb Nils Toedtmann:
> Am Montag, den 06.06.2005, 14:30 +0200 schrieb Birger Tödtmann:
> > Am Montag, den 06.06.2005, 10:26 +0100 schrieb Keir Fraser:
> > [...]
> > > > somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the
> > > > crash happens - could this hint to something?
> > >
> > > The crashes you see with free_mfn removed will be impossible to debug
> > > -- things are very screwed by that point. Even the crash within
> > > free_mfn might be far removed from the cause of the crash, if it's due
> > > to memory corruption.
> > >
> > > It's perhaps worth investigating what critical limit you might be
> > > hitting, and what resource it is that's limited. e.g., can you can
> > > create a few vifs, but connected together by some very large number of
> > > bridges (daisy chained together)? Or can you create a large number of
> > > vifs if they are connected together by just one bridge?
> >
> > This is getting really weird - as I found out I'll enounter problems
> > with far fewer vifs/bridges that suspected. I just fired up a network
> > with 7 nodes, all with four interfaces each connected to the same four
> > bridge interfaces. The nodes can ping through the network, however
> > after a short time, the system (dom0) crashes as well. This time, it
> > dies in net_rx_action() at a slightly different place:
> >
> > [...]
> > [<c02b6e15>] kfree_skbmem+0x12/0x29
> > [<c02b6ed1>] __kfree_skb+0xa5/0x13f
> > [<c028c9b3>] net_rx_action+0x23d/0x4df
> > [...]
> >
> > Funnily, I cannot reproduce this with 5 nodes (domUs) running. I'm a
> > bit unsure where to go from here... Maybe I should try a different
> > machine for further testing.
>
> I can confirm this bug on AMD Athlon using xen-unstable from june 5th
> (latest ChangeSet 1.1677).
[...]
errr ... sorry for the dupe.
> Further experiments show that its seems to be the amount of traffic (and
> the number of connected vifs?) which triggers the oops: with all OSPF
> daemons stopped, i could UP all bridges & vifs. But when i did a flood-
> broadcast ping (ping -f -b $broadcastadr) on the 52th bridge (that one
> with more that two active ports), dom0 OOPSed again.
>
> I could only reproduce that "too-much-traffic-oops" on bridges
> connecting more that 10 vifs.
>
> Would be interesting if that happens with unicast traffic, too. Have no
> time left, test more tomorrow.
Ok, reproduced the dom0 kernel panic in a simpler situation:
* create some domUs, each having 1 interface in the same subnet
* bridge all the interfaces together (dom0 not having an ip on that
bridge)
* trigger unicast traffic as much as you want (like unicast flood
pings): No problem.
* Now trigger some broadcast traffic between the domUs:
ping -i 0,1 -b 192.168.0.255
BOOOM.
Instead, you may down all vifs first, start the flood broadcast ping in
the first domU and bring up one vif after the other (wait each time
>15sec until the bridge put the added port in forwarding state). After
bringing up 10-15 vifs, dom0 panics.
I could _not_ reproduce this with massive unicast traffic. The problem
disappears if i set "net.ipv4.icmp_echo_ignore_broadcasts=1" in all
domains. Maybe the probem rises if to many domUs answer to broadcasts at
the same time (collisions?).
/nils.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|