i'm building this now, and am just thinking about how to test this... I was using a ping as my test mechanism. I guess i'll do lots of block device copies. I guess this lends weight to your thoughts that it probably is a net problem and not a block problem.
Instead of changing the source code to disable the net stuff, would it work if I just specified 'nics=0' or is some part of the net subsystem still activated? I'll test this too anyway.
In order to test disabling send or receive, this might be a bit trickier than you first make out. Send-only should be easy enough, just start another domain and then ping it (a manual arp table entry should alleviate the need to broadcast). Receive-only will be tricker. How do you get a domain to send to it? This problem of course assumes that corruption is not limited to the domain... if it is limited to the domain then you should be able to have a send/receive domain and ignore crashes in there, just focus on the crashes in the receive-only domain.
i'm almost confused, but am about to start testing - firstly with no network.
Could someone try to isolate this to either the network backend driver
or the blkdev backend driver?
The best way to do this is to disable the frontend drivers so that
they never try to coinnect to the backend driver...
To disable networking:
Edit arch/xen/drivers/netif/frontend/main.c. Change netif_init() to
always 'return 0;'.
To disable block devices:
Edit arch/xen/drivers/blkif/frontend/main.c. Change xlblk_init() to
always 'return 0;'.
Oh yes -- the 2.4 sparse tree no longer contains the net frontend
driver - you'll find the build tree symlinks to
linux-2.6.7-xen-sparse/drivers/xen/net/network.c. So you might want to
edit that instead...
Obviously, if you disable blkdevs you'll need to boot off a ramdisk
or via a networked mount. :-)
> I downloaded these (from a tgz that Keir had given me a link to as bk was down - I assume it's identical to his latest fixes) and started my tests running and went to bed, but it looks like I got errors within a very short time.
> The tests I was running were my 'compare' script and pinging the two domains I had running with
> ping -q -i 0.01 -s 1400 <ip address>
> Lots of oopses in the logs, most are probably as a result of the corruption and not indicative of the cause. They look similar to Jody's dump so I won't bother sending them unless someone thinks they might be useful.
> btw, can the install be modified to give us a System.map-2.4.26-xen[0U] in /boot? ksymoops would be much happier.