Okay, I have made the following change in dom0:
To disable the transmit path for guest OSes:
Edit net_tx_action in arch/xen/drivers/netif/backend/main.c. After the
call to netif_schedule_work(), add:
make_tx_response(netif, txreq.id, NETIF_RSP_OKAY);
compiled and rebooted with the new kernel. booted dom1, removed vif1.0 from the bridge, gave it it's own ip address, added a static arp entry and pinged away. I could see the packet counters for dom0 and dom1 climbing rapiding indicating that dom0 was sending packets, dom1 was receiving packets, but that a packet sent by dom1 was unable to reach dom0 again. I got the same sort of crashes after about 10 minutes.
I'm now testing the other half.
> As a first test I have just disabled networking via nics=0 in the config, and running this script in dom1:
> while [ 1 = 1 ]
> dd if=/dev/sda1 of=/dev/null bs=1024 count=128K &
> dd if=/dev/sda1 of=/dev/null bs=1024 skip=256K count=256K
> it tells me 'ioctl 801c6d02 not supported by XL blkif' but that doesn't seem to matter. Anyway, there are no crashes so far so i'm thinking at this stage that the block interface stuff is probably fine and I should now concentrate on the network. Disabling the block stuff will be a huge hassle at this stage so i'll have to let it go for the moment.
It does seem more likely that the network backend driver is to blame
-- it's considerably more complicated than the blkdev driver.
> I think i need a crash course in how all this hangs together before I can understand what i'm testing... My understanding is as follows:
> packets sent to dom0.vif1.0 appear at dom1.eth0.
> packets sent to dom1.eth0 appear at dom0.vif1.0.
Yes, it's basically a point-to-point link. The transmit side on each
interface is directly linked to the receive side on the other.
> and that's about it. Are they symmetrical? Is the transmit code for dom0.vif1.0 the same as the transmit code for dom1.eth0? Ditto for receive?
No. dom1.eth0 is implemented by the frontend driver
dom0.vif* is implemented by arch/xen/drivers/netif/backend/main.c
So they look symmetric to users, but the implementation is not