I am trying this now. Within a few seconds of starting the flood ping, dom1 rebooted. no messages in the logs to give any hint as to why though. Trying again and I didn't get anything useful either once I started getting noticable corruption.
just on the subject of page reassignment, I'm trying to figure out what the code is doing.
in netif_be_start_xmit, there is a check to make sure that the packet is entirely on 1 page. What happens if the packet is too big for one page, or if there is other data on the same page? (it's all black magic to me at the moment!)
James
> Okay, I have made the following change in dom0:
>
> To disable the transmit path for guest OSes:
> Edit net_tx_action in arch/xen/drivers/netif/backend/main.c. After the
> call to netif_schedule_work(), add:
> make_tx_response(netif, txreq.id, NETIF_RSP_OKAY);
> netif_put(netif);
> continue;
>
> compiled and rebooted with the new kernel. booted dom1, removed vif1.0 from the bridge, gave it it's own ip address, added a static arp entry and pinged away. I could see the packet counters for dom0 and dom1 climbing rapiding indicating that dom0 was sending packets, dom1 was receiving packets, but that a packet sent by dom1 was unable to reach dom0 again. I got the same sort of crashes after about 10 minutes.
If you do a test with DPRINTK enabled in
linux-2.4.26-xen-sparse/arch/xen/drivers/netif/backend/common.h
and with debugging enabled in Xen 'debug=y make'
then you may get some useful debugging out of the machine when it all
goes horribly wrong. e.g., perhaps something is failing apparently
spuriously... one example would be that a page reassignment (from dom0
to the other guest) is failing for some weird reason.
If we can get somne debugging out when things first go wrong, that
would be very useful indeed.
Thanks,
Keir