WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-users] DNS / SSH / NFS problems with domU

Hello,

 I've been experiencing an odd problem as of late with new domUs that I've 
created.

 My setup:

 Intel E6400 Core 2 Duo
 4GB RAM
 250GB LVM volume
 Debian 4.0 "Etch" as dom0
 Xen 3.0.3 w/PAE / 2.6.18 kernel

 I've had this machine up and running since November/December (2006), and have 
made a 
number of paravirt 
domUs. Initially by hand, and then progressing onto the Debian xen-tools 
package and using 
xen-create-
image. All domUs are Debian 4.0 "Etch" systems-- filesystems present as logical 
volumes on 
the LVM.

 All has been fine and dandy (and even continues to be fine and dandy-- with 
those original 
domUs), until some 
point when any newly created domUs created appear to have issues with 
DNS/SSH/NFS. Not 
sure WHY they have 
problems-- but it appears I haven't been able to create a working domU using 
xen-create-
image / manual 
debootstrap since February/March (I'd also attribute it to the release of 
Debian Etch.. but 
they've all been apt-
get upgraded to the official Etch release over time). No clue why.

 What happens is often first noticed by SSH attempts. The first attempt to 
connect to a 
troubled VM often 
results in odd delays not present on the other "working" VMs, and then I get 
the infamous: 
"Disconnecting: Bad 
packet length wxxxyyyzzz" message. Subsequent connections tend to work, only to 
be 
booted with a similar 
"Bad packet length" message a few minutes later. NFS shares refuse to mount 
from the server 
(another VM on 
the same machine). "rpcinfo -p nfs_server" works, but after a curiously long 
delay. NFS 
mounts themselves 
claim "failed. NFS server is down."

 Also strange, one machine seemed to be having issues pinging another VM. There 
would be 
delays between 
icmp_seq lines, then it might go for 5-6 seemingly normal, only to act up 
again. I would end 
up getting about 
11-25% packet loss just pinging another VM on the same machine. Installing the 
host utility 
and running "host 
problemvm" from the problemvm yields a mix of results-- sometimes it retrieves 
its DNS info 
just fine.. other 
times not... and still others, it'll seem to "half" find it.. returning some 
malformed output.

 VMs get IPs via DHCP. resolv.conf is set up properly. In fact, as far as I can 
tell.. network and 
general machine 
configuration is identical as the working domUs. I've even dropped in copies of 
config files 
from working VMs, 
no luck.

 Now, I was suspecting possible DNS issues. I've checked... double checked.. 
had a couple 
other people check. 
Everything seems to be spelled correctly, IPs match, serial numbers 
appropriately updated... 
services restarted... 
no malformed line endings. So I really don't know what could be the issue.

 Kernel, modules, initrd are the same as the working VMs.

 I've tried scouring the internet for others who may have experienced the same 
problems... 
nothing that seems 
to come anywhere close. I did find mention on the xen-devel list to some 
"nloopbacks" 
parameter that had 
been scaled down to 4 from 8. But if each VM is running generally only one VIF, 
is that even 
an issue?

 Now, I will say I am running, in addition to the dom0, 8-9 domUs on one 
system. I don't 
know if there's some 
internal limit I've unknowingly passed. I've tried booting up the "problematic" 
domUs first, 
then booting the 
known working ones.. no change-- the working domUs still work, the newer domUs 
do not.

 I had it suggested that there might be an issue with the bridge interface... 
but what? Is there 
some kernel limit 
to the number of bridges and vifs I can have?

 Any strategies to try and figure this out? It is truly perplexing. No strange 
errors are logged 
by the dom0... I 
wouldn't think it would be hardware (ie memory)... as I can create a new domU 
and have it 
experience the same 
problems. In fact I have two such problematic domUs running right now.. and 
they're both 
not working as I'd 
expect them to (while the other 7 working domUs are also up and running, 
seemingly 
working just fine).

 Any suggestions would be greatly appreciated.

-Matthew
-- 
 Matthew Haas
 Visiting Instructor
 Corning Community College
 Computer & Information Science
 http://lab46.corning-cc.edu/haas/home/

  "Writing should be like breathing;
   It is one of those important things we do." -- me

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users