WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] domU network has sleeping sickness

To: Steven Timm <timm@xxxxxxxx>
Subject: Re: [Xen-users] domU network has sleeping sickness
From: Marc Teichgraeber <radar@xxxxxxxxxxx>
Date: Mon, 03 Mar 2008 18:19:11 +0100
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 03 Mar 2008 09:19:48 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <Pine.LNX.4.64.0803030959350.7116@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <47CC1E09.1080804@xxxxxxxxxxx> <Pine.LNX.4.64.0803030959350.7116@xxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 1.5.0.14 (X11/20060911)
Steven Timm wrote:
> I've seen the same problem with my xen 3.1.0 setup.  What
> the Xen gurus are telling us is that this is a symptom of Xen dom0
> being busy and not servicing the network interrupts of the domu's
> promptly.  Their advice to us was to shift an application that
> had been running on dom0 to another Xen instance to see if that
> would help.  We are in the process of implementing that solution now.
>

There is nothing running on my dom0's. They're only purpose is managing
the domU's.
On one of the problematic XEN-hosts is actually load on the three
domU's, they are serving continous build systems. But another sleepy
XEN-host with five domU's is more or less in pre-production state and
idling.

> By the way my system (Dell poweredge2950) has got broadcomm
> inbuilt network cards, not Intel E1000 so it is unlikely that
> it is a network driver specific issue.
>
> During these episodes of non-network connectivity, by the way,
> it was not unusual to see the following kernel dump in dom0
>

I do'nt find anything helpful or suspicious in any log. But maybe I'm
missing it.
I'm looking in dom0 in dmesg, messages, warn, xend-debug.log,  xend.log
and xen-hotplug.log and in the domU in dmesg, messages and warn.
But after the bootup process there is more or less nothing important logged.

> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: Call Trace:
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <IRQ>
> [<ffffffff8025
> 8269>] softlockup_tick+0xcc/0xde
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8020e84d>]
>  timer_interrupt+0x3a3/0x401
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff80258898>]
>  handle_IRQ_event+0x4b/0x93
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8025897e>]
>  __do_IRQ+0x9e/0x100
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8020cc97>]
>  do_IRQ+0x63/0x71
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8034b347>]
>  evtchn_do_upcall+0xee/0x165
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8020abca>]
>  do_hypervisor_callback+0x1e/0x2c
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <EOI>
>
> or
>
> Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0!
> Feb 25 10:32:39 fermigrid6 kernel:
> Feb 25 10:32:39 fermigrid6 kernel: Call Trace:
> Feb 25 10:32:39 fermigrid6 kernel:  <IRQ> [<ffffffff80258269>]
> softlockup_tick+0xcc/0xde
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020e84d>]
> timer_interrupt+0x3a3/0x401
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80258898>]
> handle_IRQ_event+0x4b/0x93
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8025897e>]
> __do_IRQ+0x9e/0x100
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020cc97>] do_IRQ+0x63/0x71
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b347>]
> evtchn_do_upcall+0xee/0x165
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020abca>]
> do_hypervisor_callback+0x1e/0x2c
> Feb 25 10:32:39 fermigrid6 kernel:  <EOI> [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b258>]
> force_evtchn_callback+0xa/0xb
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f2272>]
> thread_return+0xdf/0x119
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80228a25>]
> __cond_resched+0x1c/0x44
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f25df>]
> cond_resched+0x37/0x42
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802343c4>]
> ksoftirqd+0x0/0xbf
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80234432>]
> ksoftirqd+0x6e/0xbf
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802422d7>]
> kthread+0xc8/0xf1
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae1c>]
> child_rip+0xa/0x12
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8024220f>] kthread+0x0/0xf1
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae12>]
> child_rip+0x0/0x12
>
> ----------------
>
> One of our dom0's was running an LVS server, the other one on
> identical hardware was not.  We moved the LVS server from one to the
> other and
> the network problems and kernel panics followed it.
>
> Steve Timm
>
> On Mon, 3 Mar 2008, Marc Teichgraeber wrote:
>
>> Hi all,
>>
>> I have a strange network problem with some domU's on three XEN-Hosts.
>> They are loosing their network connectivity. I do bridged networking.
>>   * It happens randomly and could happen right after bootup of the domU
>> or anytime later.
>>   * The domU is not reachable from another host on the LAN.
>>   * The domU is always reachable from the dom0 (ssh, ping).
>>   * I can 'repair' the connection when attaching to the console and
>> ping out from the domU. First nothings happens, then the machine gets
>> back their network. (And thats also my momentary workaround, pinging all
>> the time from the console)
>>   * Pinging from another host at the same time helps too.
>>   * It could be that I can ping continously from one host and another
>> hosts gets only every 10th packet or so back.
>>   * The interfaces could come back from their sleep by itself.
>>   * When the networks has fallen asleep, ssh on the domU from another
>> host hangs, it does not come back with "no route to host" or something.
>>
>> I'm suspicious about the network controllers, they are the same on all
>> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
>> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection
>> with I/O Acceleration"(Intel website). I've tried the latest e1000
>> driver from Intel but it does'nt helped.
>> I've checked all MAC Adresses, they are unique, also the IP Adresses.
>>
>> Any ideas are welcome :)
>>
>> -------------------------------------------------------------------------
>>
>> "xm info" from host1,  openSUSE 10.2 (X86-64):
>>
>> release                : 2.6.18.8-0.9-xen
>> version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
>> machine                : x86_64
>> nr_cpus                : 4
>> nr_nodes               : 1
>> sockets_per_node       : 2
>> cores_per_socket       : 2
>> threads_per_core       : 1
>> cpu_mhz                : 2327
>> hw_caps                :
>> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
>> total_memory           : 32766
>> free_memory            : 21607
>> max_free_memory        : 21607
>> max_para_memory        : 21603
>> max_hvm_memory         : 21544
>> xen_major              : 3
>> xen_minor              : 0
>> xen_extra              : .3_11774-23
>> xen_caps               : xen-3.0-x86_64
>> xen_pagesize           : 4096
>> platform_params        : virt_start=0xffff800000000000
>> xen_changeset          : 11774
>> cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
>> Linux)
>> cc_compile_by          : abuild
>> cc_compile_domain      : suse.de
>> cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
>> xend_config_format     : 2
>> -------------------------------------------------------------------------
>>
>> "xm info" output on host2, openSUSE 10.3 (X86-64)
>>
>> release                : 2.6.22.13-0.3-xen
>> version                : #1 SMP 2007/11/19 15:02:58 UTC
>> machine                : x86_64
>> nr_cpus                : 8
>> nr_nodes               : 1
>> sockets_per_node       : 2
>> cores_per_socket       : 4
>> threads_per_core       : 1
>> cpu_mhz                : 3000
>> hw_caps                :
>> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
>> total_memory           : 16382
>> free_memory            : 591
>> max_free_memory        : 591
>> max_para_memory        : 587
>> max_hvm_memory         : 577
>> xen_major              : 3
>> xen_minor              : 1
>> xen_extra              : .0_15042-51
>> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
>> xen_scheduler          : credit
>> xen_pagesize           : 4096
>> platform_params        : virt_start=0xffff800000000000
>> xen_changeset          : 15042
>> cc_compiler            : gcc version 4.2.1 (SUSE Linux)
>> cc_compile_by          : abuild
>> cc_compile_domain      : suse.de
>> cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
>> xend_config_format     : 4
>>
>>
>


-- 
--------------------------------
Marc Teichgraeber
Systemadministrator
Systemadministration

neofonie GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 185
fax: +49.30 24627 120
marc.teichgraeber@xxxxxxxxxxx
http://www.neofonie.de 

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschaeftsfuehrung
Helmut Hoffer von Ankershoffen
Nurhan Yildirim
--------------------------------


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users