|
|
|
|
|
|
|
|
|
|
xen-users
Re: [Xen-users] Xen system hang or freeze
Some thoughts:
0. Do you have the default behavior where the guests independent
wallclocks are disabled?
1. I have observed visible performance differences from a VM when
%steal goes above 1%.
It sounds like you have 8 cores.
How many VMs do you have?
What are their weights and caps?
2. The system default of collecting sar every ten minutes is pretty
unhelpful for problem diagnosis. I routinely adjust this to interval
to five seconds, which for the expense of a lot of disk space, gives a
historical dataset that is useful for forensics.
On Apr 21, 2009, at 10:10 AM, Nick Anderson wrote:
On Tue, Apr 21, 2009 at 08:30:32AM -0400, Peter Booth wrote:
It would be interesting to know whether sar data was captured during
this time. From this you could track whether there was any process
creation or destruction occurring.
I just had another lockup this weekend.
Sar (from the host)
12:35:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
12:45:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
12:55:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
01:05:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
01:15:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
Average: all 0.00 0.00 0.00 0.00
0.01 99.98
01:25:53 PM LINUX RESTART
01:35:02 PM CPU %user %nice %system %iowait
%steal %idle
01:45:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
01:55:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
02:05:01 PM all 0.00 0.00 0.00 0.00
0.01 99.99
sar -b
11:55:01 AM 12.22 0.90 11.32 12.90 257.89
12:05:01 PM 13.97 0.49 13.48 7.68 331.48
12:15:01 PM 18.88 7.30 11.59 161.74 260.17
12:25:01 PM 14.34 1.10 13.23 16.53 438.73
12:35:01 PM 9.01 0.43 8.58 6.96 208.50
12:45:01 PM 8.47 0.35 8.12 5.23 186.03
12:55:01 PM 10.00 1.09 8.91 19.22 245.17
01:05:01 PM 11.89 1.82 10.06 27.76 279.90
01:15:01 PM 10.06 0.34 9.72 5.23 214.62
Average: 17.55 6.12 11.43 385.87 369.74
01:25:53 PM LINUX RESTART
01:35:02 PM tps rtps wtps bread/s bwrtn/s
01:45:01 PM 19.01 7.19 11.83 113.49 273.91
01:55:01 PM 12.23 2.44 9.79 37.42 239.82
02:05:01 PM 16.89 2.79 14.10 47.93 422.02
02:15:01 PM 17.09 1.92 15.17 26.93 495.01
02:25:01 PM 13.91 3.42 10.49 164.83 282.82
02:35:01 PM 12.47 2.05 10.42 28.45 256.32
02:45:01 PM 13.67 1.81 11.87 31.78 340.39
sar -c
12:45:01 PM 0.02
12:55:01 PM 0.02
01:05:01 PM 0.02
01:15:01 PM 0.02
Average: 0.03
01:25:53 PM LINUX RESTART
01:35:02 PM proc/s
01:45:01 PM 0.02
01:55:01 PM 0.02
sar -q
12:55:01 PM 0 147 0.00 0.00 0.00
01:05:01 PM 0 147 0.07 0.03 0.01
01:15:01 PM 0 147 0.00 0.00 0.00
Average: 0 147 0.00 0.00 0.00
01:25:53 PM LINUX RESTART
01:35:02 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
01:45:01 PM 0 147 0.00 0.00 0.00
01:55:01 PM 0 147 0.00 0.00 0.00
sar -r
01:05:01 PM 7312568 1878856 20.44 175416 66532
1044184 0 0.00 0
01:15:01 PM 7311948 1879476 20.45 175416 66544
1044184 0 0.00 0
Average: 7328126 1863298 20.27 175403 67011
1044184 0 0.00 0
01:25:53 PM LINUX RESTART
01:35:02 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbswpfree kbswpused %swpused kbswpcad
01:45:01 PM 8620940 570484 6.21 64136 36012
1044184 0 0.00 0
01:55:01 PM 8619824 571600 6.22 64972 36028
1044184 0 0.00 0
02:05:01 PM 8618204 573220 6.24 65800 36040
1044184 0 0.00 0
===============================================================
Now perhaps I have missed something but to me that all looks just
fine. I should setup something to log ps. But in my guests I see steal
pushed through the roof. And its like that for days ahead time. Ive
noticed the steal during the lockups before but either I neglected to
look back several days or forgot what I saw. I didnt recall steal
being at 100% as far back as my logs go.
12:55:01 PM CPU %user %nice %system %iowait
%steal %idle
01:05:01 PM all 0.00 0.00 0.00 0.00
100.00 0.00
01:15:01 PM all 0.00 0.00 0.00 0.00
100.00 0.00
Average: all 0.00 0.00 0.00 0.00
100.00 0.00
01:27:49 PM LINUX RESTART
01:35:01 PM CPU %user %nice %system %iowait
%steal %idle
01:45:01 PM all 4.04 0.00 1.80 0.64
0.02 93.50
01:55:01 PM all 4.10 0.00 1.76 0.31
0.02 93.80
02:05:01 PM all 5.45 0.00 2.47 0.23
0.02 91.83
02:15:01 PM all 7.03 0.00 3.22 0.22
0.02 89.51
02:25:01 PM all 4.82 0.00 2.31 0.18
0.01 92.6
Might also be worth adding a cron entry to append the output of
lsof to a
file every N minutes (perhaps with logrotate enabled) to see if you
can
capture what changed in the running system when this "lockup"
occurred?
Also worth collecting ps output every minute
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
--
Nick Anderson <nick@xxxxxxxxxxxx>
http://www.cmdln.org
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Xen-users] Xen system hang or freeze, Paraic Gallagher
- Re: [Xen-users] Xen system hang or freeze, Nick Anderson
- Re: [Xen-users] Xen system hang or freeze, Paraic Gallagher
- Re: [Xen-users] Xen system hang or freeze, Peter Booth
- Re: [Xen-users] Xen system hang or freeze, Nick Anderson
- Re: [Xen-users] Xen system hang or freeze,
Peter Booth <=
- Re: [Xen-users] Xen system hang or freeze, Nick Anderson
- Re: [Xen-users] Xen system hang or freeze, Peter Booth
- Re: [Xen-users] Xen system hang or freeze, Peter Booth
- Re: [Xen-users] Xen system hang or freeze, Nick Anderson
- Re: [Xen-users] Xen system hang or freeze, Nick Anderson
- Re: [Xen-users] Xen system hang or freeze, Peter Booth
- Re: [Xen-users] Xen system hang or freeze, Nick Anderson
Re: [Xen-users] Xen system hang or freeze, Martin Fernau
|
|
|
|
|