On Sat, Jan 24, 2009 at 11:29 PM, gopikrishnan
<gopikrishnan@xxxxxxxxxxxx> wrote:
>
> From the above result, it appears like everything is normal. Can you give any
> suggestions?
A "normal" device should not trigger
===========
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: rejecting I/O to offline device
============
In my setup I got similar cases happened several times because of
three problems :
(1) the disks were simply busy.
For example, when using some hosting appliances they'd use a lot of
I/O during startup. Putting several hosting domUs on the same dom0 and
starting them all at the same has the effect of making startup takes a
loooooong time.
When this happens :
- "iostsat -x 3" on dom0 during the boot process will show that the
disk is busy with high throughput
- There's no weird messages on syslog
- all you have to do is wait patiently
(2) problems on the SAN switches/connections or HW raid controller
For example, when your SAN switch is rebooted. This would block all
disk I/O for some time, and on some cases can lead to data corruption.
When this happens :
- "iostsat -x 3" on dom0 (on the time the problem occurs) will show
that the disk is busy with very low or no throughput
- depending on your setup, you might get "rejecting I/O to offline
device" messages (check the CONSOLE to be sure, not just
/var/log/messages)
- sometimes the problem seems to "fix itself" without you having to do anything
(3) broken disks or controller
Similar to (2), but this can also happen on local storage. Everything
seemed to work correctly, but when accessing certain data it would
take a loooong time or failed. This one's hardest to diagnose, but
sometimes had the similar symptoms as (2)
>From your earlier mail I suspect it was (3). Then again, from "After a few
>hours
(may be 8-10hrs), all these VPS will come up automatically." it can also be (1).
To be sure though, you'll need to have some diagnostics when the
problem occured :
- how was disk throughput at that time (check with "iostat -x 3" or
similar commands)
- was there any weird messages on the CONSOLE or on /var/log/messages
at that time (depending on the problem, it is possible that error
messages were not written to /var/log/messages)
- what was domU load at that time. Do all domUs uses 100% CPU?
Note that some diagnostics had to be done at the time the probelm
occured, not AFTER.
Good luck!
Regards,
Fajar
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|