On Mon, 2011-06-20 at 12:49 -0400, Ian Jackson wrote:
> Daniel Stodden writes ("[Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with
> broken frontend/backend ring I/O"):
> > Adds tool support to debug backends which expose I/O ring state in
> > sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
> > nodes for block I/O, where implemented.
> > Primary function is to observe ring state make progress over a period
> > of time, then report stuck message queue halves where pending
> > consumer/event are not moving.
> This seems to have only one entry in COMMANDS, "check". Is that
> right ?
The <command> thing should allow alternative ways to run it without
breaking existing deployments. I used to think about a 'daemon', but
then found that cron would likely do the job.
> And it doesn't seem to provide a way to specify a particular
> domain to look for ?
I briefly considered it initially, but after testing it just didn't look
so important anymore. :}
# xen-ringwatch check -v
RingWatch(vbd-1-51760/io_ring)[IDLE]: RingState(size=32, Req(prod=31, cons=31,
event=32), Rsp(prod=31, pvt=31, event=32)): io: complete, req: complete, rsp:
RingWatch(vbd-1-51712/io_ring)[BUSY]: RingState(size=32, Req(prod=143236466,
cons=143236466, event=143236467), Rsp(prod=143236459, pvt=143236459,
event=143236460)): io: pending, req: complete, rsp: complete
will to dump the entire set of running backends, independent of state.
I should point out there's not really a significant overhead involved,
except some required wait period to come to a conclusion. It's all
glob/read/write/wait and all VBDs are watched in parallel. But even with
50 VMs, at some point I anticipated people to rather grep instead.
Here's a sample crontab invocation:
xen-ringwatch check -T 4 --kick | logger -p daemon.crit -t RINGWATCH-ALERT
Which will remain silent, until it actually discovers some watched
subset to .kick() and then outputs those, exclusively.
Jun 20 13:26:59 localhost RINGWATCH-ALERT:
RingWatch(vbd-1-51712/io_ring)[STCK]: RingState(size=32, Req(prod=146141561,
cons=146141561, event=146141562), Rsp(prod=146141561, pvt=146141561,
event=146141530)): io: complete, req: complete, rsp: pending
> I'm happy to take it as-is as it seems like a better-than-nothing tool
> but I just wanted to check I'd understood it, first.
Found that the patch I sent was missing cleanup in some spots (mainly a
program rename, and the verbose variable in __main__ ended up off by
one). Can I sneak in the update attached before you push it?
Also, I never tried the make install target. Does it look okay to you?
Description: Text Data
Xen-devel mailing list