This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


Re: [Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with broken frontend/

On Mon, 2011-06-20 at 12:49 -0400, Ian Jackson wrote:
> Daniel Stodden writes ("[Xen-devel] [PATCH 1 of 1] xen-backwatch: Deal with 
> broken frontend/backend ring I/O"):
> > Adds tool support to debug backends which expose I/O ring state in
> > sysfs. Currently supports /sys/devices/xen-backend/vbd-*-*/io_ring
> > nodes for block I/O, where implemented.
> Thanks.
> > Primary function is to observe ring state make progress over a period
> > of time, then report stuck message queue halves where pending
> > consumer/event are not moving.
> This seems to have only one entry in COMMANDS, "check".  Is that
> right ?  

The <command> thing should allow alternative ways to run it without
breaking existing deployments. I used to think about a 'daemon', but
then found that cron would likely do the job.

> And it doesn't seem to provide a way to specify a particular
> domain to look for ?

I briefly considered it initially, but after testing it just didn't look
so important anymore. :}

Presently, a 

# xen-ringwatch check -v 
RingWatch(vbd-1-51760/io_ring)[IDLE]: RingState(size=32, Req(prod=31, cons=31, 
event=32), Rsp(prod=31, pvt=31, event=32)): io: complete, req: complete, rsp: 
RingWatch(vbd-1-51712/io_ring)[BUSY]: RingState(size=32, Req(prod=143236466, 
cons=143236466, event=143236467), Rsp(prod=143236459, pvt=143236459, 
event=143236460)): io: pending, req: complete, rsp: complete

will to dump the entire set of running backends, independent of state.

I should point out there's not really a significant overhead involved,
except some required wait period to come to a conclusion. It's all
glob/read/write/wait and all VBDs are watched in parallel. But even with
50 VMs, at some point I anticipated people to rather grep instead.

Here's a sample crontab invocation:

xen-ringwatch check -T 4 --kick | logger -p daemon.crit -t RINGWATCH-ALERT

Which will remain silent, until it actually discovers some watched
subset to .kick() and then outputs those, exclusively.

Jun 20 13:26:59 localhost RINGWATCH-ALERT: 
RingWatch(vbd-1-51712/io_ring)[STCK]: RingState(size=32, Req(prod=146141561, 
cons=146141561, event=146141562), Rsp(prod=146141561, pvt=146141561, 
event=146141530)): io: complete, req: complete, rsp: pending

> I'm happy to take it as-is as it seems like a better-than-nothing tool
> but I just wanted to check I'd understood it, first.

Found that the patch I sent was missing cleanup in some spots (mainly a
program rename, and the verbose variable in __main__ ended up off by
one). Can I sneak in the update attached before you push it?

Also, I never tried the make install target. Does it look okay to you?


Attachment: xen-ringwatch.diff
Description: Text Data

Xen-devel mailing list