When rebinding user event channels to cpu in dom0, evtchn/evtchn.c
calls rebind_evtchn_to_cpu that changes the cpu_evtchn struct
to mask an event to a specific cpu and notifies xen about this
using an EVTCHNOP_bind_vcpu.
In core/evtchn.c:evtchn_do_upcall, the evtchn_pending array from
the shared info page is checked against this cpu_evtchn mask,
effectively preventing domU from handling events when "upcalled"
on the wrong cpu.
On an EVTCHNOP_close, xen rebinds the event channel to vcpu 0.
This seems to mean that anyone calling EVTCHNOP_close on an event channel
that was bound to a vcpu touching the vcpu_evtchn array, must
rebind the event channel to cpu 0 too, or we are losing the events
(they stay pending, unmasked, and never do it to do_IRQ).
I'm currently seeing such "lost" events, and some debugging shows they're
bound in cpu_evtchn to a vcpu xen does not agree with (seen with
an EVTCHNOP_status call and status->vcpu). I think I tracked it down
to EVTCHNOP_closes() in evtchn/evtchn.c, thought I might miss some other
cases here ?
If this the right cause, maybe a close() wrapper in core/evtchn.c should
be used to avoid such deadlocks, or some other mechanism should ensure we
don't get out of synch with xen ?
Or maybe let xen publish this mask in the shared info page and rely
only on hypercalls for masking/unmasking events ?
The attached patch is an attempt to fix the above -
\o/ Pascal Bouchareine - Gandi
g 0170393757 15, place de la Nation - 75011 Paris
Description: Text document
Xen-devel mailing list