This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [PATCH] Re: [Xen-devel] Xen crash on dom0 shutdown

To: "Shan, Haitao" <haitao.shan@xxxxxxxxx>, 'Keir Fraser' <keir.fraser@xxxxxxxxxxxxx>, 'Jan Beulich' <jbeulich@xxxxxxxxxx>
Subject: RE: [PATCH] Re: [Xen-devel] Xen crash on dom0 shutdown
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Thu, 25 Sep 2008 21:24:31 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "'xen-devel@xxxxxxxxxxxxxxxxxxx'" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 25 Sep 2008 06:25:45 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <F6473715D25C9E46A5515027E5482F100887BDB5C6@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C4FFC32C.27684%keir.fraser@xxxxxxxxxxxxx> <C4FFE700.276F3%keir.fraser@xxxxxxxxxxxxx> <F6473715D25C9E46A5515027E5482F100887BDB5C6@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AckeJb2U/BQsZIoYEd2egwAX8io7RQAFVrwGADNJP7AAAgf1gA==
Thread-topic: [PATCH] Re: [Xen-devel] Xen crash on dom0 shutdown
After the pirq is unbind for MSI, the vector is in fact free and action is 
freed, but it is not cleared in pirq_to_vector mapping in domain still.

Yunhong Jiang

Shan, Haitao <> wrote:
> Hi, Keir,
> As I tested my patch today, I find cs18539 breaks my system. I
> have to revert this changeset to test MSI.
> Here is the output from serial port.
> (XEN) ----[ Xen-3.4-unstable  x86_64  debug=n  Not tainted ]---- (XEN) CPU:
> 2 (XEN) RIP:    e008:[<ffff828c8013c2f9>] pirq_guest_unbind+0xb9/0x2f0
> (XEN) RFLAGS: 0000000000010082   CONTEXT: hypervisor
> (XEN) rax: ffff828c80274d80   rbx: 0000000000000000   rcx: 00000000000000d0
> (XEN) rdx: ffff83007c60fe38   rsi: 00000000000000ff   rdi: ffff83007c80e080
> (XEN) rbp: ffff83007c80e080   rsp: ffff83007c60fe28   r8: 0000000000000282
> (XEN) r9:  ffff828c80274da4   r10: ffff828c8026e580   r11: 0000000000000206
> (XEN) r12: ffff828c80274d80   r13: 00000000000000d0   r14: 00000000000000ff
> (XEN) r15: 00000000000000ff   cr0: 000000008005003b   cr4: 00000000000026b0
> (XEN) cr3: 000000005ea9c000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83007c60fe28:
> (XEN)    0000000000000282 ffff83007c80e080 0000000000000282 fffffffffffffffd
> (XEN)    ffff83007c80e080 00000000000000ff 00000000000000d0 ffff8800204408b0
> (XEN)    00000000000000ff ffff828c80149075 00000008000000ff ffffffff803908d6
> (XEN)    828c801a98a0b948 c390ec90d1ffffff ffff83007c80e1f8 ffff83007c60fed8
> (XEN)    ffff828c8025f900 ffff83007c80d080 000000ff804e7ff0 ffffffffffffffff
> (XEN)    0000000000000000 0000000000000000 000000010000001a 00a0fb000000001a
> (XEN)    00000001801fe568 ffff83007d5e6080 00000000000000ff ffff8800204408b0
> (XEN)    0000000000000000 ffff8800204408b0 ffff880020440000 ffff828c801a9169
> The actually call trace is do_physdev_op->pirq_guest_unbind,
> which definitely is in unmap_pirq hypercall for MSI interrupt.
> This will happen when dom0 first unbind guest pirq and then
> unmap that pirq. I think this little patch will fix the
> problem. But I am not quite sure whether this will break the intension of
> cs 18539.
> diff -r 7a32c2325fdc xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c        Thu Sep 25 10:26:08 2008 +0100
> +++ b/xen/arch/x86/irq.c        Thu Sep 25 21:12:03 2008 +0800 @@ -619,6
>     +619,8 @@ }
>     action = (irq_guest_action_t *)desc->action;
> +       if ( !action )
> +               goto out;
>     vector = desc - irq_desc;
>     for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
> Signed-off-by:        Shan haitao <haitao.shan@xxxxxxxxx>
> Best Regards
> Haitao Shan
> Keir Fraser wrote:
>> On 24/9/08 10:13, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote:
>>>> How about this one? It doesn't exactly follow the path you
>>>> suggested, i.e. doesn't mess with event channels, but rather marks
>>>> the irq<->vector mapping as invalid, allowing only a subsequent
>>>> event channel unbind (i.e. close) to recover from that state (which
>>>> seems better in terms of requiring proper discipline in the guest,
>>>> as it prevents re-using the irq or vector without properly cleaning
>>>> up).
>>> Yeah, this is the kind of thing I had in mind. I will work on this a
>>> bit more (e.g., need to synchronise on d->evtchn_lock to avoid racing
>>> EVTCHNOP_bind_pirq; also I'm afraid about leaking MSI vectors on
>>> domain destruction). Thanks.
>> Changeset 18539. I made the locking quite a bit stricter, by getting
>> rid of 'irq_lock' and instead using 'evtchn_lock' (which may be
>> better renamed now to 'event_lock' since it isn't protecting just
>> event channels now). Now the pirq-to-vector mapping is protected by
>> both evtchn_lock and irq_desc->lock. A user of the mapping can
>> protect themselves with either lock (whichever is most convenient).
>> Some TODOs:
>>  * There is no management of MSI vectors. They always leak! I didn't
>> fix that here since it isn't a mere synchronisation race; the code just
>>  isn't there. * I decided to WARN_ON(!spin_is_locked(&d->evtchn_lock)) in
>> pirq_guest_[un]bind(). The reason is that in any case those functions
>> do not expect to be re-entered -- they really want to be per-domain
>> serialised. Furthermore I am pretty certain that the HVM passthrough
>> code is not synchronising properly with changes to the pirq-to-vector
>> mapping (it uses domain_irq_to_vector() in many places with no care
>> for locking) nor is it synchronised with other users of
>> pirq_guest_bind() --- a reasonable semantics here would be that a
>> domain pirq can be bound to once, either via an event channel, or
>> through a virtual PIC in HVM emulation context. I therefore think
>> that careful locking is required -- it may suffice to get rid of (or
>> at least make less use of) the hvm_domain.irq_lock and replace its
>> use with evtchn_lock (only consideration is that the latter is not
>> IRQ-safe). The WARN_ON() is a nice reminder of work to be done here. ;-)
>> Comments?
>>  -- Keir

Xen-devel mailing list