[Xen-ia64-devel] Any hint about a weird behavior about scheduler

To:	"Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>
Subject:	[Xen-ia64-devel] Any hint about a weird behavior about scheduler?
From:	"Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date:	Wed, 25 Jan 2006 11:36:36 +0800
Cc:	xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date:	Wed, 25 Jan 2006 03:45:19 +0000
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
List-help:	<mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id:	Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post:	<mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
Sender:	xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcYhYItPyvGm0f+vRga13WlrEENmgA==
Thread-topic:	Any hint about a weird behavior about scheduler?

Hi, Keir,
        I'm seeing a strange phenomenon when running VTI domain on
latest xen-ia64-unstable.hg. It's definite some IA64 specific bug, but
hope you may have some hint for it. ;-)

        Sometimes when VTI domain is accessing MMIO, flow goes to
do_block and then __enter_scheduler and delete from runqueue. Before
__enter_scheduler, everything is verified OK.

        Later when ioreq is serviced and VTI domain is woken up, flow
resumes to the point after __enter_scheduler in do_block (XEN/IA46
doesn't reset stack pointer and stack is per-vp). However a check at
that point shows VTI domain is still off the runqueue with next pointer
as null. Both stack, current pointer and other control registers are the
very content of VTI domain, except schedule_data[cpu].curr points to
dom0.

        Then next schedule will throw out assertion, since current
running VTI domain is not on the runqueue. domU runs well, maybe because
there're more context shared and no block happens there. Actually the
error point is random, and I do observe many ioreq emulated
successfully.

        The possible reason seems from your recent change to reduce lock
critical region in __enter_scheduler, where spin_unlock_irq is promoted
to the point before context_switch. That's obvious good, however to
leave a small window with interrupt enabled which may not be handled
correctly by current ia64 code. To move spin_unlock_irq backward after
context_switch, everything works well then.

        So Keir, have you seen any similar phenomenon per your
experience before? If yes, how about the cause? That may be different as
IA64, which however provide invaluable hint to help track down bogus
code on IA64.

P.S.
        - BVT is the default scheduler along this issue
        - The patch I sent out to disable interrupt in context_switch is
for another random issue, which can't fix current one.
        - I'll be in vocation for Chinese New Year from today til
Feb.13. So there'll be no mail check to track this issue. So sorry if
your helpful answer is there however without my follow-up. ;-) BTW, if
anyone else can reproduce it, it would be helpful to track it down.

Thanks,
Kevin

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

WARNING - OLD ARCHIVES

xen-ia64-devel

[Xen-ia64-devel] Any hint about a weird behavior about scheduler?