I've been tracking down a bug where a multi-vcpu VM hangs in the
hvmloader on credit2, but not on credit1. It hangs while trying to
bring up extra cpus.
It turns out that an unintended quirk in credit2 (some might call it a
bug) causes a scheduling order which exposes a race in the vlapic
init_sipi tasklet handling code.
The code as it stands right now, is meant to do this:
* v0 does an APIC ICR write with APIC_DM_STARTUP, trapping to Xen.
* vlapic code checks to see that v1 is down (vlapic.c:318); finds that
it is down, and schedules the tasklet, returning X86_EMUL_RETRY
(vlapic.c:270)
* Taslket runs, brings up v1.
* v1 starts running.
* v0 re-executes the instruction, finds that v1 is up, and returns
X86_EMUL_OK, allowing the instruction to move forward.
* v1 does some diagnostics, and takes itself offline.
Unfortunately, the credit2 scheduler almost always preempts v0
immediately, allowing v1 to run to completion and bring itself back
offline again, before v0 can re-try the MMIO. It looks like this:
* v0 does APIC ICR APIC_DM_STARTUP write, trapping to Xen.
* vlapic code checks to see that v1 is down; finds that it is down,
schedules the tasklet, returns X86_EMUL_RETRY
* Tasklet runs, brings up v1
* Credit 2 pre-empts v0, allowing v1 to run
* v1 starts running
* v1 does some diagnostics, and takes itself offline.
* v0 re-executes the instruction, finds that v1 is down, and again
schedules the tasklet and returns X86_EMUL_RETRY.
* For some reason the tasklet doesn't actually bring up v1 again
(presumably because it hasn't had an APIC_DM_INIT again); so v0 is
stuck doing X86_EMUL_RETRY forever.
The problem is that VPF_down is used as the test to see if the tasklet
has finished its work; but there's no guarantee that the scheduler
will run v0 before v1 has come up and gone back down again.
I discussed this with Tim, and we agreed that we should ask you.
One option would be to simply make vlapic_schedule_sipi_init_ipi()
always return X86_EMUL_OK, but we weren't sure if that might cause
some other problems.
The "right" solution, if synchronization is needed, is to have an
explicit signal sent back that the instruction can be allowed to
complete, rather than relying on reading VPF_down, which may cause
races.
Thoughts?
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|