This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-devel] RE: PV resume failed after self migration failed

> Subject: RE: PV resume failed after self migration failed
> Date: Mon, 20 Jun 2011 09:11:59 +1000
> From: james.harper@xxxxxxxxxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
> > >
> > > Windows will invoke a scsi reset if a request takes too long to
> complete
> > > (5 seconds I think). It will also issue a reset when a crash dump
> > > starts, just to make sure all previous requests are flushed etc.
> > >
> > Thanks for the help, sorry for the late response, I've been leaving a
> while
> > lase weekend.
> >
> > If VBD is already suspended, all further IO try to issue will find vbd
> states
> > is not SR_STATE_RUNNING,
> > thus calls ScsiPortNotification to notify RequestComplete, right?
> >
> > If so, I have an assumption.
> > at time t, VBD is suspend, an IO is try to issue, but before it calls
> > ScsiPortNotificaiton, the whole
> > VM paused(VCPU paused, last step of step), 10 or more seconds later,
> if VM
> > resumes, will the driver
> > found the IO mentioned before has already timed out and trigger
> > XenVbd_HwScsiResetBus?
> >
> The xenvbd driver doesn't do any timeout, windows does the timeout and
> tells xenvbd to reset. I haven't tested the scenario you describe very
> recently, and xenvbd is now two different drivers, one for scsiport (<=
> 2003) and one for storport (>= Vista), so there could be bugs in either.
The bug can be reproduced in 2003 32bit system. We are using scsi driver.
I put some log in XenVbd_HwScsiResetBus to see if there are not completed srb(Like below)
but I didn't see the log when XenVbd_HwScsiResetBus called. So No IO is in queue.  
 for (i = 0; i < MAX_SHADOW_ENTRIES; i++)
      if (xvdd->shadows[i].srb)
        KdPrint((__DRIVER_NAME "    in-flight srb %p with status SRB_STATUS_BUS_RESET\n", xvdd->shadows[i].srb));
Right now, I don't think it is related to bus reset.  From the log, it looks like an event is not acked.
The log shows that PV Resuming is waiting xppdd->device_state.suspend_resume_state_fdo to change but failed.
that is :  XenPci_Pdo_Resume->XenPci_Pdo_ChangeSuspendState(device, SR_STATE_RESUMING)->
-> KeWaitForSingleObject(&xpdd->pdo_suspend_event, Executive, KernelMode, FALSE, NULL);
It is assumed that the change should happen in XenVbd_HwScsiInterrupt.
But for some reason the if statement in XenVbd_HwScsiInterrupt(xenvbd_scsiport.c:920) return False.
             /* in dump mode I think we get called on a timer, not by an actual IRQ */
                 if (!dump_mode && !xvdd->vectors.EvtChn_AckEvent(xvdd->vectors.context, xvdd->event_channel, &last_interrupt))
                         return FALSE; /* interrupt was not for us */
Since the event is not acked, that's why in EvtChn_EvtInterruptIsr, print out a log like "Unacknowledged event word = 0, val = 00000200"
12952670574140: XenPCI --> XenPci_BalloonEnableHandler
12952670574140: XenPCI     Unacknowledged event word = 0, val = 00000200 
12952670574140: XenPCI  receive balloon enable = (1308226300.21:0)
12952670574156: XenPCI     Balloon enable change to 0
12952670574156: XenPCI  successfull got BalloonEnableChangedEvent
I will try to take a close look EvtChn_EvtInterruptIsr to get more understanding. Thanks.

> James
Xen-devel mailing list