| 
 
Hi: 
  
     Well, AFAK, There is a KeSwapProcessOrStack thread in Windonws kernel to swap in/out thread 
kernel Stack, and it is possible to cause BOSD code 0x77/0x7E, Which means the IO page requestion can 
not be complete successfully due to disk fail. This is reproduceable by periodically "gdb attach tapdsik" 
process in dom0, to simulate IO large response, larger than 10s. 
  
      In fact, the IO stream from tapdisk is written to our own storage cluster, and it supports 
failover, but it takes time,  so it means, when failover, the IO is hang from VM side. When this  
happen, we confront some bluescreens. 
  
     Also I've done some experiments, test two scenerios,  
     1) use current XenVbd_HwScsiResetBus, that is complete IO with SRB_STATUS_BUS_RESET 
     2) do nothing in XenVbd_HwScsiResetBus 
    Just use gdb tapdisk to hold IO periodically, it shows that 1) makes higher possibilty blue  
screen than 2)(in fact, we have'nt met bluescreen in 2)). 
  
     Form the log, I see XenVbd_HwScsiResetBus every 14seconds( 10 Seconds + 4S hold time) 
in scenerio 1), but in 2) I just saw a fem of them(less than 10), It looks like the driver call resetbus 
on a few of times. 
  
     So, I have below assumptions or questions: 
     1) Only some of the IO failure will cause BOSD 
     2) Do nothing in XenVbd_HwScsiResetBus  is relatively good to minimize the bluescreen posibity 
     3) Well, I still confuse how is XenVbd_HwScsiResetBus called, and why XenVbd_HwScsiResetBus not 
called if nothing to be done in XenVbd_HwScsiResetBus. 
     4) Is it ok do nothing in XenVbd_HwScsiResetBus? 
  
      Could you help to clarify? Many thanks. 
  
  
      
      		 	   		   
 |