|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] E1000 hanging
I analyzed the e1000-ethernet driver hanging the system during reboot
reboot. I found out, that the system worked properly when exchanging
the "ms_delay" by an "ms_delay_irq". When looking into it I recognized,
that the process goes to sleep when applying the first function, while
it loops for the approximate time when applying the latter function.
Further analysis showed, that the system schedules "xen_idle" which in
turn determines the "next timer".
I wrote quite a lot of various code insertions both in the kernel an within
the xen hypervisor. I found out, that the timer which is in fact specified
is essentially in the infinite future. This causes the sleep call never to
wake up. It wakes up immediately at any keyboard click, and from what I saw
in the scheduling code of the hypervisior ( the DF_BLOCKED flag seems to
become cleared on receipt of an interrupt ), this seems to be the case for
any interrupt. I managed to "fix" the problem by adding a timer a 10th second
in the future within "do_block" in the hypervisor. The problem seems to be
that the timer has elapsed, between the time the system decides to schedule
xen_idle and the moment it determines the "next timer".
Addmitedly I all of this with xen.2.0.5. I used the 2.0.6 hypervisor
without recompiling the kernel and it showd the same behavior. I tried
xen-unstable yesterday, but the kernel failed to initialize and the
last message I got was infroming that it was initializing the sata-disks.
It hung and showed now devices.
I will try 2.0.6 next week by compiling the kernel completely. Looking
into the routines "xen_idle", "set_timeout_timer" and
"next_timer_interrupt" I found no changes at first sight. So I do not
expect the behavior to change.
In the unstable version the check for local_softirq pending seems to
be a candidate to fix the problem, because the system seems to be woken
up at the next clock-tick. And the cleck for pending events in
do_block AFTER seting the (now-called) _VCPUF_BLOCKED flag and ( in
case ) clearing it again, seems to do the job.
Did this problem no show up at other "short-term" waits than in the
e1000 driver ?
It occured there within the e1000_hw_reset routine.
Is it good advice to try the "unstable" version ?
Thanks in advance
Peter Bier
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|