On 27/09/2011 19:13, Christopher S. Aker wrote:
> On 10/11/10 5:44 PM, Christopher S. Aker wrote:
>> In an effort to fix the problem described in my previous xen-devel post
>> ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've
>> come across another problem. 3ware 9690SA cards to not behave under Xen
>> 4.1 (as of cs 22155).
>>
>> We have a simple Xen thrash test suite which fires up domUs that do
>> different workloads (some swap thrash, some kernel build, some spin
>> CPUs, some cycle rebooting, etc). Almost immediately after launching the
>> suite we can get the 3ware 9690SA card to fail with something like the
>> following:
>>
>> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting
>> card.
>> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting
>> card.
>> sd 0:0:0:0: rejecting I/O to offline device
>> sd 0:0:0:0: rejecting I/O to offline device
>>
>> Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so:
>>
>> http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt
>>
>> Results matrix:
>>
>> +---------------------------------------------------------------+
>> | Xen | Dom0 | 9550SXU | 9690SA | 9750 |
>> +---------------------------------------------------------------+
>> | 3.4.1 | 2.6.18.8-931-2 | OK | OK | OK |
>> | 3.4.4-rc1-pre | 2.6.18.8-931-2 | OK | OK | OK |
>> | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK | OK | OK |
>> | 4.1 @ 22155 | 2.6.18.8-931-2 | OK | FAIL | OK |
>> | 4.1 @ 22155 | 2.6.32.23-g41a85de5 | OK | FAIL | OK |
>> +---------------------------------------------------------------+
>>
>> The failures were verified on at least 2 machines of identical
>> specification.
>>
>> The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb
>> under Xen 4.1.
> I'm back at this, and the problem still exists with a 4.1.1/3.0.4 stack.
>
> Konrad, in the "offline raid" thread you asked for the following debug
> information:
>
> http://www.theshore.net/~caker/xen/BUGS/offline-raid/
>
> The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid
> card to hang up (but before it timed out and started spewing to the
> console).
>
> Oddly, lspci shows three devices assigned IRQ 16, however
> /proc/interrupts only lists two of them. Side effect of MSI?
>
> Also, the problem still happens even with MSI disabled (pci=nomsi).
>
> Thanks,
> -Chris
This is almost certainly the bug to do with not ack'ing a migrating line
level interrupt which I fixed in c/s 23145:1092a143ef9d. Try applying
that patch, or just running from the tip of
http://xenbits.xen.org/hg/xen-4.1-testing.hg/
~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|