On 09/27/2010 11:46 AM, Daniel Stodden wrote:
> On Mon, 2010-09-27 at 03:41 -0400, Andrew Jones wrote:
>> On 09/24/2010 08:50 PM, Jeremy Fitzhardinge wrote:
>>> On 09/24/2010 12:14 AM, Andrew Jones wrote:
>>>> On 09/23/2010 08:36 PM, Jeremy Fitzhardinge wrote:
>>>>> On 09/23/2010 09:38 AM, Paolo Bonzini wrote:
>>>>>> On 09/23/2010 06:23 PM, Jeremy Fitzhardinge wrote:
>>>>>>>> Any developments with this? I've got a report of the exact same
>>>>>>>> warnings
>>>>>>>> on RHEL6 guest. See
>>>>>>>>
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=632802
>>>>>>>>
>>>>>>>> RHEL6 doesn't have the 'Move blkif_interrupt into a tasklet' patch, so
>>>>>>>> that can be ruled out. Unfortunately I don't have this reproducing on a
>>>>>>>> test machine, so it's difficult to debug. The report I have showed
>>>>>>>> that
>>>>>>>> in at least one case it occurred on boot up, right after initting the
>>>>>>>> block device. I'm trying to get confirmation if that's always the case.
>>>>>>>>
>>>>>>>> Thanks in advance for any pointers you might have.
>>>>>>> Yes, I see it even after reverting that change as well. However I only
>>>>>>> see it on my domain with an XFS filesystem, but I haven't dug any deeper
>>>>>>> to see if that's relevant.
>>>>>>>
>>>>>>> Do you know when this appeared? Is it recent? What changes are in the
>>>>>>> rhel6 kernel in question?
>>>>>> It's got pretty much everything in stable-2.6.32.x, up to the 16 patch
>>>>>> blkfront series you posted last July. There are some RHEL-specific
>>>>>> workarounds for PV-on-HVM, but for PV domains everything matches
>>>>>> upstream.
>>>>> Have you tried bisecting to see when this particular problem appeared?
>>>>> It looks to me like something is accidentally re-enabling interrupts -
>>>>> perhaps a stack overrun is corrupting the "flags" argument between a
>>>>> spin_lock_irqsave()/restore pair.
>>>>>
>>>> Unfortunately I don't have a test machine where I can do a bisection
>>>> (yet). I'm looking for one. I only have this one report so far, and it's
>>>> on a production machine.
>>>
>>> The report says that its repeatedly killing the machine though? In my
>>> testing, it seems to hit the warning once at boot, but is OK after that
>>> (not that I'm doing anything very stressful on the domain).
>>>
>>
>> It looks like the crash is from failing to read swap due to a bad page
>> map. It's possibly another issue, but I wanted to try and clean this
>> issue up first to see what happens.
>
> Uh oh. Sure this was a frontend crash? If you see it a again, a stack
> trace to look at would be great.
>
Hi Daniel,
You can take a look at this bug
https://bugzilla.redhat.com/show_bug.cgi?id=632802
there's stacks for the swap issue in the comments and also this attached
dmesg
https://bugzilla.redhat.com/attachment.cgi?id=447789
Thanks,
Drew
> Thanks,
> Daniel
>
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|