actually attach the logs :)
On Wed, 16 Mar 2011, Stefano Stabellini wrote:
> On Fri, 11 Mar 2011, Konrad Rzeszutek Wilk wrote:
> > On Fri, Mar 11, 2011 at 01:17:23PM +0000, Stefano Stabellini wrote:
> > > Hello,
> > > recently we had a couple of long discussions with Yinghai about boot
> > > crashes on xen, related to pagetable initialization.
> > > As a result we came up with three patches, two of them fix the first [1]
> > > boot crash and provide a nice cleanup on native:
> >
> > I don't know why this is happening now, but it could be very well
> > related to the build config. Smaller builds don't seem to encounter this,
> > while
> > this is a distro type build. If I use:
> >
> > > Stefano Stabellini (1):
> > > xen: set max_pfn_mapped to the last pfn mapped
> >
> > it hangs during bootup. The machine hangs during the box (no keyboard
> > interaction)
> > and I can see this in the bootup.
>
> Konrad sent me few other logs offline: log1 is the log of the hang and
> log2 is a successful boot (reverting the problematic patch).
> It looks like the SP5100 TCO WatchDog Timer Driver is using ioremap on
> an address (0xb8fe00) that belongs to the memory range used for the
> pagetable (0x9fc000-0xf43fff).
> In the succesful case max_pfn_mapped is higher so the pagetable is
> located at an higher address (0x16dfb000-0x17342fff) so the problem
> doesn't occur.
>
> I still have few unaswered questions on this issue: if we assume that
> the ioremap address is the same in the two cases (0xb8fe00), how is it
> possible that in the first case it is ram (page_is_ram returns true)
> while in the second case it is not (otherwise we would still get a
> warning from ioremap): page_is_ram shouldn't be affected by the position
> of the kernel pagetable, and the e820 is still the same.
> In any case if 0xb8fe00 is really an MMIO address memblock_find_in_range
> shouldn't have returned the range (0x9fc000-0xf43fff) in
> find_early_table_space.
> I think that lowering the value of max_pfn_mapped is likely to cause
> bugs like this one, where a low memory range is not properly marked as
> reserved and gets mistakenly used for the pagetable.
>
> Considering that meanwhile Linux 2.6.38 was released with this bug, I
> think is better if we change approach and fix the regression in a more
> straightforward way, like for example:
>
> - 2M align _end;
> - do not clean initial mapping between _brk_end to _end;
> - resurrect the patch "respect memblock reserved regions when
> destroying mappings", trying to minimize the number of memblock reserved
> checks.
>
> Opinions?
>
>
>
> Regarding the other commit "x86-64, mm: Put early page table high" that
> causes a reliable crash on Xen: I noticed that Ingo sent a pull request
> to Linus with this commit included.
> At this point I can send the patch to fix the Xen issue to Linus
> directly, no need to rebased the patch on tip?
>
log1
Description: Text document
log2
Description: Text document
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|