I need to think about this more, but it looks like you have an L2 page
that has a type count of 1 but hasn't been validated. You're then
looping when you try and increment it to 2 thinking that you're racing
someone else.
Does this happen if you boot with 'nosmp'? I don't really believe it's a
race, but might be worth checking.
Also, it's worth adding a printk into this loop just to check that that
is where you're getting caught.
/* Someone else is updating validation of this page. Wait...
*/
while ( (y = page->u.inuse.type_info) == x )
cpu_relax();
goto again;
We need to figure out how the type count managed to get to one without
the page being validated. I presume you're doing a debug=y build of Xen?
Do you get any warnings about illegal mmu_update attempts when you boot
FreeBSD?
Ian
> Without the ability to continue and only a very basic
> understanding of the page typing code there is not a whole
> lot to go on. Let me know if there is some other bit of
> information that I can provide you with.
>
> -Kip
>
> Before attaching:
> (XEN) 'd' pressed -> dumping registers
> (XEN) CPU: 1
> (XEN) EIP: 0808:[<fc52d59f>]
> (XEN) EFLAGS: 00000246 CONTEXT: hypervisor
> (XEN) eax: 40000001 ebx: 00000000 ecx: fcfe3740 edx: fcfe3740
> (XEN) esi: 00007ff0 edi: 00000001 ebp: fcffbda0 esp: fcffbd58
> (XEN) ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810 cs: 0808
> (XEN) Stack trace from ESP=fcffbd58:
> (XEN) 80000003 00000001 fcfe3740 fcfe3740 fcfe3740 80000003
> 80000004 80000003
> (XEN) 00000000 00007ff0 fcffbda0 [fc52bfec] fd494968 fcfe3740
> fcffbdc0 40000001
> (XEN) 40000001 40000002 fcffbdd0 [fc52c07b] fd494968 25fe0000
> 00000000 00000000
> (XEN) 000003d1 00000000 fcffbde0 [fc52bcec] 00000000 fd494968
> fcffbe00 [fc52c52e]
> (XEN) 0000630f 25fe0000 fcfe3740 [fc52d100] fffffffc 00000000
> fcffe000 00000001
> (XEN) 00000001 ff85b000 fcffbe40 [fc52c889] 0630f061 0000630f
> fcfe3740 000002ff
> (XEN) 00000001 f0000000 f0000000 00000004 f0000001 f0000000
> 000002ff ff85b000
> (XEN) 0000630f fcfe3740 fcffbe60 [fc52d0f0] fd494968 000001fa
> fc5b20c0 [fc53185d]
> (XEN) 40000000 00000002 fcffbeb0 [fc52d771] fd494968 40000000
> fcfe3740 fcfe3740
> (XEN) fcfe3740 80000002 80000003 00000004 00000000 f0000000
> f0000000 00000004
> (XEN) 40000001 f0000000 fd49497c f0000000 f0000000 40000001
> fcffbee0 [fc52c07b]
> (XEN) fd494968 40000000 002ed518 00000000 a089075b 00000001
> fcfe3740 00000000
> (XEN) 00007ff0 fd494968 fcffbfb0 [fc52df98] 0000630f 40000000
> fcfe3740 00000292
> (XEN) fc5781c0 00000001 0019b901 00000000 00804e95 00000000
> a089075b 000000a1
> (XEN) a10955f0 000000a1 00000001 fcfea040 00007ff0 00000001
> fcffbf80 00000000
> (XEN) fcfe3740 00000000 fcfe3740 00000000 a10955f0 000000a1
> 00000000 fcffbf98
> (XEN) c0293bac 0000000c 00000003 [fc515bfc] a08902cd 000000a1
> 00000002 fcfe3740
> (XEN) fcfea040 fd494968 00000000 40000000 00000001 00000001
> 00000000 00000000
> (XEN) 00000001 0000630f c018a19b 00000001 fcfea040 00007ff0
> c0293bc8 [fc54e923]
> (XEN) c0293bac 00000001 00000000 00007ff0 00000001 c0293bc8
> 0000001a 00000000
> (XEN) Call Trace from ESP=fcffbd58:
> (XEN) [<fc52bfec>] [<fc52c07b>] [<fc52bcec>] [<fc52c52e>]
> [<fc52d100>] [<fc52c889>]
> (XEN) [<fc52d0f0>] [<fc53185d>] [<fc52d771>] [<fc52c07b>]
> [<fc52df98>] [<fc515bfc>]
> (XEN) [<fc54e923>]
> (XEN) Waiting for GDB to attach to XenDBG
>
>
> gdb) bt
> #0 0xfc52d59f in get_page_type (page=0xfd494968,
> type=0x25fe0000) at mm.c:1235
> #1 0xfc52c07b in get_page_and_type_from_pagenr
> (page_nr=0x630f, type=0x25fe0000, d=0xfcfe3740) at mm.c:360
> #2 0xfc52c52e in get_page_from_l2e (l2e={l2_lo = 0x630f061},
> pfn=0x630f, d=0xfcfe3740, va_idx=0x2ff) at mm.c:495
> #3 0xfc52c889 in alloc_l2_table (page=0xfd494968) at mm.c:679
> #4 0xfc52d0f0 in alloc_page_type (page=0xfd494968,
> type=0x40000000) at mm.c:1083
> #5 0xfc52d771 in get_page_type (page=0xfd494968,
> type=0x40000000) at mm.c:1269
> #6 0xfc52c07b in get_page_and_type_from_pagenr
> (page_nr=0x630f, type=0x40000000, d=0xfcfe3740) at mm.c:360
> #7 0xfc52df98 in do_mmuext_op (uops=0xc0293bac, count=0x1, pdone=0x0,
> foreigndom=0x7ff0) at mm.c:1499
> #8 0xfc54e923 in test_all_events () at bitops.h:239
> #9 0xc0293bac in ?? ()
>
> (gdb) f 7
> #7 0xfc52df98 in do_mmuext_op (uops=0xc0293bac, count=0x1, pdone=0x0,
> foreigndom=0x7ff0) at mm.c:1499
> 1499 okay = get_page_and_type_from_pagenr(op.mfn, type,
> FOREIGNDOM);
> (gdb) p op
> $9 = {
> cmd = 0x1,
> {
> mfn = 0x630f,
> linear_addr = 0x630f
> },
> {
> nr_ents = 0xc018a19b,
> cpuset = 0xc018a19b
> }
> }
> (gdb) p x
> $1 = 0x40000001
> (gdb) x nx
> 0x40000002: Ignoring packet error, continuing...
> Reply contains invalid hex digit 40
> (gdb) p y
> $2 = 0x40000001
> (gdb) p page->u.inuse.type_info
> $3 = 0x40000001
> (gdb) p x
> $4 = 0x40000001
> (gdb) p nx
> $5 = 0x40000002
> (gdb) p y
> $6 = 0x40000001
> (gdb) p x
> $7 = 0x40000001
> (gdb) p sizeof(page->u.inuse.type_info)
> $8 = 0x4
>
>
>
> On 4/15/05, Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx> wrote:
> > Wild! It really is looping in get_page_type.
> >
> > Any chance you could use the serial debugger to find out what x, nx
> > and y are in the cmpxchg?
> >
> > I've tried to think of duff inputs that could cause it to loop, but
> > I'm not smart enough.
> >
> > Ian
> >
> > > -----Original Message-----
> > > From: Kip Macy [mailto:kip.macy@xxxxxxxxx]
> > > Sent: 15 April 2005 18:13
> > > To: Ian Pratt
> > > Cc: Keir Fraser; xen-devel; ian.pratt@xxxxxxxxxxxx
> > > Subject: Re: [Xen-devel] xm pause causing lockup
> > >
> > > Great, thanks. I'm now running a completely fresh tree from last
> > > night.
> > >
> > > Over the course of several minutes I hit 'd' a number of
> times. The
> > > addresses I got were:
> > >
> > > 0xfc51c742
> > > 0xfc51c746
> > > 0xfc51c74b
> > > 0xfc51c740
> > >
> > > (gdb) x/i 0xfc51c742
> > > 0xfc51c742 <get_page_type+1218>: mov 0x40(%esp,1),%eax
> > > (gdb) x/i 0xfc51c746
> > > 0xfc51c746 <get_page_type+1222>: mov 0x14(%eax),%ebx
> > > (gdb) x/i 0xfc51c74b
> > > 0xfc51c74b <get_page_type+1227>: je 0xfc51c740
> > > <get_page_type+1216>
> > > (gdb) x/i 0xfc51c740
> > > 0xfc51c740 <get_page_type+1216>: repz nop
> > >
> > >
> > > -Kip
> > >
> > > On 4/14/05, Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf
> > > Of Kip Macy
> > > > > Sent: 15 April 2005 05:36
> > > > > To: Keir Fraser
> > > > > Cc: xen-devel
> > > > > Subject: Re: [Xen-devel] xm pause causing lockup
> > > > >
> > > > > To further check this I added:
> > > > > printk("%s %d %d %d %d %d\n", __FUNCTION__, op->cmd,
> > > > > op->mfn, count, success_count, domid); to
> > > > > HYPERVISOR_mmuext_op and something similar to mmu_update.
> > > >
> > > > Is your hypothesis that Xen gets stuck in either the
> mmuext_op or
> > > > mmu_update loops?
> > > > Are you running with watchdog enabled?
> > > >
> > > > It might be good to add a printk at the end so that you can
> > > prove this.
> > > >
> > > > Hitting 'd' on the debug console will give us an EIP on CPU 1.
> > > >
> > > > Ian
> > > >
> > >
> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|