On Wednesday 18 August 2010 16:59:30 Ian Campbell wrote:
> On Wed, 2010-08-18 at 15:02 +0100, Christoph Egger wrote:
> > On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote:
> > > > In unlock_pages, the address and length passed to munlock() is:
> > > >
> > > > laddr 0x7f7ffdfe7000, llen 0x2000
> > > >
> > > > The reason why munlock() fails is that mlock() hasn't been called
> > > > before. The hcall_buf_prep() is not called at all before the first
> > > > call to _xc_clean_hcall_buf().
> > >
> > > If hcall_buf_prep() has never been called then
> > > "pthread_getspecific(hcall_buf_pkey)" should return NULL and
> > > _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf.
> > > _xc_clean_hcall_buf also ignores NULL values itself.
> >
> > Who calls hcall_buf_prep() in your case ?
> >
> > Only hypercalls call hcall_buf_prep().
> > What if no hypercalls are not called during xend startup ?
>
> Then I would have expected pthread_getspecific(hcall_buf_pkey) to return
> NULL (because _xc_init_hcall_buf was never called) and therefore for
> xc_clean_hcall_buf to not doing any unlocking.
>
> However I think my expectation was wrong. If _xc_init_hcall_buf is never
> called then hcall_buf_pkey is undefined but not necessarily invalid --
> and it seems to be the case on your system that it turns out to be valid
> (perhaps pthread_key_t is valid on NetBSD and invalid on Linux or
> something like that) and therefore we try an unlock some random address.
To make it even more mysterious, the "random" address is always the same
even across machine reboots.
>
> My updated patch ensured that hcall_buf_pkey is always initialised
> before use.
Yes, but we also need to figure out why hcall_buf_prep is never called.
Who calls hcall_buf_prep() on your machine ?
Can you provide a call trace when hcall_buf_prep() is called the first time,
please ?
> > If you call xc_clean_hcall_buf() from xc_interface_close()
> > then you should also call hcall_buf_prep() from xc_interface_open().
> >
> > > However you say that hcall_buf_pkey is not NULL, but rather contains a
> > > valid hcall_buf containing 0x7f7ffdfe7040.
> >
> > hcall_buf itself has the address 0x7f7ffdfe7000.
> >
> > hcall_buf->buf has the address 0x7f7ffdfe7040.
>
> That's very odd -- hcall_buf->buf is allocated with xc_memalign and
> therefore should be page aligned. Are you sure the addresses aren't the
> other way round?
Yes, I am.
>
> > > The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a
> > > non-NULL value is in hcall_buf_prep(), so it must have been called at
> > > some point.
> >
> > In that case, I am puzzled why I don't get the trace.
> > Something really fishy is going on.
> >
> > > Please can you confirm if _xc_init_hcall_buf() is ever called and what
> > > the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if
> > > _xc_init_hcall_buf() has never been called. I think it is supposed to
> > > return NULL in this case and we certainly rely on that.
> >
> > _xc_init_hcall_buf() is not called. pthread_getspecific() should return
> > NULL but doesn't.
> >
> > I am starting to ask myself "How did libxc ever work?". It feels like we
> > are hunting down a long-term hidden bug.
>
> Previously _xc_clean_hcall_buf would be called IFF hcall_buf_prep had
> been called. My patch changed this to also be called on close (even if
> hcall_buf_prep was never called) and could therefore access an
> uninitialised hcall_buf_pkey.
Calling _xc_clean_hcall_buf() unconditionally and hcall_buf_prep()
conditionally sounds to me like calling free() unconditionally
and malloc() conditionally.
I will give calling hcall_buf_prep() from xc_interface_open() a try with your
patch tomorrow.
> I am reasonably confident that before my patch libxc was OK.
And is ok again after it has been backed out. :)
> > > pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on
> > > error, however hcall_buf_pkey is uninitialised until
> > > _xc_init_hcall_buf, perhaps on NetBSD the uninitialised value somehow
> > > looks valid? It's not clear what the correct value to initialise a
> > > pthread_key_t to in order for it to appear invalid until it is properly
> > > setup is, but I suppose we should be initialising it before use. Please
> > > can you try this patch:
> >
> > I tried the replacement patch from the other mail.
> > With it, xend does not crash, hcall_buf is NULL,
> > pthread_getspecific() returns NULL,
>
> OK, I think that suggests that my updated patch does the right thing
> here.
Is it possible that xend can call xc_interface_close() during startup
and hcall_buf_prep() later when xend comes in interaction with xm ?
> > and I am not able to start a guest with 'xm'
> >
> > Xend has probably crashed! Invalid or missing HTTP status code.
>
> There was another HTTP (XML/RPC) related mail on the list this morning
I saw this mail. No, I don't think it is related to this.
> -- is this related to that? Are you sure it is related to the libxc
> patch?
Yes.
> (did you by any chance update to python2.7 recently?)
No, I am on python 2.5.
> > > If that doesn't work perhaps you can reduce the issue to a simple test
> > > case like the attached? (which doesn't reproduce the issue for me on
> > > Linux) If you can do that then please run it with the attached libxc
> > > patch and post the output.
> >
> > xc_interface is 0x7f7ffdb03800
> > before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> > after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000
> > after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> > xc interface close returned 0
> >
> > No crash. Is this the expected output ?
>
> It looks correct but didn't reproduce the crash so is of limited
> utility.
>
> Ian.
Christoph
--
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|