Re: [Xen-devel] xend segfaults when starting

On Wed, 2010-08-18 at 15:02 +0100, Christoph Egger wrote:
> On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote:

> > > In unlock_pages, the address and length passed to munlock() is:
> > >
> > >  laddr 0x7f7ffdfe7000, llen 0x2000
> > >
> > > The reason why munlock() fails is that mlock() hasn't been called before.
> > > The hcall_buf_prep() is not called at all before the first call to
> > > _xc_clean_hcall_buf().
> >
> > If hcall_buf_prep() has never been called then
> > "pthread_getspecific(hcall_buf_pkey)" should return NULL and
> > _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf.
> > _xc_clean_hcall_buf also ignores NULL values itself.
> 
> Who calls hcall_buf_prep() in your case ?
> 
> Only hypercalls call hcall_buf_prep().
> What if no hypercalls are not called during xend startup ?

Then I would have expected pthread_getspecific(hcall_buf_pkey) to return
NULL (because _xc_init_hcall_buf was never called) and therefore for
xc_clean_hcall_buf to not doing any unlocking.

However I think my expectation was wrong. If _xc_init_hcall_buf is never
called then hcall_buf_pkey is undefined but not necessarily invalid --
and it seems to be the case on your system that it turns out to be valid
(perhaps pthread_key_t is valid on NetBSD and invalid on Linux or
something like that) and therefore we try an unlock some random address.

My updated patch ensured that hcall_buf_pkey is always initialised
before use.

> If you call xc_clean_hcall_buf() from xc_interface_close()
> then you should also call hcall_buf_prep() from xc_interface_open().
> 
> > However you say that hcall_buf_pkey is not NULL, but rather contains a
> > valid hcall_buf containing 0x7f7ffdfe7040.
> 
> hcall_buf itself has the address 0x7f7ffdfe7000.
> 
> hcall_buf->buf has the address 0x7f7ffdfe7040.

That's very odd -- hcall_buf->buf is allocated with xc_memalign and
therefore should be page aligned. Are you sure the addresses aren't the
other way round?

> > The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a non-NULL
> > value is in hcall_buf_prep(), so it must have been called at some point.
> 
> In that case, I am puzzled why I don't get the trace.
> Something really fishy is going on.
> 
> > Please can you confirm if _xc_init_hcall_buf() is ever called and what
> > the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if
> > _xc_init_hcall_buf() has never been called. I think it is supposed to
> > return NULL in this case and we certainly rely on that.
> 
> _xc_init_hcall_buf() is not called.  pthread_getspecific() should return NULL
> but doesn't.
> 
> I am starting to ask myself "How did libxc ever work?". It feels like we are
> hunting down a long-term hidden bug.

Previously _xc_clean_hcall_buf would be called IFF hcall_buf_prep had
been called. My patch changed this to also be called on close (even if
hcall_buf_prep was never called) and could therefore access an
uninitialised hcall_buf_pkey.

I am reasonably confident that before my patch libxc was OK.

> > pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on error,
> > however hcall_buf_pkey is uninitialised until _xc_init_hcall_buf,
> > perhaps on NetBSD the uninitialised value somehow looks valid? It's not
> > clear what the correct value to initialise a pthread_key_t to in order
> > for it to appear invalid until it is properly setup is, but I suppose we
> > should be initialising it before use. Please can you try this patch:
> 
> I tried the replacement patch from the other mail.
> With it, xend does not crash, hcall_buf is NULL,
> pthread_getspecific() returns NULL,

OK, I think that suggests that my updated patch does the right thing
here.

> and I am not able to start a guest with 'xm'
> 
> Xend has probably crashed!  Invalid or missing HTTP status code.

There was another HTTP (XML/RPC) related mail on the list this morning
-- is this related to that? Are you sure it is related to the libxc
patch?

(did you by any chance update to python2.7 recently?)

> > If that doesn't work perhaps you can reduce the issue to a simple test
> > case like the attached? (which doesn't reproduce the issue for me on
> > Linux) If you can do that then please run it with the attached libxc
> > patch and post the output.
> 
> xc_interface is 0x7f7ffdb03800
> before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000
> after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> xc interface close returned 0
> 
> No crash. Is this the expected output ?

It looks correct but didn't reproduce the crash so is of limited
utility.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] xend segfaults when starting