WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xend segfaults when starting

On Wed, 2010-08-18 at 15:02 +0100, Christoph Egger wrote:
> On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote:

> > > In unlock_pages, the address and length passed to munlock() is:
> > >
> > >  laddr 0x7f7ffdfe7000, llen 0x2000
> > >
> > > The reason why munlock() fails is that mlock() hasn't been called before.
> > > The hcall_buf_prep() is not called at all before the first call to
> > > _xc_clean_hcall_buf().
> >
> > If hcall_buf_prep() has never been called then
> > "pthread_getspecific(hcall_buf_pkey)" should return NULL and
> > _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf.
> > _xc_clean_hcall_buf also ignores NULL values itself.
> 
> Who calls hcall_buf_prep() in your case ?
> 
> Only hypercalls call hcall_buf_prep().
> What if no hypercalls are not called during xend startup ?

Then I would have expected pthread_getspecific(hcall_buf_pkey) to return
NULL (because _xc_init_hcall_buf was never called) and therefore for
xc_clean_hcall_buf to not doing any unlocking.

However I think my expectation was wrong. If _xc_init_hcall_buf is never
called then hcall_buf_pkey is undefined but not necessarily invalid --
and it seems to be the case on your system that it turns out to be valid
(perhaps pthread_key_t is valid on NetBSD and invalid on Linux or
something like that) and therefore we try an unlock some random address.

My updated patch ensured that hcall_buf_pkey is always initialised
before use.

> If you call xc_clean_hcall_buf() from xc_interface_close()
> then you should also call hcall_buf_prep() from xc_interface_open().
> 
> > However you say that hcall_buf_pkey is not NULL, but rather contains a
> > valid hcall_buf containing 0x7f7ffdfe7040.
> 
> hcall_buf itself has the address 0x7f7ffdfe7000.
> 
> hcall_buf->buf has the address 0x7f7ffdfe7040.

That's very odd -- hcall_buf->buf is allocated with xc_memalign and
therefore should be page aligned. Are you sure the addresses aren't the
other way round?

> > The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a non-NULL
> > value is in hcall_buf_prep(), so it must have been called at some point.
> 
> In that case, I am puzzled why I don't get the trace.
> Something really fishy is going on.
> 
> > Please can you confirm if _xc_init_hcall_buf() is ever called and what
> > the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if
> > _xc_init_hcall_buf() has never been called. I think it is supposed to
> > return NULL in this case and we certainly rely on that.
> 
> _xc_init_hcall_buf() is not called.  pthread_getspecific() should return NULL
> but doesn't.
> 
> I am starting to ask myself "How did libxc ever work?". It feels like we are
> hunting down a long-term hidden bug.

Previously _xc_clean_hcall_buf would be called IFF hcall_buf_prep had
been called. My patch changed this to also be called on close (even if
hcall_buf_prep was never called) and could therefore access an
uninitialised hcall_buf_pkey.

I am reasonably confident that before my patch libxc was OK.

> > pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on error,
> > however hcall_buf_pkey is uninitialised until _xc_init_hcall_buf,
> > perhaps on NetBSD the uninitialised value somehow looks valid? It's not
> > clear what the correct value to initialise a pthread_key_t to in order
> > for it to appear invalid until it is properly setup is, but I suppose we
> > should be initialising it before use. Please can you try this patch:
> 
> I tried the replacement patch from the other mail.
> With it, xend does not crash, hcall_buf is NULL,
> pthread_getspecific() returns NULL,

OK, I think that suggests that my updated patch does the right thing
here.

> and I am not able to start a guest with 'xm'
> 
> Xend has probably crashed!  Invalid or missing HTTP status code.

There was another HTTP (XML/RPC) related mail on the list this morning
-- is this related to that? Are you sure it is related to the libxc
patch?

(did you by any chance update to python2.7 recently?)

> > If that doesn't work perhaps you can reduce the issue to a simple test
> > case like the attached? (which doesn't reproduce the issue for me on
> > Linux) If you can do that then please run it with the attached libxc
> > patch and post the output.
> 
> xc_interface is 0x7f7ffdb03800
> before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000
> after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
> xc interface close returned 0
> 
> No crash. Is this the expected output ?

It looks correct but didn't reproduce the crash so is of limited
utility.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel