On Wednesday 18 August 2010 14:14:19 Ian Campbell wrote:
> Thanks for the analysis. I'm a bit confused though.
> On Wed, 2010-08-18 at 11:44 +0100, Christoph Egger wrote:
> > I tracked down where the error happens. In safe_munlock(),
> > the munlock() fails.
> > The trace is:
> > xc_interface_close -> _xc_clean_hcall_buf -> unlock_pages -> safe_munlock
> > -> munlock
> > hcall_buf->buf has the address 0x7f7ffdfe7040
> Mustn't this be page aligned, due to
> hcall_buf->buf = xc_memalign(PAGE_SIZE, PAGE_SIZE);
> This appears to turn into valloc on NetBSD which (at least according to
> the Linux manpages) returns a page aligned result.
> > In unlock_pages, the address and length passed to munlock() is:
> > laddr 0x7f7ffdfe7000, llen 0x2000
> > The reason why munlock() fails is that mlock() hasn't been called before.
> > The hcall_buf_prep() is not called at all before the first call to
> > _xc_clean_hcall_buf().
> If hcall_buf_prep() has never been called then
> "pthread_getspecific(hcall_buf_pkey)" should return NULL and
> _xc_clean_hcall_buf will never be called from xc_clean_hcall_buf.
> _xc_clean_hcall_buf also ignores NULL values itself.
Who calls hcall_buf_prep() in your case ?
Only hypercalls call hcall_buf_prep().
What if no hypercalls are not called during xend startup ?
If you call xc_clean_hcall_buf() from xc_interface_close()
then you should also call hcall_buf_prep() from xc_interface_open().
> However you say that hcall_buf_pkey is not NULL, but rather contains a
> valid hcall_buf containing 0x7f7ffdfe7040.
hcall_buf itself has the address 0x7f7ffdfe7000.
hcall_buf->buf has the address 0x7f7ffdfe7040.
> The only call to "pthread_setspecific(hcall_buf_pkey, ...)" with a non-NULL
> value is in hcall_buf_prep(), so it must have been called at some point.
In that case, I am puzzled why I don't get the trace.
Something really fishy is going on.
> Please can you confirm if _xc_init_hcall_buf() is ever called and what
> the behaviour of "pthread_getspecific(hcall_buf_pkey)" is if
> _xc_init_hcall_buf() has never been called. I think it is supposed to
> return NULL in this case and we certainly rely on that.
_xc_init_hcall_buf() is not called. pthread_getspecific() should return NULL
I am starting to ask myself "How did libxc ever work?". It feels like we are
hunting down a long-term hidden bug.
> pthread_getspecific(hcall_buf_pkey) is supposed to return NULL on error,
> however hcall_buf_pkey is uninitialised until _xc_init_hcall_buf,
> perhaps on NetBSD the uninitialised value somehow looks valid? It's not
> clear what the correct value to initialise a pthread_key_t to in order
> for it to appear invalid until it is properly setup is, but I suppose we
> should be initialising it before use. Please can you try this patch:
I tried the replacement patch from the other mail.
With it, xend does not crash, hcall_buf is NULL,
pthread_getspecific() returns NULL,
and I am not able to start a guest with 'xm'
Xend has probably crashed! Invalid or missing HTTP status code.
> If that doesn't work perhaps you can reduce the issue to a simple test
> case like the attached? (which doesn't reproduce the issue for me on
> Linux) If you can do that then please run it with the attached libxc
> patch and post the output.
xc_interface is 0x7f7ffdb03800
before prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
after prep buf is 0x7f7ffdb0b000 / 0x7f7ffdb20000
after release buf is 0x7f7ffdb0b000 / 0x7f7ffdb0b040
xc interface close returned 0
No crash. Is this the expected output ?
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
Xen-devel mailing list