|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Xen Crashes when releasing gnttab mappings - of a crashed do
Observation:
------------
When connecting two miniOs (using a shared ring), Xen (not a domain)
crashes when the miniOs's exits..
Xen crashes and produces the following:
(XEN) Xen call trace:
(XEN) [<ff11d20d>] __bug+0x29/0x45
(XEN) [<ff107cb3>] gnttab_release_mappings+0xcb/0x2e5
(XEN) [<ff1046dd>] domain_kill+0x29/0x62
(XEN) [<ff10349a>] do_domctl+0x6d6/0xfbc
(XEN) [<ff165755>] hypercall+0x95/0xb5
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) BUG at grant_table.c:1122
(XEN) ****************************************
The cause:
----------
Xen tries to release the grant table mappings by accessing a remote
domain grant table.
But the remote domain seems to be non-existent and consequently Xen
fails:
find_domain_by_id (in gnttab_release_mappings) returns NULL.
Analysis:
---------
This situation described above should never happen: if I understand
correctly, a domain should not be completely destroyed until there are
no more references to it.
See: put_domain(d) // sched.h
Which is defined as follows:
If ( atomic_dec_and_test( &(_d_->refcnt) ) domain_destroy(_d)
It does however happen when a domain crashes.
Note that there are two ways to "finish" with a domain (domain.c):
1. domain_kill (which calls domain_destroy) - releases all
resources in a gracefull
manner.
2. __domain_crash (which calls domain_shutdown) - which seems to
kill the domain
without proper releasing of resources that reference to it..
(this function is called on extreme cases)
Our scenario:
-------------
We are running two miniOs with the same profile:
Open a ring (share a page with a grant ref and map a page from a remote
domain)
Write
Read
Close the ring (dealloc, unmap*)
do_exit()
Timeline - >
MiniOs 1: .......... calls do_exit() ->
.. domain_kill() ->
.. gnttab_release_mapping()
->
.. BUG()
MiniOs 2: crashes**
*When we unmap we use Xen's hypercall to unmap a grant reference
and the gnttab_unmap_grant_ref structure.
Note that we have a bug and do NOT set unmap_op.dev_bus_addr to 0 as we
should.
Xen's API (in public/grant_table.h) explicitly describes that it should
be 0 or
the grant reference will be treated as valid device mapping.
** Because of the bug descrived in * we cause the domain to crash.
We observe:
(XEN) grant_table.c:394: Bad frame number doesn't match gntref
(XEN) mm.c:760: Attempt to implicitly unmap a granted PTE
(XEN) domain_crash called from mm.c:761
Summary:
-----------
1. Setting unmap_op.dev_bus_addr removes the BUG and all is well.
2. But crashing Xen - even with our error - doesn't seem to be a healthy
choice.
:)
Micha.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|