On Mon, Jan 10, 2011 at 4:10 PM, Jui-Hao Chiang <juihaochiang@xxxxxxxxx> wrote:
>> After this change, unshare() has a potential problem of deadlock for
>> shr_lock and p2m_lock with different locking order.
>> Assume two CPUs do the following
>> CPU1: hvm_hap_nested_page_fault() => unshare() => p2m_change_type()
>> (locking order: shr_lock, p2m_lock)
>> CPU2: p2m_teardown() => unshare() (locking order: p2m_lock, shr_lock)
>> When CPU1 grabs shr_lock and CPU2 grabs p2m_lock, they deadlock later.
>> 1. mem_sharing_unshare_page() has the routine called from
>> gfn_to_mfn_unshare, which is called by gnttab_transfer
>> Since no bug report on grant_table right now, so I think this is safe for
>> Also p2m_tear_down è mem_sharing_unshare_page() , its flag is
>> MEM_SHARING_DESTROY_GFN, and won’t has the chance to
>> call set_shared_p2m_entry()
> Of course, the p2m_teardown won't call set_shared_p2m_entry. But this does
> not change my argument that p2m_teardown() hold p2m_lock to wait on
> shr_lock. Actaully, after looking for a while, I rebut myself that the
> scenario of deadlock won't exist.
> When p2m_teardown is called, the domain is dying in its last few steps
> (device, irq are released), and there is no way for
> hvm_hap_nested_page_fault() to happen on the memory of the dying domain. If
> this case is eliminated, then my patch should not have deadlock problem. Any
After a discussion with tinnycloud, his test is working after applying
the previous patch
(set_shared_p2m_entry is not executed since it is in ASSERT).
And after a few code tracing and testing, my own worry about the
deadlock between p2m_lock and shr_lock actually disappears as the
above discussion. So here I re-attach the patch again which includes
another fix to recover type count when nominate fails on a page (from
our previous dicussions).
See if anything wrong.
Description: Binary data
Xen-devel mailing list