Re: [Xen-devel] RE: mem_sharing: summarized problems when domain is dyin
I think it would be best if every separate issue you're facing is a
separate thread. This looks like a Linux crash -- please include the
kernel version you're using, and whatever other information might be
2011/1/24 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> Another BUG found when testing memory sharing.
> In this test, I start 24 linux HVMS, each of them reboot through "xm
> reboot" every 30minutes.
> After several hours, some of the HVM will crash. All of the crash HVM
> are stopped during booting.
> The bug still exists even I forbid page sharing by cheating tapdisk
> that xc_memshr_nominate_gref()
> return failure.
> And no special log found.
> I was able to dump the crash stack.
> what could happen?
> PID: 2307 TASK: ffff810014166100 CPU: 0 COMMAND: "setfont"
> #0 [ffff8100123cd900] xen_panic_event at ffffffff88001d28
> #1 [ffff8100123cd920] notifier_call_chain at ffffffff80066eaa
> #2 [ffff8100123cd940] panic at ffffffff8009094a
> #3 [ffff8100123cda30] oops_end at ffffffff80064fca
> #4 [ffff8100123cda40] do_page_fault at ffffffff80066dc0
> #5 [ffff8100123cdb30] error_exit at ffffffff8005dde9
> [exception RIP: vgacon_do_font_op+363]
> RIP: ffffffff800515e5 RSP: ffff8100123cdbe 8 RFLAGS: 00010203
> RAX: 0000000000000000 RBX: ffffffff804b3740 RCX: ffff8100000a03fc
> RDX: 00000000000003fd RSI: ffff810011cec000 RDI: ffffffff803244c4
> RBP: ffff810011cec000 R8: d0d6999996000000 R9: 0000009090b0b0ff
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
> R13: 0000000000000001 R14: 0000000000000001 R15: 000000000000000e
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #6 [ffff8100123cdc20] vgacon_font_set at ffffffff8016bec5
> #7 [ffff8100123cdc60] con_font_op at ffffffff801aa86b
> #8  ;[ffff8100123cdcd0] vt_ioctl at ffffffff801a5af4
> #9 [ffff8100123cdd70] tty_ioctl at ffffffff80038a2c
> #10 [ffff8100123cdeb0] do_ioctl at ffffffff800420d9
> #11 [ffff8100123cded0] vfs_ioctl at ffffffff800302ce
> #12 [ffff8100123cdf40] sys_ioctl at ffffffff8004c766
> #13 [ffff8100123cdf80] tracesys at ffffffff8005d28d (via system_call)
> RIP: 00000039294cc557 RSP: 00007fff54c4aec8 RFLAGS: 00000246
> RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff
> RDX: 00007fff54c4aee0 RSI: 0000000000004b72 RDI: 0000000000000003
> RBP: 000000001d747ab0 R8: 0000000000000010 R9: 0000000 000800000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000010
> R13: 0000000000000200 R14: 0000000000000008 R15: 0000000000000008
> ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
>> Date: Fri, 21 Jan 2011 14:45:14 -0500
>> Subject: Re: mem_sharing: summarized problems when domain is dying
>> From: juihaochiang@xxxxxxxxx
>> To: Tim.Deegan@xxxxxxxxxx
>> CC: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>> On Fri, Jan 21, 2011 at 11:19 AM, Jui-Hao Chiang <juihaochiang@xxxxxxxxx>
>> > Hi, Tim:
>> > From tinnycloud's result, here I summarize the current problem and
>> > findings of mem_sharing due to domain dying.
>> > (1) When domain is dying, alloc_domheap_page() and
>> > set_shared_p2m_entry() would just fail. So the shr_lock is not enough
>> > to ensure that the domain won't die in the middle of mem_sharing code.
>> > As tinnycloud's code shows, is that better to use
>> > rcu_lock_domain_by_id before calling the above two functions?
>> There seems no good locking to protect a domain from changing the
>> is_dying state. So the unshare function could fail in the middle in
>> several points, e.g., alloc_domheap_page and set_shared_p2m_entry.
>> If that's the case, we need to add some checking, and probably revert
>> the things we have done when is_dying is changed in the middle.
>> Any comments?
> Xen-devel mailing list
Xen-devel mailing list