OK, I finally popped off all the interrupts on my stack and got back to this.
The put_domain() that finally destroys the domains (after plugging
back in the cpu) is in page_alloc.c:931, in free_domheap_pages().
Here's the callstack from xen:
(XEN) [<ffff828c80112cd6>] free_domheap_pages+0x3a9/0x427
(XEN) [<ffff828c8014f0e3>] put_page+0x4b/0x52
(XEN) [<ffff828c80150236>] put_page_from_l1e+0x137/0x1ae
(XEN) [<ffff828c80155ed0>] ptwr_emulated_update+0x555/0x57c
(XEN) [<ffff828c80155fa3>] ptwr_emulated_cmpxchg+0xac/0xb5
(XEN) [<ffff828c80176511>] x86_emulate+0xf876/0xfb5d
(XEN) [<ffff828c8014f523>] ptwr_do_page_fault+0x15c/0x190
(XEN) [<ffff828c80164d8c>] do_page_fault+0x3b8/0x571
So the thing that finally destroys the domain is unmapping its last
outstanding domheap page from dom0's pagetables. It was unmapped from
vcpu 1 (which had just come back online), from
linux/mm/memory.c:unmap_vmas().
I confirmed that there were two outstanding unmapped pages of the
"zombie domain" using the 'q' debug key:
(XEN) General information for domain 2:
(XEN) refcnt=1 dying=2 nr_pages=2 xenheap_pages=0 dirty_cpus={}
max_pages=8192
(XEN) handle=a7c2bcb8-e647-992f-9e15-7313072a36bf vm_assist=00000008
(XEN) Rangesets belonging to domain 2:
(XEN) Interrupts { }
(XEN) I/O Memory { }
(XEN) I/O Ports { }
(XEN) Memory pages belonging to domain 2:
(XEN) DomPage 000000000003d64f: caf=00000001, taf=e800000000000001
(XEN) DomPage 000000000003d64e: caf=00000001, taf=e800000000000001
(XEN) VCPU information and callbacks for domain 2:
(XEN) VCPU0: CPU0 [has=F] flags=1 poll=0 upcall_pend = 00,
upcall_mask = 00 dirty_cpus={} cpu_affinity={0-31}
(XEN) 100 Hz periodic timer (period 10 ms)
(XEN) Notifying guest (virq 1, port 0, stat 0/-1/0)
I'm not sure if this is relevant, but looks that while dom0's vcpu 1
was offline, it had a pending interrupt:
(XEN) VCPU1: CPU0 [has=F] flags=2 poll=0 upcall_pend = 01,
upcall_mask = 01 dirty_cpus={} cpu_affinity={0-31}
(XEN) 100 Hz periodic timer (period 10 ms)
(XEN) Notifying guest (virq 1, port 0, stat 0/-1/-1)
So it appears that when vcpu 1 is offline, it never successfully
removes mappings for the domU until vcpu 1 comes back online.
I don't know enough about the unmapping process... Jeremy, do you know
anything about the process for unmapping domU memory from dom0 when
the domU is being destroyed in the linux-2.6.18-xen.hg tree? More
specifically, why if I take dom0's vcpu 1 offline (via the /sys
interface), why the unmapping doesn't happen until I bring vcpu 1
online?
-George
On Tue, Feb 17, 2009 at 5:39 PM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote:
> On 17/02/2009 17:30, "George Dunlap" <dunlapg@xxxxxxxxx> wrote:
>
>> domain_destroy() is only called from put_domain(), so presumably
>> there's somehow reference counts held somewhere which aren't released
>> when the second cpu is offline.
>>
>> I've duplicated this using the standard credit scheduler on
>> xen-unstable tip. I'm using a Debian dom0 filesystem, and a
>> linux-2.6.18-xen0 build from a month ago.
>
> If the domain never runs there will be very few domain refcnt updates. You
> should be able to track it down pretty easily by logging every caller.
>
> -- Keir
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|