|
|
|
|
|
|
|
|
|
|
xen-ia64-devel
[PATCH] Fix libxc and pm_timer (Was: [Xen-ia64-devel] Maybe doman_destro
Tue, 21 Aug 2007 09:27:45 +0900, Masaki Kanno wrote:
>Hi all,
>
>I tested xm create command with latest xen-ia64-unstable and the
>attached patch. The attached patch intentionally causes contiguous
>memory shortage in VHPT allocation for HVM domain. On the test,
>I wanted to confirm that the release proceeding of domain resources
>is working correctly when HVM domain creation failed. But I could
>not confirm that it is working correctly. It seemed to be not
>calling domain_destroy().
>The following messages are the result of the test. Different RID
>was allocated whenever I created a HVM domain.
>Do you think where a bug hides?
>
> (XEN) domain.c:546: arch_domain_create:546 domain 1 pervcpu_vhpt 1
> (XEN) tlb_track.c:69: allocated 256 num_entries 256 num_free 256
> (XEN) tlb_track.c:115: hash 0xf0000002fd350000 hash_size 512
> (XEN) regionreg.c:193: ### domain f0000000040fc080: rid=80000-c0000 mp_rid
>=2000
> (XEN) domain.c:583: arch_domain_create: domain=f0000000040fc080
> (XEN) vpd base: 0xf000000007be0000, vpd size:65536
> (XEN) No enough contiguous memory(16384KB) for init_domain_vhpt
> (XEN) domain.c:546: arch_domain_create:546 domain 2 pervcpu_vhpt 1
> (XEN) tlb_track.c:69: allocated 256 num_entries 256 num_free 256
> (XEN) tlb_track.c:115: hash 0xf0000002f6f8c000 hash_size 512
> (XEN) regionreg.c:193: ### domain f000000004109380: rid=c0000-100000
>mp_rid=3000
> (XEN) domain.c:583: arch_domain_create: domain=f000000004109380
> (XEN) vpd base: 0xf000000007b90000, vpd size:65536
> (XEN) No enough contiguous memory(16384KB) for init_domain_vhpt
> (XEN) domain.c:546: arch_domain_create:546 domain 3 pervcpu_vhpt 1
> (XEN) tlb_track.c:69: allocated 256 num_entries 256 num_free 256
> (XEN) tlb_track.c:115: hash 0xf0000002f676c000 hash_size 512
> (XEN) regionreg.c:193: ### domain f000000007bf1380: rid=100000-140000
>mp_rid=4000
> (XEN) domain.c:583: arch_domain_create: domain=f000000007bf1380
> (XEN) vpd base: 0xf000000007b50000, vpd size:65536
> (XEN) No enough contiguous memory(16384KB) for init_domain_vhpt
>
Hi,
I found two bugs in this problem.
Bug.1:
copy_from_GFW_to_nvram() in libxc forgot munmap() if NVRAM data
invalid. Also it forgot free() and close() too.
The Bug.1 is solved by munmap_nvram_page.patch.
I tried the test again after Bug.1 was solved. But hypervisor did
a panic on the test. The following messages are the result of the
test.
(XEN) domain.c:546: arch_domain_create:546 domain 2 pervcpu_vhpt 1
(XEN) tlb_track.c:69: allocated 256 num_entries 256 num_free 256
(XEN) tlb_track.c:115: hash 0xf0000002fad00000 hash_size 512
(XEN) regionreg.c:193: ### domain f0000000040fc080: rid=80000-c0000 mp_rid=2000
(XEN) domain.c:583: arch_domain_create: domain=f0000000040fc080
(XEN) *** xen_handle_domain_access: exception table lookup failed,
iip=0xf00000000403f530, addr=0x0, spinning...
ip=0xf00000000403f530, addr=0x0, spinning...
(XEN) d 0xf000000007c5c080 domid 0
(XEN) vcpu 0xf000000007c40000 vcpu 0
(XEN)
(XEN) CPU 0
(XEN) psr : 0000101008226018 ifs : 800000000000058d ip : [<f00000000403f530>]
(XEN) ip is at timer_softirq_action+0x170/0x2e0
(XEN) unat: 0000000000000000 pfs : 000000000000058d rsc : 0000000000000003
(XEN) rnat: 0000000000004000 bsps: f000000007c47e20 pr : 00000000006a9969
(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0 : f00000000403f4f0 b6 : f000000004038b80 b7 : a000000100018570
(XEN) f6 : 1003e000001b932157960 f7 : 1003e0000000281bd3682
(XEN) f8 : 000000000000000000000 f9 : 000000000000000000000
(XEN) f10 : 000000000000000000000 f11 : 000000000000000000000
(XEN) r1 : f00000000438ca40 r2 : 0000007da3766757 r3 : f000000007c47fe8
(XEN) r8 : 0000000000000001 r9 : 0000000000000000 r10 : 0000000000000000
(XEN) r11 : 0009804c0270033f r12 : f000000007c47e00 r13 : f000000007c40000
(XEN) r14 : 0000000000000000 r15 : f0000000040fc9b0 r16 : 0000000000000001
(XEN) r17 : f000000007ceaf18 r18 : 0000000000000002 r19 : 0000000000000001
(XEN) r20 : f000000007ceb508 r21 : f0000000040fc9b8 r22 : 0000000000000001
(XEN) r23 : 0000000000000001 r24 : f000000007ceaf18 r25 : f000000007c47e28
(XEN) r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0000000000000000
(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f000000004400100
(XEN)
(XEN) Call Trace:
(XEN) [<f0000000040af150>] show_stack+0x80/0xa0
(XEN) sp=f000000007c478b0 bsp=f000000007c41668
(XEN) [<f000000004087640>] panic_domain+0x120/0x170
(XEN) sp=f000000007c47a80 bsp=f000000007c41600
(XEN) [<f00000000407ada0>] ia64_do_page_fault+0x6b0/0x6c0
(XEN) sp=f000000007c47bc0 bsp=f000000007c41568
(XEN) [<f0000000040a7f40>] ia64_leave_kernel+0x0/0x300
(XEN) sp=f000000007c47c00 bsp=f000000007c41568
(XEN) [<f00000000403f530>] timer_softirq_action+0x170/0x2e0
(XEN) sp=f000000007c47e00 bsp=f000000007c41500
(XEN) [<f00000000403ca30>] do_softirq+0x170/0x220
(XEN) sp=f000000007c47e00 bsp=f000000007c41480
(XEN) [<f0000000040a7f60>] ia64_leave_kernel+0x20/0x300
(XEN) sp=f000000007c47e00 bsp=f000000007c41480
(XEN) domain_crash_sync called from xenmisc.c:152
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) d 0xf000000007c5c080 domid 0
(XEN) vcpu 0xf000000007c40000 vcpu 0
(XEN)
(XEN) CPU 0
(XEN) psr : 00001011085a6010 ifs : 8000000000000307 ip : [<a0000001000a6540>]
(XEN) ip is at ???
(XEN) unat: 0000000000000000 pfs : 400000000000038a rsc : 0000000000000007
(XEN) rnat: 0000000000000000 bsps: e000000162a90f70 pr : 00000000006a9a59
(XEN) ldrs: 0000000002300000 ccv : 0000000000000000 fpsr: 0009804c0270033f
(XEN) csd : 0000000000000000 ssd : 0000000000000000
(XEN) b0 : a0000001000a6960 b6 : a000000100018610 b7 : a000000100018570
(XEN) f6 : 000000000000000000000 f7 : 000000000000000000000
(XEN) f8 : 000000000000000000000 f9 : 000000000000000000000
(XEN) f10 : 000000000000000000000 f11 : 000000000000000000000
(XEN) r1 : a0000001011225d0 r2 : e0000001781f3154 r3 : e000000164d58198
(XEN) r8 : e000000164cc8198 r9 : e000000164cc8018 r10 : 0000000000000000
(XEN) r11 : 0000000000000000 r12 : e000000162a97df0 r13 : e000000162a90000
(XEN) r14 : 0000000000000000 r15 : e000000164cc8dd0 r16 : 0000000000001000
(XEN) r17 : e000000164d58dd0 r18 : e000000164cc8da8 r19 : 0000000000000000
(XEN) r20 : e0000001781f3138 r21 : 0000000000000018 r22 : e000000162a90f70
(XEN) r23 : 0000000000000001 r24 : 0000000000000000 r25 : 0000000000000000
(XEN) r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0000000000000000
(XEN) r29 : 0000000000000000 r30 : 0000000000000018 r31 : 400000000000038a
(XEN)
(XEN) Call Trace:
(XEN) [<f0000000040af150>] show_stack+0x80/0xa0
(XEN) sp=f000000007c478b0 bsp=f000000007c416b8
(XEN) [<f000000004017300>] __domain_crash+0x100/0x140
(XEN) sp=f000000007c47a80 bsp=f000000007c41690
(XEN) [<f000000004017380>] __domain_crash_synchronous+0x40/0xf0
(XEN) sp=f000000007c47a80 bsp=f000000007c41668
(XEN) [<f000000004087680>] panic_domain+0x160/0x170
(XEN) sp=f000000007c47a80 bsp=f000000007c41600
(XEN) [<f00000000407ada0>] ia64_do_page_fault+0x6b0/0x6c0
(XEN) sp=f000000007c47bc0 bsp=f000000007c41568
(XEN) [<f0000000040a7f40>] ia64_leave_kernel+0x0/0x300
(XEN) sp=f000000007c47c00 bsp=f000000007c41568
(XEN) [<f00000000403f530>] timer_softirq_action+0x170/0x2e0
(XEN) sp=f000000007c47e00 bsp=f000000007c41500
(XEN) [<f00000000403ca30>] do_softirq+0x170/0x220
(XEN) sp=f000000007c47e00 bsp=f000000007c41480
(XEN) [<f0000000040a7f60>] ia64_leave_kernel+0x20/0x300
(XEN) sp=f000000007c47e00 bsp=f000000007c41480
(XEN)
(XEN) Call Trace:
(XEN) [<f0000000040af150>] show_stack+0x80/0xa0
(XEN) sp=f000000007c478b0 bsp=f000000007c416b8
(XEN) [<f000000004017310>] __domain_crash+0x110/0x140
(XEN) sp=f000000007c47a80 bsp=f000000007c41690
(XEN) [<f000000004017380>] __domain_crash_synchronous+0x40/0xf0
(XEN) sp=f000000007c47a80 bsp=f000000007c41668
(XEN) [<f000000004087680>] panic_domain+0x160/0x170
(XEN) sp=f000000007c47a80 bsp=f000000007c41600
(XEN) [<f00000000407ada0>] ia64_do_page_fault+0x6b0/0x6c0
(XEN) sp=f000000007c47bc0 bsp=f000000007c41568
(XEN) [<f0000000040a7f40>] ia64_leave_kernel+0x0/0x300
(XEN) sp=f000000007c47c00 bsp=f000000007c41568
(XEN) [<f00000000403f530>] timer_softirq_action+0x170/0x2e0
(XEN) sp=f000000007c47e00 bsp=f000000007c41500
(XEN) [<f00000000403ca30>] do_softirq+0x170/0x220
(XEN) sp=f000000007c47e00 bsp=f000000007c41480
(XEN) [<f0000000040a7f60>] ia64_leave_kernel+0x20/0x300
(XEN) sp=f000000007c47e00 bsp=f000000007c41480
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
Bug.2:
The release proceeding of domain resources forgot to stop (or kill)
PM timer, and freed the domain structure.
VMX flag of VCPU#0 was not set when VHPT allocation for VCPU#0
failed. For this reason, domain_relinquish_resources() did not
call vmx_relinqush_guest_resources(). But the domain structure
was freed. As a result, timer_softirq_action() lose sight of
the callback function for PM timer.
The Bug.2 is solved by kill_pm_timer.patch.
Signed-off-by: Masaki Kanno <kanno.masaki@xxxxxxxxxxxxxx>
Best regards,
Kan
kill_pm_timer.patch
Description: Binary data
munmap_nvram_page.patch
Description: Binary data
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|
|
|
|
|