WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] RE: BUG() w/ HVM win2k3 64b

To: "Woller, Thomas" <thomas.woller@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] RE: BUG() w/ HVM win2k3 64b
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Thu, 10 Jan 2008 22:02:57 +0000
Delivery-date: Thu, 10 Jan 2008 14:03:41 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <683860AD674C7348A0BF0DE3918482F6069EE398@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AchTvZRmzXPzqiDHRt2NZV4oQGTBBwAAgByAAALmgp8AADCpsAACJ0NG
Thread-topic: [Xen-devel] RE: BUG() w/ HVM win2k3 64b
User-agent: Microsoft-Entourage/11.3.6.070618
Oh, the bug is obvious actually. It's introduced by 16491, and is because
dst.type is getting clobbered to OP_NONE before it is tested for OP_REG.
I'll sort out a fix.

 Thanks!
 Keir

On 10/1/08 21:11, "Woller, Thomas" <thomas.woller@xxxxxxx> wrote:

>> 16489 and 16491 are obviously suspects. You might also try current tip
>> (-rc5) as some emulator bugs were fixed in the last day or
>> so. 
> 16491 just failed a few mins ago.  16490 passed at 9 hours, although
> could use more time.
> We are down to 3 1P test systems available for use till next week, and
> will start up:
> 1) 16701 minus 16491
> 2) 16701
> 3) 16701
> 
> And let them run overnight, which *should* be enough time.  If above all
> fail, we'll have to go back and work with 16489/16490 more closely with
> more time in test.
> 
>> Was your successful 16488 test stressful enough to be
>> confident that it's not a false negative (for the bug)?
> Yes, 2 systems confirmed 16488 passed.   Btw 3.1.3 passes also.
> 
> tom
> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>> Sent: Thursday, January 10, 2008 2:56 PM
>> To: Woller, Thomas; xen-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: Re: [Xen-devel] RE: BUG() w/ HVM win2k3 64b
>> 
>> 16489 and 16491 are obviously suspects. You might also try current tip
>> (-rc5) as some emulator bugs were fixed in the last day or
>> so. Was your successful 16488 test stressful enough to be
>> confident that it's not a false negative (for the bug)?
>> 
>>  -- Keir
>> 
>> On 10/1/08 19:36, "Woller, Thomas" <thomas.woller@xxxxxxx> wrote:
>> 
>>>> We have seen failures with changesets >= 16492, latest tested was
>>>> 16676 that fails, and c/s 16488 passes without issue.
>>> clarification to my email, was thinking that c/s 16491 was
>> the problem 
>>> (not 16492 as I indicated),
>>> 
>>> 16492 has failed tests, and 16491 c/s is running fine right
>> now - but 
>>> need more test time on that c/s to see if it will fail.
>>> 
>>> So, just to be clear, still don't have a handle on which
>> specific c/s 
>>> is the problem, but still seems around 1649x-ish
>>> 
>>> Tom
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Woller, Thomas
>>>> Sent: Thursday, January 10, 2008 1:18 PM
>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> Cc: Woller, Thomas
>>>> Subject: BUG() w/ HVM win2k3 64b
>>>> 
>>>> We are observing a BUG() with 3.2/unstable.  This problem takes a
>>>> number of hours to reproduce - anywhere from 4 to 12+
>> hours, and only 
>>>> with windows 2003 64b HVM multi-vcpu guest so far under
>> heavy stress 
>>>> load.
>>>> 
>>>> Only reproduceable using Shadow Paging, we have not see
>> the problem 
>>>> using nested paging.
>>>> 
>>>> We have seen failures with changesets >= 16492, latest tested was
>>>> 16676 that fails, and c/s 16488 passes without issue.
>>>> 
>>>> We have tried to narrow down the issue to a specific
>> changeset, and 
>>>> overnight testing seems to indicate that changeset 14692
>> might be the 
>>>> culprit.  Not quite confirmed until additional testing completes
>>>> tomorrow on c/s 14691 and 14690.  We will know more EOD
>> thursday if 
>>>> these 2 pass testing.
>>>> 
>>>> We will startup some testing using 16701 also to make sure
>> that it is 
>>>> not resolved with post 16676 patches.  I'll also try to startup a
>>>> test with removing c/s 16492 from 16701 base and see if that helps
>>>> this specific problem.  All of this testing though will not finish
>>>> till towards end of next week due to largescale move of
>> lab/offices 
>>>> starting tomorrow - and with 3.2 almost out, would like to
>> see this 
>>>> figured out before release.
>>>> 
>>>> Reproduced on 1P family11h and family10h systems, but unable to
>>>> reproduce on 2P+ systems so far.  We don't believe we are
>>>> seeing any sort of h/w anomoly at this point.   have not
>>>> tried reproducing on VT boxes.
>>>> 
>>>> We are able to reproduce using 2 64b windows Guests,
>> currently we are
>>>> using 2 or 4 VCPUs, but have not tried reducing to single VCPU.
>>>> 
>>>> Any debug thoughts are appreciated.
>>>> 
>>>> Looks like the dst.mem.seg is invalid for the read() in
>> Grp5 case 2/4 
>>>> (jmp/call), which results in the BUG() later.
>>>> 
>>>> X86_emulate:
>>>> ...
>>>>     case 0xff: /* Grp5 */
>>>>         switch ( modrm_reg & 7 )
>>>>         {
>>>>         case 0: /* inc */
>>>>             emulate_1op("inc", dst, _regs.eflags);
>>>>             break;
>>>>         case 1: /* dec */
>>>>             emulate_1op("dec", dst, _regs.eflags);
>>>>             break;
>>>>         case 2: /* call (near) */
>>>>         case 4: /* jmp (near) */
>>>>             dst.type = OP_NONE;
>>>>             if ( (dst.bytes != 8) && mode_64bit() )
>>>>             {
>>>>                 dst.bytes = op_bytes = 8;
>>>>                 if ( dst.type == OP_REG )
>>>>                     dst.val = *dst.reg;
>>>>                 else if ( (rc = ops->read(dst.mem.seg, dst.mem.off,
>>>>                                           &dst.val, 8,
>> ctxt)) != 0 )
>>>>                     goto done;
>>>>          
>>>> 
>>>> Guest config:
>>>> HVM Windows 2003 64b
>>>> vcpus=4
>>>> memory=1024
>>>> pae/acpi/apic=1
>>>> 
>>>> BUG() info.
>>>> (XEN) Xen BUG at svm.c:599
>>>> (XEN) ----[ Xen-3.2.0-rc3  x86_64  debug=n  Tainted:    C ]----
>>>> (XEN) CPU:    2
>>>> (XEN) RIP:    e008:[<ffff828c80165205>]
>>>> svm_get_segment_register+0x145/0x170
>>>> (XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor
>>>> (XEN) rax: ffff8300a6e0ff28   rbx: ffff8300a7dde000   rcx:
>>>> 00000000a6e0fa28
>>>> (XEN) rdx: ffff830b14f09f54   rsi: 00000000a6e0fa28   rdi:
>>>> ffff8300a7ddc080
>>>> (XEN) rbp: ffff830b14f09f54   rsp: ffff8300a6e0f850   r8:
>>>> ffff8300a6e0fc98
>>>> (XEN) r9:  ffff8300a6e0f8c8   r10: 0000000000000000   r11:
>>>> 0000000000000001
>>>> (XEN) r12: ffff8300a6e0f8c8   r13: 0000000000000001   r14:
>>>> 00000000a6e0fa28
>>>> (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4:
>>>> 00000000000006f0
>>>> (XEN) cr3: 000000003b75b000   cr2: 000000000247f000
>>>> (XEN) ds: 0000   es: 0000   fs: 0053   gs: 002b   ss: 0000
>>   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff8300a6e0f850:
>>>> (XEN)    ffff830b14f09f54 0000000000000000 ffff828c80178eea
>>>> ffff8300a6e0fc98
>>>> (XEN)    ffff828c80179d0c ffff8300a6e0f8d0 ffff8300a6e0fb20
>>>> 0000000000000001
>>>> (XEN)    0000000000000008 ffff8300a6e0fc98 ffff8300a6e0fc98
>>>> 0000000000000004
>>>> (XEN)    ffff828c80179e46 0000000000000000 fffffadff3c54040
>>>> fffffadff04cbde0
>>>> (XEN)    0000000000000002 ffff828c801c18e0 0000000000000008
>>>> 0000000000000000
>>>> (XEN)    ffff828c80146be5 0000000000000001 ffff8300a6e0ff28
>>>> 000000003a4002e7
>>>> (XEN)    00000002a6e0fb87 ffff8300a6e0fbc8 0000001100000000
>>>> 0000000080a572b0
>>>> (XEN)    ffff8300a6e0f9d8 ffff828c801c18e0 0000000000000000
>>>> 0000000000000000
>>>> (XEN)    00000006a6e0fbc8 fffff80000812be8 0000468c8015a2b0
>>>> ffff8300a6e0fb03
>>>> (XEN)    0000000000000296 0000000000000002 ffff8300a7dd2080
>>>> 0000000000000000
>>>> (XEN)    ffff828c8013974a 0000000000000000 00000000ffffffff
>>>> ffff830000000046
>>>> (XEN)    ffff8300a7dd37e0 fffffadff04cbe00 fffffadff04cbd70
>>>> ffff8300a7dcd7e0
>>>> (XEN)    ffff828c80161206 fffff80000341070 fffffadff410d040
>>>> 0000000000000000
>>>> (XEN)    fffffadff41171f0 0000000000000080 fffffadff35ce040
>>>> fffff78000000008
>>>> (XEN)    0000000000000000 0000000000000000 fffffadff35ce040
>>>> fffffadff1a73010
>>>> (XEN)    fffffadff3699f90 fffffadff3699f90 fffffadff35ce040
>>>> fffffadff3c54040
>>>> (XEN)    0000000000000003 fffff80001272bae 0000000000000000
>>>> 0000000000000246
>>>> (XEN)    fffffadff04cbd70 0000000000000000 5555555555555555
>>>> 5555555555555555
>>>> (XEN)    5555555555555555 5555555555555555 00000001801324cd
>>>> 0000000000000004
>>>> (XEN)    ffffffffffffffff ffff8300a7ddc080 000fffff80001272
>>>> ffff8300a6e0fba4
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff828c80165205>] svm_get_segment_register+0x145/0x170
>>>> (XEN)    [<ffff828c80178eea>] hvm_get_seg_reg+0x3a/0x40
>>>> (XEN)    [<ffff828c80179d0c>] hvm_translate_linear_addr+0x3c/0xa0
>>>> (XEN)    [<ffff828c80179e46>] hvm_read+0x36/0xe0
>>>> (XEN)    [<ffff828c80146be5>] x86_emulate+0x3f35/0x9940
>>>> (XEN)    [<ffff828c8013974a>] smp_send_event_check_mask+0x3a/0x40
>>>> (XEN)    [<ffff828c80161206>] vlapic_write+0x546/0x7e0
>>>> (XEN)    [<ffff828c8017f3f5>]
>>>> sh_gva_to_gfn__shadow_4_guest_4+0xc5/0x150
>>>> (XEN)    [<ffff828c80152d27>] __hvm_copy+0x97/0x280
>>>> (XEN)    [<ffff828c8017f2ba>] guest_walk_tables+0x80a/0x880
>>>> (XEN)    [<ffff828c8017a206>] shadow_init_emulation+0x126/0x160
>>>> (XEN)    [<ffff828c80182bd5>]
>>>> sh_page_fault__shadow_4_guest_4+0xdb5/0xe80
>>>> (XEN)    [<ffff828c80128259>] context_switch+0xb79/0xbc0
>>>> (XEN)    [<ffff828c8016753c>] svm_vmexit_handler+0x6ac/0x1a70
>>>> (XEN)    [<ffff828c801160bf>] schedule+0x25f/0x290
>>>> (XEN)    [<ffff828c8015fcbd>] vlapic_has_pending_irq+0x2d/0x70
>>>> (XEN)    [<ffff828c80163dc6>] svm_intr_assist+0x46/0x140
>>>> (XEN)    [<ffff828c801692d4>] svm_stgi_label+0x8/0x14
>>>> (XEN)    
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 2:
>>>> (XEN) Xen BUG at svm.c:599
>>>> (XEN) ****************************************
>>>> (XEN)
>>>> (XEN) Manual reset required ('noreboot' specified)
>>>> 
>>>>   --Tom
>>>> 
>>>> thomas.woller@xxxxxxx  +1-512-602-0059 AMD Corporation - Operating
>>>> Systems Research Center
>>>> 5204 E. Ben White Blvd. UBC1
>>>> Austin, Texas 78741
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-devel
>> 
>> 
>> 
>> 
>> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel