WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] RE: BUG() w/ HVM win2k3 64b

To: "Woller, Thomas" <thomas.woller@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] RE: BUG() w/ HVM win2k3 64b
From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Date: Thu, 10 Jan 2008 20:55:51 +0000
Delivery-date: Thu, 10 Jan 2008 12:56:29 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <683860AD674C7348A0BF0DE3918482F6069EE351@xxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AchTvZRmzXPzqiDHRt2NZV4oQGTBBwAAgByAAALmgp8=
Thread-topic: [Xen-devel] RE: BUG() w/ HVM win2k3 64b
User-agent: Microsoft-Entourage/11.3.6.070618
16489 and 16491 are obviously suspects. You might also try current tip
(-rc5) as some emulator bugs were fixed in the last day or so. Was your
successful 16488 test stressful enough to be confident that it's not a false
negative (for the bug)?

 -- Keir

On 10/1/08 19:36, "Woller, Thomas" <thomas.woller@xxxxxxx> wrote:

>> We have seen failures with changesets >= 16492, latest tested
>> was 16676 that fails, and c/s 16488 passes without issue.
> clarification to my email, was thinking that c/s 16491 was the problem
> (not 16492 as I indicated),
> 
> 16492 has failed tests, and 16491 c/s is running fine right now - but
> need more test time on that c/s to see if it will fail.
> 
> So, just to be clear, still don't have a handle on which specific c/s is
> the problem, but still seems around 1649x-ish
> 
> Tom
> 
> 
>> -----Original Message-----
>> From: Woller, Thomas
>> Sent: Thursday, January 10, 2008 1:18 PM
>> To: xen-devel@xxxxxxxxxxxxxxxxxxx
>> Cc: Woller, Thomas
>> Subject: BUG() w/ HVM win2k3 64b
>> 
>> We are observing a BUG() with 3.2/unstable.  This problem
>> takes a number of hours to reproduce - anywhere from 4 to 12+
>> hours, and only with windows 2003 64b HVM multi-vcpu guest so
>> far under heavy stress load.
>> 
>> Only reproduceable using Shadow Paging, we have not see the
>> problem using nested paging.
>> 
>> We have seen failures with changesets >= 16492, latest tested
>> was 16676 that fails, and c/s 16488 passes without issue.
>> 
>> We have tried to narrow down the issue to a specific
>> changeset, and overnight testing seems to indicate that
>> changeset 14692 might be the culprit.  Not quite confirmed
>> until additional testing completes tomorrow on c/s 14691 and
>> 14690.  We will know more EOD thursday if these 2 pass testing.
>> 
>> We will startup some testing using 16701 also to make sure
>> that it is not resolved with post 16676 patches.  I'll also
>> try to startup a test with removing c/s 16492 from 16701 base
>> and see if that helps this specific problem.  All of this
>> testing though will not finish till towards end of next week
>> due to largescale move of lab/offices starting tomorrow - and
>> with 3.2 almost out, would like to see this figured out
>> before release.
>> 
>> Reproduced on 1P family11h and family10h systems, but unable
>> to reproduce on 2P+ systems so far.  We don't believe we are
>> seeing any sort of h/w anomoly at this point.   have not
>> tried reproducing on VT boxes.
>> 
>> We are able to reproduce using 2 64b windows Guests,
>> currently we are using 2 or 4 VCPUs, but have not tried
>> reducing to single VCPU.
>> 
>> Any debug thoughts are appreciated.
>> 
>> Looks like the dst.mem.seg is invalid for the read() in Grp5
>> case 2/4 (jmp/call), which results in the BUG() later.
>> 
>> X86_emulate:
>> ...
>>     case 0xff: /* Grp5 */
>>         switch ( modrm_reg & 7 )
>>         {
>>         case 0: /* inc */
>>             emulate_1op("inc", dst, _regs.eflags);
>>             break;
>>         case 1: /* dec */
>>             emulate_1op("dec", dst, _regs.eflags);
>>             break;
>>         case 2: /* call (near) */
>>         case 4: /* jmp (near) */
>>             dst.type = OP_NONE;
>>             if ( (dst.bytes != 8) && mode_64bit() )
>>             {
>>                 dst.bytes = op_bytes = 8;
>>                 if ( dst.type == OP_REG )
>>                     dst.val = *dst.reg;
>>                 else if ( (rc = ops->read(dst.mem.seg, dst.mem.off,
>>                                           &dst.val, 8, ctxt)) != 0 )
>>                     goto done;
>>          
>> 
>> Guest config:
>> HVM Windows 2003 64b
>> vcpus=4
>> memory=1024
>> pae/acpi/apic=1
>> 
>> BUG() info.
>> (XEN) Xen BUG at svm.c:599
>> (XEN) ----[ Xen-3.2.0-rc3  x86_64  debug=n  Tainted:    C ]----
>> (XEN) CPU:    2
>> (XEN) RIP:    e008:[<ffff828c80165205>]
>> svm_get_segment_register+0x145/0x170
>> (XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor
>> (XEN) rax: ffff8300a6e0ff28   rbx: ffff8300a7dde000   rcx:
>> 00000000a6e0fa28
>> (XEN) rdx: ffff830b14f09f54   rsi: 00000000a6e0fa28   rdi:
>> ffff8300a7ddc080
>> (XEN) rbp: ffff830b14f09f54   rsp: ffff8300a6e0f850   r8:
>> ffff8300a6e0fc98
>> (XEN) r9:  ffff8300a6e0f8c8   r10: 0000000000000000   r11:
>> 0000000000000001
>> (XEN) r12: ffff8300a6e0f8c8   r13: 0000000000000001   r14:
>> 00000000a6e0fa28
>> (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4:
>> 00000000000006f0
>> (XEN) cr3: 000000003b75b000   cr2: 000000000247f000
>> (XEN) ds: 0000   es: 0000   fs: 0053   gs: 002b   ss: 0000   cs: e008
>> (XEN) Xen stack trace from rsp=ffff8300a6e0f850:
>> (XEN)    ffff830b14f09f54 0000000000000000 ffff828c80178eea
>> ffff8300a6e0fc98
>> (XEN)    ffff828c80179d0c ffff8300a6e0f8d0 ffff8300a6e0fb20
>> 0000000000000001
>> (XEN)    0000000000000008 ffff8300a6e0fc98 ffff8300a6e0fc98
>> 0000000000000004
>> (XEN)    ffff828c80179e46 0000000000000000 fffffadff3c54040
>> fffffadff04cbde0
>> (XEN)    0000000000000002 ffff828c801c18e0 0000000000000008
>> 0000000000000000
>> (XEN)    ffff828c80146be5 0000000000000001 ffff8300a6e0ff28
>> 000000003a4002e7
>> (XEN)    00000002a6e0fb87 ffff8300a6e0fbc8 0000001100000000
>> 0000000080a572b0
>> (XEN)    ffff8300a6e0f9d8 ffff828c801c18e0 0000000000000000
>> 0000000000000000
>> (XEN)    00000006a6e0fbc8 fffff80000812be8 0000468c8015a2b0
>> ffff8300a6e0fb03
>> (XEN)    0000000000000296 0000000000000002 ffff8300a7dd2080
>> 0000000000000000
>> (XEN)    ffff828c8013974a 0000000000000000 00000000ffffffff
>> ffff830000000046
>> (XEN)    ffff8300a7dd37e0 fffffadff04cbe00 fffffadff04cbd70
>> ffff8300a7dcd7e0
>> (XEN)    ffff828c80161206 fffff80000341070 fffffadff410d040
>> 0000000000000000
>> (XEN)    fffffadff41171f0 0000000000000080 fffffadff35ce040
>> fffff78000000008
>> (XEN)    0000000000000000 0000000000000000 fffffadff35ce040
>> fffffadff1a73010
>> (XEN)    fffffadff3699f90 fffffadff3699f90 fffffadff35ce040
>> fffffadff3c54040
>> (XEN)    0000000000000003 fffff80001272bae 0000000000000000
>> 0000000000000246
>> (XEN)    fffffadff04cbd70 0000000000000000 5555555555555555
>> 5555555555555555
>> (XEN)    5555555555555555 5555555555555555 00000001801324cd
>> 0000000000000004
>> (XEN)    ffffffffffffffff ffff8300a7ddc080 000fffff80001272
>> ffff8300a6e0fba4
>> (XEN) Xen call trace:
>> (XEN)    [<ffff828c80165205>] svm_get_segment_register+0x145/0x170
>> (XEN)    [<ffff828c80178eea>] hvm_get_seg_reg+0x3a/0x40
>> (XEN)    [<ffff828c80179d0c>] hvm_translate_linear_addr+0x3c/0xa0
>> (XEN)    [<ffff828c80179e46>] hvm_read+0x36/0xe0
>> (XEN)    [<ffff828c80146be5>] x86_emulate+0x3f35/0x9940
>> (XEN)    [<ffff828c8013974a>] smp_send_event_check_mask+0x3a/0x40
>> (XEN)    [<ffff828c80161206>] vlapic_write+0x546/0x7e0
>> (XEN)    [<ffff828c8017f3f5>]
>> sh_gva_to_gfn__shadow_4_guest_4+0xc5/0x150
>> (XEN)    [<ffff828c80152d27>] __hvm_copy+0x97/0x280
>> (XEN)    [<ffff828c8017f2ba>] guest_walk_tables+0x80a/0x880
>> (XEN)    [<ffff828c8017a206>] shadow_init_emulation+0x126/0x160
>> (XEN)    [<ffff828c80182bd5>]
>> sh_page_fault__shadow_4_guest_4+0xdb5/0xe80
>> (XEN)    [<ffff828c80128259>] context_switch+0xb79/0xbc0
>> (XEN)    [<ffff828c8016753c>] svm_vmexit_handler+0x6ac/0x1a70
>> (XEN)    [<ffff828c801160bf>] schedule+0x25f/0x290
>> (XEN)    [<ffff828c8015fcbd>] vlapic_has_pending_irq+0x2d/0x70
>> (XEN)    [<ffff828c80163dc6>] svm_intr_assist+0x46/0x140
>> (XEN)    [<ffff828c801692d4>] svm_stgi_label+0x8/0x14
>> (XEN)    
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 2:
>> (XEN) Xen BUG at svm.c:599
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Manual reset required ('noreboot' specified)
>> 
>>   --Tom
>> 
>> thomas.woller@xxxxxxx  +1-512-602-0059
>> AMD Corporation - Operating Systems Research Center
>> 5204 E. Ben White Blvd. UBC1
>> Austin, Texas 78741
>> 
>> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel