WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] [PATCH] fix gdb debugging of hypervisor

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] [PATCH] fix gdb debugging of hypervisor
From: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
Date: Tue, 11 Dec 2007 17:41:52 +0000
Delivery-date: Tue, 11 Dec 2007 09:44:09 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
There are (at least) two things which prevent one from debugging the
hypervisor with gdb by setting crash_debug=y in xen/Rules.mk.


arch/x86/gdbstub.c uses copy_{from,to}_user to access hypervisor
memory.  This is a bit strange, as the comments nearby suggest: it
does this so that it can reuse the fixup facility which allows
pagefaults to be turned into error returns.

However, evidently since this code was last used by anyone an
additional check was added to copy_{from,to}_user: access_ok is called
to prevent copy_{from,to}_user from being tricked (by an errant guest)
into reading or writing hypervisor memory.  This is not appropriate
for the gdbstubs.  The effect is that all attempts to read and write
hypervisor memory from the gdb session fail.

So in the patch below I have changed the calls to use
__copy_{from,to}_user, bypassing access_ok.


gdb attempts to read inaccessible memory at startup, at least in my
tests.  This ought not to be a problem as the aforementioned page
fault fixup magic ought just to turn these into errors, which gdb
copes with.

However __spurious_page_fault is called fairly early on by the page
fault handler - before looking for a fixup.  And __spurious_page_fault
in turn calls map_domain_page.  map_domain_page asserts !in_irq(),
which assertion can fail when the page fault happens while debugging,
for example if the debugger was entered via the `%' hypervisor console
keystroke.

The best way to deal with this seemed to me to make
__spurious_page_fault always return false when the debugger is
attached (ie, when we are running the debugging stub).  The comments
surrounding __spurious_page_fault seem to suggest that these only
occur when page tables are changed.  If that's true then this change
is correct since the debugging stubs don't change page tables.

However, if general speculative memory accesses may generate spurious
page faults then this change is wrong.  I'm sure someone more familiar
with the contents of the CPU architecture manuals can answer this
question with less effort than me :-).  In that case it may be
necessary to put the debugger check in the very-widely-used
map_domain_page.  (Luckily you only bear the cost if you compile with
-DCRASH_DEBUG.)


With these two changes, I can get gdb hypervisor debugging to work -
at least, in my test I was able to trap into the debugging mode with
`%', attach with gdb, and examine some hypervisor static variables.

I haven't spent any effort on getting it to be able to resume normal
operation after such a %-interrupt, and that didn't work for me
even though it's apparently intended to be supported.


The attached patch also changes the error message printed when
debugging is requested with `%' but no gdb=... was provided on the
command line from dbg_printk (generally a noop) to a printk, and
provides a confirmation that `%' was pressed.  This makes debugger
entry a little more positive and the latter may help diagnosis when
debugger entry fails for some other reason.


For the record, to get to a working gdb prompt I did these things:
  * Set debug=y in Config.mk
  * Set crash_debug=y in xen/Rules.mk
  * Make the changes in the attached patch, and build.
  * Pass gdb=com1 as a hypervisor command line argument
    (I already have com1=38400,8n1 console=com1,vga sync_console)
  * Boot the system with minicom connected from my workstation via
    a null modem cable in the usual way
  * In minicom, give the escape character (^A by default) three times
    to talk to Xen (Xen prints `(XEN) *** Serial input -> Xen...').
  * Press % and observe the messages
     (XEN) '%' pressed -> trapping into debugger
     (XEN) GDB connection activated.
     (XEN) Waiting for GDB to attach...
  * Disconnect from minicom without allowing minicom to send any
    modem control sequences.
  * Start gdb with   gdb /path/to/build/tree/xen/xen-syms  and then
      (gdb) set remotebaud 38400
      (gdb) target remote /dev/ttyS0
      0x614d12ff in ?? ()
      (gdb)
As you can see at least the EIP seen by gdb doesn't seem accurate,
but that's not important to me right now.


Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

Ian.


diff -r 38a45b7c6cb5 xen/common/gdbstub.c
--- a/xen/common/gdbstub.c      Mon Dec 10 11:37:13 2007 +0000
+++ b/xen/common/gdbstub.c      Tue Dec 11 16:00:21 2007 +0000
@@ -512,7 +512,7 @@ __trap_to_gdb(struct cpu_user_regs *regs
 
     if ( gdb_ctx->serhnd < 0 )
     {
-        dbg_printk("Debugger not ready yet.\n");
+        printk("Debugging connection not set up.\n");
         return -EBUSY;
     }
 
@@ -582,6 +582,10 @@ __trap_to_gdb(struct cpu_user_regs *regs
     local_irq_restore(flags);
 
     return rc;
+}
+
+int debugger_currently_attached(void) {
+       return gdb_ctx->connected;
 }
 
 void __init
diff -r 38a45b7c6cb5 xen/common/keyhandler.c
--- a/xen/common/keyhandler.c   Mon Dec 10 11:37:13 2007 +0000
+++ b/xen/common/keyhandler.c   Tue Dec 11 14:54:03 2007 +0000
@@ -275,6 +275,7 @@ extern void perfc_reset(unsigned char ke
 
 static void do_debug_key(unsigned char key, struct cpu_user_regs *regs)
 {
+    printk("'%c' pressed -> trapping into debugger\n", key);
     (void)debugger_trap_fatal(0xf001, regs);
     nop(); /* Prevent the compiler doing tail call
                              optimisation, as that confuses xendbg a
diff -r 38a45b7c6cb5 xen/arch/x86/gdbstub.c
--- a/xen/arch/x86/gdbstub.c    Mon Dec 10 11:37:13 2007 +0000
+++ b/xen/arch/x86/gdbstub.c    Tue Dec 11 17:00:06 2007 +0000
@@ -72,17 +72,21 @@ gdb_arch_read_reg(unsigned long regnum, 
 }
 
 /* Like copy_from_user, but safe to call with interrupts disabled.
-   Trust me, and don't look behind the curtain. */
+   Trust me, and don't look behind the curtain.
+   We use the __ versions to skip the access_ok check, which
+   would otherwise prevent us from accessing hypervisor memory
+   (which is the main point, obviously).
+*/
 unsigned int
 gdb_arch_copy_from_user(void *dest, const void *src, unsigned len)
 {
-    return copy_from_user(dest, src, len);
+    return __copy_from_user(dest, src, len);
 }
 
 unsigned int 
 gdb_arch_copy_to_user(void *dest, const void *src, unsigned len)
 {
-    return copy_to_user(dest, src, len);
+    return __copy_to_user(dest, src, len);
 }
 
 void 
diff -r 38a45b7c6cb5 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c      Mon Dec 10 11:37:13 2007 +0000
+++ b/xen/arch/x86/traps.c      Tue Dec 11 15:59:01 2007 +0000
@@ -913,6 +913,13 @@ static int __spurious_page_fault(
     l2_pgentry_t l2e, *l2t;
     l1_pgentry_t l1e, *l1t;
     unsigned int required_flags, disallowed_flags;
+
+    /* Debugger doesn't change page tables and shouldn't generate
+     * spurious faults.  The debugger's only faults are supposed to be
+     * from copy_{from,to_user}.
+     */
+    if (debugger_currently_attached())
+        return 0;
 
     /* Reserved bit violations are never spurious faults. */
     if ( regs->error_code & PFEC_reserved_bit )
diff -r 38a45b7c6cb5 xen/include/asm-x86/debugger.h
--- a/xen/include/asm-x86/debugger.h    Mon Dec 10 11:37:13 2007 +0000
+++ b/xen/include/asm-x86/debugger.h    Tue Dec 11 15:59:56 2007 +0000
@@ -52,11 +52,13 @@ static inline int debugger_trap_fatal(
 
 /* Int3 is a trivial way to gather cpu_user_regs context. */
 #define debugger_trap_immediate() __asm__ __volatile__ ( "int3" );
+extern int debugger_currently_attached(void);
 
 #else
 
 #define debugger_trap_fatal(v, r) (0)
 #define debugger_trap_immediate() ((void)0)
+#define debugger_currently_attached (0)
 
 #endif
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel