AMD Implementation Specification for Xen platform abstractions for HW assisted virtualization Revision 1.6 June 3, 2005 elsie.wahlig@xxxxxxx (Engineering Lead) thomas.woller@xxxxxxx (Engineer) Contents 1 Introduction 2 Overview 3 Hypervisor 3.1 Hypervisor Non-VMX specific files 3.2 Hypervisor VMX specific files 3.3 Hypervisor VMX files modified 3.4 Structure Specific Modifications 4 Domain Builder tools Appendix A (Specific File listings) 1.0 Hypervisor 2.0 Domain Builder tools 1 Introduction AMD Secure Virtual Machine (SVM) architecture, codenamed Pacifica, is designed to provide enterprise-class server virtualization technology that facilitates virtualization development and deployment. Please see the "Secure Virtual Machine Architecture Reference Manual", PID #33047 for specific Pacifica/SVM details at http://www.amd.com (search www.amd.com for 33047). This project consists of modifying the Xen source code base by merging redundant functionality and creating a unified code base containing shared VM logic between the VT/VMX (Vanderpool Technology - Virtual Machine Extensions) and the Pacific/SVM sources. The current configuration options concerning VMX (CONFIG_VMX) will be removed with all VM logic compiled by default. Xen source code containing the generic VMX/SVM hardware assist logic will be "hval" (Hardware Virtual Abstraction Layer) in this document. The term "sanitized" will be used to describe files that currently contain VMX specific code, and that have been reworked to contain only non-specific generalized VMX/SVM logic. 2 Overview The following descriptions contain information concerning generalized design, as well as specific details concerning files and functions that require modification in order to implement a generic solution for the platforms that implement enhanced virtualization assistance through the processor (VMX and SVM). The modifications will be limited to the "x86" (32bit and 64bit) code base and will not affect other platform architectures. The approach is to provide support for the platform virtual enhancement with the user unaware of the specific underlying hardware (VMX or SVM). The two areas of specific interest include the Hypervisor (cpu context code), and the domain builder tools (python/libxc). The Privileged (Dom0) and the Unprivileged (DomU) domains do not contain specific hardware platform virtualization logic (VMX or SVM), but rather can be launched with specific configuration options depending upon the underlying VM platform. The device model (ioemu) code does contain CONFIG_VMX specific code that is generic and will be removed. The Domain builder tools (libxc/python), will not contain hardware assisted logic, except for initialization of the hardware environment identification (init_intel/init_amd). The hypervisor logic will be divided up to include several areas that will be accessed via a simple function table interface, instead of the current VMX specific direct function calls between source files. The proposed approach is to create a simple function table (same as seen in much of the linux sources) for each VM hardware assisted domain (VMX or SVM). This function table will be populated at run-time (initialization) for the specific platform and will include all of the shared generic functions currently necessary. This table can be expanded as required. The table will reside within the exec_domain (see below). DESIGN NOTES: In reality further levels of abstraction could be applied than are described here, specifically to the vmx.c and vmx_vmcs.c files, which contain more specific VMX specific detail. Nearly all of the current vmx code *could* be used on the SVM design also. The changes would be fairly invasive to the current code, and may not be prudent at this point. NOTES: The goals is to make the implementation specific differences between the VT and Pacifica architecture appear abstract. These differences are contained to the instructions and control data structure. These differences will be hidden from the user, and no configuration decisions at the user level should be required. 3 Hypervisor The files that require attention within the Hypervisor code base can be categorized into three categories: (1) Non-VMX files containing added VMX specific functionality that can be sanitized for hval platform environment. (2) VMX specifically created files that can be updated to use hval platform environment. (3) VMX specific files that will have VMX functions added, due to the sanitation of files from the above two categories. The files in category (1) will be modified to use the platform abstractions. Any calls to the specific platform dependent code will be via a function table (see below). Other VMX specific calls (e.g. vmx_hooks_assist() ) will change names to a "hval_" generic name. Other files in category (1) which currently include some VMX logic that will be abstracted include: domain.c dom0_ops.c shadow.c setup.c The current Hypervisor VMX specific files that fall into category (2) include: vmx_platform.c (new file hval_platform.c) vmx_intercept.c (new file hval_intercept.c) vmx_io.c (new file hval_io.c) The very specific platform dependent files in category (3) will be maintained. These are vmx.c and vmx_vmcs.c, with corresponding future svm.c and svm_vmcb.c files to be added. The platform specific files need not use HVAL layer for performance benefit, although will use some hval_ specifically named functions and structures. Include files will also be modified accordingly with only vmx.h, vmx_vmcs.h, svm.h and svm_vmcb.h to contain platform specific content. 3.1 Hypervisor Non-VMX specific files The common interface functionality within the hypervisor domain code will be modified to include the following interfaces: "Final setup" of a guest domain "launching/starting" a domain "Relinquish resources" for a domain "stop vm" for a hval domain "save cpu regs" (vmx specific) The function table will subsequently contain the following generic interface functions: do_launch (currently maps to arch_vmx_do_launch()) final_setup_guest (currently maps to vmx_final_setup_guest()) relinquish_resources (currently maps to vmx_final_setup_guest()) stop_hval (currently maps to stop_vmx()) save_hval_cpu_regs (currently maps to save_vmx_cpu_regs()) 3.2 Hypervisor VMX specific files The current vmx specific files that can be most easily genericised are vmx_platform.c, vmx_intercept.c, vmx_io.c. vmx_platform.c (new name hval_platform.c) The only externally declared entry point within this file will be handle_mmio(). The current function store_cpu_user_regs() which contains only __vmread() macro calls will be move to a hval.c specific file. There are 8 __vmread() macro calls that must be replaced with function table calls in order to provide access to similar SVM variables. *MORE TO COME SOON* vmx_intercept.c (new name hval_intercept.c) There are approximately 8 locations that currently use the vmx_platform.vmx_handler variable. These places will change to use the hval_struct.platform.handler instead. vmx_io.c (new name hval_io.c) There are __vmread() macro calls that must be replaced with the proper generic equivalent in order to provide access to similar SVM variables. The vmx_stts() macros from vmx.h must be modified (contains __vmread/__vmwrite/__vm_set_bit). *MORE TO COME SOON* 3.3 Hypervisor VMX files Please see the specific file listings below. Generally, changes include the arch_vmx structure, and some of the function calls that are now not VMX specific. Modifications to the vmx.c and vmx_vmcs.c files were kept to a minimum. As stated previously, a complete rewrite is possible with generalization of the vmx.c, and vmx_vmcs.c files also, but this work would be fairly extensive in terms of code rework and better suited for a follow-up continuation of the abstractions and code cleanup. 3.4 Structure Specific Modifications The generic information will contain an "arch_hval" structure which will contain the virtual platform definition structure. The platform structure will contain a pointer to the hval function table, which will provide the necessary common interface for both the VMX and the SVM environments. The function table will be a single table populated at initialization time depending upon the virtual platform (VMX or SVM). struct arch_hval_struct { union { struct vmcs_struct *vmcs; //vmx struct vmcb_struct *vmcb; //svm } unsigned long flags; /* VMCS/VMCB flags */ unsigned long cpu_cr2; unsigned long cpu_cr3; unsigned long cpu_state; struct virtual_platform_def hval_platform; } struct virtual_platform_def { unsigned long *real_mod_data; unsigned long *shared_page_va, struct hval_virtpit_t pit; struct hval_handler_t handler; struct mi_per_cpu_info mpci; struct hval_platform_function_table hval_table; } // // Fill in this table with specific functions for the platform // during intialization. // The information below indicates the functions // that would be associated with that particular entry in the // function call table following initialization. // struct hval_platform_function_table { // *MORE TO COME SOON* // Exact definition of function table coming }; 4 Domain Builder tools The domain builder will begin startup of a generic "hval" domain. The modifications appear to be fairly straightforward. Generally there does seem to be a need to retain the type of guest within the tools/domain builder context, the VGCF_VMX_GUEST bit will be retained and two new macro definitions will be created, VGCF_SVM_GUEST (SVM specific), and VGCF_HVAL_GUEST (both). Currently the VGCF_VMX_GUEST flag is used in the xc trace functions, which are can be modified. The VGCF_VMX_GUEST bit is also passed via hypercall (do_dom0_op() DOM0_SETDOMAININFO for example), so comparable code within the hypervisor will also be necessary to match. The hypervisor code (domain.c arch_set_info_guest()) checks the VGCF_VMX_GUEST bit which is also scheduled for modification. Appendix A Specific File Modification listings (~Changeset 1.1618) 1.0 Specific File listings (not all files listed) dom0_ops.c Arch_getdomaininfo_ctxt() Replace check for VMX_DOMAIN with HVAL_DOMAIN, replace call to save_vmx_cpu_user_regs() with save_hval_cpu_user_regs() from function table. Replace VGCF_VMX_GUEST with VGCF_HVAL_GUEST. domain.c vmx arch specific calls can either be moved to vmx.c/svm.c or a new file created (example: vmx_domain.c/svm_domain.c). Move arch_vmx_do_launch() to vmx.c, new call to function table do_launch. Move and rename vmx_final_setup_guest() to vmx.c. Change call from vmx_final_setup_guest() to function table final_setup_guest() (NOTE: final_setup_guest could be generic, but must add new construct_vmc_struct() and alloc_vmc_struct() functions to the function call table, for now leave as separate functions. Move arch_vmx_do_launch() to vmx.c in arch_set_info_guest() replace tests for VGCF_VMX_GUEST with VGCF_HVAL_GUEST. and setup the P2M map with the initially provided page tables, and call Add code to test for !HVAL_DOMAIN and call SET_FAST_TRAP an switch_kernel_stack() functions. In context_switch() add check for !HVAL_DOMAIN and call load_LDT()/load_segments(). vmx_io.c (replace filename with hval_io.c) rename load_cpu_user_regs to hval_load_cpu_user_regs. Rename vmx_io_assist() to hval_io_assist() (only called internally within vmx_io_assist.c). Change access to mpci to new generic structure. Change access to shared_page_va to new generic structure (3 locations). Rename vmx_hooks_assist() to hval_hooks_assist() (function in vmx_intercept.c) Change ARCH_VMX_IO_WAIT to ARCH_HVAL_IO_WAIT. Change arch_vmx.flags to generic equivalent access. Change arch_vmx.vmx_platform.mpci.mmio_target to generic equivalent access. Change vmx_pit and vmx_virtpt_t to generic equivalent access. Change vmx_intr_assist() to hval_intr_assist(). Change __vmread(VM_ENTRY_INTR_INFO_FIELD) and check for intr valid mask call to a single function table call, with new vmx function to obtain, then check interrupt fields, add svm function with similar functionality but with different bit access. Change __vmread(GUEST_EFLAGS) to function table call. Change __vmwrite(VM_ENTRY_INTR_INFO_FIELD) and __vmwrite(GUEST_INTERRUPTIBILITY_INFO) and replace with call to a single function table call, with new vmx function to obtain, then check interrupt fields, add svm function with similar functionality but with different bit access. Change TRC_VMX_INT to HVAL_VMX_INT. Change call to vmx_intr_assist() to hval_intr_assist(). shadow.c vmx_shadow_clear_state() currently labelled vmx specific. Change to hval_shadow_clear_state() and change call from vmx.c.svm.c. setup.c within init_amd() add start_svm() call. Retain the distinction of start_vmx() call since the logic is contained with in the init_intel (and init_amd) functions. hval_platform.c The only externally declared entry point within this file will be handle_mmio(). The current function store_cpu_user_regs() which contains only __vmread() macro calls will be move to a vmx*.c specific file, and the call to store_cpu_user_regs() will be replaced with a call into the hval function table. There are 8 __vmread() macro calls that will be replaced with function table calls in order to provide access to similar SVM variables. svm.c (new file) svm_vmcb.c (new file) add specific vmcb functionality. vmx.c change arch_vmx accesses to generic equivalent. Any external calls change that currently access vmx_io.c, vmx_intercept.c, vmx_platform.c. vmx_vmcs.c change arch_vmx accesses to generic equivalent. Any external calls change that currently access vmx_io.c, vmx_intercept.c, vmx_platform.c. x86_32/entry.S add SVM specific asm wrappers for: ENTRY(svm_asm_do_vmrun) " calls VMRUN/VMSAVE/VMLOAD internally. Note: The SVM h/w returns control to the instruction following the VMRUN instruction. Therefore svm exit handling functionality logic is located immediately after the VMRUN instruction. x86_64/entry.S add SVM specific asm wrappers for: ENTRY(svm_asm_do_vmrun) " calls VMRUN/VMSAVE/VMLOAD internally. Note: The SVM h/w returns control to the instruction following the VMRUN instruction. Therefore svm exit handling functionality logic is located immediately after the VMRUN instruction. xen/include/asm-x86/domain.h struct arch_exec_domain - rearrange structures as previously defined. X86_64/traps.c add check for HVAL_DOMAIN and show registers. include/asm-x86/shadow.h prototype for hval_shadow_clear_state(). update_pagetables() - assign local var paging_enabled based upon if HVAL_DOMAIN(). include/asm-x86/svm.h lots of prototype information. include/asm-x86/hval_cpu.h (rename file) any virtual cpu state structs and macros. include/asm-x86/hval_intercept.h (renamed file) io intercept handler prototype. struct including i/o intercept addr/offset and specific action code. include/asm-x86/hval_platform.h (renamed file) structure for specific instruction information (multiple operands stored). memory mapped i/o specific structures and prototypes. include/asm-x86/hval_virpit.h (renamed file) virtual programmable interrupt timer structures, macros and prototypes. include/asm-x86/svm_vmcb.h Prototypes for svm do_launch(), svm dump_vmcb(), svm construct_vmcb(). Vmcb encodings definitions. Vmcb debug prototypes for logging/printk, and for svm_bug() which displays registers when domain has effectively hit a unforeseen condition. include/public/trace.h replace TRC_VMX with TRC_HVAL (0x00100000). replace VMX class trace events with generic: TRC_HVAL_VMEXIT, TRC_HVAL_VECTOR, TRC_HVAL_INT. include/xen/perfc_defn.h 2.0 Domain Builder tools (not complete) tools/python/xen/lowlevel/xc/xc.c replace pyxc_vmx_build() function name with pyxc_hval_build() entry into pyxc_methods(). Replace call to xc_vmx_build() with xc_hval_build(). In pyxc_methods[] replace "vmx_build" with "hval_build" calling new function "pyxc_hval_build". tools/python/xen/xend/XendDomainInfo.py replace is_vmx with is_hval. Replace check for image_name from "vmx" to "hval". Setting self.is_vmx change to self.is_svm. Checking for ostype == "vmx" replacing with "hval" Replacing check for self.is_vmx: with if self.is_hval. Replacing function call to self.create_vmx_model() with self.create_hval_model(). Change name of create_vmx_model(self) to create_hval_model(self). Replace error messages containing vmx with hval. Within def pgtable_size() replace is_vmx with is_hval. Replace vm_image_vmx() with vm_image_hval(). Parameter within vm.create_domain() replace "vmx" with "hval". Parameter within add_image_handler() replace "vmx" with "hval", and replace vm_image_vmx with vm_image_hval. tools/python/xen/xend/create.py replace configure_vmx() function name with configure_hval(). Replace function call to configure_vmx() with configure_hval(). tools/xentrace/formats replace VMX with HVAL for HVAL_VMEXIT, HVAL_VECTOR and HVAL_INT definitions for tracing. tools/examples/Makefile replace xmexample.vmx with xmexample.hval in the XEN_CONFIGS += line. tools/examples/xmexample.svm change builder=vmx to builder=hval. change name=ExampleVMXDomain to name=ExampleHVALDomain. change xmeexample.vmx to xmexample.hval. tools/libxc/Makefile replace xc_vmx_build.c with xc_hval_build.c tools/libxc/xc_ptrace_core.c replace VGCF_VMX_GUEST with VGCF_HVAL_GUEST in 2 locations. tools/libxc/xc.h add xc_hval_build() prototype to file. tools/libxc/xc_ptrace.c replace VGCF_VMX_GUEST with VGCF_HVAL_GUEST in 2 locations. tools/libxc/xc_vmx_build.c (change name to xc_hval_build.c) retain vmx_identify() and add svm_identify() functions to allow detection of vmx and svm platforms. Replace xc_vmx_build() function name with xc_hval_build(). Within xc_hval_build() function replace VGCF_VMX_GUEST with VGCF_HVAL_GUEST. tools/ioemu/iodev (i8254.c, exec-all.h, monitor.c, vl.c) leave as VMX at this time. Allow TARGET_VMX to enable the device emulation code.