WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: [PATCH 00/13] Nested Virtualization: Overview

Thank you very much for giving me the valuable information I am asking for, 
finally.

> >> Freely (at any time) preload/post-store of VMCS fields is very hard
> >> because VMX only provide access to current VMCS (because it is CPU
> >> register), while SVM may be able to access all VMCBs given that it
> >> is in memory. I can't say it is impossible (one can solve it with
> >> further limitation/complexity), however enforcing those conversion
> >> simply put trickies to VMX code to limiting the time of
> >> pre-load/post-load, and the additional cost of VMCS access.
> >
> > When the VCPU switches between host and guest mode then you need to
> > save the l1 guest state, restore the l2 guest state and vice versa.
>
> No. Switching from L1 to L2 guest, it is a simple restore of L2 guest state
> (VMCLEAR/VMPTRLD). VMX doesn't save L1 state,

So the function hooks prepare4vmentry is pretty cheap and the 'hostsave'
function hook is empty.

> while SVM does require the save of L1 state per my understanding.

SVM does not per architecture. It is the implementation I have taken over from
KVM. It will be replaced with an implementation that only requires a VMSAVE
to save the L1 state. But that won't change the patch series fundamentally as
this change touches SVM code only.

> Intel process can hold VMCS in processor for performance.
>
> Switching from L2 to L1, we need to convert (or store) some VMCS
> information from physical to virtual VMCS.

That is what can be put into the prepare4vmexit function hook.

> But it is limited and only covers the "guest state" and exit information.
> Load of L1 state may be as simple as VMPTRLD (of course it may modify some
> VMCS field upon different situation).

That is what can be put into the 'hostrestore' function hook.

> > This requires a lot of accesses from/to the vmcb/vmcs  unless you have
> > a lazy switching technique, do you ?
>
> Yes, the Intel hardware already did lazy switching thru VMPTRLD.
> And the access of VMCS is depending on the L1 guest modification, only
> dirty VMCS fields needs to be updated.

Sounds like you need a 'shadow vmcs' that holds the l2 guest state.

> In the majority case, the VM exit from L2 guest will be handled by root VMM
> directly.

Same on SVM. root VMM handles everything L1 guest does not intercept plus some
special intercepts such as interrupts, nmi's, page fault/nested page faults.

> One example is external interrupt, which doesn't need to access 
> rest of VMCS fields except the reason, but the wrapper makes the access a
> must, which I can't agree.

Which conversion function do you mean by 'wrapper' here ? Why does it require
additional VMCS field accesses ?

Can you explain this in detail, please ?


> >> Another portion of so called common code are actually SVM code only.
> >> Here are part of them:
> >>
> >>
> >>
> >> +
> >> +static enum nestedhvm_vmexits
> >> +nestedhvm_vmexit_msr(unsigned long *msr_bitmap, uint32_t msr,
> >> bool_t write) +{ + bool_t enabled;
> >> +  unsigned long *msr_bit = NULL;
> >> +
> >> +  /*
> >> +   * See AMD64 Programmers Manual, Vol 2, Section 15.10 +  *
> >> (MSR-Bitmap Address). +     */
> >> +  if ( msr <= 0x1fff )
> >> +          msr_bit = msr_bitmap + 0x0000 / BYTES_PER_LONG;
> >> +  else if ( (msr >= 0xc0000000) && (msr <= 0xc0001fff) )
> >> +          msr_bit = msr_bitmap + 0x0800 / BYTES_PER_LONG;
> >> +  else if ( (msr >= 0xc0010000) && (msr <= 0xc0011fff) )
> >> +          msr_bit = msr_bitmap + 0x1000 / BYTES_PER_LONG;
> >
> > Why does above code snippet not work on Intel CPUs ?
>
> It is said even "Intel processor" is following AMD64 manual, isn;t it?
> msr_bitmap in Intel doesn't have a fixed bitmap position, rather it scan
> the entire table to decide which MSR to automatically save/restore for
> performance given that we only put single digital MSRs for that purpose.
> BTW, Intel does implement MSRs large than 0x1fff such as 0x107D0.
> Preassuming their usage model for now is risky.

Ok, I will think about this.

> >> +/* Virtual GIF */
> >> +int
> >> +nestedsvm_vcpu_clgi(struct vcpu *v)
> >> +{
> >> +  if (!nestedhvm_enabled(v->domain)) {
> >> +          hvm_inject_exception(TRAP_invalid_op, 0, 0);
> >> +          return -1;
> >> +  }
> >> +
> >> +  if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> >> +          return 0;
> >> +
> >> +  /* clear gif flag */
> >> +  vcpu_nestedhvm(v).nh_gif = 0;
> >> +  local_event_delivery_disable(); /* mask events for PV drivers */
> >> +  return 0; +}
> >> +
> >> +int
> >> +nestedsvm_vcpu_stgi(struct vcpu *v)
> >> +{
> >> +  if (!nestedhvm_enabled(v->domain)) {
> >> +          hvm_inject_exception(TRAP_invalid_op, 0, 0);
> >> +          return -1;
> >> +  }
> >> +
> >> +  /* Always set the GIF to make hvm_interrupt_blocked work. */
> >> +  vcpu_nestedhvm(v).nh_gif = 1;
> >> +
> >> +  if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> >> +          return 0;
> >> +
> >> +  local_event_delivery_enable(); /* unmask events for PV drivers */
> >> +  return 0; +}
> >> +
> >
> > The reason to leave this in the generic code is what Keir stated out
> > as feedback to the second patch series:
> > http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00280.html
> > (What was patch #10 there is  patch #8 in latest patch series).
>
> While, I didn;t read in this way. Even the function itself is wrapped, they
> should go to SVM tree if not go together with the caller.
>
> Anyway, to me given that nested SVM & VMX is still on the very beginning of
> development, I can only say yes to those wrappers that have clear
> understanding to both side.

Good to know. See my offer below.

> I would rather leave those uncertain wrappers to future, once the basic
> shape of nested virtualization is good and stable enough, i.e. lightweight
> wrapper.  We have plenty of performance work ahead such as virtual VTd
> support, enhanced PV driver for nested etc. Excessive wrapper is simple a
> burden to nested VMX developers for those future features.
>
> Qing will post his patch today or tomorrow for your reference if you want.

Thanks. May I take code from there and add into my patch series ?

> C++ abstracts better than C, but C++ never replaces C in places like Linux.

It's not necessary. When you truely understand the concept behind interface
and implementation you can do that in any language you know.

> BTW, can you add SR-IOV test into your patch test system to avoid
> regression?

Yes.

> Thx, Eddie



-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel