[Xen-devel] Re: [PATCH 00/13] Nested Virtualization: Overview

Thank you very much for giving me the valuable information I am asking for, 
finally.

> >> Freely (at any time) preload/post-store of VMCS fields is very hard
> >> because VMX only provide access to current VMCS (because it is CPU
> >> register), while SVM may be able to access all VMCBs given that it
> >> is in memory. I can't say it is impossible (one can solve it with
> >> further limitation/complexity), however enforcing those conversion
> >> simply put trickies to VMX code to limiting the time of
> >> pre-load/post-load, and the additional cost of VMCS access.
> >
> > When the VCPU switches between host and guest mode then you need to
> > save the l1 guest state, restore the l2 guest state and vice versa.
>
> No. Switching from L1 to L2 guest, it is a simple restore of L2 guest state
> (VMCLEAR/VMPTRLD). VMX doesn't save L1 state,

So the function hooks prepare4vmentry is pretty cheap and the 'hostsave'
function hook is empty.

> while SVM does require the save of L1 state per my understanding.

SVM does not per architecture. It is the implementation I have taken over from
KVM. It will be replaced with an implementation that only requires a VMSAVE
to save the L1 state. But that won't change the patch series fundamentally as
this change touches SVM code only.

> Intel process can hold VMCS in processor for performance.
>
> Switching from L2 to L1, we need to convert (or store) some VMCS
> information from physical to virtual VMCS.

That is what can be put into the prepare4vmexit function hook.

> But it is limited and only covers the "guest state" and exit information.
> Load of L1 state may be as simple as VMPTRLD (of course it may modify some
> VMCS field upon different situation).

That is what can be put into the 'hostrestore' function hook.

> > This requires a lot of accesses from/to the vmcb/vmcs  unless you have
> > a lazy switching technique, do you ?
>
> Yes, the Intel hardware already did lazy switching thru VMPTRLD.
> And the access of VMCS is depending on the L1 guest modification, only
> dirty VMCS fields needs to be updated.

Sounds like you need a 'shadow vmcs' that holds the l2 guest state.

> In the majority case, the VM exit from L2 guest will be handled by root VMM
> directly.

Same on SVM. root VMM handles everything L1 guest does not intercept plus some
special intercepts such as interrupts, nmi's, page fault/nested page faults.

> One example is external interrupt, which doesn't need to access 
> rest of VMCS fields except the reason, but the wrapper makes the access a
> must, which I can't agree.

Which conversion function do you mean by 'wrapper' here ? Why does it require
additional VMCS field accesses ?

Can you explain this in detail, please ?


> >> Another portion of so called common code are actually SVM code only.
> >> Here are part of them:
> >>
> >>
> >>
> >> +
> >> +static enum nestedhvm_vmexits
> >> +nestedhvm_vmexit_msr(unsigned long *msr_bitmap, uint32_t msr,
> >> bool_t write) +{ + bool_t enabled;
> >> +  unsigned long *msr_bit = NULL;
> >> +
> >> +  /*
> >> +   * See AMD64 Programmers Manual, Vol 2, Section 15.10 +  *
> >> (MSR-Bitmap Address). +     */
> >> +  if ( msr <= 0x1fff )
> >> +          msr_bit = msr_bitmap + 0x0000 / BYTES_PER_LONG;
> >> +  else if ( (msr >= 0xc0000000) && (msr <= 0xc0001fff) )
> >> +          msr_bit = msr_bitmap + 0x0800 / BYTES_PER_LONG;
> >> +  else if ( (msr >= 0xc0010000) && (msr <= 0xc0011fff) )
> >> +          msr_bit = msr_bitmap + 0x1000 / BYTES_PER_LONG;
> >
> > Why does above code snippet not work on Intel CPUs ?
>
> It is said even "Intel processor" is following AMD64 manual, isn;t it?
> msr_bitmap in Intel doesn't have a fixed bitmap position, rather it scan
> the entire table to decide which MSR to automatically save/restore for
> performance given that we only put single digital MSRs for that purpose.
> BTW, Intel does implement MSRs large than 0x1fff such as 0x107D0.
> Preassuming their usage model for now is risky.

Ok, I will think about this.

> >> +/* Virtual GIF */
> >> +int
> >> +nestedsvm_vcpu_clgi(struct vcpu *v)
> >> +{
> >> +  if (!nestedhvm_enabled(v->domain)) {
> >> +          hvm_inject_exception(TRAP_invalid_op, 0, 0);
> >> +          return -1;
> >> +  }
> >> +
> >> +  if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> >> +          return 0;
> >> +
> >> +  /* clear gif flag */
> >> +  vcpu_nestedhvm(v).nh_gif = 0;
> >> +  local_event_delivery_disable(); /* mask events for PV drivers */
> >> +  return 0; +}
> >> +
> >> +int
> >> +nestedsvm_vcpu_stgi(struct vcpu *v)
> >> +{
> >> +  if (!nestedhvm_enabled(v->domain)) {
> >> +          hvm_inject_exception(TRAP_invalid_op, 0, 0);
> >> +          return -1;
> >> +  }
> >> +
> >> +  /* Always set the GIF to make hvm_interrupt_blocked work. */
> >> +  vcpu_nestedhvm(v).nh_gif = 1;
> >> +
> >> +  if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> >> +          return 0;
> >> +
> >> +  local_event_delivery_enable(); /* unmask events for PV drivers */
> >> +  return 0; +}
> >> +
> >
> > The reason to leave this in the generic code is what Keir stated out
> > as feedback to the second patch series:
> > http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00280.html
> > (What was patch #10 there is  patch #8 in latest patch series).
>
> While, I didn;t read in this way. Even the function itself is wrapped, they
> should go to SVM tree if not go together with the caller.
>
> Anyway, to me given that nested SVM & VMX is still on the very beginning of
> development, I can only say yes to those wrappers that have clear
> understanding to both side.

Good to know. See my offer below.

> I would rather leave those uncertain wrappers to future, once the basic
> shape of nested virtualization is good and stable enough, i.e. lightweight
> wrapper.  We have plenty of performance work ahead such as virtual VTd
> support, enhanced PV driver for nested etc. Excessive wrapper is simple a
> burden to nested VMX developers for those future features.
>
> Qing will post his patch today or tomorrow for your reference if you want.

Thanks. May I take code from there and add into my patch series ?

> C++ abstracts better than C, but C++ never replaces C in places like Linux.

It's not necessary. When you truely understand the concept behind interface
and implementation you can do that in any language you know.

> BTW, can you add SR-IOV test into your patch test system to avoid
> regression?

Yes.

> Thx, Eddie



-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] Re: [PATCH 00/13] Nested Virtualization: Overview