Thank you very much for giving me the valuable information I am asking for,
finally.
> >> Freely (at any time) preload/post-store of VMCS fields is very hard
> >> because VMX only provide access to current VMCS (because it is CPU
> >> register), while SVM may be able to access all VMCBs given that it
> >> is in memory. I can't say it is impossible (one can solve it with
> >> further limitation/complexity), however enforcing those conversion
> >> simply put trickies to VMX code to limiting the time of
> >> pre-load/post-load, and the additional cost of VMCS access.
> >
> > When the VCPU switches between host and guest mode then you need to
> > save the l1 guest state, restore the l2 guest state and vice versa.
>
> No. Switching from L1 to L2 guest, it is a simple restore of L2 guest state
> (VMCLEAR/VMPTRLD). VMX doesn't save L1 state,
So the function hooks prepare4vmentry is pretty cheap and the 'hostsave'
function hook is empty.
> while SVM does require the save of L1 state per my understanding.
SVM does not per architecture. It is the implementation I have taken over from
KVM. It will be replaced with an implementation that only requires a VMSAVE
to save the L1 state. But that won't change the patch series fundamentally as
this change touches SVM code only.
> Intel process can hold VMCS in processor for performance.
>
> Switching from L2 to L1, we need to convert (or store) some VMCS
> information from physical to virtual VMCS.
That is what can be put into the prepare4vmexit function hook.
> But it is limited and only covers the "guest state" and exit information.
> Load of L1 state may be as simple as VMPTRLD (of course it may modify some
> VMCS field upon different situation).
That is what can be put into the 'hostrestore' function hook.
> > This requires a lot of accesses from/to the vmcb/vmcs unless you have
> > a lazy switching technique, do you ?
>
> Yes, the Intel hardware already did lazy switching thru VMPTRLD.
> And the access of VMCS is depending on the L1 guest modification, only
> dirty VMCS fields needs to be updated.
Sounds like you need a 'shadow vmcs' that holds the l2 guest state.
> In the majority case, the VM exit from L2 guest will be handled by root VMM
> directly.
Same on SVM. root VMM handles everything L1 guest does not intercept plus some
special intercepts such as interrupts, nmi's, page fault/nested page faults.
> One example is external interrupt, which doesn't need to access
> rest of VMCS fields except the reason, but the wrapper makes the access a
> must, which I can't agree.
Which conversion function do you mean by 'wrapper' here ? Why does it require
additional VMCS field accesses ?
Can you explain this in detail, please ?
> >> Another portion of so called common code are actually SVM code only.
> >> Here are part of them:
> >>
> >>
> >>
> >> +
> >> +static enum nestedhvm_vmexits
> >> +nestedhvm_vmexit_msr(unsigned long *msr_bitmap, uint32_t msr,
> >> bool_t write) +{ + bool_t enabled;
> >> + unsigned long *msr_bit = NULL;
> >> +
> >> + /*
> >> + * See AMD64 Programmers Manual, Vol 2, Section 15.10 + *
> >> (MSR-Bitmap Address). + */
> >> + if ( msr <= 0x1fff )
> >> + msr_bit = msr_bitmap + 0x0000 / BYTES_PER_LONG;
> >> + else if ( (msr >= 0xc0000000) && (msr <= 0xc0001fff) )
> >> + msr_bit = msr_bitmap + 0x0800 / BYTES_PER_LONG;
> >> + else if ( (msr >= 0xc0010000) && (msr <= 0xc0011fff) )
> >> + msr_bit = msr_bitmap + 0x1000 / BYTES_PER_LONG;
> >
> > Why does above code snippet not work on Intel CPUs ?
>
> It is said even "Intel processor" is following AMD64 manual, isn;t it?
> msr_bitmap in Intel doesn't have a fixed bitmap position, rather it scan
> the entire table to decide which MSR to automatically save/restore for
> performance given that we only put single digital MSRs for that purpose.
> BTW, Intel does implement MSRs large than 0x1fff such as 0x107D0.
> Preassuming their usage model for now is risky.
Ok, I will think about this.
> >> +/* Virtual GIF */
> >> +int
> >> +nestedsvm_vcpu_clgi(struct vcpu *v)
> >> +{
> >> + if (!nestedhvm_enabled(v->domain)) {
> >> + hvm_inject_exception(TRAP_invalid_op, 0, 0);
> >> + return -1;
> >> + }
> >> +
> >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> >> + return 0;
> >> +
> >> + /* clear gif flag */
> >> + vcpu_nestedhvm(v).nh_gif = 0;
> >> + local_event_delivery_disable(); /* mask events for PV drivers */
> >> + return 0; +}
> >> +
> >> +int
> >> +nestedsvm_vcpu_stgi(struct vcpu *v)
> >> +{
> >> + if (!nestedhvm_enabled(v->domain)) {
> >> + hvm_inject_exception(TRAP_invalid_op, 0, 0);
> >> + return -1;
> >> + }
> >> +
> >> + /* Always set the GIF to make hvm_interrupt_blocked work. */
> >> + vcpu_nestedhvm(v).nh_gif = 1;
> >> +
> >> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> >> + return 0;
> >> +
> >> + local_event_delivery_enable(); /* unmask events for PV drivers */
> >> + return 0; +}
> >> +
> >
> > The reason to leave this in the generic code is what Keir stated out
> > as feedback to the second patch series:
> > http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00280.html
> > (What was patch #10 there is patch #8 in latest patch series).
>
> While, I didn;t read in this way. Even the function itself is wrapped, they
> should go to SVM tree if not go together with the caller.
>
> Anyway, to me given that nested SVM & VMX is still on the very beginning of
> development, I can only say yes to those wrappers that have clear
> understanding to both side.
Good to know. See my offer below.
> I would rather leave those uncertain wrappers to future, once the basic
> shape of nested virtualization is good and stable enough, i.e. lightweight
> wrapper. We have plenty of performance work ahead such as virtual VTd
> support, enhanced PV driver for nested etc. Excessive wrapper is simple a
> burden to nested VMX developers for those future features.
>
> Qing will post his patch today or tomorrow for your reference if you want.
Thanks. May I take code from there and add into my patch series ?
> C++ abstracts better than C, but C++ never replaces C in places like Linux.
It's not necessary. When you truely understand the concept behind interface
and implementation you can do that in any language you know.
> BTW, can you add SR-IOV test into your patch test system to avoid
> regression?
Yes.
> Thx, Eddie
--
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|