>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 11.09.08 12:54 >>>
>On 10/9/08 15:35, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:
>> The major issue with supporting a significantly larger number of physical
>> CPUs appears to be the use of per-CPU GDT entries - at present, x86-64
>> could support only up to 126 CPUs (with code changes to also use the
>> top-most GDT page, that would be 254). Instead of trying to go with
>> incremental steps here, by converting the GDT itself to be per-CPU,
>> limitations in that respect go away entirely.
>Firstly, we don't really need the LDT and TSS GST slots to be always valid.
>Actually we always initialise the slot immediately before LTR or LLDT. So we
>could even have per-CPU LDT and TSS initialisation share a single slot.
>Then, with the extra reserved page, we'd be good for nearly 512 CPUs.
No, this would break 32-bits at least: The GDT entry for the selector
loaded into TR must remain a valid, busy TSS descriptor for the whole
lifetime of the system. So it can't be shared with the LDT. But even for
64-bits I would fear using the same GDT slot for both LDT and GDT
>Secondly: Actually your patch looks not too bad. But the double LGDT in
>context switch is nasty. But also I do not see why it is necessary?
>Presumably your fear is about using the prev->vcpu_id's mapped GDT in
>next->vcpu_id's page tables? But we should only be relying on GDT entries
>(HYPERVISOR_CS, HYPERVISOR_DS, for example) which are identical in all
>per-CPU GDTs. So why do you need to add that LGDT before CR3 switch at all?
The goal is that the per-CPU descriptor be valid at all times (see the
check_cpu() calls I put in there for debugging). As the double fault handlers
have no way of deriving the current processor other than from that GDT
entry (actually, I think x86-64 could, but didn't so far, so I didn't change
that now), they'd break during that window. While you may argue that
double faults are rare, my point here is that if we ever see one, analyzing
its dump shouldn't be made more difficult than it likely already will be.
>You would need to use l1e_write_atomic() in the context-switch code, to make
>sure all VCPU's hypervisor reserved GDT mappings are always valid. Actually
>you must at least use l1e_write() in any case -- it is not safe to not use
>one of those macros on a live pagetable (by which I mean possibly in use by
>some CPU) because a direct write of a PAE pte is not atomic and can cause
>the pte to pass through a bogus intermediate state (which could be bogusly
>prefetched by a CPU into its TLB. Yuk!).
Ah, yes. l1e_write() should be sufficient, though, as the slot(s) that get(s)
written cannot be validly in use on any CPU (for other than speculation).
Xen-devel mailing list