Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Tuesday 10 March 2009 02:19:04 Jiang, Yunhong wrote:
> Christoph/Frank, do you have any comments?
>
> Thanks
> Yunhong Jiang
>
> Jiang, Yunhong <> wrote:
> > Christoph Egger <mailto:Christoph.Egger@xxxxxxx> wrote:
> >> On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote:
> >>> Christoph Egger <mailto:Christoph.Egger@xxxxxxx> wrote:
> >>>> MC_ACT_CACHE_SHIRNK  <-- typo. should be MC_ACT_CACHE_SHRINK
> >>>
> >>> Ahh, yes, I will fix it.
> >>>
> >>>> The L3 cache index disable feature works like this:
> >>>>
> >>>> You read the bits 17:6  from the MSR 0xC0000408 (which is MC4_MISC1)
> >>>> and write it into the index field. This MSR does not belong to
> >>>> the standard
> >>>> mc bank data and is therefore provided by mcinfo_extended.
> >>>> The index field are the bits 11:0 of the PCI function 3 register "L3
> >>>> Cache Index Disable".
> >>>
> >>> So what's the offset of "L3 Cache Index Disable"? Is it in 256 byte or
> >>> 4K byte?
> >>
> >> Sorry, which offset do you mean ?
> >
> > I mean the offset of this register in the PCI function's
> > configuration space. You know for a PCI device, it has 256
> > byte configuration register while PCI-E device has 4K
> > configuration register.
> > Currently xen can access the 256 byte config register already,
> > however, to support 4K range, it requires more stuff, like
> > mmconfig sparse etc. That's the reason I ask the offset of
> > this register.

Ah, I see. The registers of our memory controller are in the
PCI config space. It's no PCI-E device.

> >>> For the PCI access, I'd prefer to have xen to control all these, i.e.
> >>> even if dom0 want to disable the L3 cache, it is done through a
> >>> hypercall. The reason is, Xen control the CPU, so keep it in Xen will
> >>> make things simpler.
> >>>
> >>> Of course, it is ok for me too, if you want to keep Xen for #MC handler
> >>> and Dom0 for CE handler.
> >>
> >> We still need to define the rules to prevent interferes and
> >> clarify how to
> >> deal with Dom0/DomU going wild and breaking the rules.
> >
> > As discussed previously,  we don't need concern about DomU,
> > all configuration space access from domU will be intercepted by dom0.
> >
> > For Dom0, since currently all PCI access to 0xcf8/cfc will be
> > intercepted by Xen,  so Xen can do checking. We can achieve
> > same checking for mmconfig if remove that range from dom0. But
> > I have to say I'm not sure if we do need concern too much what
> > will happen when dom0 going wild ( after all, a crash in dom0
> > will lost everything), especially interfere on such access
> > will not cause security issue (please correct me if I'm wrong ).

This sounds like an assumption that an IOMMU is always available.


> >>>> Why is the recover action bound to the bank ?
> >>>> I would like to see a struct mcinfo_recover  rather extending
> >>>> struct mcinfo_bank.  That gives us flexibility.
> >>>
> >>> I'd get input from Frank or Gavin. Place mcinfo_recover in mcinfo_back
> >>> has advantage of keep connection of the error source and the action,
> >>> but it do make the mcinfo_bank more complex. Or we can keep the
> >>> cpu/bank information in the mcinfo_recover also, so that we keep the
> >>> flexibility and don't lose the connection.
> >>
> >> From your suggestions I prefer the last one, but is still limited due
> >> to the assumption that each struct mcinfo_bank and each struct
> >> mcinfo_extended stands for exactly one error.
> >>
> >> This assumption doesn't cover follow-up errors which may be needed to
> >> determine the real root cause. Some of them may even be ignored
> >> depending on what is going on.
> >
> > I think the assumption here is a recover action will be
> > triggered only by one bank. For example, we offline page
> > because one MC bank tell us that page is broken.

Only if the bank is the one from the memory controller.
What if the bank is the Data or Instruction Cache ?

> > The "follow-up errors" is something interesting to me, do you
> > have any example? It's ok for us to not include the back
> > information if there are such requirement.

An error in the Bus Unit can trigger a watchdog timeout
and cause a Load-Store error as a "follow-up error". This in turn
may trigger another "follow-up error" in the memory controller
or in the Data or Instruction Cache depending on what the CPU
tries to do.

I think, we should mark the 'struct mcinfo_global' as a kind of header for
each error. All following information describe the error (including the 
follow-up errors) and all recover actions. This gives us the flexibility
to get as many information as possible and allows to do
as many recover actions as necessary instead of just one.

Christoph


> >>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
> >>>>> Christoph/Frank, Followed is the interface definition, please have a
> >>>>> look.
> >>>>>
> >>>>> Thanks
> >>>>> Yunhong Jiang
> >>>>>
> >>>>> 1) Interface between Xen/dom0 for passing xen's recovery action
> >>>>> information to dom0. Usage model: After offlining broken page, Xen
> >>>>> might pass its page-offline recovery action result information to
> >>>>> dom0. Dom0 will save the information in non-volatile memory for
> >>>>> further proactive actions, such as offlining the easy-broken page
> >>>>> early when doing next reboot.
> >>>>>
> >>>>>
> >>>>> struct page_offline_action
> >>>>> {
> >>>>>     /* Params for passing the offlined page number to DOM0 */
> >>>>> uint64_t mfn; uint64_t status; /* Similar to page offline hypercall
> >>>>> */ };
> >>>>>
> >>>>> struct cpu_offline_action
> >>>>> {
> >>>>>     /* Params for passing the identity of the offlined CPU to DOM0 */
> >>>>>     uint32_t mc_socketid; uint16_t mc_coreid;
> >>>>>     uint16_t mc_core_threadid;
> >>>>> };
> >>>>>
> >>>>> struct cache_shrink_action
> >>>>> {
> >>>>>     /* TBD, Christoph, please fill it */
> >>>>> };
> >>>>>
> >>>>> /* Recover action flags, giving recovery result information to guest
> >>>>> */ /* Recovery successfully after taking certain recovery actions
> >>>>> below */ #define REC_ACT_RECOVERED      (0x1 << 0)
> >>>>> /* For solaris's usage that dom0 will take ownership when crash */
> >>>>> #define REC_ACT_RESET          (0x1 << 2)
> >>>>> /* No action is performed by XEN */
> >>>>> #define REC_ACT_INFO           (0x1 << 3)
> >>>>>
> >>>>> /* Recover action type definition, valid only when flags &
> >>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1
> >>>>> #define MC_ACT_CPU_OFFLINE   2
> >>>>> #define MC_ACT_CACHE_SHIRNK 3
> >>>>>
> >>>>> struct recovery_action
> >>>>> {
> >>>>>     uint8_t flags;
> >>>>>     uint8_t action_type;
> >>>>>     union
> >>>>>     {
> >>>>>         struct page_offline_action page_retire;
> >>>>>         struct cpu_offline_action cpu_offline;
> >>>>>         struct cache_shrink_action cache_shrink;
> >>>>>         uint8_t pad[MAX_ACTION_SIZE];
> >>>>>     } action_info;
> >>>>> }
> >>>>>
> >>>>> struct mcinfo_bank {
> >>>>>     struct mcinfo_common common;
> >>>>>
> >>>>>     uint16_t mc_bank; /* bank nr */
> >>>>>     uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on
> >>>>> dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t
> >>>>>     mc_status; /* bank status */ uint64_t mc_addr;   /* bank address,
> >>>>>                          only valid * if addr bit is set in mc_status
> >>>>> */ uint64_t mc_misc; uint64_t mc_ctrl2;
> >>>>>     uint64_t mc_tsc;
> >>>>>     /* Recovery action is performed per bank */
> >>>>>     struct recovery_action action;
> >>>>> };
> >>>>>
> >>>>> 2) Below two interfaces are for MCA processing internal use.
> >>>>>     a. pre_handler will be called earlier in MCA ISR context, mainly
> >>>>> for early need_reset detection for avoiding log missing (flag
> >>>>> MCA_RESET). Also, pre_handler might be able to find the impacted
> >>>>> domain if possible. b. mca_error_handler is actually a
> >>>>> (error_action_index,
> >>>>> recovery_handler pointer) pair. The defined recovery_handler function
> >>>>> performs the actual recovery operations in softIrq context after the
> >>>>> per_bank MCA error matching the corresponding mca_code index. If
> >>>>> pre_handler can't judge the impacted domain, recovery_handler must
> >>>>> figure it out.
> >>>>>
> >>>>> /* Error has been recovered successfully */
> >>>>> #define MCA_RECOVERD    0
> >>>>> /* Error impact one guest as stated in owner field */ #define
> >>>>> MCA_OWNER 1 /* Error can't be recovered and need reboot system */
> >>>>> #define MCA_RESET 2 /* Error should be handled in softIRQ context */
> >>>>> #define MCA_MORE_ACTION 3
> >>>>>
> >>>>> struct mca_handle_result
> >>>>> {
> >>>>>     uint32_t flags;
> >>>>>     /* Valid only when flags & MCA_OWNER */
> >>>>>     domid_d owner;
> >>>>>     /* valid only when flags & MCA_RECOVERD */
> >>>>>     struct  recovery_action *action;
> >>>>> };
> >>>>>
> >>>>> struct mca_error_handler
> >>>>> {
> >>>>>     /*
> >>>>>      * Assume we will need only architecture defined code. If the
> >>>>> index can't be setup by * mca_code, we will add a function to do the
> >>>>> (index, recovery_handler) mapping check. * This mca_code represents
> >>>>> the recovery handler pointer index for identifying this * particular
> >>>>> error's corresponding recover action */
> >>>>>     uint16_t mca_code;
> >>>>>
> >>>>>     /* Handler to be called in softIRQ handler context */
> >>>>>     int recovery_handler(struct mcinfo_bank *bank,
> >>>>>                      struct mcinfo_global *global,
> >>>>>                      struct mcinfo_extended *extention,
> >>>>>                      struct mca_handle_result *result);
> >>>>>
> >>>>> };
> >>>>>
> >>>>> struct mca_error_handler intel_mca_handler[] =
> >>>>> {
> >>>>>     ....
> >>>>> };
> >>>>>
> >>>>> struct mca_error_handler amd_mca_handler[] =
> >>>>> {
> >>>>>     ....
> >>>>> };
> >>>>>
> >>>>>
> >>>>> /* HandlVer to be called in MCA ISR in MCA context */
> >>>>> int intel_mca_pre_handler(struct cpu_user_regs *regs,
> >>>>>                                 struct mca_handle_result *result);
> >>>>>
> >>>>> int amd_mca_pre_handler(struct cpu_user_regs *regs,
> >>>>>                             struct mca_handle_result *result);
> >>>>>
> >>>>> Frank.Vanderlinden@xxxxxxx
> >>
> >> <mailto:Frank.Vanderlinden@xxxxxxx> wrote:
> >>>>>> Jiang, Yunhong wrote:
> >>>>>>> Frank/Christopher, can you please give more comments for it, or you
> >>>>>>> are OK with this? For the action reporting mechanism, we will send
> >>>>>>> out a proposal for review soon.
> >>>>>>
> >>>>>> I'm ok with this. We need a little more information on the AMD
> >>>>>> mechanism, but it seems to me that we can fit this in.
> >>>>>>
> >>>>>> Sometime this week, I'll also send out the last of our changes that
> >>>>>> haven't been sent upstream to xen-unstable yet. Maybe we can combine
> >>>>>> some things in to one patch, like the telemetry handling changes
> >>>>>> that Gavin did. The other changes are error injection (for
> >>>>>> debugging) and panic crash dump support for our FMA tools, but those
> >>>>>> are probably only interesting to us.
> >>>>>>
> >>>>>> - Frank
> >>>>
> >>>> --


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN