xen-devel
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
To: |
"Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> |
Subject: |
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN |
From: |
Christoph Egger <Christoph.Egger@xxxxxxx> |
Date: |
Thu, 5 Mar 2009 15:53:33 +0100 |
Cc: |
"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>, "Ke, Liping" <liping.ke@xxxxxxxxx>, "Frank.Vanderlinden@xxxxxxx" <Frank.Vanderlinden@xxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Kleen, Andi" <andi.kleen@xxxxxxxxx> |
Delivery-date: |
Thu, 05 Mar 2009 06:54:24 -0800 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<E2263E4A5B2284449EEBD0AAB751098401C7CE8D98@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<C5BF30B3.2C2B%keir.fraser@xxxxxxxxxxxxx> <49AC1BA8.3090302@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7CE8D98@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
User-agent: |
KMail/1.9.7 |
MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK
The L3 cache index disable feature works like this:
You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1)
and write it into the index field. This MSR does not belong to the standard
mc bank data and is therefore provided by mcinfo_extended.
The index field are the bits 11:0 of the PCI function 3 register
"L3 Cache Index Disable".
Why is the recover action bound to the bank ?
I would like to see a struct mcinfo_recover rather extending
struct mcinfo_bank. That gives us flexibility.
Christoph
On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
> Christoph/Frank, Followed is the interface definition, please have a look.
>
> Thanks
> Yunhong Jiang
>
> 1) Interface between Xen/dom0 for passing xen's recovery action information
> to dom0. Usage model: After offlining broken page, Xen might pass its
> page-offline recovery action result information to dom0. Dom0 will save the
> information in non-volatile memory for further proactive actions, such as
> offlining the easy-broken page early when doing next reboot.
>
>
> struct page_offline_action
> {
> /* Params for passing the offlined page number to DOM0 */
> uint64_t mfn;
> uint64_t status; /* Similar to page offline hypercall */
> };
>
> struct cpu_offline_action
> {
> /* Params for passing the identity of the offlined CPU to DOM0 */
> uint32_t mc_socketid;
> uint16_t mc_coreid;
> uint16_t mc_core_threadid;
> };
>
> struct cache_shrink_action
> {
> /* TBD, Christoph, please fill it */
> };
>
> /* Recover action flags, giving recovery result information to guest */
> /* Recovery successfully after taking certain recovery actions below */
> #define REC_ACT_RECOVERED (0x1 << 0)
> /* For solaris's usage that dom0 will take ownership when crash */
> #define REC_ACT_RESET (0x1 << 2)
> /* No action is performed by XEN */
> #define REC_ACT_INFO (0x1 << 3)
>
> /* Recover action type definition, valid only when flags &
> REC_ACT_RECOVERED */
> #define MC_ACT_PAGE_OFFLINE 1
> #define MC_ACT_CPU_OFFLINE 2
> #define MC_ACT_CACHE_SHIRNK 3
>
> struct recovery_action
> {
> uint8_t flags;
> uint8_t action_type;
> union
> {
> struct page_offline_action page_retire;
> struct cpu_offline_action cpu_offline;
> struct cache_shrink_action cache_shrink;
> uint8_t pad[MAX_ACTION_SIZE];
> } action_info;
> }
>
> struct mcinfo_bank {
> struct mcinfo_common common;
>
> uint16_t mc_bank; /* bank nr */
> uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
> * and if mc_addr is valid. Never valid on DomU. */
> uint64_t mc_status; /* bank status */
> uint64_t mc_addr; /* bank address, only valid
> * if addr bit is set in mc_status */
> uint64_t mc_misc;
> uint64_t mc_ctrl2;
> uint64_t mc_tsc;
> /* Recovery action is performed per bank */
> struct recovery_action action;
> };
>
> 2) Below two interfaces are for MCA processing internal use.
> a. pre_handler will be called earlier in MCA ISR context, mainly for
> early need_reset detection for avoiding log missing (flag MCA_RESET).
> Also, pre_handler might be able to find the impacted domain if possible.
> b. mca_error_handler is actually a (error_action_index,
> recovery_handler pointer) pair. The defined recovery_handler function
> performs the actual recovery operations in softIrq context after the
> per_bank MCA error matching the corresponding mca_code index. If
> pre_handler can't judge the impacted domain, recovery_handler must figure
> it out.
>
> /* Error has been recovered successfully */
> #define MCA_RECOVERD 0
> /* Error impact one guest as stated in owner field */
> #define MCA_OWNER 1
> /* Error can't be recovered and need reboot system */
> #define MCA_RESET 2
> /* Error should be handled in softIRQ context */
> #define MCA_MORE_ACTION 3
>
> struct mca_handle_result
> {
> uint32_t flags;
> /* Valid only when flags & MCA_OWNER */
> domid_d owner;
> /* valid only when flags & MCA_RECOVERD */
> struct recovery_action *action;
> };
>
> struct mca_error_handler
> {
> /*
> * Assume we will need only architecture defined code. If the index
> can't be setup by * mca_code, we will add a function to do the (index,
> recovery_handler) mapping check. * This mca_code represents the recovery
> handler pointer index for identifying this * particular error's
> corresponding recover action
> */
> uint16_t mca_code;
>
> /* Handler to be called in softIRQ handler context */
> int recovery_handler(struct mcinfo_bank *bank,
> struct mcinfo_global *global,
> struct mcinfo_extended *extention,
> struct mca_handle_result *result);
>
> };
>
> struct mca_error_handler intel_mca_handler[] =
> {
> ....
> };
>
> struct mca_error_handler amd_mca_handler[] =
> {
> ....
> };
>
>
> /* HandlVer to be called in MCA ISR in MCA context */
> int intel_mca_pre_handler(struct cpu_user_regs *regs,
> struct mca_handle_result *result);
>
> int amd_mca_pre_handler(struct cpu_user_regs *regs,
> struct mca_handle_result *result);
>
> Frank.Vanderlinden@xxxxxxx <mailto:Frank.Vanderlinden@xxxxxxx> wrote:
> > Jiang, Yunhong wrote:
> >> Frank/Christopher, can you please give more comments for it, or you are
> >> OK with this? For the action reporting mechanism, we will send out a
> >> proposal for review soon.
> >
> > I'm ok with this. We need a little more information on the AMD
> > mechanism, but it seems to me that we can fit this in.
> >
> > Sometime this week, I'll also send out the last of our changes that
> > haven't been sent upstream to xen-unstable yet. Maybe we can combine
> > some things in to one patch, like the telemetry handling changes that
> > Gavin did. The other changes are error injection (for debugging) and
> > panic crash dump support for our FMA tools, but those are probably only
> > interesting to us.
> >
> > - Frank
--
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN,
Christoph Egger <=
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden
|
|
|