xen-devel
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
To: |
"Frank.Vanderlinden@xxxxxxx" <Frank.Vanderlinden@xxxxxxx> |
Subject: |
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN |
From: |
"Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> |
Date: |
Thu, 5 Mar 2009 16:31:27 +0800 |
Accept-language: |
en-US |
Acceptlanguage: |
en-US |
Cc: |
Christoph Egger <Christoph.Egger@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Ke, Liping" <liping.ke@xxxxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, "Kleen, Andi" <andi.kleen@xxxxxxxxx> |
Delivery-date: |
Thu, 05 Mar 2009 00:32:58 -0800 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<49AC1BA8.3090302@xxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<C5BF30B3.2C2B%keir.fraser@xxxxxxxxxxxxx> <49A45CF0.6080807@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7B6E888@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200902251319.29299.Christoph.Egger@xxxxxxx> <49A580C0.7050501@xxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401C7C59202@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <49AC1BA8.3090302@xxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Thread-index: |
AcmbXwgSG6ZN2R4KQqOX+5VH/tqchQCDTOzA |
Thread-topic: |
[Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN |
Christoph/Frank, Followed is the interface definition, please have a look.
Thanks
Yunhong Jiang
1) Interface between Xen/dom0 for passing xen's recovery action information to
dom0.
Usage model: After offlining broken page, Xen might pass its page-offline
recovery action
result information to dom0. Dom0 will save the information in non-volatile
memory for further
proactive actions, such as offlining the easy-broken page early when doing
next reboot.
struct page_offline_action
{
/* Params for passing the offlined page number to DOM0 */
uint64_t mfn;
uint64_t status; /* Similar to page offline hypercall */
};
struct cpu_offline_action
{
/* Params for passing the identity of the offlined CPU to DOM0 */
uint32_t mc_socketid;
uint16_t mc_coreid;
uint16_t mc_core_threadid;
};
struct cache_shrink_action
{
/* TBD, Christoph, please fill it */
};
/* Recover action flags, giving recovery result information to guest */
/* Recovery successfully after taking certain recovery actions below */
#define REC_ACT_RECOVERED (0x1 << 0)
/* For solaris's usage that dom0 will take ownership when crash */
#define REC_ACT_RESET (0x1 << 2)
/* No action is performed by XEN */
#define REC_ACT_INFO (0x1 << 3)
/* Recover action type definition, valid only when flags & REC_ACT_RECOVERED */
#define MC_ACT_PAGE_OFFLINE 1
#define MC_ACT_CPU_OFFLINE 2
#define MC_ACT_CACHE_SHIRNK 3
struct recovery_action
{
uint8_t flags;
uint8_t action_type;
union
{
struct page_offline_action page_retire;
struct cpu_offline_action cpu_offline;
struct cache_shrink_action cache_shrink;
uint8_t pad[MAX_ACTION_SIZE];
} action_info;
}
struct mcinfo_bank {
struct mcinfo_common common;
uint16_t mc_bank; /* bank nr */
uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
* and if mc_addr is valid. Never valid on DomU. */
uint64_t mc_status; /* bank status */
uint64_t mc_addr; /* bank address, only valid
* if addr bit is set in mc_status */
uint64_t mc_misc;
uint64_t mc_ctrl2;
uint64_t mc_tsc;
/* Recovery action is performed per bank */
struct recovery_action action;
};
2) Below two interfaces are for MCA processing internal use.
a. pre_handler will be called earlier in MCA ISR context, mainly for early
need_reset
detection for avoiding log missing (flag MCA_RESET). Also, pre_handler
might
be able to find the impacted domain if possible.
b. mca_error_handler is actually a (error_action_index, recovery_handler
pointer) pair.
The defined recovery_handler function performs the actual recovery
operations in
softIrq context after the per_bank MCA error matching the corresponding
mca_code index.
If pre_handler can't judge the impacted domain, recovery_handler must
figure it out.
/* Error has been recovered successfully */
#define MCA_RECOVERD 0
/* Error impact one guest as stated in owner field */
#define MCA_OWNER 1
/* Error can't be recovered and need reboot system */
#define MCA_RESET 2
/* Error should be handled in softIRQ context */
#define MCA_MORE_ACTION 3
struct mca_handle_result
{
uint32_t flags;
/* Valid only when flags & MCA_OWNER */
domid_d owner;
/* valid only when flags & MCA_RECOVERD */
struct recovery_action *action;
};
struct mca_error_handler
{
/*
* Assume we will need only architecture defined code. If the index can't
be setup by
* mca_code, we will add a function to do the (index, recovery_handler)
mapping check.
* This mca_code represents the recovery handler pointer index for
identifying this
* particular error's corresponding recover action
*/
uint16_t mca_code;
/* Handler to be called in softIRQ handler context */
int recovery_handler(struct mcinfo_bank *bank,
struct mcinfo_global *global,
struct mcinfo_extended *extention,
struct mca_handle_result *result);
};
struct mca_error_handler intel_mca_handler[] =
{
....
};
struct mca_error_handler amd_mca_handler[] =
{
....
};
/* HandlVer to be called in MCA ISR in MCA context */
int intel_mca_pre_handler(struct cpu_user_regs *regs,
struct mca_handle_result *result);
int amd_mca_pre_handler(struct cpu_user_regs *regs,
struct mca_handle_result *result);
Frank.Vanderlinden@xxxxxxx <mailto:Frank.Vanderlinden@xxxxxxx> wrote:
> Jiang, Yunhong wrote:
>> Frank/Christopher, can you please give more comments for it, or you are OK
>> with this? For the action reporting mechanism, we will send out a proposal
>> for review soon.
>
> I'm ok with this. We need a little more information on the AMD
> mechanism, but it seems to me that we can fit this in.
>
> Sometime this week, I'll also send out the last of our changes that
> haven't been sent upstream to xen-unstable yet. Maybe we can combine
> some things in to one patch, like the telemetry handling changes that
> Gavin did. The other changes are error injection (for debugging) and
> panic crash dump support for our FMA tools, but those are probably only
> interesting to us.
>
> - Frank
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN,
Jiang, Yunhong <=
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Christoph Egger
- RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Jiang, Yunhong
- Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN, Frank van der Linden
|
|
|