WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] 2/3: MCA/MCE correctable error handling

To: xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] [PATCH] 2/3: MCA/MCE correctable error handling
From: "Christoph Egger" <Christoph.Egger@xxxxxxx>
Date: Thu, 23 Aug 2007 09:07:21 +0200
Cc: Gavin.Maltby@xxxxxxx, Keir Fraser <keir@xxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxxxx>
Delivery-date: Thu, 23 Aug 2007 00:09:02 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C2F21F3A.14771%keir@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C2F21F3A.14771%keir@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.6
On Wednesday 22 August 2007 18:13:30 Keir Fraser wrote:
> I'm not sure you should use shared_info at all. This interface should be
> low-rate enough that you can add a sysctl (assuming this info is applicable
> for dom0 only, which it looks to be).

The polling service is dom0 only. Uncorrectable errors may also be reported
to a DomU.

> Actually, probably a platform_op  rather than a sysctl...

What you propose is to copy the MCA info from Xen to guest?
The MCA info structure will be used for both correctable and uncorrectable 
error notification (as I said in my first patch 0/3 mail).

I assumed you agreed on all this as is in the patches, since you did not 
object/comment when this was discussed on this list in the "MCA/MCE concept" 
topic (and also your questions in your patch 3/3 mail were discussed there).
The purpose of this discussion was to find an agreement on how to proceed.

Therefore, all what I expected from you were some questions/comments about 
implementation details and no design/concept questions.


Christoph


>  -- Keir
>
> On 22/8/07 16:47, "Christoph Egger" <Christoph.Egger@xxxxxxx> wrote:
> > I fixed all this in my code and will re-submit the new patch.
> >
> > Christoph
> >
> > On Wednesday 22 August 2007 12:01:26 Jan Beulich wrote:
> >>>>> "Christoph Egger" <Christoph.Egger@xxxxxxx> 22.08.07 10:47 >>>
> >>>
> >>> On Tuesday 21 August 2007 17:53:45 Jan Beulich wrote:
> >>>>> +} __attribute__((packed));
> >>>>
> >>>> I think it was a general agreement that it is not a good idea
> >>>> (non-portable to non-GNU compilers) to put things like this in public
> >>>> headers. I think by properly ordering things you can get away without
> >>>> this (and almost without padding).
> >>>
> >>> Oh, you're right. I should use #pragma pack(1)  instead.
> >>> Is this fine with you?
> >>
> >> I'm afraid that wouldn't work with the compat mode header generation, as
> >> the original use of #pragma pack(push, ...) was banned for (Sun)
> >> compatibility reasons. Consequently, I believe the only way of doing
> >> this properly is to avoid depending on compiler behavior by arranging
> >> things appropriately (including padding members if needed) and avoiding
> >> #pragma pack() altogether.
> >>
> >>>>> +struct mcinfo_global {
> >>>>> ...
> >>>>> +    uint16_t mc_socketid;
> >>>>> +    uint16_t mc_coreid;
> >>>>> +    uint16_t mc_vcpu_id;
> >>>>
> >>>> Unless mc_vcpu_id is intended for that purpose, this neglects
> >>>> hyperthreading (I know, AMD doesn't use this, but the interface should
> >>>> be vendor neutral).
> >>>>
> >>>> If mc_vcpu_id is meant for that purpose, its naming is ambiguous.
> >>>>
> >>>> If mc_vcpu_id is meant as a vcpuid, then ordering things os that
> >>>> mc_domid and mc_vcpu_id are contiguous would seem more natural (making
> >>>> eventual examination in raw memory less confusing).
> >>>
> >>> mc_coreid is the physical core that reported the machine check
> >>> information. mc_socketid is the physical socket the physical core is
> >>> in. This is useful to find all other affected physical cores, when an
> >>> error in the L3-Cache is reported that is shared over all cores in the
> >>> chip.
> >>>
> >>> mc_vcpu_id is the id of the active vcpu for the domain, that reported
> >>> the machine check information. Inside Xen, this is current->vcpu_id.
> >>> mc_vcpu_id is needed to deal properly with the upcoming NUMA support my
> >>> collegue is working on.
> >>
> >> Okay, but then you're really lacking a thread id here, for being filled
> >> by respective Intel code (AMD code would set this to zero).
> >>
> >>>>> +/* sizeof(struct mcinfo_global) + 6 * sizeof(struct mcinfo_bank) ==
> >>>>> 200. + * This is enough space to store mc information of up to six
> >>>>> banks. + */
> >>>>> +#define MCINFO_MAXSIZE (204 - sizeof(uint32_t))
> >>>>
> >>>> Why don't you use the sizeof()-s from the description? If this is
> >>>> really needed for some reason, then having 200 in the description and
> >>>> 204 in the macro is at least confusing...
> >>>
> >>> MCINFO_MAXSIZE is the size for the content of the mi_data array.
> >>> MCINFO_MAXSIZE + sizeof(mi_nentries) == 204. That is where is number
> >>> comes from.
> >>
> >> I concluded that, but pointed out that seeing two different numbers made
> >> me think of why this is so, whereas identical numbers would have avoided
> >> that (and will likely keep others from asking later, too).
> >>
> >> Also, you don't say what was the reason for you to use constants instead
> >> of sizeof() here.
> >>
> >>>>>     /* Frame containing list of mfns containing list of mfns
> >>>>> containing p2m. */ xen_pfn_t     pfn_to_mfn_frame_list_list;
> >>>>>     unsigned long nmi_reason;
> >>>>> +    struct arch_mc_info mc_info; /* machine check information */
> >>>>>     uint64_t pad[32];
> >>>>> };
> >>>>
> >>>> Are you sure it is appropriate to add a member here without reducing
> >>>> the padding member?
> >>>
> >>> You want me to replace "uint64_t pad[32];" with "uint32_t pad[13];" ?
> >>
> >> It would be my understanding that that's the right thing to do (assuming
> >> you calculated the numbers correctly), but I'd rely on Keir to give a
> >> final word on this.
> >>
> >> Jan



-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel