This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen

To: Jan Beulich <jbeulich@xxxxxxxxxx>, Christoph Egger <Christoph.Egger@xxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>
Subject: RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Wed, 17 Sep 2008 17:20:57 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Fraser <keir.fraser@xxxxxxxxxxxxx>, Keir, Haitao Shan <maillists.shan@xxxxxxxxx>
Delivery-date: Wed, 17 Sep 2008 02:21:26 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <48D0C868.76E4.0078.0@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <C4EEE682.2707B%keir.fraser@xxxxxxxxxxxxx> <200809111623.11316.Christoph.Egger@xxxxxxx> <48D084BE.5050602@xxxxxxx> <48D0C868.76E4.0078.0@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AckYk8stouMBowU1TdmaHxIQDEvV5wAEcw6g
Thread-topic: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen

>-----Original Message-----
>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich
>Sent: 2008年9月17日 15:06
>To: Christoph Egger; Gavin Maltby
>Cc: Haitao Shan; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx;
>Shan, Haitao; Keir Fraser
>Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline
>support in Xen
>>>> Gavin Maltby <Gavin.Maltby@xxxxxxx> 17.09.08 06:17 >>>
>>I don't see this as a problem for machine check correctness.
>>If dom0 asks to offline a cpu (because it believes the cpu is
>busted and
>>a threat to uptime), that decision is fundamentally asynchronous
>>to the actual error handling that occured at machine check exception
>>  - running in whatever context
>>  - MCE occurs
>>  - trap to hypervisor MCE handler
>>       . this decides on hypervisor panic, or other appropriate
>>         immediate (in handler) response
>>       . telemetry forwarded to dom0 for logging and analysis
>>  - assume no hypervisor panic
>>  - eons pass during which any unconstrained bad data remaining
>>    after initial handling may go anywhere
>>  - dom0 gets telemetry and let's say diagnoses a fault and
>>    decides to call back into the hypervisor to offline the
>>    offending cpu
>>Note the "eons pass" bit;  tonnes of instructions may run on the
>>bad cpu in this time, and a few more for some offline delay won't
>Shouldn't this possibly be handled the other way around: If a
>MCE happened, immediately stop scheduling anything on the affected
>CPU(s), until Dom0 tells you otherwise (and of course as long as there
>remains at least one CPU to run on).

Current MCE handling in Xen has no mechanism to achieve this, agree that some 
initial containment in Xen is needed to reduce the possibility of second MCE, ( 
will the program locality cause such situation?)
What we are thinking is, when MCA handler happen, all domain's vcpu except 
dom0's vcpu0 need be bring into xen's execution context.

>Xen-devel mailing list
Xen-devel mailing list
<Prev in Thread] Current Thread [Next in Thread>