This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen

To: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Christoph Egger <Christoph.Egger@xxxxxxx>
Subject: RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
From: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>
Date: Thu, 18 Sep 2008 23:17:20 +0800
Accept-language: en-US
Acceptlanguage: en-US
Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>, Haitao Shan <maillists.shan@xxxxxxxxx>
Delivery-date: Thu, 18 Sep 2008 08:18:31 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <C4F7D9D0.27284%keir.fraser@xxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <E2263E4A5B2284449EEBD0AAB751098401ABC43AD4@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <C4F7D9D0.27284%keir.fraser@xxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AckYqfCMO5ppaQXoT/qQARCqfu4JZAAlReQAAAgCmh0AAD9SAAADnh1gAAv4C7A=
Thread-topic: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
Keir Fraser <mailto:keir.fraser@xxxxxxxxxxxxx> wrote:
> On 18/9/08 09:13, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>>> Hmm, I think current NMI_MCE_SOFTIRQ can't make sure other guest will
>>>> not be scheduled. Considering there is a schedule softirq already
>>>> pending on the pCPU, other guest may run before the impacted guest. Did
>>>> I missed anything?
>>> There are races here in any case. What if #MC happens halfway through the
>>> scheduler, just before set_current(new)?
>> If MCE handler will not cause schedule and not change current, will any
>> issue happen?
> I'm not sure exactly what you mean. What *I* meant was that there are
> certain points during execution where, if a #MC occurs, it may not be
> possible to determine which single vCPU was running on the

Current implementation on k8_machine_check, it determine xen_impacted through 
if current is idel domain. And it determine which domain is impacted through 
current. I have no idea of AMD's machine check mechanism, but when considering 
support on intel platform, it may be a bit different. For example, xen is 
impacted if MCE caused by sync event happens in Xen's context, even is not in 
idel domain. Also impacted domain may not always determined by current, memory 
ownership may help to decide impacted domain.

Another difference we are considering is, we suppose domU's MCA handler is not 
trusted, so firstly, we may always need dom0's MCE handler support, secondly, 
after domU MCE handler, some guard may be needed to make sure no error 
triggered again.

> pCPU. I guess
> though that if you ever get unrecoverable errors reported while running
> inside the hypervisor, you probably can't recover anyway.

I think this may depends on the error type. If the error is an async event, it 
may be ok to continue after some containment. For example, if EIPV=0, RIPV=1, 
and ADDRV =1 and happens to xen's execution context, it may because of some 
async event to the memory side, in that situtaion, we can kill the owner of the 
page (if that page is owned exclusively by one guest) and continue run. 
However, if it is a sync event like EIPV=1, then we have to reset the system.

> -- Keir

Xen-devel mailing list