xen-devel
RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
To: |
Christoph Egger <Christoph.Egger@xxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> |
Subject: |
RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen |
From: |
"Ke, Liping" <liping.ke@xxxxxxxxx> |
Date: |
Wed, 17 Sep 2008 21:14:59 +0800 |
Accept-language: |
en-US |
Acceptlanguage: |
en-US |
Cc: |
"Tian, Kevin" <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Gavin Maltby <Gavin.Maltby@xxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>, Haitao Shan <maillists.shan@xxxxxxxxx> |
Delivery-date: |
Wed, 17 Sep 2008 06:15:46 -0700 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<200809171143.32398.Christoph.Egger@xxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<C4EEE682.2707B%keir.fraser@xxxxxxxxxxxxx> <48D0C868.76E4.0078.0@xxxxxxxxxx> <E2263E4A5B2284449EEBD0AAB751098401ABBE479B@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <200809171143.32398.Christoph.Egger@xxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
Thread-index: |
AckYqfbmTx3tPrAaQmW3RXRU0ywypgAGgJtQ |
Thread-topic: |
[Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen |
Hi, Egger
Thanks a lot about your answer. Just look through your patch and want to get
some help:
1. When MCE happens, will all of the cores (even in different socket) all be
brought into MCE handler in AMD platform (such as K8) too? So every core will
enter and execute k8_machine_check handler? When mce happened, this handler
will enter N (number of cores) times.
2. When doing send_guest_trap, if dom0->vcpu0->processor = 0 while in
nmi_mce_softirq, cur_cpu = 1, when set affinity, we should bind dom0->vcpu0
with cur_cpu 1 instead of its original bindings [cpu_set(cpu, affinity) vs
cpu_set(st->processor, affinity)]? Otherwise the affinity has e no changes and
need not restore? Not sure about this.
3. If several vcpus are running (belongs dom0 or other domains) when MCA
happens, if we don't pause other vcpus except dom0.vcpu0 and let them into
idle, maybe we can't make sure that those vcpus will still be scheduled in an
unstable environment? At the same time, other pcpu might be in k8_machine_check
handler concurrently?
Thanks a lot for your answer!
Regards,
Criping
-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Christoph Egger
Sent: 2008年9月17日 17:44
To: Jiang, Yunhong
Cc: Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Shan, Haitao; Gavin Maltby;
Keir Fraser; Haitao Shan
Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
On Wednesday 17 September 2008 11:20:57 Jiang, Yunhong wrote:
> >-----Original Message-----
> >From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich
> >Sent: 2008年9月17日 15:06
> >To: Christoph Egger; Gavin Maltby
> >Cc: Haitao Shan; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx;
> >Shan, Haitao; Keir Fraser
> >Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline
> >support in Xen
> >
> >>>> Gavin Maltby <Gavin.Maltby@xxxxxxx> 17.09.08 06:17 >>>
> >>
> >>I don't see this as a problem for machine check correctness.
> >>
> >>If dom0 asks to offline a cpu (because it believes the cpu is
> >
> >busted and
> >
> >>a threat to uptime), that decision is fundamentally asynchronous
> >>to the actual error handling that occured at machine check exception
> >>time:
> >>
> >> - running in whatever context
> >> - MCE occurs
> >> - trap to hypervisor MCE handler
> >> . this decides on hypervisor panic, or other appropriate
> >> immediate (in handler) response
> >> . telemetry forwarded to dom0 for logging and analysis
> >> - assume no hypervisor panic
> >> - eons pass during which any unconstrained bad data remaining
> >> after initial handling may go anywhere
> >> - dom0 gets telemetry and let's say diagnoses a fault and
> >> decides to call back into the hypervisor to offline the
> >> offending cpu
> >>
> >>Note the "eons pass" bit; tonnes of instructions may run on the
> >>bad cpu in this time, and a few more for some offline delay won't
> >>hurt.
> >
> >Shouldn't this possibly be handled the other way around: If a
> >recoverable
> >MCE happened, immediately stop scheduling anything on the affected
> >CPU(s), until Dom0 tells you otherwise (and of course as long as there
> >remains at least one CPU to run on).
>
> Current MCE handling in Xen has no mechanism to achieve this.
It has since c/s 17968.
Christoph
--
AMD Saxony, Dresden, Germany
Operating System Research Center
Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
Dr. Hans-R. Deppe, Thomas McCoy
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, (continued)
- Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Gavin Maltby
- Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Jan Beulich
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Jiang, Yunhong
- Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Christoph Egger
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen,
Ke, Liping <=
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Jiang, Yunhong
- Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Keir Fraser
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Jiang, Yunhong
- Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Keir Fraser
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Jiang, Yunhong
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Shan, Haitao
- Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Keir Fraser
- RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen, Shan, Haitao
|
|
|