WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-ia64-devel

Re: [Xen-ia64-devel] [RFC] MCA handler support for Xen/ia64

To: Alex Williamson <alex.williamson@xxxxxx>
Subject: Re: [Xen-ia64-devel] [RFC] MCA handler support for Xen/ia64
From: Masaki Kanno <kanno.masaki@xxxxxxxxxxxxxx>
Date: Fri, 28 Jul 2006 21:12:15 +0900
Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Fri, 28 Jul 2006 05:13:13 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <1152766770.5675.61.camel@lappy>
List-help: <mailto:xen-ia64-devel-request@lists.xensource.com?subject=help>
List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
List-post: <mailto:xen-ia64-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-ia64-devel>, <mailto:xen-ia64-devel-request@lists.xensource.com?subject=unsubscribe>
References: <8DC6A596BD1093kanno.masaki@xxxxxxxxxxxxxx> <1152766770.5675.61.camel@lappy>
Sender: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi Alex,

We are awfully sorry to have kept you waiting for a long time.

>Hi Masaki,
>
>   Thanks for the write-up, generally looks like a good approach to me.
>A few comments and questions:
>
>   How do you plan to handle the mismatch between dom0's vCPUs and the
>pCPUs reporting errors.  For instance, will all pCPU's CMCs be injected
>into dom0 vCPU0?  Will all CPE records be returned from all pCPUs when
>dom0 does a SAL_GET_STATE_INFO from vCPU0?  SAL_GET_STATE_INFO_SIZE may
>need to return the platform state info size * number of pCPUs to allow
>dom0 enough space to save the records.  On big SMP systems we need to
>make sure that's not more than can reasonable be allocated in the kernel
>by dom0.
>

Our design is to inject all CMC/CPEs into dom0 vcpu0. I think this is 
sufficient because our goal of this initial support is logging of 
hardware error, not recovery. See detailed flow below.

  Step1: Xen receives CMC/CPE interrupt(1)(2) from each pCPUs, and 
         queues(3)(4) these interrupts.

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | +----------------------------------+ |
    +--------------------------------------+
    | +-vCPU0-+                            |Xen
    | |       |  +-------+  +-------+      |
    | |      ----> pCPU0 ---> pCPU1 |      |
    | |       |  +-------+  +-------+      |
    | +-------+  |status |  |status |      |
    |            +-------+  +-------+      |
    |                   A          A       |
    | +-CMC/CPE handler-|----------|-----+ |
    | |                 |(3)       |(4)  | |
    | |            queues interupts      | |
    | |           with a handling state  | |
    | |         A                A       | |
    | +---------|----------------|-------+ |
    +-----------|----------------|---------+
    | +-pCPU0-+ |      +-pCPU1-+ |         |Hardware
    | |   (1)---+      |   (2)---+         |
    | +-+-----+        +-+-----+           |
    |   |                |                 |
    | +-+-----+        +-+-----+           |
    | |record0|        |record1|           |
    | +-------+        +-------+           |
    +--------------------------------------+

  Step2: Inject(5) a CMC/CPE into dom0 vCPU0 in turn. 
         Then dom0 issues(6) SAL_GET_STATE_INFO.

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | | SAL_GET_STATE_INFO               | |
    | | (6)                              | |
    | |  |      A                        | |
    | +--|------|------------------------+ |
    +--(trap)---|--------------------------+
    |    |      |                          |Xen
    |    V      |                          |
    | +-vCPU0-+ |                          |
    | |   (5)---+                          |
    | |       |  +-------+  +-------+      |
    | |      ----> pCPU0 ---> pCPU1 |      |
    | |       |  +-------+  +-------+      |
    | +-------+  |status |  |status |      |
    |            +-------+  +-------+      |
    +--------------------------------------+
    | +-pCPU0-+        +-pCPU1-+           |Hardware
    | |       |        |       |           |
    | +-+-----+        +-+-----+           |
    |   |                |                 |
    | +-+-----+        +-+-----+           |
    | |record0|        |record1|           |
    | +-------+        +-------+           |
    +--------------------------------------+

  Step3: Xen traps this SAL call.
         If the pCPU to get SAL record is the same as the vCPU, 
         then Xen issues(7) a normal SAL call to the pCPU.
         Xen copies(8) SAL record to dom0.

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | |        (8) Get SAL record        | |
    | |         A                        | |
    | +---------|------------------------+ |
    +-----------|--------------------------+
    | +-vCPU0-+ |                          |Xen
    | |       | |                          |
    | |       | |  +-------+  +-------+    |
    | |      ---|--> pCPU0 ---> pCPU1 |    |
    | |       | |  +-------+  +-------+    |
    | +-------+ |  |status |  |status |    |
    |           |  +-------+  +-------+    |
    |           |                          |
    | SAL_GET_STATE_INFO                   |
    |   (7)     |                          |
    |    |   [Buffer]                      |
    |    |      A                          |
    +----|------|--------------------------+
    |    V      |                          |Hardware
    | +-pCPU0-+ |      +-pCPU1-+           |
    | |       | |      |       |           |
    | +-+-----+ |      +-+-----+           |
    |   |       |        |                 |
    | +-+-----+ |      +-+-----+           |
    | |record0+-+      |record1|           |
    | +-------+        +-------+           |
    +--------------------------------------+

  Step4: Dom0 issues(9) SAL_CLEAR_STATE_INFO. 

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | | SAL_CLEAR_STATE_INFO             | |
    | | (9)                              | |
    | |  |                               | |
    | +--|-------------------------------+ |
    +--(trap)------------------------------+
    |    |                                 |Xen
    |    V                                 |
    | +-vCPU0-+                            |
    | |       |  +-------+  +-------+      |
    | |      ----> pCPU0 ---> pCPU1 |      |
    | |       |  +-------+  +-------+      |
    | +-------+  |status |  |status |      |
    |            +-------+  +-------+      |
    +--------------------------------------+
    | +-pCPU0-+        +-pCPU1-+           |Hardware
    | |       |        |       |           |
    | +-+-----+        +-+-----+           |
    |   |                |                 |
    | +-+-----+        +-+-----+           |
    | |record0|        |record1|           |
    | +-------+        +-------+           |
    +--------------------------------------+

  Step5: Xen traps this SAL call.
         If the pCPU to clear SAL record is the same as the vCPU, 
         then Xen issues(10) a normal SAL call to the pCPU.
         Xen frees(11) pCPU0 information.

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | +----------------------------------+ |
    +--------------------------------------+
    | +-vCPU0-+                            |Xen
    | |       |                            |
    | |       |               +-------+    |
    | |      -----------------> pCPU1 |    |
    | |       |     (11)      +-------+    |
    | +-------+               |status |    |
    |                         +-------+    |
    | SAL_CLEAR_STATE_INFO                 |
    |   (10)                               |
    +----|---------------------------------+
    |    V                                 |Hardware
    | +-pCPU0-+        +-pCPU1-+           |
    | |       |        |       |           |
    | +-------+        +-+-----+           |
    |                    |                 |
    |                  +-+-----+           |
    |                  |record1|           |
    |                  +-------+           |
    +--------------------------------------+

  Step6: Inject(12) the next CMC/CPE into dom0 vCPU0. 
         Then dom0 issues(13) SAL_GET_STATE_INFO.

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | | SAL_GET_STATE_INFO               | |
    | | (13)                             | |
    | |  |      A                        | |
    | +--|------|------------------------+ |
    +--(trap)---|--------------------------+
    |    |      |                          |Xen
    |    V      |                          |
    | +-vCPU0-+ |                          |
    | |  (12)---+                          |
    | |       |             +-------+      |
    | |      ---------------> pCPU1 |      |
    | |       |             +-------+      |
    | +-------+             |status |      |
    |                       +-------+      |
    +--------------------------------------+
    | +-pCPU0-+        +-pCPU1-+           |Hardware
    | |       |        |       |           |
    | +-------+        +-+-----+           |
    |                    |                 |
    |                  +-+-----+           |
    |                  |record1|           |
    |                  +-------+           |
    +--------------------------------------+

  Step7: Xen traps this SAL call.
         If the pCPU to get SAL record is not the same as the 
         vCPU, Xen issues(14) IPI for another pCPU, Xen on 
         another pCPU issues(15) SAL call.
         Xen copies(16) SAL record to dom0.

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | |        (16) Get SAL record       | |
    | |         A                        | |
    | +---------|------------------------+ |
    +-----------|--------------------------+
    | +-vCPU0-+ |                          |Xen
    | |       | |                          |
    | |       | |             +-------+    |
    | |      ---|-------------> pCPU1 |    |
    | |       | |             +-------+    |
    | +-------+ |             |status |    |
    |           |             +-------+    |
    |           |                          |
    |           |      SAL_GET_STATE_INFO  |
    | send IPI  |           (15)           |
    |   (14)    |         A  |             |
    |    |   [Buffer]     |  |             |
    |    |      A         |  |             |
    |    |      |         |  |             |
    +----|------|---------|--|-------------+
    |    |      +---------------------+    |Hardware
    |    |                |  |        |    |
    |    V                |  V        |    |
    | +-pCPU0-+        +-pCPU1-+      |    |
    | |       |------->|       |      |    |
    | +-------+        +-+-----+      |    |
    |                    |            |    |
    |                  +-+-----+      |    |
    |                  |record1+------+    |
    |                  +-------+           |
    +--------------------------------------+

  Step8: Dom0 issues(17) SAL_CLEAR_STATE_INFO. 

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | | SAL_CLEAR_STATE_INFO             | |
    | | (17)                             | |
    | |  |                               | |
    | +--|-------------------------------+ |
    +--(trap)------------------------------+
    |    |                                 |Xen
    |    V                                 |
    | +-vCPU0-+                            |
    | |       |             +-------+      |
    | |      ---------------> pCPU1 |      |
    | |       |             +-------+      |
    | +-------+             |status |      |
    |                       +-------+      |
    +--------------------------------------+
    | +-pCPU0-+        +-pCPU1-+           |Hardware
    | |       |        |       |           |
    | +-------+        +-+-----+           |
    |                    |                 |
    |                  +-+-----+           |
    |                  |record1|           |
    |                  +-------+           |
    +--------------------------------------+

  Step9: Xen traps this SAL call.
         If the pCPU to clear SAL record is not the same as the 
         vCPU, Xen issues(18) IPI for another pCPU, Xen on 
         another pCPU issues(19) SAL call.
         Xen frees(20) pCPU1 information.

    +--------------------------------------+
    | +-CMC/CPE handler------------------+ |dom0
    | |                                  | |
    | +----------------------------------+ |
    +--------------------------------------+
    | +-vCPU0-+                            |Xen
    | |       |                            |
    | |       |   (20)                     |
    | |       |                            |
    | |       |                            |
    | +-------+                            |
    |                SAL_CLEAR_STATE_INFO  |
    | send IPI              (19)           |
    |   (18)              A  |             |
    +----|----------------|--|-------------+
    |    |                |  |             |Hardware
    |    V                |  V             |
    | +-pCPU0-+        +-pCPU1-+           |
    | |       |------->|       |           |
    | +-------+        +-------+           |
    +--------------------------------------+

>   What about clearing error records?  We need to be careful that error
>records read by Xen and cleared before being passed to dom0 are volatile
>and could be lost if the system crashes or if dom0 doesn't retrieve
>them.  It's best to only clear the log after the error record has been
>received by dom0 and dom0 issues a SAL_CLEAR_STATE_INFO.  This will get
>complicated if we need to clear error records on all pCPUs in response
>to a SAL_CLEAR_STATE_INFO on dom0 vCPU0.
>

By our new design, Xen issues SAL_CLEAR_STATE_INFO synchronizing with 
SAL_CLEAR_STATE_INFO that dom0 issues.

>   Do you plan to support CMC and CPE throttling in Xen (ie. switching
>between interrupt driven and polling handlers under load) and dynamic
>polling intervals?
>

Yes, our design is supported CMC and CPE throttling in Xen and dynamic 
polling intervals. We think that Xen must not fall or slow down with 
hot CMC and CPE interruption.

>   It may be overly complicated to support CPEI on dom0 (fake MADT
>entries, trapping IOSAPIC write, maybe an entirely virtual IOSAPIC in
>order to describe a valid GSI for the CPEI, etc...).  Probably best to
>start out with just letting dom0 poll for CPE records.  Thanks,
>

Thanks for your advice. As for MADT and IOSAPIC, we are not well 
informed. We hope for advice from you and everyone.
Your advice modifies Linux/kernel(mca.c) of dom0, doesn't it? If so, 
we modify Linux/kernel of dom0, and CPE supports polling mode only.


BTW, new member kaz has join our team.

>       Alex
>
>-- 
>Alex Williamson                             HP Open Source & Linux Org.

Best regards,
 Yutaka Ezaki(You)
 Kazuhiro Suzuki(Kaz)
 Masaki Kanno(Kan)



_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel