WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] States for correct establishment and tear-down of an inter-d

To: xen-devel@xxxxxxxxxxxxxxxxxxx, Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Subject: [Xen-devel] States for correct establishment and tear-down of an inter-domain communication channel.
From: harry <harry@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 24 Oct 2005 15:46:27 +0100
Delivery-date: Mon, 24 Oct 2005 14:43:34 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1129547697.32584.61.camel@xxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1129547697.32584.61.camel@xxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I've attached a diagram of the state machine I came up with. It's fairly
subtle and this is the fourth attempt so there's a chance that this
version is also somehow fundamentally flawed.

In the end I went with a 'ready' node and used the event-channel and
ring-reference to represent the connected state.  This is because the
state machine doesn't make use of the info until the other side gets to
the connected state.

The green states are the normal path, the red states are internal
errors, the orange states are when we are disconnecting and the blue
state is where we are going down to go around and reconnect again due to
a reconnection of the remote side.

The state machine would be simpler without the internal errors.

Here are some comments about the state machine (cut and pasted from the
implementation) to aid in understanding:

/*****************************************************************************/
/* The state machine below makes the following assumptions:
*/
/*
*/
/* 1) The store might contain some stale cruft from the last time our
device */
/* driver failed.  This is a handy assumption for testing new versions
of    */
/* the driver but isn't strictly necessary.
*/
/*
*/
/* 2) If the other side generates a protocol error on the inter-domain
*/
/* connection then we attempt to disconnect and reconnect.  An
alternative   */
/* behaviour would be to wait for our interface to be called to
disconnect   */
/* and then reconnect before retrying.
*/
/*
*/
/* 3) If we experience an internal failure (fail to register watch for
*/
/* example) then we attempt to disconnect and wait for our interface to
be   */
/* called to disconnect (by module unload for example) and reconnect
before  */
/* retrying.
*/
/*
*/
/* 4) Connection and disconnection of the channel is done in two phases:
the */
/* first phase makes the local resources available to the remote side,
the   */
/* second phase uses the resources of the remote side to complete the
*/
/* connection.
*/
/*
*/
/* The key for the stimuli is as follows:
*/
/*
*/
/* cn: interface called to connect the channel when interface state is
*/
/* disconnected (for example on module load).
*/
/*
*/
/* pe: protocol error detected when channel is in a state between phase
two  */
/* connected and the completion of the phase two disconnect callback.
*/
/*
*/
/* dn: interface called to disconnect the channel when the interface
state   */
/* is connected (for example on module unload).
*/
/*
*/
/* ou: a synchronous stimulus from the response test_other_state which
*/
/* indicates that the other state is still unknown because the watch
*/
/* callback hasn't happened yet. Can only happen when making the
response    */
/* test_other_state.
*/
/*
*/
/* od: other state is disconnected.  This is both a synchronous stimulus
*/
/* from test_other_state and an asynchronous stimulus from the watch
*/
/* function. Disconnected means that we can't read the at least the
other    */
/* side's ready node from the store.
*/
/*
*/
/* or: other state is ready. This is both a synchronous stimulus from
*/
/* test_other_state and an asynchronous stimulus from the watch
function.    */
/* Ready means that we can see the other side's ready node but not the
*/
/* ring-reference and event-channel information.
*/
/*
*/
/* oc: other state is connected.  This is both a synchronous stimulus
from   */
/* test_other_state and an asynchronous stimulus from the watch
function.    */
/* Connected means that we found both the ready node and the connected
*/
/* information in the store.
*/
/*
*/
/* If the values of the connected information change when the other side
is  */
/* connected then we generate the 'oc' stimulus again which forces a
*/
/* reconnect.
*/
/*
*/
/* rs: An asynchronous response was successful (we only make one
response at */
/* a time so all asynchronous responses have the same completion
stimuli).   */
/*
*/
/* rf: An asynchronous response failed.  Only register_watch,
clear_store,   */
/* write_ready, write_connected, phase_two_connect can fail.  Only
*/
/* phase_two_connect has a good reason for failure: the other side might
*/
/* have passed bogus parameters; the other failures are poor API design
and  */
/* ought to be promoted to domain failures.
*/
/*
*/
/* The state machine responses are as follows:
*/
/*
*/
/* test_other_state:  what state do we currently think the other side is
in  */
/* as reflected by the last watch event.  Synchronous (called with the
lock  */
/* held) completes with ou/od/or/oc.
*/
/*
*/
/* register_watch: register a watch on the other side.
*/
/*
*/
/* unregister_watch: unregister the watch.
*/
/*
*/
/* clear_store: remove the ready node and connected information.
*/
/*
*/
/* write_ready: write the ready node to the store.
*/
/*
*/
/* write_connected: write the connected information to the store.
*/
/*
*/
/* phase_one_connect: grant the remote side access to the local page
etc.    */
/*
*/
/* phase_two_connect: map the remote page etc.
*/
/*
*/
/* phase_two_disconnect: unmap the remote page.
*/
/*
*/
/* phase_one_disconnect: revoke the access of the remote side.
*/
/*
*/
/* complete_disconnect: When our interface is called to get us to
disconnect */
/* the channel we quiesce and disconnect and then call this to indicate
we   */
/* are done.
*/
/*
*/
/*****************************************************************************/

Enjoy,

Harry.

On Mon, 2005-10-17 at 21:14 +1000, Rusty Russell wrote:
> Hi Harry,
> 
>       Did some more thinking about state diagram.  It is far simpler if (1)
> we assume that failures all terminate the device (ie. wait for device
> deletion), and (2) simply treat all changes the same, whether a resume
> (backend change) or tools changing some configuration stuff.  It's not
> complete, but useful to seeing what a nicer xenbus interface would look
> like.  I think something like the following:
> 
>       ->create()
>               // Allocate, read tool-written fields
>               // Xenbus watches backend
> 
>       ->open()
>               // Write fields for backend to read
>               // Xenbus ensures backend not connected anymore
> 
>       ->close()
>               // Abort connection to backend, remove fields for be
>               // Xenbus removes connected field
> 
>       ->change()
>               // Re-read tool-written fields.
> 
>       ->destroy()
>               // Deallocate.
> 
> Grammar:
>       LIFE := create() CONN destroy()
>       CONN := change()* OPENCLOSE change()*
>       OPENCLOSE := open() change()* close()
> 
> Anyway, here's the first cut (haven't sent out yet, since my brain is
> still a little fried and I want to sleep on it).
> 
> State Transition Diagram for Xen Skeleton Front End Device
> 
> Events:
> da: Device appears (tools create directory in store w/ initial fields incl. 
> backend)
> dd: Device destruction (tools remove directory from store)
> dc: Device changes (tools alter fields in store)
> db: Backend changes (restore)
> rm: Module remove
> 
> Unless otherwise referenced, failure puts into "fail" state, which can
> only be resolved by destroying the device.
> 
> i: Initial state
>       Device does not exist, only one event possible.
> 
>       da: goto i_da: read initial fields, watch backend
>       (be: backend info exists, bd: backend info doesn't exist)
> 
> i_da:
>       We need to make sure backend isn't still connected so it
>       notices us coming up.  Check if backend has "connected" node
>       (ce: "connected" exists, cd: "connected" doesn't exist)
> 
>       ce: goto i_da: report error that other end still connected
>       cd: goto i_da_cd: create and write info for backend
>       db: goto i_da: move watch to new backend
>       rm: goto i_da_rm: unwatch backend, free resources
> 
> i_da_cd:
>       Backend is not connected to old frontend, we can set up.
> 
>       dd: goto i: delete info for backend, unwatch backend, free resources
>       dc: goto i_da_cd: update fe info
>       db: goto i_da: move watch to new backend, delete info for backend
>       be: goto id_da_cd_be: read and store backend info,
>               write "connected" field
>       rm: goto i_da_rm: remove info for backend, unwatch backend,
>               free resources
> 
> id_da_cd_be:
>       Fully connected.
> 
>       dd: goto i: unwatch backend, free resources
>       dc: goto i_da_cd_be: update fe info
>       db: goto i_da: abort connection, move watch to new backend, 
>               remove info for backend
>       bd: goto i_da: remove "connected", remove info for backend,
>               abort connection
>       rm: goto i_da_rm: remove "connected", abort connection, remove info for 
> backend, unwatch backend, free resources
> 
> i_da_rm:
>       Remove module
> 

Attachment: xenidc_xbgt_channel_enumeration.ps
Description: PostScript document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>