IPC Protocols: Difference between revisions

Jump to navigation Jump to search
Line 30: Line 30:
== Protocol-definition Language (PDL) ==
== Protocol-definition Language (PDL) ==


A protocol has a ''server'' and a ''client''.  Loosely speaking for IPC, the ''client'' '''MAY BE''' the process that forks the ''server'' processAgain loosely speaking, the ''server'' provides capabilities not inherent to the ''client''.
A protocol is between a ''parent'' actor and a ''child'' actorChild actors are not trusted, and if there is any evidence the child is misbehaving, it is terminated.  Code utilizing the child actor therefore will never see type or protocol errors; if they occur, the child is killedThe parent actor must, however, carefully handle errors.


'''TODO''': this isn't a complete definition.  Forking a plugin process fits it, but a chrome-local addon to content doesn't.
A protocol is specified with respect to the parent; that is, messages the parent is allowed to receive are exactly the messages a child is allowed to sent, and ''vice versa''.


A protocol is specified with respect to the server; that is, messages the server is allowed to receive are exactly the messages a client is allowed to sent, and ''vice versa''.
A protocol consists of declarations of ''messages'' and specifications of ''state machine transitions''.  (This ties us to state-machine semantics.)  The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself.


A protocol consists of declarations of ''messages'' and specifications of ''state machine transitions''.  (This ties us to state-machine semantics.)  The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself.
A protocol is specified by first declaring the protocol


A protocol is specified by first naming the protocol
Protocol :: (\epsilon | 'sync' | 'rpc') 'protocol' ProtocolName '{' Messages Transitions '}'


  protocol Foo {
This implies the creation of <code>FooParent</code> and <code>FooChild</code> actors. (Hereafter, this document speaks from the perspective of the <code>FooParent</code>.)


This implies the creation of a <code>FooServer</code> and <code>FooClient</code>.  (Hereafter, this document speaks from the perspective of the <code>FooServer</code>.)
By default, protocols can only exchange asynchronous messages.  A protocol must explicitly allow synchronous and RPC (see below) messages by using the <code>sync</code> or <code>rpc</code> qualifiersThis is enforced statically by the PDL type system.


Conceptually (but not necessarily syntactically) next are message definitions.  What underlying types we allow in these, and with what qualifiers, is likely to be a central topic of debateAnother hot topic will likely be what message semantics we provide; possibilities are
Conceptually (but not necessarily syntactically) next are message definitions.  Messages definitions are somewhat analogous to function signatures in C/C++Messages can have one of three semantics
* '''asynchronous''': the sending actor does not expect nor listen for a response to the sent message
* '''asynchronous''': the sending actor does not expect nor listen for a response to the sent message
* '''synchronous''': the sending actor is completely blocked until it receives a response
* '''synchronous''': the sending actor is completely blocked until it receives a response
* '''RPC++''': the sending actor is partially blocked until it receives a response to message ''m''.  It is only allowed to process RPC++ messages sent by the actor receiving ''m'', direct resulting from the receiving actor receiving ''m''.  (This is intended to model function call semantics.)
* '''RPC++''': the sending actor is partially blocked until it receives a response to message ''m''.  It is only allowed to process RPC++ messages sent by the actor receiving ''m'', direct resulting from the receiving actor receiving ''m''.  (This is intended to model function call semantics.)


(From e-mail discussions, it appears that we may want RPC++ messages, excluding synchronous messages (since they're a special case of RPC++ messages).  However, I'm not convinced RPC++ is necessary, and I'm writing the strawman grammar below assuming we only require synchronous messages.)
Asynchronous messages are the default.  The list above is sorted by decreasing simplicity and efficiency; synchronous and RPC++ messages should not be used without a compelling reason.


=== Strawman message grammar ===
=== Strawman message grammar ===


   Message :: (SyncMessage | AsyncMessage) ';'
   Message :: (\epsilon | 'sync' | 'rpc') MessageBody ';'
  SyncMessage :: 'sync' ('in' | 'out') Type MessageName '(' MessageArguments ')'
   MessageBody :: Type MessageName '(' MessageArguments ')'
   AsyncMessage :: 'async' ('in' | 'out') MessageName '(' MessageArguments ')'
    
    
   MessageArguments :: (MessageArgument ',' | \epsilon)*
   MessageArguments :: (MessageArgument ',')* MessageArgument?
   MessageArgument :: Type Identifier
   MessageArgument :: Type Identifier
    
    
   Type :: SharingQualifier BasicType
   Type :: SharingQualifier BasicType
   SharingQualifier :: ('share' | 'transfer' | \epsilon)
   SharingQualifier :: (\epsilon | 'transfer' | 'share')
   BasicType :: ('void' | 'int' | ... ???)
   BasicType :: BuiltinType | ImportedType


A few items are worth calling out.
A few items are worth calling out.


* As mentioned above, will SyncMessage be sufficient for us?
SharingQualifiers define transport semantics for objects sent in a message.  By default, objects are sent "by value" (i.e., marshalled then unmarshalled). How we classes are marshalled is not a concern of the protocol layer, but very important nonetheless.  This is likely to be another security concern.  But large objects can also be transported through shared memory.
* SyncMessages have a return type, whereas AsyncMessages don't.
* SharingQualifiers are a discussion unto themselves.


'''share''' means that the object ''o'' named lives in shared memory, and is co-owned by the client and server.  If the receiving actor does not already co-own ''o'', it does after receiving the message.  A lower layer needs to enforce that this is implemented correctly:
The qualifier '''share''' means that the object ''o'' named lives in shared memory, and is co-owned by the parent and child actors.  If the receiving actor does not already co-own ''o'', it does after receiving the message.  A lower layer needs to enforce that this is implemented correctly
# ''o'' lives in shared memory
# ''o'' lives in shared memory
# all objects reachable from ''o'' live in shared memory
# all objects reachable from ''o'' live in shared memory
# all accesses to members of ''o'' are synchronized across the client and server
# all accesses to members of ''o'' are synchronized across the client and server


'''transfer''' means that the sending actor owns ''o'', and when the receiving actor receives ''o'', ownership transfers from the sender to the receiver.  This means that requirement (3) above is removed for '''transfer''' types.
'''transfer''' means that the sending actor owns ''o'', and when the receiving actor receives ''o'', ownership transfers from the sender to the receiver.  This means that requirement (3) above is removed for '''transfer''' types.  This is the preferred sharing semantics; '''share''' probably won't be implemented initially.
 
No SharingQualifier (\epsilon) means that that object sent is serializedHow we classes are serialized is probably not a concern of the protocol layer, but very important nonetheless.  This is likely to be another security concern.


'''NOTE''': '''share''' and '''transfer''' are optimizations.  These don't need to be included in the initial language implementation, but are worth keeping in mind.
'''NOTE''': '''share''' and '''transfer''' are optimizations.  These don't need to be included in the initial language implementation, but are worth keeping in mind.


'''NOTE''': what <code>BasicType</code> means should be a fruitful topic for discussion
A BasicType is a C++ type that can be transferred in a message.  We will provide a set of BuiltinTypes like void, int, and so on.  Protocol writers can also ''import'' foreign types for which marshall/unmarshall traits are defined, and/or that can be allocated in shared memory.


=== Strawman transition grammar ===
=== Strawman transition grammar ===


   Transition :: 'state' StateName '{' Actions '}'
   Transition :: 'state' StateName '{' Actions '}'
   Actions :: (Action ';' | \epsilon)
   Actions :: (Action ';')* Action? 
  Action :: MessageAction | RPCAction
    
    
   Action :: MessageName ('!' | '?') '->' StateName
   MessageAction :: ('send' | 'rcv') MessageName 'goto' StateName
  RPCAction :: ('call' | 'answer') MessageName ('push' StateName)?
 
'''TODO''': the above grammar may lead to unnecessarily verbose specifications, since there's only one "action" permitted per state transition.  We can add additional syntax to reduce verbosity if it becomes a problem.


This is a dirt-simple grammar but should capture all we need in a first pass.  A transition starts from a particular state (the lexically first state is the start state), and then either sends or receives one of a set of allowed messages.  The syntax <code>MessageName !</code> means "send MessageName", and <code>MessageName ?</code> means "receive MessageName".  The actor then transitions into another state.
A transition starts from a particular state (the lexically first state is the start state), and then either sends or receives ("calls" or "answers" for RPC) one of a set of allowed messages.  The syntax <code>send MessageName</code> means "send MessageName", and <code>rcv MessageName</code> means "receive MessageName".  After the action, the actor then transitions into another state.  For RPC, an action causes the current state to be pushed on a stack, then the "push STATE" to be transitioned into.


From a particular state, an actor can either ''only receive'' or ''only send'' messages (we could relax this, but it complicates the implementation).  This is extremely easy to check statically (we could make it part of the grammar, too).
Unfortunately, the syntax for async/sync messages and RPC calls diverge because the semantics are so different.  Sync/async messages only model message passing, whereas RPC models function calls.  After a message-passing action occurs, the actor only makes a state transition (<code>goto STATE</code>).  However, an RPC action pushes a new state onto an "RPC stack" (<code>push STATE</code>).  When the RPC call returns, the "pushed" state is "popped."


Transitions only happen when the underlying message operation was "completed."  For messages sent asynchronously, this means sent over the wire (resp., received).  For messages sent synchronously, this means sent over the wire ''and'' replied to by the other side (resp., received and reply sent).
'''TODO''': this may be confusing. Any ideas for simplifying it?


We can support sending/receiving multiple messages per transitionAs this complicates the implementation, it's probably best to add that only when necessary.
We can check almost any property of the protocol specification itself statically, since it's a state machine + well-defined stackWhat all of these static invariants should be is not yet known; one invariant is that an asynchronous message can't be nested within a synchronous one.  From this static specification, we can generate a C++ dynamic checker to ensure that message processors (code utilizing actors) adhere to the protocol spec.  We may be able to check some of this statically as well, but it's harder.


'''TODO''': there are many things we can integrate here, but concrete use cases are necessary.  This should be a main point of discussion.
'''TODO''': there are more things we can integrate into the transition grammar, but concrete use cases are necessary.


== Implementation ==
== Implementation ==
Confirmed users
699

edits

Navigation menu