Confirmed users
699
edits
Line 30: | Line 30: | ||
== Protocol-definition Language (PDL) == | == Protocol-definition Language (PDL) == | ||
A protocol | A protocol is between a ''parent'' actor and a ''child'' actor. Child actors are not trusted, and if there is any evidence the child is misbehaving, it is terminated. Code utilizing the child actor therefore will never see type or protocol errors; if they occur, the child is killed. The parent actor must, however, carefully handle errors. | ||
''' | A protocol is specified with respect to the parent; that is, messages the parent is allowed to receive are exactly the messages a child is allowed to sent, and ''vice versa''. | ||
A protocol | A protocol consists of declarations of ''messages'' and specifications of ''state machine transitions''. (This ties us to state-machine semantics.) The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself. | ||
A protocol | A protocol is specified by first declaring the protocol | ||
Protocol :: (\epsilon | 'sync' | 'rpc') 'protocol' ProtocolName '{' Messages Transitions '}' | |||
This implies the creation of <code>FooParent</code> and <code>FooChild</code> actors. (Hereafter, this document speaks from the perspective of the <code>FooParent</code>.) | |||
By default, protocols can only exchange asynchronous messages. A protocol must explicitly allow synchronous and RPC (see below) messages by using the <code>sync</code> or <code>rpc</code> qualifiers. This is enforced statically by the PDL type system. | |||
Conceptually (but not necessarily syntactically) next are message definitions. | Conceptually (but not necessarily syntactically) next are message definitions. Messages definitions are somewhat analogous to function signatures in C/C++. Messages can have one of three semantics | ||
* '''asynchronous''': the sending actor does not expect nor listen for a response to the sent message | * '''asynchronous''': the sending actor does not expect nor listen for a response to the sent message | ||
* '''synchronous''': the sending actor is completely blocked until it receives a response | * '''synchronous''': the sending actor is completely blocked until it receives a response | ||
* '''RPC++''': the sending actor is partially blocked until it receives a response to message ''m''. It is only allowed to process RPC++ messages sent by the actor receiving ''m'', direct resulting from the receiving actor receiving ''m''. (This is intended to model function call semantics.) | * '''RPC++''': the sending actor is partially blocked until it receives a response to message ''m''. It is only allowed to process RPC++ messages sent by the actor receiving ''m'', direct resulting from the receiving actor receiving ''m''. (This is intended to model function call semantics.) | ||
Asynchronous messages are the default. The list above is sorted by decreasing simplicity and efficiency; synchronous and RPC++ messages should not be used without a compelling reason. | |||
=== Strawman message grammar === | === Strawman message grammar === | ||
Message :: ( | Message :: (\epsilon | 'sync' | 'rpc') MessageBody ';' | ||
MessageBody :: Type MessageName '(' MessageArguments ')' | |||
MessageArguments :: (MessageArgument ',' | MessageArguments :: (MessageArgument ',')* MessageArgument? | ||
MessageArgument :: Type Identifier | MessageArgument :: Type Identifier | ||
Type :: SharingQualifier BasicType | Type :: SharingQualifier BasicType | ||
SharingQualifier :: (' | SharingQualifier :: (\epsilon | 'transfer' | 'share') | ||
BasicType :: | BasicType :: BuiltinType | ImportedType | ||
A few items are worth calling out. | A few items are worth calling out. | ||
SharingQualifiers define transport semantics for objects sent in a message. By default, objects are sent "by value" (i.e., marshalled then unmarshalled). How we classes are marshalled is not a concern of the protocol layer, but very important nonetheless. This is likely to be another security concern. But large objects can also be transported through shared memory. | |||
'''share''' means that the object ''o'' named lives in shared memory, and is co-owned by the | The qualifier '''share''' means that the object ''o'' named lives in shared memory, and is co-owned by the parent and child actors. If the receiving actor does not already co-own ''o'', it does after receiving the message. A lower layer needs to enforce that this is implemented correctly | ||
# ''o'' lives in shared memory | # ''o'' lives in shared memory | ||
# all objects reachable from ''o'' live in shared memory | # all objects reachable from ''o'' live in shared memory | ||
# all accesses to members of ''o'' are synchronized across the client and server | # all accesses to members of ''o'' are synchronized across the client and server | ||
'''transfer''' means that the sending actor owns ''o'', and when the receiving actor receives ''o'', ownership transfers from the sender to the receiver. This means that requirement (3) above is removed for '''transfer''' types | '''transfer''' means that the sending actor owns ''o'', and when the receiving actor receives ''o'', ownership transfers from the sender to the receiver. This means that requirement (3) above is removed for '''transfer''' types. This is the preferred sharing semantics; '''share''' probably won't be implemented initially. | ||
'''NOTE''': '''share''' and '''transfer''' are optimizations. These don't need to be included in the initial language implementation, but are worth keeping in mind. | '''NOTE''': '''share''' and '''transfer''' are optimizations. These don't need to be included in the initial language implementation, but are worth keeping in mind. | ||
''' | A BasicType is a C++ type that can be transferred in a message. We will provide a set of BuiltinTypes like void, int, and so on. Protocol writers can also ''import'' foreign types for which marshall/unmarshall traits are defined, and/or that can be allocated in shared memory. | ||
=== Strawman transition grammar === | === Strawman transition grammar === | ||
Transition :: 'state' StateName '{' Actions '}' | Transition :: 'state' StateName '{' Actions '}' | ||
Actions :: (Action ';' | | Actions :: (Action ';')* Action? | ||
Action :: MessageAction | RPCAction | |||
MessageAction :: ('send' | 'rcv') MessageName 'goto' StateName | |||
RPCAction :: ('call' | 'answer') MessageName ('push' StateName)? | |||
'''TODO''': the above grammar may lead to unnecessarily verbose specifications, since there's only one "action" permitted per state transition. We can add additional syntax to reduce verbosity if it becomes a problem. | |||
A transition starts from a particular state (the lexically first state is the start state), and then either sends or receives ("calls" or "answers" for RPC) one of a set of allowed messages. The syntax <code>send MessageName</code> means "send MessageName", and <code>rcv MessageName</code> means "receive MessageName". After the action, the actor then transitions into another state. For RPC, an action causes the current state to be pushed on a stack, then the "push STATE" to be transitioned into. | |||
Unfortunately, the syntax for async/sync messages and RPC calls diverge because the semantics are so different. Sync/async messages only model message passing, whereas RPC models function calls. After a message-passing action occurs, the actor only makes a state transition (<code>goto STATE</code>). However, an RPC action pushes a new state onto an "RPC stack" (<code>push STATE</code>). When the RPC call returns, the "pushed" state is "popped." | |||
'''TODO''': this may be confusing. Any ideas for simplifying it? | |||
We can | We can check almost any property of the protocol specification itself statically, since it's a state machine + well-defined stack. What all of these static invariants should be is not yet known; one invariant is that an asynchronous message can't be nested within a synchronous one. From this static specification, we can generate a C++ dynamic checker to ensure that message processors (code utilizing actors) adhere to the protocol spec. We may be able to check some of this statically as well, but it's harder. | ||
'''TODO''': there are | '''TODO''': there are more things we can integrate into the transition grammar, but concrete use cases are necessary. | ||
== Implementation == | == Implementation == |