IPC Protocols
There will be many IPC protocols, and they should all be as error-free as possible by design. Having separate protocols for separate uses is a good thing, and they all shouldn't be multiplexed by hand over a shared transport layer. Protocol implementations should only have to check for errors as infrequently as possible; the heavy lifting should be done generically, or by code generated from high-level specifications. Finally, explicit protocols make automatically generating test cases (of "normal" and "failure" conditions) much easier.
Terminology
NB: this section is intended to provide an informal basis for the rest of this document. The remainder of the document should concretely define each of these terms.
An actor is a protocol end-point. This is something that can send and receive messages. We may have multiple actors per thread and per process. (E.g.: when a thread opens a TCP connection to mozilla.org, the socket object created my kernel is an actor (the ownership of this actor is up to the process containing the creating thread). The socket object created by the mozilla.org server is another actor.
A message is the unit of communication between two actors. A message will probably consist of a message type and a message payload. The message type will specify the actor the message is intended to be sent to, the protocol the message belongs to, and the types of each datum in the message payload. The message type may additionally specify the location of data in the payload; e.g., large data may be transferred through shared memory, while small ones may be sent over a pipe. The message type should make it clear which actor owns each potentially-shared datum.
A protocol is a formal definition of the messages that two, asymmetric actors are allowed to send/receive to/from each other. Protocol semantics can be defined in many different ways, but we should be able to get away with "finite-state machine semantics." This means that each protocol actor has a "current state," and in that state, it is only allowed to send a set of messages SM and receive a set of messages RM. Upon sending or receiving a particular message m, the actor may transition into a new "state."
Desiderata
- All messages sent from one actor to another are well-typed. That is, every datum in a message m sent by actor a has a static type, is serializable by an a, is un-serializable by the actor that receives m, and the ownership of the datum is explicit.
- All messages received by an actor are well-typed, in the same sense as above. (NB: these goals are complementary. A misbehaving actor could send a non-well-typed message to another actor, and the receiving actor shouldn't suffer because of the non-well-typed message. "Suffer" might include security concerns, so this is important.)
- All messages sent from one actor to another adhere to the semantics of the actor's protocol. For the TCP example, this means that an actor in the "CLOSED" state can't send a "FIN" message.
- All messages received by an actor adhere to the semantics of the actor's protocol. This means that an actor doesn't have to worry about processing mal-formed messages sent by another, misbehaving actor. Like the message type-safety desideratum, this is likely to be a security concern and is thus important.
- Resource ownership should be clear, and how resources are cleanup on errors/crashes should be well-defined. Need to handle the separate cases: cleanup on shutdown, cleanup on crash, cleanup on GC/refcount. (Whether this is a protocol-layer concern is not yet clear.)
- Objects should be shared carefully so as to avoid creating cycles and/or leaking information from priveleged->unprivileged processes.
Implementation Overview
For the overall design, we will follow the example of network protocol stacks. On top of the stack is the protocol layer, and below it is the transport layer. (Luckily, much of the transport layer for IPC seems to be provided by chromium.)
The transport layer is responsible for platform-specific code for communication and shared memory, connection establishment, identifying and routing messages to actors, and type safety of individual messages. Please refer to [http:://example.com TODO] for its specification.
We will follow the example of Sing# for protocol specifications. Since C++ doesn't support this kind of code, and it's extremely hard to make it do so, and we may want to allow other languages (JavaScript) to implement protocol actors, we will implement protocol specifications as a domain-specific language. (We did the same thing for IDL.) The remainder of this document focuses on this protocol-specification language.
Protocol-definition Language (PDL)
A protocol is between a parent actor and a child actor. Child actors are not trusted, and if there is any evidence the child is misbehaving, it is terminated. Code utilizing the child actor therefore will never see type or protocol errors; if they occur, the child is killed. The parent actor must, however, carefully handle errors.
A protocol is specified with respect to the parent; that is, messages the parent is allowed to receive are exactly the messages a child is allowed to sent, and vice versa.
A protocol consists of declarations of messages and specifications of state machine transitions. (This ties us to state-machine semantics.) The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself.
A protocol is specified by first declaring the protocol
Protocol :: (\epsilon | 'sync' | 'rpc') 'protocol' ProtocolName '{' Messages Transitions '}'
This implies the creation of FooParent
and FooChild
actors. (Hereafter, this document speaks from the perspective of the FooParent
.)
By default, protocols can only exchange asynchronous messages. A protocol must explicitly allow synchronous and RPC (see below) messages by using the sync
or rpc
qualifiers. This is enforced statically by the PDL type system.
Conceptually (but not necessarily syntactically) next are message definitions. Messages definitions are somewhat analogous to function signatures in C/C++. Messages can have one of three semantics
- asynchronous: the sending actor does not expect nor listen for a response to the sent message
- synchronous: the sending actor is completely blocked until it receives a response
- RPC++: the sending actor is partially blocked until it receives a response to message m. It is only allowed to process RPC++ messages sent by the actor receiving m, direct resulting from the receiving actor receiving m. (This is intended to model function call semantics.)
Asynchronous messages are the default. The list above is sorted by decreasing simplicity and efficiency; synchronous and RPC++ messages should not be used without a compelling reason.
Strawman message grammar
Message :: (\epsilon | 'sync' | 'rpc') MessageBody ';' MessageBody :: Type MessageName '(' MessageArguments ')' MessageArguments :: (MessageArgument ',')* MessageArgument? MessageArgument :: Type Identifier Type :: SharingQualifier BasicType SharingQualifier :: (\epsilon | 'transfer' | 'share') BasicType :: BuiltinType | ImportedType
A few items are worth calling out.
SharingQualifiers define transport semantics for objects sent in a message. By default, objects are sent "by value" (i.e., marshalled then unmarshalled). How we classes are marshalled is not a concern of the protocol layer, but very important nonetheless. This is likely to be another security concern. But large objects can also be transported through shared memory.
The qualifier share means that the object o named lives in shared memory, and is co-owned by the parent and child actors. If the receiving actor does not already co-own o, it does after receiving the message. A lower layer needs to enforce that this is implemented correctly
- o lives in shared memory
- all objects reachable from o live in shared memory
- all accesses to members of o are synchronized across the client and server
transfer means that the sending actor owns o, and when the receiving actor receives o, ownership transfers from the sender to the receiver. This means that requirement (3) above is removed for transfer types. This is the preferred sharing semantics; share probably won't be implemented initially.
NOTE: share and transfer are optimizations. These don't need to be included in the initial language implementation, but are worth keeping in mind.
A BasicType is a C++ type that can be transferred in a message. We will provide a set of BuiltinTypes like void, int, and so on. Protocol writers can also import foreign types for which marshall/unmarshall traits are defined, and/or that can be allocated in shared memory.
Strawman transition grammar
Transition :: 'state' StateName '{' Actions '}' Actions :: (Action ';')* Action? Action :: MessageAction | RPCAction MessageAction :: ('send' | 'rcv') MessageName 'goto' StateName RPCAction :: ('call' | 'answer') MessageName ('push' StateName)?
TODO: the above grammar may lead to unnecessarily verbose specifications, since there's only one "action" permitted per state transition. We can add additional syntax to reduce verbosity if it becomes a problem.
A transition starts from a particular state (the lexically first state is the start state), and then either sends or receives ("calls" or "answers" for RPC) one of a set of allowed messages. The syntax send MessageName
means "send MessageName", and rcv MessageName
means "receive MessageName". After the action, the actor then transitions into another state. For RPC, an action causes the current state to be pushed on a stack, then the "push STATE" to be transitioned into.
Unfortunately, the syntax for async/sync messages and RPC calls diverge because the semantics are so different. Sync/async messages only model message passing, whereas RPC models function calls. After a message-passing action occurs, the actor only makes a state transition (goto STATE
). However, an RPC action pushes a new state onto an "RPC stack" (push STATE
). When the RPC call returns, the "pushed" state is "popped."
TODO: this may be confusing. Any ideas for simplifying it?
We can check almost any property of the protocol specification itself statically, since it's a state machine + well-defined stack. What all of these static invariants should be is not yet known; one invariant is that an asynchronous message can't be nested within a synchronous one. From this static specification, we can generate a C++ dynamic checker to ensure that message processors (code utilizing actors) adhere to the protocol spec. We may be able to check some of this statically as well, but it's harder.
TODO: there are more things we can integrate into the transition grammar, but concrete use cases are necessary.
Implementation
The transport layer should hopefully be mainly chromium IPC code. We will likely need to extend this to support objects sent through shared memory, if that feature is desired.
The PDL compiler will take as input a FooProtocol.pdl source and generate FooProtocolServer.h and FooProtocolClient.h C++ headers. These headers will declare message types for the transport layer, declare protocol states, and define functions that take care of state transitions and invoke message handlers. The headers will leave undefined the implementations of message handlers; these are up to mozilla code to provide. (Perhaps through subclassing.)
Message handlers will be responsible for processing the received message (obviously) and also deciding which of the allowed states is transitioned to after processing. The generated protocol header will enforce protocol safety here. Lower-level enforcement of transport-layer properties, like not nesting asynchronous calls within synchronous handlers, can be enforced in a shared ProtocolBase class. TODO: should these kinds of properties be global, or defined in protocol specs? What are all these properties? Deadlock avoidance is another, if we provide synchronous messages.
Protocol safety (only sending/receiving allowed messages from particular states) will be enforced by generating a finite-state machine in the protocol header, and checking sent/received messages against this state machine.
Very importantly, there will be an implicitly-defined ERROR state that occurs when an actor violates the protocol (or a type safety constraint). How this is handled is very much open to discussion.
NOTE: I think it's worthwhile leaving error detection on even in optimized builds. It should be quite cheap compared to IPC overhead and will likely allow us to avoid security-related bugs.
An actor will just be an instance of a protocol server or client class. We need some way of addressing particular actors across threads and processes. Hopefully the chromium IPC code already supports this. Also, we can easily support process-to-process and thread-to-thread communication over these protocols. The latter would be very useful for debugging, and also useful for writing threaded code that avoids shared memory as much as possible (even when we have multiple processes we will still have multiple threads).
I do not think that protocol instances should be thread safe (in the usual shared-memory sense). We should use these protocols to avoid this kind of threaded code, whether in-process or out-of-process.