Remote Debugging Protocol
(Note: this page is a draft design of work not yet completed. It is written in the present tense to be easily promoted to documentation when implemented, and also to simplify the grammar.)
The Mozilla debugging protocol allows a debugger to connect to a browser, discover what sorts of things are present to debug or inspect, select JavaScript threads to watch, and observe and modify their execution. The protocol provides a unified view of JavaScript, DOM nodes, CSS rules, and the other technologies used in client-side web applications. The protocol is meant to be sufficiently general to be extended for use with other sorts of clients (profilers, say) and servers (mail readers; random XULrunner applications).
All communication between debugger (client) and browser (server) is in the form of JSON objects. This makes the protocol directly readable by humans, capable of graceful evolution, and easy to implement using stock libraries. In particular, it should be easy to create mock implementations for testing and experimentation.
The protocol operates at the JavaScript level, not at the C++ or machine level, and assumes that the JavaScript implementation itself is healthy and responsive. The JavaScript program being executed may well have gone wrong, but the JavaScript implementation's internal state must not be corrupt. Bugs in the implementation may cause the debugger to fail; bugs in the interpreted program must not.
Actors
An "actor" is something on the server that receives and replies to JSON packets from the client. Every packet from the client specifies the actor to which it is directed, and every packet from the server indicates which actor sent it.
Each server has a root actor, with which the client first interacts. The root actor can explain what sort of thing the server represents (browser; mail reader; etc.), and enumerate things available to debug: tabs, chrome, and so on. Each of these, in turn, is an actor to which requests can be addressed.
For example, a debugger might connect to a browser, ask the root actor to list the browser's tabs, and present this list to the developer. If the developer chooses some tabs to debug, then the debugger sends "attach" requests to the actors representing those tabs, to begin debugging. Both artifacts of the program being debugged, like JavaScript objects and stack frames, and artifacts of the debugging machinery, like breakpoints and watchpoints, are actors to which packets can be addressed.
All actors form a tree, with the root actor as the root. Closing communications with an actor closes communications with all its descendants. The root actor has no owner, and lives as long as the underlying connection to the client does; when the underlying connection is closed, all actor names are closed. These limits on the lifetimes of actor names allow the protocol to mention actors freely, without forcing the client to explicitly free every actor that has ever been mentioned.
Note that the actor hierarchy does not, in general, correspond to any particular hierarchy appearing in the debuggee. For example, although web workers are arranged in a hierarchy, all actors representing web worker threads are direct children of the root actor: one might want to detach from a parent worker while continuing to debug one of its children, so it doesn't make sense to close communications with a child worker when one closes communicatinos with its parent.
(We are stealing the "actor" terminology from Mozilla's IPDL, to mean, roughly, "things participating in the protocol". However, IPDL does much more with the idea than we do: it treats both client and server as collections of actors, and uses that detail to statically verify properties of the protocol. In contrast, the debugging protocol simply wants a consistent way to indicate the entities to which packets are directed.)
Packets
The protocol is carried by a reliable, bi-directional byte stream; data sent in both directions consists of JSON objects, called packets. A packet is a top-level JSON object, not contained inside any other value.
Every packet sent from the client has the form:
{ "to": actor, "type": type, ... }
where actor is the actor to whom the packet is directed—actor names are always integers—and type is a string specifying what sort of packet it is. Additional properties may be present, depending on type.
Every packet sent from the server has the form:
{ "from": actor, ... }
where actor is the name of the actor that sent it. Additional properties may be present, depending on the situation.
We expect that, as the protocol evolves, we will specify new properties that can appear in existing packets, and experimental implementations will do the same. Clients should be written to silently ignore properties they do not recognize.
Requests and Replies
In this protocol description, a "request" is a packet sent from the client which always elicits a single packet from the recipient, the "reply". These terms indicate a simple pattern of communication: at any given time, either the client or actor is permitted to send a packet, but never both.
The client's communication with each actor is treated separately: a client may send a request to one actor, and then send a request to a different actor before receiving a reply from the first.
Packets not described as "requests" or "replies" are part of some more complicated interaction, which should be spelled out in more detail.
The Root Actor
When the connection to the server is opened, the root actor opens the conversation with the following packet:
{ "from":0, "application-type":app-type, "traits":traits, ...}
The root actor's name is always zero. app-type is a string indicating what sort of program the server represents. There may be more properties present, depending on app-type.
traits is an object describing protocol variants this server supports that are not convenient for the client to detect otherwise. The property names present indicate what traits the server has; the properties' values depend on their names. This version of the protocol defines no traits, so traits must be an object with no properties, {}.
For web browsers, the introductory packet should have the following form:
{ "from":0, "application-type":"browser", "traits":traits }
Listing Top-Level Browsing Contexts
To get a list of the top-level browsing contexts (tabs) present in a browser, a client should send a request like the following to the root actor:
{ "to":0, "type":"list-contexts" }
The reply should have the form:
{ "from":0, "contexts":[context...], selected:index }
Contexts is a list with one element for each top-level browsing context present in the browser, and index is the index within that list of the browsing context the user is currently interacting with. Each context has the following form:
{ "actor":actor, "title":title, "url":url }
actor is the actor representing that top-level browsing context; title is the context's document's title, and url is the context's document's URL.
Clients should send "list-contexts" requests only to root actors that have identified themselves as browsers.
Actor names given in a list-contexts reply have the root actor as their parent. They remain valid at least until the next list-contexts request is received. If the client attaches to a context actor, its name is valid until the client detaches from the context and receives a "detached" packet from the context, or until the client sends a "release" packet to the context. See "Interacting with Thread-Like Actors" for details.
For example, upon connection to a web browser visiting two pages at example.com, the root actor's introductory packet might look like this:
{ "from":0, "application-type":"browser", "contexts": [ { "actor":1, "title":"Fruits", "url":"http://www.example.com/fruits/" }, { "actor":2, "title":"Bats", "url":"http://www.example.com/bats/" }]}
(The point here is to give the debugger enough information to select which context it would like to debug without having to do too many round trips. Round trips are bad for UI responsiveness, but large packets are probably not a problem, so whatever would help to add, we should add.)
Interacting with Thread-Like Actors
Actors representing independent threads of JavaScript execution, like browsing contexts and web workers, are collectively known as "threads". Interactions with actors representing threads follow a more complicated communication pattern.
A thread is always in one of the following states:
- Detached: the thread is running freely, and not presently interacting with the debugger. Detached threads run, encounter errors, and exit without exchanging any sort of messages with the debugger. A debugger can attach to a thread, putting it in the Running state. Or, a detached thread may exit on its own, entering the Exited state.
- Running: the thread is running under the debugger's observation, executing JavaScript code or possibly blocked waiting for input. It will report exceptions, breakpoint hits, watchpoint hits, and other interesting events to the client, and enter the Paused state. The debugger can also interrupt a running thread; this elicits a response and puts the thread in the Paused state. A running thread may also exit, entering the "exited" state.
- Paused: the thread has reported a pause to the client and is awaiting further instructions. In this state, a thread can accept requests and send replies. If the client asks the thread to continue or step, it returns to the Running state.
- Exited: the thread has ceased execution, and will disappear. The resources of the underlying thread may have been freed; this state really indicates that the actor's name is not yet available for reuse. When the actor receives a "release" packet, the name may be reused.
These interactions are meant to have certain properties:
- At no point may either client or server send an unbounded number of packets without receiving a packet from its counterpart. This avoids deadlock without requiring either side to buffer an arbitrary number of packets per actor.
- In states where a transition can be initiated by either the debugger or the thread, it is always clear to the debugger which state the thread actually entered, and for what reason.
For example, if the debugger interrupts a running thread, it cannot be sure whether the thread stopped because of the interruption, paused of its own accord (to report a watchpoint hit, say), or exited. However, the next packet the debugger receives will either be "interrupted", "paused", or "exited", resolving the ambiguity.
Similarly, when the debugger attaches to a thread, it cannot be sure whether it has succeeded in attaching to the thread, or whether the thread exited before the "attach" packet arrived. However, in either case the debugger can expect a disambiguating response: if the attach suceeded, it receives an "attached" packet; and in the second case, it receives an "exit" packet.
To support this property, the thread ignores certain debugger packets in some states (the "interrupt" packet in the Paused and Exited states, for exmple). These cases all handle situations where the ignored packet was preempted by some thread action.
Note that the rules here apply to the client's interactions with each thread agent separately. A client may send an "interrupt" to one thread agent while awaiting a reply to a request sent to a different thread agent.
Attaching To a Thread
To attach to a thread, the client sends a packet of the form:
{ "to":thread, "type":"attach", "pause-for":pause-types }
Here, thread is the actor representing the thread, perhaps a browsing context from a "list-contexts" reply. This tells the thread to continue to run, but asks it to pause if any of the event described by pause-types occurs. The form of pause-types is described in #Pause_Types.
The thread responds in one of two ways:
{ "from":thread, "type":"attached" }
This indicates that the thread received the attach packet, and will continue to run, reporting events of interest to the debugger. The thread is now in the Running state. The actor name thread remains valid until the client detaches from the thread or acknowledges a thread exit.
{ "from":thread, "type":"exited" }
This indicates that the thread exited before receiving the attach packet. The thread is now in the Exited state. The client must respond to this with a release packet; see Exiting Threads.
Detaching From a Thread
To detach from a thread, the client sends a packet of the form:
{ "to":thread, "type":"detach" }
The thread responds in one of three ways:
{ "from":thread, "type":"detached" }
This indicates that the client has detached from the thread. The thread is now in the Detached state: it can run freely, and no longer reports events to the client. The actor name thread is released and available for reuse.
{ "from":thread, "type":"paused", ... } { "from":thread, "type":"detached" }
This series of packets indicates that the thread paused of its own accord (for the reason given by the additional properties of the "paused" packet), and only then received the "detach" packet. As above, this indicates that the thread is in the Detached state, and the actor name is available for reuse.
{ "from":thread, "type":"exited" }
This indicates that the thread exited on its own before receiving the "detach" packet. The client should follow by sending a "release" packet; see Exiting Threads, below.
Running Threads
Once the client has attached to a thread, it is in the Running state. In this state, four things can happen:
- The thread can hit a breakpoint or watchpoint, or encounter some other condition of interest to the client.
- The thread can exit.
- The client can detach from the thread.
- The client can interrupt the running thread.
Note that a client action can occur simultaneously with a thread action. The protocol is designed to avoid ambiguities when both client and thread act simultaneously.
Thread Pauses
If the thread pauses to report an interesting event to the client, it sends a packet of the form:
{ "from":thread, "type":"paused", "actor":actor, "frame":frame, "why":reason }
This indicates that the thread has entered the Paused state, and explains where and why.
Actor is an actor representing this specific pause of the thread. The pause actor lives until the thread leaves the Paused state. Actors referring to stack frames, values, and other entities uncovered during this pause are all children of this actor; when the thread resumes, those actors are automatically freed. This relieves the client from the responsibility to explicitly close communications with every actor mentioned in a pause. If a client wishes to hold a reference to a JavaScript value across pauses, then it must create its own grip on the value, parented by the thread actor, using the "keep-value" packet.
Frame describes the top frame on the JavaScript stack; see Inspecting The Stack, below. The reason value describes why the thread paused. It has one of the following forms:
{ "type":"breakpoint", "actor":actor }
The thread stopped at the breakpoint represented by the actor named actor.
{ "type":"watchpoint", "actor":watchpoint }
The thread stopped at the breakpoint represented by the actor named actor.
{ "type":"stepped" }
The client had asked the thread to step to the next statement, and the thread completed that step.
{ "type":"pre-call" }
The client had asked the thread to pause before making each function call, and the thread is about to call a function. Single-stepping the thread will a place it at the head of the function's code, with all arguments, local variables, and local functions bound.
{ "type":"pre-return" }
The client had asked the thread to pause before returning from functions, and the thread is about to return from a function. Single-stepping the thread will return the thread to the calling frame.
{ "type":"pre-throw", "exception":grip }
The client had asked the thread to pause before throwing an exception; grip is the name of an actor representing the exception value being thrown; its parent is the pause actor. Control is still at the point of the throw; it has not yet passed to a catch clause. Single-stepping this thread will report either a "caught" or "uncaught" pause.
{ "type":"caught", "exception":grip }
The client stepped the thread from a "pre-throw" pause, and a catch clause has been found for the exception referred to by grip, whose parent is the pause actor; control is stopped at the head of the catch clause, with catch variable bindings made. If the catch is conditional, control is at the beginning of the condition.
{ "type":uncaught", "exception":grip }
The client stepped the thread from a "pre-throw" pause, and no catch clause was found for the exception. Grip is as above. (I'm not sure which code the thread is executing at this point; we might as well reveal SpiderMonkey's natural behavior.)
{ "type":"pre-throw-by-guard", "exception":grip }
The thread had been stopped in a conditional guard, and the client asked the thread to continue but pause before throwing an exception. The guard condition evaluated to false, and the thread is about to re-throw the exception value, grip.
Resuming a Thread
If a thread is in the Paused state, the client can resume it by sending a packet of the following form:
{ "to":thread, "type":"resume", "pause-for":pause-types }
This puts the thread in the Running state, but asks it to pause if any of the event described by pause-types occurs. The form of pause-types is described in #Pause_Types.
Interrupting a Thread
If a thread is in the Running state, the client can cause it to pause where it is by sending a packet of the following form:
{ "to":thread, "type":"interrupt" }
The thread responds in one of three ways:
{ "from":thread, "type":"interrupted", "actor":actor, "frame":frame }
This indicates that the thread stopped due to the client's interrupt packet, and is now in the Paused state.
{ "from":thread, "type":"paused", "frame":frame, "why":reason }
This indicates that the thread stopped of its own accord before receiving the client's interrupt packet, and is now in the Paused state. The meanings of the "paused" packet's properties are as for an ordinary pause. The thread will ignore the client's interrupt packet when it receives it.
{ "from":thread, "type":"exited" }
This indicates that the thread exited before receiving the client's interrupt packet, and is now in the Exited state. See Exiting Threads, below.
Exiting Threads
When a thread in the Running state exits, it sends a packet of the following form:
{ "from":thread, "type":"exited" }
At this point, the thread can no longer be manipulated by the client, and most of the thread's resources may be freed; however, the thread actor name must remain alive, to handle stray interrupt and detach packets. To allow the last trace of the thread to be freed, the client should send a packet of the following form:
{ "to":thread, "type":"release" }
This acknowledges the exit and allows the thread actor name, thread, to be reused for other actors.
Pause Types
The pause-for property of a attach and resume packets specifies which kinds of events in the thread the client is interested in hearing about. The pause-types value is an object whose properties are named after pause reason types (pre-call or stepped, for example) and have the value true if such pauses are of interest to the client. The property names are the possible values of the "type" field of the reason in a "paused" packet.
For example, the following resume packet would instruct thread to continue until the current function call is about to return, or the thread is about to throw an exception:
{ "to":thread, "type":"resume", "pause-for": { "pre-return":true, "pre-throw":true } }
Not correct: this pauses the first time anything returns. We need something equivalent to GDB's frame-id: an inter-pause name for a specific frame, and a way to say "run until this frame is about to be popped".
Certain pause types cause other pause types to be included automatically, if those pause types are not mentioned explicitly:
Pause type | Also implies, if not mentioned explicitly |
---|---|
pre-throw | pre-throw-by-guard |
Inspecting Paused Threads
When a thread is in the Paused state, the debugger can make requests to inspect its stack, scope, and values.
Inspecting the Stack
To inspect the thread's JavaScript stack, the client can send the following request:
{ "to":thread, "type":"frames", "limit":limit }
The limit property is optional. If present, the reply contains at most limit frames from the young end of the stack; if absent, the reply describes the entire stack.
The thread replies as follows:
{ "from":thread, "frames":[frame ...] }
where each frame has the form:
{ "actor":actor, "depth":depth, "where":location, "callee":callee, "callee-name":callee-name, "host":host, "this",this, "arguments":arguments, "scope",scope }
where:
- actor is the name of an actor representing this frame;
- depth is the number of this frame, starting with zero for the youngest frame;
- location is a source location (see Source Locations);
- callee is a grip on the function value being called, or null if we are in global code;
- callee-name is the name of the callee, a string;
- host is true if this frame represents a call to a host function (perhaps implemented in C++);
- this is a grip on the value of this for this call;
- arguments is an array of grips on the actual values passed to the function; and
- scope is an actor representing the innermost scope contour at the current point of execution.
The argument list may be incomplete or inaccurate, for various reasons. If the program has assigned to its formal parameters, the original values passed may have been lost, and compiler optimizations may drop some argument values.
Location and scope may be omitted for host functions.
All actors mentioned in the frame (actor, depth, callee, this, the elements of arguments, and scope) are parented by the current pause actor, as given in the "paused" or "interrupted" packet.
A "grip" is a value that represents a specific value in the debuggee; for some types of values, a grip is an actor. See Grips for details.
eval-in-frame
scopes should provide more information up-front
Source Locations
simple script + line
eval call number + location
Function call number + location
Scope Contours
scopes should include an actor, but provide a lot of information up-front
enumerate
assign
Breakpoints
Watchpoints
Grips
A grip is a value that represents a specific value in the debuggee. A grip has one of the following forms:
value
where value is a string, a number, or a boolean value. For these types of values, the grip is simply the JSON form of the value.
{ "type":"null" }
This represents the JavaScript null value. (This representation allows clients implemented in JavaScript to use typeof(grip) == "object" to decide whether the grip is simple or not, as typeof(null) is "object".)
{ "type":"undefined" }
This represents the JavaScript undefined value, which has no representation in JSON.
{ "type":"long-string", "initial":initial, "length":length, "actor":actor }
This represents a very long string (where "very long" is defined at the server's discretion). Initial is the initial portion of the string, length is its length, and actor can be consulted for the rest of the string, as explained below.
{ "type":"object", "class":class-name, "actor":actor }
This represents a JavaScript object whose class is class-name. Actor can be consulted for its contents, as explained below.
All actors appearing in grips are children of the pause actor; their names become invalid when the thread is resumed.
Garbage collection operates below the level of this protocol, and will never free objects visible to the client via the protocol. Thus, actors representing JavaScript objects are effectively garbage collection roots.
If the client wishes to hold a reference to an object in the debuggee across pauses, it must send a request to the grip's actor of the form:
{ "to":grip-actor, "type":"thread-grip" }
where grip-actor is the pause-parented actor from the existing grip. The grip actor will reply:
{ "from":grip-actor, "thread-grip":thread-grip }
where thread-grip is a new grip on the same object, but whose actor is parented by the thread actor, not the pause actor. The client can release this grip by sending the grip actor a request of the form:
{ "to":thread-grip-actor, "type":"release-grip", "grip":grip }
The thread grip actor will reply, simply:
{ "from":thread-grip-actor }
The client may only send messages to grip actors while the thread is paused, even if the grip actors are children of the thread, not the pause.
Objects
can only be manipulated while paused
requests modeled after ES5 object inspection API
special stuff for arrays
special stuff for functions
Long Strings
The client can find the full contents of a long string by sending a request to the long string grip actor of the form:
{ "to":grip-actor, "type":"substring", "start":start, "length":length }
where start and length are integers. This requests the substring length characters long, starting at the start'th character. The actor replies as follows:
{ "from":grip-actor, "substring":string }
where string is the requested portion of the string the actor represents.