Media/WebRTC/WebRTCE10S: Difference between revisions

← Older edit

Media/WebRTC/WebRTCE10S (view source)

Revision as of 09:47, 24 April 2013

7,152 bytes added , 24 April 2013

→‎Network Proxies

Ekr

Confirmed users

214

edits

@@ Line 17: / Line 17: @@
 and between the PeerConnection and video/audio tags. This doesn't mean that
 the media actually flows through the JS, however.
+Below we show a proposed process split with E10S:
+https://raw.github.com/mozilla/webrtc/master/planning/architecture-e10s.png
+== System Resources to be Proxied ==
+The following system resources need to somehow be made accessible to the renderer
+process.
+* Video rendering (accessed via a video tag)   [TODO: Is this actually a system resource? Not clear on what the display model is.]
+* The speaker (accessed via an audio tag)
+* The camera and microphone
+* Hardware video encoders and decoders (if any)
+* The network interfaces
+In addition, we use the Socket Transport Service (STS) to do socket input processing. We create
+UDP sockets via NSPR and then attach them to the STS in order to be informed when data is
+available.
+=== Input Device Access (getUserMedia) ===
+We assume that camera and microphone access will be available only in the
+parent process. However, since most of the WebRTC stack will live in the
+child process, we need some mechanism for making the media available to
+it.
+The basic idea is to create a new backend for MediaManager/GetUserMedia
+that is just a proxy talking to the real media devices over IPDL. The
+incoming media frames would then be passed over the IPDL channel
+to the child process where they are injected into the MediaStreamGraph.
+This shouldn't be too complicated, but there are a few challenges:
+* Making sure that we don't do superfluous copies of the data. I understand that we can move the data via gralloc buffers, so maybe that will be OK for video. [OPEN ISSUE: Will that work for audio?]
+* Latency. We need to make sure that moving the data across the IPDL interface doesn't introduce too much latency. Hopefully this is a solved problem.
+=== Output Access ===
+[TODO: Presumably this works the same as rendering now?]
+=== Hardware Acceleration ===
+In this design, we make no attempt to combine HW acceleration and capture
+or rendering. I.e., if we have a standalone HW encoder, we just insert it
+into the pipeline in place of the the SW encoder and then redirect the
+encoded media out the network interface. The same goes for decoding.
+There's no attempt made to shortcut the rest of the stack. This design
+promotes modularity, since we can just make the HW encoder look
+like another module inside of GIPS. In the longer term, we may want
+to revisit this, but I think it's the best design for now.
+Note that if we have an integrated encoder (e.g., in a camera) then
+we *can* accomodate that by just having gUM return encoded frames
+instead of I420 and then we pass those directly to the network without
+encoding them. (Though this is somewhat complicated by the need
+to render them locally in a video tag.)
+=== Network Access ===
+All networking access in WebRTC is mediated through the ICE stack (media/mtransport/third_party/nICEr and media/mtransport/nr*).
+From a technical perspective, the requirements look like:
+* The ability to send and receive UDP datagrams with any valid local address and any remote address.
+* The ability to enumerate every network interface.
+* The ability to have events happen at specific times.
+Below is a schematic diagram of the interaction of the ICE stack with the rest of the system which shows
+how things actually work.
+https://raw.github.com/mozilla/webrtc/master/planning/network-e10s.png
+As before, the boxes on the left signify the currently protected operations.
+There are two natural designs, discussed below.
+==== Network Proxies ====
+The first design is to do only the primitive networking operations in the parent
+process and have ICE talk to the proxies that remote those operations,
+as shown below. This is approximately the design Google uses.
+https://raw.github.com/mozilla/webrtc/master/planning/network-e10s-socket-proxy.png
+The advantage of this design is that it is relatively straightforward to execute
+and that the APIs that are required are relatively limited. I.e.,
+* List all the interfaces and their addresses
+* Bind a socket to a given interface/address
+* Send a packet to a given remote address from a given socket
+* Receive a packet on a given socket and learn the remote address
+The major disadvantage of this design is that it provides the content process
+with a fair amount of control over the network and thus potentially represents
+a threat if/when the content process is compromised. For instance,
+if the content process is compromised, it could send arbitrary UDP or
+TCP packets to anywhere in the network that is accessible to the phone.
+Of course, this is already a risk in the desktop version.
+We might be able to mitigate this risk somewhat by installing some
+primitive packet filtering on the parent process side. For instance, we
+could enforce the following policy:
+* A socket maintains two tables:
+** An outstanding STUN transaction table
+** A "permissions" table of accepted remote addresses
+* When a content process tries to send a non-STUN formatted packet, the socket rejects it unless the remote address is in the permissions table
+* When a content process sends a STUN-formatted packet, it gets transmitted and added to the outstanding STUN transaction table
+* When packet is received, it is checked against the outstanding STUN transaction table. If a transaction completes, then the address is added to the permissions table.
+This would be relatively easy to implement and would provide a measure of protection
+against misuse of this interface. It would require some STUN-parsing smarts in the
+parent, but those can be kept relatively minimal.
+Detailed api proposal at [[Media/WebRTC/WebRTCE10S/NetworkProxyInterface]]
+==== ICE In Parent ====
+The alternative design is to push the entire ICE stack into the parent process, as shown
+below.
+https://raw.github.com/mozilla/webrtc/master/planning/network-e10s-ice-parent.png
+The advantage of this design from a security perspective is that by pushing the
+connectivity checking into the parent process we completely remove the
+ability of a compromised content process to send arbitrary network
+traffic.
+The two major drawbacks of this design are:
+* The interface to the ICE stack is very complicated, which makes the
+engineering task harder.
+* The ICE stack itself is also complicated, which increases the surface area
+in the "secure" parent process.
+The ICE stack interface is found at:
+* http://hg.mozilla.org/mozilla-central/file/b553e9ca2354/media/mtransport/nricectx.h
+* http://hg.mozilla.org/mozilla-central/file/b553e9ca2354/media/mtransport/nricemediastream.h
+This API has around 20 distinct API calls, each of which will need to be separately
+remoted. A number of them have fairly complicated semantics, which would tend
+to invade the rest of the program.
+==== Recommendation ====
+In my opinion we should go for the "Network Proxies" design. It's going to be a lot simpler
+to implement than the "ICE in the parent" design and can be largely hidden by an
+already replaceable component (nr_socket_prsock.cpp) without impacting the rest
+of the code. It also lets us work in parallel because we can do a simple implementation
+without the packet filter described above and then add the packet filter transparently
+later.