Media/WebRTC/WebRTCE10S: Difference between revisions

← Older edit

Media/WebRTC/WebRTCE10S (view source)

Revision as of 09:47, 24 April 2013

3,622 bytes added , 24 April 2013

→‎Network Proxies

Ekr

Confirmed users

214

edits

@@ Line 40: / Line 40: @@
 === Input Device Access (getUserMedia) ===
+We assume that camera and microphone access will be available only in the
+parent process. However, since most of the WebRTC stack will live in the
+child process, we need some mechanism for making the media available to
+it.
+The basic idea is to create a new backend for MediaManager/GetUserMedia
+that is just a proxy talking to the real media devices over IPDL. The
+incoming media frames would then be passed over the IPDL channel
+to the child process where they are injected into the MediaStreamGraph.
+This shouldn't be too complicated, but there are a few challenges:
+* Making sure that we don't do superfluous copies of the data. I understand that we can move the data via gralloc buffers, so maybe that will be OK for video. [OPEN ISSUE: Will that work for audio?]
+* Latency. We need to make sure that moving the data across the IPDL interface doesn't introduce too much latency. Hopefully this is a solved problem.
 === Output Access ===
+[TODO: Presumably this works the same as rendering now?]
+=== Hardware Acceleration ===
+In this design, we make no attempt to combine HW acceleration and capture
+or rendering. I.e., if we have a standalone HW encoder, we just insert it
+into the pipeline in place of the the SW encoder and then redirect the
+encoded media out the network interface. The same goes for decoding.
+There's no attempt made to shortcut the rest of the stack. This design
+promotes modularity, since we can just make the HW encoder look
+like another module inside of GIPS. In the longer term, we may want
+to revisit this, but I think it's the best design for now.
+Note that if we have an integrated encoder (e.g., in a camera) then
+we *can* accomodate that by just having gUM return encoded frames
+instead of I420 and then we pass those directly to the network without
+encoding them. (Though this is somewhat complicated by the need
+to render them locally in a video tag.)
 === Network Access ===
@@ Line 64: / Line 95: @@
 There are two natural designs, discussed below.
 ==== Network Proxies ====
@@ Line 76: / Line 108: @@
 and that the APIs that are required are relatively limited. I.e.,
-- List all the interfaces and their addresses
+* List all the interfaces and their addresses
-- Bind a socket to a given interface/address
+* Bind a socket to a given interface/address
-- Send a packet to a given remote address from a given socket
+* Send a packet to a given remote address from a given socket
-- Receive a packet on a given socket and learn the remote address
+* Receive a packet on a given socket and learn the remote address
 The major disadvantage of this design is that it provides the content process
@@ Line 98: / Line 130: @@
 * When a content process sends a STUN-formatted packet, it gets transmitted and added to the outstanding STUN transaction table
 * When packet is received, it is checked against the outstanding STUN transaction table. If a transaction completes, then the address is added to the permissions table.
+This would be relatively easy to implement and would provide a measure of protection
+against misuse of this interface. It would require some STUN-parsing smarts in the
+parent, but those can be kept relatively minimal.
+Detailed api proposal at [[Media/WebRTC/WebRTCE10S/NetworkProxyInterface]]
 ==== ICE In Parent ====
+The alternative design is to push the entire ICE stack into the parent process, as shown
+below.
 https://raw.github.com/mozilla/webrtc/master/planning/network-e10s-ice-parent.png
+The advantage of this design from a security perspective is that by pushing the
+connectivity checking into the parent process we completely remove the
+ability of a compromised content process to send arbitrary network
+traffic.
+The two major drawbacks of this design are:
+* The interface to the ICE stack is very complicated, which makes the
+engineering task harder.
+* The ICE stack itself is also complicated, which increases the surface area
+in the "secure" parent process.
+The ICE stack interface is found at:
+* http://hg.mozilla.org/mozilla-central/file/b553e9ca2354/media/mtransport/nricectx.h
+* http://hg.mozilla.org/mozilla-central/file/b553e9ca2354/media/mtransport/nricemediastream.h
+This API has around 20 distinct API calls, each of which will need to be separately
+remoted. A number of them have fairly complicated semantics, which would tend
+to invade the rest of the program.
+==== Recommendation ====
+In my opinion we should go for the "Network Proxies" design. It's going to be a lot simpler
+to implement than the "ICE in the parent" design and can be largely hidden by an
+already replaceable component (nr_socket_prsock.cpp) without impacting the rest
+of the code. It also lets us work in parallel because we can do a simple implementation
+without the packet filter described above and then add the packet filter transparently
+later.