Media/WebRTC/WebRTCE10S: Difference between revisions

 
(12 intermediate revisions by the same user not shown)
Line 40: Line 40:
=== Input Device Access (getUserMedia) ===
=== Input Device Access (getUserMedia) ===


We assume that camera and microphone access will be available only in the
parent process. However, since most of the WebRTC stack will live in the
child process, we need some mechanism for making the media available to
it.


The basic idea is to create a new backend for MediaManager/GetUserMedia
that is just a proxy talking to the real media devices over IPDL. The
incoming media frames would then be passed over the IPDL channel
to the child process where they are injected into the MediaStreamGraph.
This shouldn't be too complicated, but there are a few challenges:
* Making sure that we don't do superfluous copies of the data. I understand that we can move the data via gralloc buffers, so maybe that will be OK for video. [OPEN ISSUE: Will that work for audio?]
* Latency. We need to make sure that moving the data across the IPDL interface doesn't introduce too much latency. Hopefully this is a solved problem.


=== Output Access ===
=== Output Access ===


[TODO: Presumably this works the same as rendering now?]


=== Hardware Acceleration ===
In this design, we make no attempt to combine HW acceleration and capture
or rendering. I.e., if we have a standalone HW encoder, we just insert it
into the pipeline in place of the the SW encoder and then redirect the
encoded media out the network interface. The same goes for decoding.
There's no attempt made to shortcut the rest of the stack. This design
promotes modularity, since we can just make the HW encoder look
like another module inside of GIPS. In the longer term, we may want
to revisit this, but I think it's the best design for now.
Note that if we have an integrated encoder (e.g., in a camera) then
we *can* accomodate that by just having gUM return encoded frames
instead of I420 and then we pass those directly to the network without
encoding them. (Though this is somewhat complicated by the need
to render them locally in a video tag.)


=== Network Access ===
=== Network Access ===
Line 64: Line 95:


There are two natural designs, discussed below.
There are two natural designs, discussed below.


==== Network Proxies ====
==== Network Proxies ====
Line 76: Line 108:
and that the APIs that are required are relatively limited. I.e.,
and that the APIs that are required are relatively limited. I.e.,


- List all the interfaces and their addresses
* List all the interfaces and their addresses
- Bind a socket to a given interface/address
* Bind a socket to a given interface/address
- Send a packet to a given remote address from a given socket
* Send a packet to a given remote address from a given socket
- Receive a packet on a given socket and learn the remote address
* Receive a packet on a given socket and learn the remote address


The major disadvantage of this design is that it provides the content process
The major disadvantage of this design is that it provides the content process
Line 98: Line 130:
* When a content process sends a STUN-formatted packet, it gets transmitted and added to the outstanding STUN transaction table
* When a content process sends a STUN-formatted packet, it gets transmitted and added to the outstanding STUN transaction table
* When packet is received, it is checked against the outstanding STUN transaction table. If a transaction completes, then the address is added to the permissions table.
* When packet is received, it is checked against the outstanding STUN transaction table. If a transaction completes, then the address is added to the permissions table.
This would be relatively easy to implement and would provide a measure of protection
against misuse of this interface. It would require some STUN-parsing smarts in the
parent, but those can be kept relatively minimal.
Detailed api proposal at [[Media/WebRTC/WebRTCE10S/NetworkProxyInterface]]


==== ICE In Parent ====
==== ICE In Parent ====
The alternative design is to push the entire ICE stack into the parent process, as shown
below.


https://raw.github.com/mozilla/webrtc/master/planning/network-e10s-ice-parent.png
https://raw.github.com/mozilla/webrtc/master/planning/network-e10s-ice-parent.png
The advantage of this design from a security perspective is that by pushing the
connectivity checking into the parent process we completely remove the
ability of a compromised content process to send arbitrary network
traffic.
The two major drawbacks of this design are:
* The interface to the ICE stack is very complicated, which makes the
engineering task harder.
* The ICE stack itself is also complicated, which increases the surface area
in the "secure" parent process.
The ICE stack interface is found at:
* http://hg.mozilla.org/mozilla-central/file/b553e9ca2354/media/mtransport/nricectx.h
* http://hg.mozilla.org/mozilla-central/file/b553e9ca2354/media/mtransport/nricemediastream.h
This API has around 20 distinct API calls, each of which will need to be separately
remoted. A number of them have fairly complicated semantics, which would tend
to invade the rest of the program.
==== Recommendation ====
In my opinion we should go for the "Network Proxies" design. It's going to be a lot simpler
to implement than the "ICE in the parent" design and can be largely hidden by an
already replaceable component (nr_socket_prsock.cpp) without impacting the rest
of the code. It also lets us work in parallel because we can do a simple implementation
without the packet filter described above and then add the packet filter transparently
later.
Confirmed users
214

edits