Security/Sandbox: Difference between revisions

→‎Linux: Rewrite this section to try to explain why as well as what/how. Might need more hyperlinks.
(nav)
(→‎Linux: Rewrite this section to try to explain why as well as what/how. Might need more hyperlinks.)
Line 460: Line 460:
== Linux ==
== Linux ==


[http://en.wikipedia.org/wiki/Seccomp Seccomp] stands for secure computing mode. It's a simple sandboxing tool in the Linux kernel, available since Linux version 2.6.12.  When enabling seccomp, the process enters a "secure mode" where a very small number of system calls are available (exit(), read(), write(), sigreturn()).  Writing code to work in this environment is difficult; for example, dynamic memory allocation (using brk() or mmap(), either directly or to implement malloc()) is not possible.
Linux sandboxing technologies generally fall into two categories: those that act on the semantics of operations (e.g., what happens when a filesystem path is resolved) and those that affect raw system calls (e.g., what happens when syscall #83 is invoked).  There's a more
detailed explanation in [http://blog.cr0.org/2012/09/introducing-chromes-next-generation.html
the blog post announcing seccomp-bpf], which is the main syscall-filtering facility.


Seccomp-BPF is a more recent extension to seccomp, which allows filtering system calls with [http://en.wikipedia.org/wiki/Berkeley_Packet_Filter BPF (Berkeley Packet Filter)] programs. Most of our Linux user base have systems that support seccomp-bpf.
We're primarily using seccomp-bpf because it's the only thing that's available everywhere (>99% of the Linux Firefox userbase, at last count).   There are some weaknesses to using only seccomp-bpf:


These filters can be used to allow or deny an arbitrary set of system calls, as well as filter on system call arguments (numeric values only; pointer arguments can't be dereferenced).  Additionally, instead of simply terminating the process, the filter can raise a signal, which allows the signal handler to simulate the effect of a disallowed system call (or simply gather more information on the failure for debugging purposes). Seccomp-bpf is available since Linux version 3.5 and is usable on the ARM architecture since Linux version 3.10. Several backports are available for earlier kernel versions.
* The possibility of overlooking obscure corner cases, like [https://bugzilla.mozilla.org/show_bug.cgi?id=1066750 unnamed datagram sockets], that could allow privilege escalation.


For limitations that apply to the semantics of system calls (e.g., “can this process access the filesystem”, not “can this process use system call #83”) we require unprivileged user namespaces, which a large majority of desktop users don't support. Specifically: <tt>chroot()</tt>ing into a deleted directory to revoke FS access, and namespace unsharing for networking, SysV IPC if possible, and process IDs.
* The seccomp-bpf policy can act on argument values, but can't dereference pointer arguments, like the path to <tt>open()</tt>; in such cases it's necessary to intercept the syscall and message an unsandboxed broker to validate and perform the operation, which adds latency and attack surface.


* [http://mxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp seccomp-bpf filtering rules for various processes]
Semantic isolation, like changing the filesystem root or creating a separate network stack with no access to the real network (unsharing the network namespace), has traditionally required superuser privileges.  There are two ways to get around this: unprivileged user namespaces and a setuid-root helper executable.


We're using unprivileged user namespaces for additional security where available; they don't require any system-level setup, and 88% of Linux Firefoxes are on a kernel that supports them, according to telemetry.  The reason we don't require it (as, for example, [https://github.com/servo/gaol gaol] does) is the other 12%: some distributions disable the feature because it has its own security risks.  (Briefly: it makes subtle changes to authorization semantics, and it exposes kernel attack surface that's normally restricted to root; both of these have led to local privilege escalation vulnerabilities in the past.)
But shipping a setuid-root executable *also* doesn't work for everyone: we support downloading and running Firefox as a regular user, without having it installed as a system package.  There are also some changes that would be needed to how we create child processes and set up IPC communication with them, and invoke the <tt>chroot</tt> helper; and it complicates testing.  Chromium used this approach in 2009 because there was no other choice; [https://crbug.com/312380 they would prefer to remove it] but don't seem to have a timeline for doing so.
At the time of this writing (June 2017), namespace sandboxing is used only for media plugins (EME CDMs and OpenH264): content processes can't use any of it at least until audio is remoted.


= Bug Lists =
= Bug Lists =
39

edits