Security/Sandbox/Seccomp: Difference between revisions
Gdestuynder (talk | contribs) |
|||
Line 3: | Line 3: | ||
=== Intro to seccomp and seccomp-bpf === | === Intro to seccomp and seccomp-bpf === | ||
[http://en.wikipedia.org/wiki/Seccomp Seccomp] stands for secure computing mode. It's a simple sandboxing tool in the Linux kernel, available since Linux version 2.6.12. | [http://en.wikipedia.org/wiki/Seccomp Seccomp] stands for secure computing mode. It's a simple sandboxing tool in the Linux kernel, available since Linux version 2.6.12. When enabling seccomp, the process enters a "secure mode" where a very small number of system calls are available (exit(), read(), write(), sigreturn()). Writing code to work in this environment is difficult; for example, dynamic memory allocation (using brk() or mmap(), either directly or to implement malloc()) is not possible. | ||
When enabling seccomp, the process | |||
Seccomp- | Seccomp-BPF is a more recent extension to seccomp, which allows filtering system calls with [http://en.wikipedia.org/wiki/Berkeley_Packet_Filter BPF (Berkeley Packet Filter)] programs. | ||
These filter | These filters can be used to allow or deny an arbitrary set of system calls, as well as filter on system call arguments (numeric values only; pointer arguments can't be dereferenced). Additionally, instead of simply terminating the process, the filter can raise a signal, which allows the signal handler to simulate the effect of a disallowed system call (or simply gather more information on the failure for debugging purposes). Seccomp-bpf is available since Linux version 3.5 and is usable on the ARM architecture since Linux version 3.10. Several backports are available for earlier kernel versions. | ||
We have backports for 3.0.x kernels, 3.4 kernels, and 2.6.29 kernels (see bug [https://bugzilla.mozilla.org/show_bug.cgi?id=790923 790923] and | We have backports for 3.0.x kernels, 3.4 kernels, and 2.6.29 kernels (see bug [https://bugzilla.mozilla.org/show_bug.cgi?id=790923 790923] and its children). No backport is necessary for kernels 3.10 and above. | ||
These configuration options are required to be present in the kernel's config at compile time: | These configuration options are required to be present in the kernel's config at compile time: | ||
Revision as of 00:32, 26 August 2014
What is Seccomp
Intro to seccomp and seccomp-bpf
Seccomp stands for secure computing mode. It's a simple sandboxing tool in the Linux kernel, available since Linux version 2.6.12. When enabling seccomp, the process enters a "secure mode" where a very small number of system calls are available (exit(), read(), write(), sigreturn()). Writing code to work in this environment is difficult; for example, dynamic memory allocation (using brk() or mmap(), either directly or to implement malloc()) is not possible.
Seccomp-BPF is a more recent extension to seccomp, which allows filtering system calls with BPF (Berkeley Packet Filter) programs. These filters can be used to allow or deny an arbitrary set of system calls, as well as filter on system call arguments (numeric values only; pointer arguments can't be dereferenced). Additionally, instead of simply terminating the process, the filter can raise a signal, which allows the signal handler to simulate the effect of a disallowed system call (or simply gather more information on the failure for debugging purposes). Seccomp-bpf is available since Linux version 3.5 and is usable on the ARM architecture since Linux version 3.10. Several backports are available for earlier kernel versions.
We have backports for 3.0.x kernels, 3.4 kernels, and 2.6.29 kernels (see bug 790923 and its children). No backport is necessary for kernels 3.10 and above. These configuration options are required to be present in the kernel's config at compile time:
CONFIG_SECCOMP=y CONFIG_SECCOMP_FILTER=y
How do I call seccomp-bpf ?
Seccomp-bpf is turned on through the prctl() system call (process control).
The call looks like that:
#include <sys/prctl.h> #include <linux/seccomp.h> [...] prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bpf_prog)
bpf_prog is a BPF structure which contains the rules used by seccomp-bpf - i.e., which system calls are allowed or not. To ensure that you can't execute this call again with a more permissive filter program (bpf_prog), there is an additional call to make, no new privileges, which ensures it's only possible to tighten the filter, never to extend it. This means you could first remove access to one system call, then later on in the process lifetime, remove access to more system calls, for example. Here's the same code, with the no new privileges call:
#include <sys/prctl.h> #include <linux/seccomp.h> [...] prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bpf_prog)
Construct a basic filter
The filter program can be constructed using BPF filter macros, which are listed in linux's filter.h. Here's a list of commonly used macros for seccomp-bpf:
#include <linux/filter.h> [...] #define syscall_nr (offsetof(struct seccomp_data, nr)) #define arch_nr (offsetof(struct seccomp_data, arch)) #define VALIDATE_ARCHITECTURE \ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, arch_nr), \ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ARCH_NR, 1, 0), \ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL) #define EXAMINE_SYSCALL \ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr) #define ALLOW_SYSCALL(name) \ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##name, 0, 1), \ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW) #define KILL_PROCESS \ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
In this example, you could have a filter that validates the architecture you run on supports seccomp-bpf, then allow a list of system calls, and if none match the list, kill the process.
Use in Gecko
Gecko on the desktop and in B2G use seccomp when running on Linux. The code is in mozilla-central at /security/sandbox/linux.
File security/sandbox/linux/seccomp_filter.h
Contains a whitelist of allowed system calls.
File security/sandbox/linux/Sandbox.cpp
Contains the sandbox installation code, called by:
SetCurrentProcessSandbox(void)
Seccomp reporter
The reporter is an option which will log exactly which system call has been denied by seccomp. It is enabled by default in engineering builds ("eng" builds). The option is --content-sandbox-reporter.
When seccomp denies a system call, it sends a signal (SIGSYS) which is caught by the reporter. The reporter then kills itself (and thus the content-process). The report kill itself because the content process may not handle the denied system call properly and be in a non-working state anyway.
When the reporter is enabled, the log message looks like this:
seccomp sandbox violation: pid %u, syscall %lu, args %lu %lu %lu %lu %lu. Killing Process.
How do I check my processes are sandboxed by seccomp?
There is a seccomp flag in the process status:
Replace <pid> by the process's PID.
grep Seccomp /proc/<pid>/status
- 0: Seccomp is not enabled (bad!)
- 1: Seccomp is enabled (shouldn't happen)
- 2: Seccomp-bpf is enabled (correct)
Alternatively, on recent (1.4+) b2g versions:
- b2g-ps, look at the SEC field (same meanings as above, 2 means sandboxed)
On B2G, you can find out your PIDs by using the command b2g-ps
How do I disable the sandbox temporarily?
export MOZ_DISABLE_CONTENT_SANDBOX=1
and restart Firefox. In B2G's case, this looks like:
adb shell stop b2g export MOZ_DISABLE_CONTENT_SANDBOX=1 /system/bin/b2g.sh
More information
See also the kernel documentation: