Safe Penetration Testing Labs · advanced · ~30 min
Refuse any syscall not on a written allow-list.
seccomp puts a Berkeley Packet Filter program on the syscall path of the calling process. The filter inspects syscall number + arguments and returns an action: allow, kill, errno, trap.
Defense-in-depth. Even with a compromise of your program, the attacker can only invoke syscalls on the allow-list. No execve? They can't shell out.
libseccomp. High-level wrapper. seccomp_init, seccomp_rule_add, seccomp_load. Compile + link with -lseccomp.
Default action. SCMP_ACT_KILL_PROCESS is the safest. SCMP_ACT_ERRNO(EPERM) returns an error instead.
The minimum allow-list. Even a Hello-World needs ~10 syscalls: read, write, exit_group, brk, mmap, mprotect, futex, rt_sigreturn, fstat, ioctl. Enumerate with strace.
Pentester mindset. Knowing the allow-list is auditing a sandbox. Missing syscalls = available escape. The runner image in our project has a missing-by-default seccomp (we use Docker's default profile); explicit allow-listing is harder.
Defensive coding habit. Develop the allow-list under strace -c. Test with one syscall removed at a time; the program should die instantly.
#include <seccomp.h>
scmp_filter_ctx seccomp_init(uint32_t def_action);
int seccomp_rule_add(scmp_filter_ctx, uint32_t action, int syscall, unsigned arg_cnt, ...);
int seccomp_load(scmp_filter_ctx);
seccomp-BPF lets a process attach a Berkeley Packet Filter rule to
its own syscall path. Every subsequent syscall is checked against the
filter. The standard pattern: build a filter from libseccomp, allow only
the ~20 syscalls your program actually uses, and any deviation kills the
process. This is how Docker, Chrome's sandbox, and OpenSSH's privsep work.
#include <seccomp.h>
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_load(ctx);
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
seccomp_load(ctx); /* filter live from here */
strace ./prog 2>&1 | grep ENOSYS shows refused syscalls. dmesg | tail may show kernel SIGSYS reports.
seccomp doesn't replace bounds checks — it limits the blast radius if memory safety fails.
Chrome sandbox, OpenSSH privsep, Docker default profile, Firefox content processes, every modern hardened daemon.
seccomp = BPF on the syscall path. Allow-list, kill-process default, refine under strace. Standard defense-in-depth.