Safe Penetration Testing Labs · advanced · ~30 min

seccomp-BPF — allow-list system calls

Refuse any syscall not on a written allow-list.

Overview

seccomp puts a Berkeley Packet Filter program on the syscall path of the calling process. The filter inspects syscall number + arguments and returns an action: allow, kill, errno, trap.

Why it matters

Defense-in-depth. Even with a compromise of your program, the attacker can only invoke syscalls on the allow-list. No execve? They can't shell out.

Core concepts

libseccomp. High-level wrapper. seccomp_init, seccomp_rule_add, seccomp_load. Compile + link with -lseccomp.

Default action. SCMP_ACT_KILL_PROCESS is the safest. SCMP_ACT_ERRNO(EPERM) returns an error instead.

The minimum allow-list. Even a Hello-World needs ~10 syscalls: read, write, exit_group, brk, mmap, mprotect, futex, rt_sigreturn, fstat, ioctl. Enumerate with strace.

Pentester mindset. Knowing the allow-list is auditing a sandbox. Missing syscalls = available escape. The runner image in our project has a missing-by-default seccomp (we use Docker's default profile); explicit allow-listing is harder.

Defensive coding habit. Develop the allow-list under strace -c. Test with one syscall removed at a time; the program should die instantly.

Syntax notes

#include <seccomp.h>
scmp_filter_ctx seccomp_init(uint32_t def_action);
int seccomp_rule_add(scmp_filter_ctx, uint32_t action, int syscall, unsigned arg_cnt, ...);
int seccomp_load(scmp_filter_ctx);

Lesson

seccomp-BPF lets a process attach a Berkeley Packet Filter rule to its own syscall path. Every subsequent syscall is checked against the filter. The standard pattern: build a filter from libseccomp, allow only the ~20 syscalls your program actually uses, and any deviation kills the process. This is how Docker, Chrome's sandbox, and OpenSSH's privsep work.

Code examples

#include <seccomp.h>
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read),  0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_load(ctx);

Line by line

scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL_PROCESS);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read),  0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
seccomp_load(ctx);                                  /* filter live from here */

Common mistakes

  • Forgetting to allow exit_group / rt_sigreturn / brk. Your program crashes weirdly.

Debugging tips

strace ./prog 2>&1 | grep ENOSYS shows refused syscalls. dmesg | tail may show kernel SIGSYS reports.

Memory safety

seccomp doesn't replace bounds checks — it limits the blast radius if memory safety fails.

Real-world uses

Chrome sandbox, OpenSSH privsep, Docker default profile, Firefox content processes, every modern hardened daemon.

Practice tasks

  1. Allow-list 5 syscalls; verify your toy program runs. 2. Add a 6th, observe success. 3. Remove read, observe SIGSYS.

Summary

seccomp = BPF on the syscall path. Allow-list, kill-process default, refine under strace. Standard defense-in-depth.

Practice with these exercises