Container Runtime Security: seccomp, AppArmor, and Falco
Harden containers with seccomp profiles, AppArmor policies, and Falco runtime rules. Block syscall abuse, enforce least privilege, and detect attacks in real time.
Container Runtime Security: seccomp, AppArmor, and Falco
Container images define what your application is. Container runtime security defines what it can do. Without runtime controls, a compromised container has full access to the Linux kernel's system call interface — the same access as any process running on the host. seccomp, AppArmor, and Falco are the three layers that restrict and monitor what containers actually do at runtime, independent of what the image contains.
Understanding the Threat Model
When an attacker gains code execution inside a container, they immediately try to expand their access. Common techniques include:
- Using
ptracesyscall to attach to other processes - Calling
mountto access host filesystems - Using
unshareto escape namespace boundaries - Exploiting kernel vulnerabilities via unrestricted syscall access
- Reading sensitive files from
/procor/sys
Each of these requires specific Linux syscalls. Restricting which syscalls a container can make dramatically reduces the attack surface for privilege escalation.
seccomp: Syscall Filtering
seccomp (Secure Computing Mode) filters the syscalls a process is allowed to make. Docker and Kubernetes both support seccomp profiles that specify an allow list (or block list) of syscalls.
Docker's Default seccomp Profile
Docker ships with a default seccomp profile that blocks ~44 syscalls including ptrace, reboot, mount, keyctl, and others commonly used in container escapes. You can verify it is applied:
# Inspect a running container's seccomp status
docker inspect CONTAINER_ID | jq '.[0].HostConfig.SecurityOpt'
# Should show: ["seccomp=..."]
The default profile is reasonable but not tight. A Node.js web application does not need syscalls like prctl, process_vm_readv, or perf_event_open. A custom profile can be significantly more restrictive.
Writing a Custom seccomp Profile
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{
"names": [
"read", "write", "close", "fstat", "mmap", "mprotect",
"munmap", "brk", "rt_sigaction", "rt_sigprocmask",
"ioctl", "access", "pipe", "select", "sched_yield",
"mremap", "madvise", "poll", "epoll_wait", "epoll_create",
"epoll_ctl", "clone", "execve", "wait4", "kill",
"getpid", "socket", "connect", "accept", "sendto",
"recvfrom", "bind", "listen", "getsockname", "getpeername",
"socketpair", "setsockopt", "getsockopt", "exit", "futex",
"getcwd", "openat", "getdents64", "lstat", "stat",
"open", "exit_group", "set_robust_list", "prlimit64",
"getrandom", "sendmsg", "recvmsg"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
This allowlist approach (defaultAction SCMP_ACT_ERRNO) only permits the syscalls your application actually needs. Generate a baseline by running your application under strace:
strace -f -e trace=all -o syscalls.log node server.js
grep -oP 'SYS_\K\w+' syscalls.log | sort -u
Apply the profile to a container:
docker run --security-opt seccomp=./profile.json node:20 node server.js
Kubernetes seccomp
In Kubernetes 1.19+, seccomp profiles are generally available:
apiVersion: v1
kind: Pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/node-server.json
containers:
- name: app
image: myapp:latest
The profile file must be present on the node at /var/lib/kubelet/seccomp/profiles/node-server.json.
AppArmor: Mandatory Access Control
AppArmor (Application Armor) enforces mandatory access control (MAC) policies at the kernel level. Where seccomp restricts syscalls, AppArmor restricts what files, network resources, and capabilities a process can access — regardless of file permissions.
Writing an AppArmor Profile for Docker
#include <tunables/global>
profile docker-node-app flags=(attach_disconnected, mediate_deleted) {
#include <abstractions/base>
#include <abstractions/nameservice>
network inet tcp,
network inet udp,
network inet6 tcp,
# Allow read access to app files
/app/** r,
/app/node_modules/** r,
# Allow write to temp and logs only
/tmp/** rw,
/var/log/app/** w,
# Deny access to sensitive paths
deny /etc/shadow r,
deny /proc/sysrq-trigger w,
deny /sys/** w,
# Allow necessary capabilities
capability net_bind_service,
capability setuid,
capability setgid,
}
Load and apply the profile:
sudo apparmor_parser -r -W /etc/apparmor.d/docker-node-app
docker run --security-opt apparmor=docker-node-app myapp:latest
Falco: Runtime Threat Detection
seccomp and AppArmor are preventive controls — they block things before they happen. Falco is a detective control: it monitors syscall activity in real time and fires alerts when it detects suspicious behavior patterns.
Installing Falco
# Install Falco on a Kubernetes cluster via Helm
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
--set driver.kind=ebpf \
--set falcosidekick.enabled=true \
--set falcosidekick.config.slack.webhookurl=https://hooks.slack.com/...
The eBPF driver is preferred for modern kernels — it does not require a kernel module and is safer to deploy in production.
Writing Falco Rules
Falco rules use a YAML DSL. Each rule defines a condition (evaluated against syscall events) and an output (the alert message):
- rule: Shell Spawned in Container
desc: Detect shell execution inside a container
condition: >
container.id != host and
proc.name in (shell_binaries) and
container.image.repository != allowed_shell_runners
output: >
Shell spawned in container
(user=%user.name container=%container.name
image=%container.image.repository
command=%proc.cmdline)
priority: WARNING
tags: [container, shell, mitre_execution]
- rule: Sensitive File Read in Container
desc: Detect reads of sensitive files inside containers
condition: >
container and
open_read and
fd.name in (sensitive_files)
output: >
Sensitive file read in container
(file=%fd.name user=%user.name
container=%container.name)
priority: ERROR
tags: [container, filesystem]
- rule: Container Running as Root
desc: Detect container process running as UID 0
condition: >
container and
proc.is_container_healthcheck = false and
user.uid = 0 and
not allowed_root_containers
output: >
Container running as root
(container=%container.name image=%container.image.repository)
priority: NOTICE
Falco ships with ~100 default rules covering common attack techniques including privilege escalation, data exfiltration, reverse shells, and credential access.
Hardening the Container Itself
Beyond these three controls, container security begins with the Dockerfile and pod spec:
# Pod security context
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 2000
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
add: ["NET_BIND_SERVICE"] # only if needed
volumeMounts:
- mountPath: /tmp
name: tmp-volume
Running as a non-root user with readOnlyRootFilesystem: true and allowPrivilegeEscalation: false removes the most common privilege escalation paths. Mount a writable emptyDir volume for /tmp if your application needs to write temporary files.
Together, seccomp restricts syscalls, AppArmor restricts file and network access, Falco detects anomalous behavior, and hardened pod specs enforce least-privilege execution. These controls stack — an attacker who bypasses one still faces the others.