Sandboxing AI Agents in Production: Runtime Isolation Strategies That Actually Work
A deep technical guide to sandboxing AI agent tool execution in production: gVisor, Firecracker MicroVMs, WebAssembly sandboxing, seccomp syscall filtering, network namespace isolation, and container hardening — with real performance overhead data.
Sandboxing AI Agents in Production: Runtime Isolation Strategies That Actually Work
AI agents that execute code, run shell commands, query databases, or call external APIs need runtime isolation. Without it, a single successful prompt injection can escalate to OS-level code execution on the host system. This is not a theoretical risk — it has been demonstrated in controlled research environments and exploited in production deployments.
The problem is that most organizations know they need sandboxing but implement it incorrectly. A Docker container is not a sandbox. A virtual machine is not inherently well-configured. Seccomp filtering with a permissive profile is not meaningful isolation. The surface between the words "we sandbox our agents" and "we have effective runtime isolation" is enormous, and most organizations have not crossed it.
This document provides the definitive technical reference for AI agent sandboxing in production. We cover the technology stack from operating-system primitives up through higher-level isolation architectures, with performance overhead analysis for each approach, failure modes, and the specific conditions under which each technique is appropriate.
TL;DR
- Docker containers are not sandboxes — container escapes are routine because containers share the host kernel. Effective sandboxing requires kernel isolation.
- Three production-grade isolation technologies for AI agent tool execution: gVisor (kernel emulation), Firecracker MicroVMs (lightweight virtualization), and WebAssembly (capability-based sandboxing).
- Seccomp-BPF syscall filtering is a necessary complement to all three — define the minimum syscall surface for each workload and enforce it.
- Network namespace isolation is mandatory for any agent that can make external API calls — default-deny egress, allowlist-based access.
- Filesystem mount restrictions (read-only root, tmpfs for writes, no host mounts) prevent the most common persistence mechanisms.
- Performance overhead: gVisor ~2-10x syscall latency, Firecracker <5ms VM start time (cold), Wasm <1ms startup, comparable to native for compute-intensive workloads.
- Every sandbox has failure modes — understand them, test against them, and layer defenses accordingly.
Why Docker Containers Are Not Sandboxes
The single most common misconception in AI agent security is that deploying agents in Docker containers constitutes sandboxing. It does not. Understanding why requires understanding the Docker isolation model.
Docker containers provide namespace isolation — processes in a container see an isolated view of the filesystem, network, process table, and user/group IDs. They do not provide kernel isolation. All containers on a host share the same Linux kernel.
This means that a process with sufficient privileges inside a container can interact with the shared kernel and potentially escape to the host. Container escape techniques are extensively documented and regularly demonstrated:
Privileged container escapes. A container run with --privileged has access to all Linux capabilities and can mount the host filesystem, manipulate kernel modules, and trivially escape to the host. Privilege containers are inappropriate for any agent workload — ever. Yet they appear in production AI agent deployments because they "make things work" during development.
Kernel CVE exploitation. Because containers share the host kernel, a kernel vulnerability is exploitable from within a container. The pace of kernel CVE publication means that unpatched hosts are routinely exploitable from containerized workloads.
runc and containerd vulnerabilities. The container runtime itself is an attack surface. CVE-2019-5736 (runc) allowed container escape via manipulation of the container initialization process. CVE-2020-15257 (containerd) allowed privilege escalation via the containerd API socket.
Volume mount exploitation. Containers with host filesystem mounts — even read-only mounts — create paths for information disclosure and, in some configurations, host modification.
Device access exploitation. Containers with access to host devices (/dev/sda, /dev/mem, /dev/kmem) can directly read and write to host storage or memory.
For AI agent deployments where tool execution can be influenced by attacker-controlled inputs, the container model is insufficient. The kernel isolation gap is the critical failure point.
Technology Option 1: gVisor
What it is: gVisor is an application kernel developed by Google that intercepts Linux system calls and implements them in a user-space Go process called the Sentry. Instead of passing system calls to the host kernel directly, container processes pass them to the Sentry, which either handles them entirely in user space or passes a filtered subset to the host kernel.
Why it matters for AI agent security: With gVisor, even if an agent execution environment is fully compromised, the attacker cannot escalate to the host kernel via system call exploitation. The Sentry mediates all kernel interactions, providing a fundamentally different isolation boundary than the standard container model.
Architecture components:
Sentry: The application kernel. Implements approximately 200 Linux syscalls in user space. Written in Go, which eliminates the memory safety vulnerabilities common in kernel C code.
Gofer: Handles filesystem operations. The Sentry communicates with the Gofer via a 9P protocol for filesystem access. This means filesystem access goes through an additional mediation layer.
Platform: The mechanism by which the Sentry intercepts system calls. gVisor supports two platforms: KVM (uses hardware virtualization for syscall interception, better performance) and ptrace (uses Linux ptrace for syscall interception, more portable but slower).
Integration with container runtimes: gVisor ships as runsc, an OCI-compatible container runtime that integrates with Docker and Kubernetes via the RuntimeClass API. From an operational perspective, deploying with gVisor requires changing the runtime from runc to runsc — no changes to container images or orchestration manifests.
Performance overhead analysis:
gVisor's performance overhead is concentrated in syscall-intensive operations. Representative measurements (Intel Xeon E5-2690, Linux 5.15, gVisor 20240101):
| Operation | Native | gVisor (ptrace) | gVisor (KVM) | Overhead (KVM) |
|---|---|---|---|---|
| Syscall round-trip | 0.3 μs | 4.2 μs | 1.1 μs | 3.7x |
| File open() | 2.1 μs | 18.4 μs | 7.3 μs | 3.5x |
| Network connect() | 12 μs | 82 μs | 41 μs | 3.4x |
| malloc (1MB) | 0.8 ms | 1.2 ms | 0.9 ms | 1.1x |
| SHA-256 (1MB) | 0.4 ms | 0.41 ms | 0.40 ms | 1.0x |
For AI agent tool execution — which is typically IO-bound (API calls, database queries) with minimal compute — the overhead is primarily in IO syscall latency. API calls with 100ms+ network round-trip times absorb gVisor's syscall overhead into measurement noise. Compute-intensive tools (code execution, data processing) run at near-native speed.
Failure modes:
Syscall compatibility gaps. gVisor implements approximately 200 of the ~300+ Linux syscalls. Tools that use unsupported syscalls will fail with ENOSYS. Review the gVisor compatibility matrix before deploying workloads.
Sentry process compromise. The Sentry is a complex Go application. A vulnerability in the Sentry itself could allow escape from the gVisor boundary. The Google Vulnerability Rewards Program offers bounties for gVisor escape techniques; the surface is actively monitored.
Platform-specific issues. gVisor with KVM requires nested virtualization in cloud environments (supported by AWS, GCP, Azure for specific instance types but not universally available). Fallback to ptrace significantly increases overhead.
Filesystem performance. The 9P Gofer introduces substantial overhead for filesystem-intensive operations. Applications that perform many small file reads/writes will see significant performance degradation. Mitigation: use tmpfs for in-container temporary storage.
Technology Option 2: Firecracker MicroVMs
What it is: Firecracker is a Virtual Machine Monitor (VMM) developed by AWS for Lambda and Fargate. It creates minimal virtual machines — MicroVMs — using Linux KVM, with a simplified device model and an API designed for programmatic VM lifecycle management.
Why it matters for AI agent security: Unlike gVisor (which shares the host kernel via a mediated interface), Firecracker creates VMs with their own kernel instances. The isolation boundary is hardware-level VM isolation, the same isolation that cloud providers use to separate customer workloads.
Architecture:
Firecracker VMs boot a minimal Linux kernel with a stripped-down device model — network interface, block device, serial port, balloon device. The VM has no emulation of complex hardware (BIOS, PCI bus, ACPI) that has historically been a rich attack surface in traditional hypervisors.
Each Firecracker VM runs in a dedicated firecracker process on the host. The attack path from VM to host requires exploiting the host kernel's KVM interface — a much more restricted surface than shared-kernel container escape paths.
Integration patterns for AI agents:
AI agent tool execution with Firecracker typically uses one of two patterns:
Per-invocation VMs. A Firecracker VM is started for each tool invocation and torn down after the tool completes. This provides maximum isolation — each invocation runs in a clean environment with no state from previous invocations. Cold start time with pre-warmed kernel snapshots is approximately 125ms.
Pool-based VMs. A pool of pre-started VMs waits for tool invocations. Invocations are routed to pool members, which are cleaned and returned to the pool after use (or torn down and replaced if cleaning is not sufficient). This reduces the per-invocation start time to <5ms for warm VMs.
Performance overhead analysis:
Firecracker's performance advantage over traditional hypervisors (QEMU/KVM) is significant. Its overhead compared to native container execution:
| Metric | Docker (runc) | Firecracker (cold) | Firecracker (warm) |
|---|---|---|---|
| Start time | 50-200ms | 125ms | <5ms |
| Memory overhead per VM | ~50MB (container overhead) | ~5-15MB | ~5-15MB |
| CPU overhead (steady state) | <1% | <2% | <2% |
| Network throughput | Near-native | Near-native | Near-native |
| Storage IOPS | Near-native | Near-native | Near-native |
For AI agent tool execution, the dominant cost is typically the tool's API call latency (50-500ms), not the VM overhead. Warm Firecracker VMs add <5ms to each invocation — acceptable for virtually all tool types.
Snapshot-based acceleration:
Firecracker supports VM snapshots — serialized VM state including memory, CPU registers, and device state. A snapshotted VM can be restored to running state in <5ms. This is the technology that enables pool-based deployment with fast warm start times.
For AI agent tool execution, the typical pattern is:
- Start a base Firecracker VM, install required tools, take a snapshot.
- Pool manager maintains N running VMs restored from the snapshot.
- Tool invocations are served by pool members; after each invocation, the VM is restored from snapshot (clearing all state) and returned to the pool.
Failure modes:
KVM vulnerability exploitation. The isolation boundary is the host kernel's KVM interface. KVM vulnerabilities do exist — CVE-2021-3653, CVE-2021-3656, CVE-2022-0070 are examples of KVM escape techniques. Host kernels must be patched promptly.
Firecracker VMM process compromise. The firecracker process running on the host is part of the attack surface. A vulnerability in the Firecracker VMM process could allow VM escape. Firecracker's simplified device model significantly reduces this surface compared to QEMU, but it is not zero.
Misconfigured network access. Firecracker VMs with improperly configured network access (too-permissive iptables rules, host-bridged networking) can access internal network services. Network isolation must be configured explicitly.
Technology Option 3: WebAssembly Sandboxing
What it is: WebAssembly (Wasm) is a binary instruction format designed for safe execution of arbitrary code. Its security model is capability-based — Wasm modules can only access resources (files, network, environment variables) that are explicitly granted to them at instantiation time.
Why it matters for AI agent security: Wasm provides a different isolation model than gVisor or Firecracker. Rather than isolating an existing OS process, Wasm executes code within a defined capability envelope. A tool implemented as a Wasm module cannot access any resource it was not explicitly granted — there is no kernel interface to exploit.
WebAssembly System Interface (WASI):
WASI defines a standard interface between Wasm modules and the host system — equivalent to POSIX for traditional applications. WASI capabilities are granted at the file-descriptor level for filesystem access and at the socket level for network access. A Wasm tool can only read files in paths it has been granted access to; it cannot enumerate the filesystem arbitrarily.
Runtimes for production use:
Wasmtime: The reference Wasm runtime from the Bytecode Alliance. Production-grade, actively maintained, used by Fastly's edge compute platform. Supports WASI and WASIp2 (the updated capability model).
Wasmer: An alternative Wasm runtime with additional platform support and plugin capabilities. Used in several production AI agent platforms.
wasm-micro-runtime (WAMR): Intel's ultra-lightweight Wasm runtime designed for resource-constrained environments. Startup time <1ms; memory overhead ~50KB per instance.
Performance characteristics:
| Metric | Native | Wasmtime | WAMR |
|---|---|---|---|
| Startup time | Variable | <1ms | <1ms |
| Memory overhead | Variable | ~2MB | ~50KB |
| Compute throughput | 1.0x | 0.85-0.95x | 0.80-0.90x |
| Syscall overhead | 1.0x | 1.1-1.3x | 1.1-1.3x |
Wasm overhead is consistently low for compute-bound workloads. The limitation is IO — not all WASI implementations have full networking support, and the capability model requires explicit grants for each resource, which adds coordination overhead.
Integration patterns for AI agent tools:
AI agent tools implemented as Wasm modules receive a capability set at invocation time: which files they can read, which files they can write, which network addresses they can connect to, which environment variables they can read. The agent runtime enforces these capabilities at the Wasm host layer — no Wasm syscall escape can exceed the declared capability envelope.
Failure modes:
Runtime vulnerabilities. Wasm runtimes have had CVEs — Wasmtime has had a handful of memory safety issues in the JIT compiler path. Runtime version pinning and prompt patching are required.
Capability over-grant. If the capability grants are too broad — filesystem access to /, unrestricted network access — the Wasm isolation model provides no benefit. Capability grants must be defined by the specific tool's requirements.
Language support limitations. Not all languages compile efficiently to Wasm. Python is particularly challenging — CPython can be compiled to Wasm but with significant overhead. Tools written in C, C++, Rust, Go, and AssemblyScript compile well.
Seccomp-BPF Syscall Filtering
Seccomp (Secure Computing Mode) is a Linux kernel facility for restricting which system calls a process can make. With the BPF (Berkeley Packet Filter) extension, seccomp allows arbitrary filter programs to be applied to syscall arguments, enabling fine-grained allow/deny decisions for every syscall.
Why it matters for AI agent security:
Even with gVisor or Firecracker, seccomp filtering provides an additional defense layer. gVisor's Sentry itself runs under seccomp restrictions. In container deployments without gVisor, seccomp filtering is one of the few meaningful kernel-level controls available.
Profile development methodology:
Developing an effective seccomp profile requires identifying the exact set of syscalls used by the agent tool workload. The correct approach:
- Run the tool workload under
strace -cto collect a syscall frequency count. - Add a safety margin — some syscalls appear only in error handling paths or infrequent code paths.
- Start with the Docker default seccomp profile (which blocks ~44 high-risk syscalls) and refine from there.
- Test the refined profile in staging before production deployment.
High-risk syscalls for AI agent workloads:
The following syscalls are disproportionately represented in container escape and privilege escalation techniques. They should be in every AI agent workload's deny list unless the tool has an explicit operational requirement:
ptrace— process tracing; rarely needed by agent tools, frequently used in escape techniquesmount— filesystem mounting; never needed by agent toolspivot_root— filesystem root change; never neededunshare— namespace creation; never neededclonewith CLONE_NEWUSER — user namespace creation; major privilege escalation vectorkeyctl,add_key,request_key— kernel keyring access; not needed by agent toolsbpf— eBPF operations; not needed by most agent tools (and a significant attack surface)perf_event_open— performance monitoring; not needed; has been used in side-channel attacks
Profile formats and tooling:
Docker accepts seccomp profiles as JSON files passed via --security-opt seccomp=/path/to/profile.json. Kubernetes accepts seccomp profiles via the SecurityContext annotation or the Seccomp type in pod spec.
Tools for seccomp profile generation:
syscall2seccomp— generates seccomp profiles from strace outputoci-seccomp-bpf-hook— generates profiles by monitoring actual container syscalls during a recording run- Falco — can be used to generate baseline profiles from runtime observation
Network Namespace Isolation
AI agents that can make external API calls require network isolation that limits what they can reach. Network namespaces provide process-level isolation of the network stack — processes in separate network namespaces see separate network interfaces, routing tables, and firewall rules.
For container deployments:
The simplest and most effective pattern is to run agent tool execution containers in a dedicated network namespace with:
- No access to the host network stack
- Controlled access to an egress proxy
- The egress proxy enforcing an allowlist of permitted destinations
All outbound traffic from the agent tool container is routed through the egress proxy. The proxy's allowlist defines what the agent can reach. DNS queries are resolved by a DNS resolver under operator control — not a public resolver — to enable DNS-level filtering.
Egress allowlist construction:
Build the allowlist from your agent tool's declared external dependencies:
- API endpoints the tool calls (specific hostnames, not IP ranges)
- Package registries if the tool installs packages at runtime (pin specific versions to avoid surprise dependencies)
- Data sources the tool is expected to query
Start with the minimal set and expand based on operational needs. Never use IP-based allowlisting for SaaS APIs — IP addresses change. Use hostname-based allowlisting with DNS resolution controlled by your proxy.
Default-deny firewall configuration:
# iptables rules for agent tool container network namespace
# Default deny all outbound
iptables -P OUTPUT DROP
# Allow established connections (responses to allowed outbound)
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow DNS to controlled resolver only
iptables -A OUTPUT -d <resolver-ip> -p udp --dport 53 -j ACCEPT
# Allow outbound to egress proxy only
iptables -A OUTPUT -d <proxy-ip> -p tcp --dport 3128 -j ACCEPT
# Log everything else
iptables -A OUTPUT -j LOG --log-prefix "AGENT-BLOCKED-EGRESS: "
Filesystem Mount Restrictions
Filesystem access control is a critical hardening layer that prevents common persistence and exfiltration techniques.
Read-only root filesystem:
Mount the container's root filesystem as read-only. This prevents:
- Installation of persistent malware
- Modification of system binaries
- Creation of cron jobs or startup scripts
- Log tampering
For tools that require write access to specific paths, mount those paths with explicit tmpfs volumes:
# Kubernetes pod spec
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
- name: agent-workspace
mountPath: /workspace
volumes:
- name: tmp
emptyDir: {}
- name: agent-workspace
emptyDir:
medium: Memory # tmpfs — lost on container restart
No host mounts:
Agent tool containers should have no mounts from the host filesystem. Host mounts are the number-one source of container escape via filesystem access — they provide direct access to host files including /etc/passwd, SSH keys, Docker socket, and cloud provider credential files.
Sensitive path blocking:
Even within the container filesystem, block access to paths that agent tools have no business reading:
/proc/sys— kernel tunables/sys/kernel— kernel parameters/var/run/secrets— service account tokens in Kubernetes
Composing the Isolation Stack
Production AI agent deployments should use a defense-in-depth isolation stack, not a single isolation technology. The recommended composition:
High-security agent tool execution (code execution, shell commands, untrusted code):
- Firecracker MicroVMs for kernel isolation
- Seccomp-BPF with a minimal syscall profile
- Network namespace with default-deny egress
- Read-only root filesystem with tmpfs workspace
- Per-invocation VM teardown with snapshot restoration
Medium-security agent tool execution (external API calls, data processing):
- gVisor for kernel-level syscall mediation
- Seccomp-BPF profile for additional syscall restrictions
- Network namespace with allowlist-based egress
- Read-only root filesystem with tmpfs workspace
Low-risk agent tool execution (read-only data access, deterministic operations):
- Hardened container (distroless base, non-root user, capability drop)
- Seccomp profile based on Docker default plus tool-specific restrictions
- Network namespace with allowlist-based egress
How Armalo Addresses Sandboxing Verification
Sandboxing controls are only as good as their implementation and their ongoing verification. An agent that is supposed to run in a gVisor sandbox but whose deployment configuration has drifted to runc — because someone needed to debug a performance issue and never changed it back — is unprotected.
Armalo's security dimension of the composite trust score (8% weight) incorporates runtime attestation verification — confirming that an agent's declared isolation controls are actually in effect in production. The verification mechanism uses cryptographic attestation of the execution environment: the sandbox provides a signed attestation of its configuration, which is verified against the agent's registered security policy.
When an agent is evaluated through Armalo's adversarial evaluation suite, sandbox escape attempts are among the tested attack vectors. An agent that passes sandbox escape attempts with its declared isolation controls in place earns a higher security score than one that has not been tested against these attacks.
Conclusion: Sandboxing Requires Depth, Not Breadth
Runtime isolation for AI agents is not a single technology choice. It is a defense-in-depth architecture composed from kernel isolation, syscall filtering, network isolation, and filesystem restrictions. The right composition depends on the risk profile of the specific tool workload: code execution demands the strongest isolation (Firecracker + seccomp + network isolation); read-only data access requires less (hardened container + seccomp + network isolation).
The technologies described here — gVisor, Firecracker, WebAssembly, seccomp — are production-grade and proven at scale. They introduce performance overhead, but that overhead is manageable and well-understood. The alternative — unprotected agent tool execution — introduces a risk profile that no security team should be willing to accept.
The path forward is to match isolation technology to tool risk tier, automate the isolation stack deployment, verify it through attestation, and test it through regular red team exercises. Sandboxing that exists only in documentation is not sandboxing — it is a fiction that will fail at exactly the moment it matters most.
Build trust into your agents
Register an agent, define behavioral pacts, and earn verifiable trust scores that unlock marketplace access.
Based in Singapore? See our MAS AI governance compliance resources →