How does it work?

Docker implements Enhanced Container Isolation by using the Sysbox container runtime. Sysbox is a fork of the standard OCI runc runtime that was modified to enhance standard container isolation and workloads. For more details see Under the hood.

When Enhanced Container Isolation is enabled, containers created by users through docker run or docker create are automatically launched using Sysbox instead of the standard OCI runc runtime. Users need not do anything else and can continue to use containers as usual. For exceptions, see FAQs.

Even containers that use the insecure --privileged flag can now be run securely with Enhanced Container Isolation, such that they can no longer be used to breach the Docker Desktop Virtual Machine (VM) or other containers.

メモ

When Enhanced Container Isolation is enabled in Docker Desktop, the Docker CLI --runtime flag is ignored. Docker's default runtime continues to be runc, but all user containers are implicitly launched with Sysbox.

Enhanced Container Isolation is not the same as Docker Engine's userns-remap mode or Rootless Docker.

Under the hood

Sysbox enhances container isolation by using techniques such as:

  • Enabling the Linux user-namespace on all containers (root user in the container maps to an unprivileged user in the Linux VM).
  • Restricting the container from mounting sensitive VM directories.
  • Vetting sensitive system-calls between the container and the Linux kernel.
  • Mapping filesystem user/group IDs between the container's user-namespace and the Linux VM.
  • Emulating portions of the /proc and /sys filesystems inside the container.

Some of these are made possible by recent advances in the Linux kernel which Docker Desktop now incorporates. Sysbox applies these techniques with minimal functional or performance impact to containers.

These techniques complement Docker's traditional container security mechanisms such as using other Linux namespaces, cgroups, restricted Linux Capabilities, Seccomp, and AppArmor. They add a strong layer of isolation between the container and the Linux kernel inside the Docker Desktop VM.

For more information, see Key features and benefits.

Enhanced Container Isolation versus user namespace remapping

The Docker Engine includes a feature called userns-remap mode that enables the user namespace in all containers. However it suffers from a few limitations and it's not supported within Docker Desktop.

Userns-remap mode is similar to Enhanced Container Isolation in that both improve container isolation by leveraging the Linux user-namespace.

However, Enhanced Container Isolation is much more advanced since it assigns exclusive user-namespace mappings per container automatically and adds several other container isolation features meant to secure Docker Desktop in organizations with stringent security requirements.

Enhanced Container Isolation versus Rootless Docker

Rootless Docker lets Docker Engine, and by extension the containers, to run without root privileges natively on a Linux host. This lets non-root users to install and run Docker natively on Linux.

Rootless Docker is not supported within Docker Desktop. While it's a valuable feature when running Docker natively on Linux, its value within Docker Desktop is reduced since Docker Desktop runs the Docker Engine within a Linux VM. That is, Docker Desktop already lets non-root host users to run Docker and isolates the Docker Engine from the host using a virtual machine.

Unlike Rootless Docker, Enhanced Container Isolation does not run Docker Engine within a Linux user-namespace. Rather it runs the containers generated by that engine within a user-namespace. This has the advantage of bypassing the limitations of Rootless Docker and creates a stronger boundary between the containers and the Docker Engine.

Enhanced Container Isolation is meant to ensure containers launched with Docker Desktop can't easily breach the Docker Desktop Linux VM and therefore modify security settings within it.