This paper is about the paper in the 2020 NSDI Journal – Firecracker: Lightweight Virtualization for Serverless Applications, which implements Firecracker to provide lightweight virtualization support on the host. Many developers today choose to use Serverless containers and services to reduce system overhead, improve hardware resource utilization, and enable rapid scaling, but the Serverless scenario places higher demands on container isolation, security, and performance.
When serving multiple tenants with the same hardware, we expect the different workloads to be isolated for security and performance with minimal additional overhead. For a long time, however, the majority view was that we could only choose between strong security and low latency.
In contrast to virtualization technology, which provides strong security but introduces a large additional overhead, container technology provides weak security guarantees and a small additional overhead. With this in mind, both public and private clouds make their own choices based on the needs of.
- A public cloud will use virtual machines for security assurance, which has a higher additional overhead but can pass all the costs on to the user.
- Private clouds will use container technology in order to guarantee performance. Although the isolation between containers is poor, the customers facing them are generally the business side of the company, so security is generally not a primary concern and can be prioritized to ensure overall performance.
Isolation between different tenants will always be the first issue to consider for public clouds. The existence of some resource competition between tenants is sometimes still acceptable, but no customer can accept that the services they pay for may be attacked by other tenants.
This article introduces Firecracker, the new Virtual Machine Monitor (VMM) that provides both strong security guarantees and low additional overhead, currently powering AWS functions and compute engines supporting millions of workloads and trillions of requests per month.
Firecracker offers good compatibility, performance, very low additional overhead for running workloads and can support thousands of functions on a single host, but these are not the focus of this article, we expand here on several isolation mechanisms mentioned in the paper.
Linux containers, language-specific isolation mechanisms, and virtualization technologies are a few of the more common isolation options available today, and we’ll spend some time here briefly describing the similarities and differences between the three of them.
Linux containers combine multiple features of the kernel to provide operational and security isolation, including.
- control groups (cgroups): provide CPU, memory, and other resource limits.
- namespaces, which provide namespaces for kernel resources such as user and process identifiers and network interfaces.
- secure computing (seccomp-bpf): restricts the system calls and incoming parameters that processes can use.
- changing the root directory (chroot): provides an isolated file system.
Containers often rely on restrictions on system calls for security, and many container runtimes work on system calls to ensure security. Google’s gvisor, for example, emulates some system calls in user space, which can significantly reduce the amount of capability the kernel needs to provide.
Language Specific Isolation
Modern virtualization technologies use hardware-provided features to ensure the isolation of virtual hardware, page tables, and operating system kernels. While virtualization technologies can address security concerns, many of the problems in this world are pressed into service, and heavyweight virtualization technologies can present the following challenges.
- low deployment density and high additional overhead: both the VM monitor and the independently running kernel take up additional CPU and memory resources thus limiting the maximum number of VMs that can be deployed on a single machine.
- long startup time: the startup time of a virtual machine also affects its experience, and many developers have experienced the long wait time required to run a virtual machine.
- complex and error-prone implementations: virtualization technologies are often exceptionally complex to implement, with 1,400,000 lines of code and 270 different system calls in QEMU, the virtual machine monitor, making it difficult for us to guarantee the reliability of such a large code repository.
Firecracker chose the safer language Rust and used 50,000 lines of code to implement a minimum-available virtual machine monitor to replace QEMU, and the new VMM will work with Linux’s Kernel Virtual Machine (KVM) to provide a runtime environment for different workloads.
Firecracker, as a virtual machine monitor, relies on Linux’s Kernel Virtual Machine (KVM) to provide the smallest virtual machine, MicroVM, primarily because Linux’s components provide the correct functionality, performance, and design, and bypassing these components can result in significant implementation costs, as well as increase the cost for operations engineers to understand the new system and impact operations efforts.
This paper analyzes Firecracker’s performance in detail. For example, the boot time of the pre-configured version is between 100 and 150ms, while the boot time of Firecracker without pre-configuration is around 150-250ms; in addition to providing millisecond boot times, Firecracker requires only 3MB of additional memory. In addition to providing millisecond startup times, Firecracker requires only 3MB of additional memory overhead, which can significantly increase the deployment density of a single machine. Although Firecracker has a good performance in terms of startup time and additional overhead, its I/O throughput is much worse compared to other systems.
It is worth noting that the paper mentions that Firecracker oversells resources by a factor of twenty in a test environment and by a factor of ten in a production environment without causing any problems. It seems that AWS’s function engine Lambda can be quite profitable, and it is true that if you want to make money, you have to sell one resource as ten or even twenty.