Aman Goyal

LeetCode

[One Paper Later] On-demand Container Loading in AWS Lambda


You can also listen to AI generated discussion


Overview

AWS released this paper On-demand Container Loading in AWS Lambda which discusses how they were able to scale their Lambda offering to support container images upto 10 GiB from originail 250 Mib.

Originally, users had to ZIP their code and upload it to S3 to run on Lambda. For each new invocation (especially after a cold start), AWS would spin up a new lightweight VM, pull the ZIP file, and execute it. This worked well because AWS’s global backbone network is fast and 250 MiB is relatively small, keeping cold-start times low (typically ~50ms).

With the move to supporting container images — and large ones, up to 10 GiB — two core challenges arose:

Despite this, the core requirement remained: keep cold-start times near 50ms. This required significant engineering effort and innovation. For this they exploited three core properties of container images

  1. Cacheability: The majority of workloads come from a small number of unique images.
  2. Commonality: Many popular images are based on common base layers (e.g., Alpine, Ubuntu).
  3. Sparsity: Most files inside container images are not needed at startup.

Architecture

When a user invokes a Lambda function, the request eventually lands on a Lambda worker.

image.png

Each Lambda worker can host multiple functions. A local agent process runs on the worker and communicates with the Firecracker VM via virtio, a standard virtual device interface. Inside the VM, this local agent appears as a block device (like a virtual hard disk).

Filesystem Preparation

Storage and Caching

Startup and Read/Write Behavior

Garbage Collection (GC)

A major challenge in deduplicated systems is safe and efficient garbage collection. Lambda avoids reference counting or centralized metadata to reduce complexity and risk. Instead, AWS introduced a root-based GC model:

This process ensures:

While expired roots and chunk duplication increase storage costs slightly, the tradeoff is acceptable—most customer data is updated frequently, and not all data is migrated.

Conclusion

AWS Lambda’s evolution from small ZIP files to massive 10GB container images is a masterclass in system design. By leveraging:

AWS was able to maintain low cold start times while scaling Lambda to support modern containerized workflows. This architecture not only optimizes performance and resource usage but also preserves tenant isolation and security, all without disrupting the user experience.


Source

#AWS #Lambda #Serverless #Container #One Page Later Pod