Reproducible Builds
Iron Bank supports reproducible builds. Reproducible builds in the context of container images mean that building the same Dockerfile with the same inputs always yields an identical container image digest (SHA).
Reproducible builds mean stronger supply chain security, faster and more efficient builds, and fewer wasted downloads.
For more information on why reproducible builds are beneficial, check out the Blog.
Enabling Reproducible Builds
The REPRODUCIBLE_BUILDS CI/CD variable need only be set on your container repository to put the pipeline into reproducible build mode. This causes the following changes:
- The
SOURCE_DATE_EPOCHbuild arg will be added to the Dockerfile and set to theCI_COMMIT_TIMESTAMPof the build pipeline instead of the time the build is running, making this a consistent value across builds. The "created" timestamp for the built image and all content in layers created as a part of this build will also bear this same timestamp. - The
mil.dso.ironbank.ci.idimage label will no longer be set. The value for this label would change on every pipeline, changing the digest of the image. - The
org.opencontainers.image.createdimage label value will also be set to theCI_COMMIT_TIMESTAMPof the build pipeline instead of the time the build is running, making this a consistent value across builds. - If an existing image is reproduced, new attestations will only be written if the VAT findings have changed in any way or if the Syft version has changed since the last build or rescan.
- Platform is appended to staging tags so that reproducibility checks work with multiarch builds.
- A warning log message will appear in the build logs if an image already exists for the same commit revision, but the digest of the newly built image differs from it.
Creating a Reproducible Image
The key to configuring a reproducible image is to avoid commands that generate dynamic content unless they are based on deterministic input.
Picking a good base image
Selecting the right base image is one of the most important steps in creating reproducible container builds. Any change in the base image — even a minor patch or metadata update — will result in a different final image digest.
To maximize reproducibility:
- Avoid images that update frequently or include volatile tools like shells or package managers.
- Prefer minimal, deterministic images that change only when explicitly versioned.
Images that exclude a shell, package manager, and other unnecessary utilities have fewer moving parts, making them significantly less prone to introducing nondeterminism between builds. As an added benefit, these leaner images also reduce the attack surface — improving both reproducibility and security.
Look for minimal / micro / slim / distroless base images, or use scratch.
Crafting Dockerfiles for Reproducibility
A major challenge in building reproducible container images lies in managing timestamps — including those of files, layers, and the final image metadata. Any variation in timestamps between builds will result in different image digests, breaking reproducibility.
To address this, Docker supports the SOURCE_DATE_EPOCH build argument — a convention for pinning timestamps to a fixed value. Iron Bank derives this value from the CI_COMMIT_TIMESTMAP of the pipeline building the image. The "created" timestamp for the built image and all content in layers created as a part of this build will also bear this same timestamp.
Using SOURCE_DATE_EPOCH is only part of the solution. It's the responsibility of the image maintainer to:
- Avoid non-deterministic build steps
- Ensure tools respect
SOURCE_DATE_EPOCH - Eliminate volatile inputs (e.g.
date,uuidgen, temporary files with timestamps)
Each image may require specific strategies to achieve reproducibility, but the underlying principle is the same:
Avoid generating dynamic content unless it's based on deterministic inputs.
Example 1: Use Deterministic Timestamps
The following example Dockerfile will produce a different image on every build, because we are generating a timestamp dynamically and writing it to a file at build time. The timestamp will be set to whatever the time is when the build is taking place:
FROM public.ecr.aws/ubuntu/ubuntu:22.04
# This breaks reproducibility
RUN echo $(date +%s) >/timestamp.txt
To ensure reproducible builds, use the SOURCE_DATE_EPOCH environment variable instead.
This allows the same timestamp to be used across builds, resulting in consistent image digests (assuming the base image remains unchanged):
FROM public.ecr.aws/ubuntu/ubuntu:22.04
# Reproducibility is maintained
RUN echo "${SOURCE_DATE_EPOCH}" >/timestamp.txt
Note:
SOURCE_DATE_EPOCHis automatically supplied by Iron Bank as anARGto your Dockerfile.
Example 2: Pin PIP Dependencies and Clean Cache
The following Dockerfile may change on any build because the version of requests being installed is not pinned, and there are cache files created that may also change on every build.
FROM kiwigrid/k8s-sidecar:1.30.3 as base
RUN pip install requests
Instead, use a requirements.txt file with pinned dependencies, and the --require-hashes option to ensure every package and its dependencies are pinned by hash. Don't forget to clean up the pip cache.
COPY files/requirements.txt /opt/requirements.txt
RUN pip install --no-cache-dir --require-hashes -r /opt/requirements.txt && \
rm -rf /root/.cache/pip
Example 3: Clean apt upgrade / install
Various logs and caches may need to be cleared to make an image reproducible when performing apt-get upgrades or installs.
Here is an example command that deletes or truncates various files related to apt.
RUN apt-get update -y && \
apt-get upgrade -y && \
apt install -y ca-certificates && \
apt-get clean && \
chmod 644 /usr/local/share/ca-certificates/*.pem && \
chmod +x /tmp/update-certs.sh && \
chmod +x /tmp/suid-guid.sh && \
/tmp/update-certs.sh && \
/tmp/suid-guid.sh && \
rm -rf /var/lib/apt/lists/* && \
> /var/log/dpkg.log && \
> /var/log/apt/term.log && \
> /var/log/apt/history.log && \
> /var/cache/ldconfig/aux-cache
Example 4: Clean npm installs
To clean up after your npm installs, there are a few directories to delete.
RUN \
npm install -g prettier@3.6.2 && \
rm -rf /root/.npm /tmp/node-compile-cache
Example 5: Fix "Days since pasword change" changes
/etc/shadow contains a record of "days since password change" in column 3. This value may change across builds.
Example /etc/shadow
cat /etc/shadow
root:*::0:::::
nobody:!::0:::::
clamav:!:20279:0:99999:7:::
squid:!:20279:0:99999:7:::
python:!:20279:0:99999:7:::
A way to fix this is to clear that column:
# reproducible
RUN \
for SHADOW_FILE in shadow shadow-; do sed -ie 's/[^:]*//3' /etc/$SHADOW_FILE; done
How to tell if the image is configured for reproducibility
The IronBank pipeline's build job includes automatic reproducibility checks to help ensure consistent image outputs across builds.
When building an image, the pipeline will:
- Inspect the previously published image for the same tag.
- Compare the
org.opencontainers.image.revisionlabel from the previous image with the current build. - If the revisions match, it will check whether the image digests are also identical.
The pipeline will print a warning message indicating that the image may not be reproducible if a newly built image has a different digest from an image built with the same revision.
If the image is based on a parent image that has changed since the last build, a digest mismatch is expected. If the parent image has not changed and the digest differs, this may signal that the image is not configured correctly for reproducible builds.
Debugging a Changing Image
A simple way to debug what is changing in an image is to generate checksums of the files between two images and compare them.
You can use a tool like diffoci to help you with this.
Trace a Reproducible Build to an Iron Bank Pipeline
With the removal of the mil.dso.ironbank.ci.id image label when REPRODUCIBLE_BUILDS is enabled, there are a few alternative methods for tracing a reproducible container image back to its originating Iron Bank CI pipeline.
Option 1: Use the OCI Image Labels
You can use image labels to identify the source of an image:
org.opencontainers.image.source-- The repository URLorg.opencontainers.image.revision-- The commit SHA used to build the image
To trace the pipeline, the following URL will take you directly to the Gitlab pipelines associated with the image's source revision:
https://<org.opencontainers.image.source>/-/commit/<org.opencontainers.image.revision>/pipelines
Option 2: Use the hardening_manifest.json Attestation
The hardening_manifest.json attestation now includes a pipeline_id field, which provides an explicit reference to the CI pipeline that last modified the image's attestations.
Note: If multiple pipelines produce the same image digest, the pipeline_id will correspond to the most recent pipeline that pushed or updated attestations for the image.
Attestations
In a reproducible build process, a rebuilt image will be bit-for-bit identical to the original, meaning its digest will not change. Because of this, most attestations remain unchanged between builds. We do have a few scenarios where new attestations will be published even though we may have reproduced an existing image:
-
VAT Response Change: The VAT response may change if:
- New findings are discovered due to updated vulnerability definitions or pipeline tooling
- Justifications for existing findings have been updated in VAT
-
Syft Version Change: A Syft version change may result in changed SBOM details, so a changed version will always trigger new attestations.
Since the hardening_manifest.json attestation is only updated for a reproduced image under a few specific scenarios, the pipeline_id in that manifest will point to the latest pipeline to publish new attestations.