Skip to content

Reproducible Builds

Iron Bank supports reproducible builds. Reproducible builds in the context of container images mean that building the same Dockerfile with the same inputs always yields an identical container image digest (SHA).

Reproducible builds mean stronger supply chain security, faster and more efficient builds, and fewer wasted downloads.

For more information on why reproducible builds are beneficial, check out the Blog.

Enabling Reproducible Builds

The REPRODUCIBLE_BUILDS CI/CD variable need only be set on your container repository to put the pipeline into reproducible build mode. This causes the following changes:

  1. The SOURCE_DATE_EPOCH build arg will be added to the Dockerfile and set to the CI_COMMIT_TIMESTAMP of the build pipeline instead of the time the build is running, making this a consistent value across builds. The "created" timestamp for the built image and all content in layers created as a part of this build will also bear this same timestamp.
  2. The mil.dso.ironbank.ci.id image label will no longer be set. The value for this label would change on every pipeline, changing the digest of the image.
  3. The org.opencontainers.image.created image label value will also be set to the CI_COMMIT_TIMESTAMP of the build pipeline instead of the time the build is running, making this a consistent value across builds.
  4. If an existing image is reproduced, new attestations will only be written if the VAT findings have changed in any way or if the Syft version has changed since the last build or rescan.
  5. Platform is appended to staging tags so that reproducibility checks work with multiarch builds.
  6. A warning log message will appear in the build logs if an image already exists for the same commit revision, but the digest of the newly built image differs from it.

Creating a Reproducible Image

The key to configuring a reproducible image is to avoid commands that generate dynamic content unless they are based on deterministic input.

Picking a good base image

Selecting the right base image is one of the most important steps in creating reproducible container builds. Any change in the base image — even a minor patch or metadata update — will result in a different final image digest.

To maximize reproducibility:

  • Avoid images that update frequently or include volatile tools like shells or package managers.
  • Prefer minimal, deterministic images that change only when explicitly versioned.

Images that exclude a shell, package manager, and other unnecessary utilities have fewer moving parts, making them significantly less prone to introducing nondeterminism between builds. As an added benefit, these leaner images also reduce the attack surface — improving both reproducibility and security.

Look for minimal / micro / slim / distroless base images, or use scratch.

Crafting Dockerfiles for Reproducibility

A major challenge in building reproducible container images lies in managing timestamps — including those of files, layers, and the final image metadata. Any variation in timestamps between builds will result in different image digests, breaking reproducibility.

To address this, Docker supports the SOURCE_DATE_EPOCH build argument — a convention for pinning timestamps to a fixed value. Iron Bank derives this value from the CI_COMMIT_TIMESTMAP of the pipeline building the image. The "created" timestamp for the built image and all content in layers created as a part of this build will also bear this same timestamp.

Using SOURCE_DATE_EPOCH is only part of the solution. It's the responsibility of the image maintainer to:

  • Avoid non-deterministic build steps
  • Ensure tools respect SOURCE_DATE_EPOCH
  • Eliminate volatile inputs (e.g. date, uuidgen, temporary files with timestamps)

Each image may require specific strategies to achieve reproducibility, but the underlying principle is the same:

Avoid generating dynamic content unless it's based on deterministic inputs.

Example 1: Use Deterministic Timestamps

The following example Dockerfile will produce a different image on every build, because we are generating a timestamp dynamically and writing it to a file at build time. The timestamp will be set to whatever the time is when the build is taking place:

FROM public.ecr.aws/ubuntu/ubuntu:22.04
# This breaks reproducibility
RUN echo $(date +%s) >/timestamp.txt

To ensure reproducible builds, use the SOURCE_DATE_EPOCH environment variable instead. This allows the same timestamp to be used across builds, resulting in consistent image digests (assuming the base image remains unchanged):

FROM public.ecr.aws/ubuntu/ubuntu:22.04
# Reproducibility is maintained
RUN echo "${SOURCE_DATE_EPOCH}" >/timestamp.txt

Note: SOURCE_DATE_EPOCH is automatically supplied by Iron Bank as an ARG to your Dockerfile.

Example 2: Pin PIP Dependencies and Clean Cache

The following Dockerfile may change on any build because the version of requests being installed is not pinned, and there are cache files created that may also change on every build.

FROM kiwigrid/k8s-sidecar:1.30.3 as base
RUN pip install requests

Instead, use a requirements.txt file with pinned dependencies, and the --require-hashes option to ensure every package and its dependencies are pinned by hash. Don't forget to clean up the pip cache.

COPY files/requirements.txt /opt/requirements.txt
RUN pip install --no-cache-dir --require-hashes -r /opt/requirements.txt && \
    rm -rf /root/.cache/pip

Example 3: Clean apt upgrade / install

Various logs and caches may need to be cleared to make an image reproducible when performing apt-get upgrades or installs.

Here is an example command that deletes or truncates various files related to apt.

RUN apt-get update -y && \
    apt-get upgrade -y && \
    apt install -y ca-certificates && \
    apt-get clean && \
    chmod 644 /usr/local/share/ca-certificates/*.pem && \
    chmod +x /tmp/update-certs.sh && \
    chmod +x /tmp/suid-guid.sh && \
    /tmp/update-certs.sh && \
    /tmp/suid-guid.sh && \
    rm -rf /var/lib/apt/lists/* && \
    > /var/log/dpkg.log && \
    > /var/log/apt/term.log && \
    > /var/log/apt/history.log && \
    > /var/cache/ldconfig/aux-cache

Example 4: Clean npm installs

To clean up after your npm installs, there are a few directories to delete.

RUN \ 
    npm install -g prettier@3.6.2 && \
    rm -rf /root/.npm /tmp/node-compile-cache

Example 5: Fix "Days since pasword change" changes

/etc/shadow contains a record of "days since password change" in column 3. This value may change across builds.

Example /etc/shadow

cat /etc/shadow
root:*::0:::::
nobody:!::0:::::
clamav:!:20279:0:99999:7:::
squid:!:20279:0:99999:7:::
python:!:20279:0:99999:7:::

A way to fix this is to clear that column:

# reproducible
RUN \
    for SHADOW_FILE in shadow shadow-; do sed -ie 's/[^:]*//3' /etc/$SHADOW_FILE; done

How to tell if the image is configured for reproducibility

The IronBank pipeline's build job includes automatic reproducibility checks to help ensure consistent image outputs across builds.

When building an image, the pipeline will:

  1. Inspect the previously published image for the same tag.
  2. Compare the org.opencontainers.image.revision label from the previous image with the current build.
  3. If the revisions match, it will check whether the image digests are also identical.

The pipeline will print a warning message indicating that the image may not be reproducible if a newly built image has a different digest from an image built with the same revision.

If the image is based on a parent image that has changed since the last build, a digest mismatch is expected. If the parent image has not changed and the digest differs, this may signal that the image is not configured correctly for reproducible builds.

Debugging a Changing Image

A simple way to debug what is changing in an image is to generate checksums of the files between two images and compare them.

You can use a tool like diffoci to help you with this.

Trace a Reproducible Build to an Iron Bank Pipeline

With the removal of the mil.dso.ironbank.ci.id image label when REPRODUCIBLE_BUILDS is enabled, there are a few alternative methods for tracing a reproducible container image back to its originating Iron Bank CI pipeline.

Option 1: Use the OCI Image Labels

You can use image labels to identify the source of an image:

  • org.opencontainers.image.source -- The repository URL
  • org.opencontainers.image.revision -- The commit SHA used to build the image

To trace the pipeline, the following URL will take you directly to the Gitlab pipelines associated with the image's source revision:

https://<org.opencontainers.image.source>/-/commit/<org.opencontainers.image.revision>/pipelines

Option 2: Use the hardening_manifest.json Attestation

The hardening_manifest.json attestation now includes a pipeline_id field, which provides an explicit reference to the CI pipeline that last modified the image's attestations.

Note: If multiple pipelines produce the same image digest, the pipeline_id will correspond to the most recent pipeline that pushed or updated attestations for the image.

Attestations

In a reproducible build process, a rebuilt image will be bit-for-bit identical to the original, meaning its digest will not change. Because of this, most attestations remain unchanged between builds. We do have a few scenarios where new attestations will be published even though we may have reproduced an existing image:

  1. VAT Response Change: The VAT response may change if:

    • New findings are discovered due to updated vulnerability definitions or pipeline tooling
    • Justifications for existing findings have been updated in VAT
  2. Syft Version Change: A Syft version change may result in changed SBOM details, so a changed version will always trigger new attestations.

Since the hardening_manifest.json attestation is only updated for a reproduced image under a few specific scenarios, the pipeline_id in that manifest will point to the latest pipeline to publish new attestations.