Cross-compiling C++ with Bazel and Wolfi

Introduction

Chainguard's Wolfi is a distroless base for Docker containers: the bare minimum needed to run a Linux kernel. This has two big implications:

the images are small, which can make them faster to build -- a key consideration if you are aiming for rapid iteration and continuous delivery
the number of packages is limited, which reduces attack surface

It goes hand-in-hand with apko, a tool that builds OCI-compliant containers from Alpine's apk distribution format. The companion rules_apko module for the Bazel build system then lets you automate creating those containers. However, there is a catch if you are building native code: you also need to package up your own binary, and that binary needs to be fully compatible with the target runtime for the container, which could be different from the host operating system. For instance, you might be compiling on MacOS running on ARM64-based Apple Silicon chip, but you want to create an image that runs on Linux on an x86-based Intel i9 chip.

In order to do this, you need to cross-compile the binary. LLVM excels at this, and Bazel's toolchains_llvm helpfully takes care of things like downloading the toolchain and sysroot that you need to cross-compile. The latter, though, is its own tricky piece. A sysroot is a filesystem with things like glibc, Linux kernel headers, etc.. You need this because even if your compiler running on MacOS knows how to generate machine code for x86 while running on an ARM processor, it still needs headers and libraries to link to in order to create a complete Linux executable. Depending on how you compile and link, your new executable might end up with absolute paths or links to shared libraries that may or may not be available inside your container -- or worse, the versions present might have subtle differences, so what you compiled and tested might not be what you run.

To get around this hazard, Bazel encourages hermetic builds. Namely, you want full control of the build environment, libraries, etc. and then you want to use that to deploy as well. In our case, since we want to target Wolfi, it would be nice if that sysroot we were compiling against was the same base image as the one we plan to use with apko, right?

That's what we're going to tackle here.

From wolfi-base to sysroot

Chainguard's wolfi-base image can be pulled from their Docker registry:

docker pull cgr.dev/chainguard/wolfi-base

which is great if you want to run it, and not as helpful if your goal is to build a sysroot for cross-compilation. What we really want to do is run apko to create a sysroot, and then expose it to the LLVM toolchain. Thankfully, with a repository_rule in Bazel you can do just that, even downloading the apko binary you need on-the-fly.

We start with sysroot.yaml -- an apko configuration file that starts with wolfi-base and layers on top glibc, libstdc++ and the Linux kernel headers.

sysroot.yaml

contents:
  repositories:
    - https://packages.wolfi.dev/os
  keyring:
    - https://packages.wolfi.dev/os/wolfi-signing.rsa.pub
  packages:
    - build-base
    - glibc-dev
    - libstdc++-dev
    - linux-headers

archs:
  - x86_64
  - aarch64

This will create a larger than usual image: about 500 MB each, almost entirely due to gcc's inclusion in build-base. However, if we subsequently create an even more minimal setup, we know that the hermetic build's sysroot and the container image share identical foundations -- making the image more reliable and secure.

One key finding it getting this working is that toolchains_llvm requires a sysroot to be a package, which means you have to generate it with a repository_rule. However, this imposes a constraint: repository rules are evaluated during Bazel's loading phase, and so you cannot depend on artifacts generated by Bazel itself. That means if we want to call apko unfortunately we cannot rely on rules_apko to load the binary for us: we have to download a specific version. This is why the attributes include details like apko_version and apko_sha256 .

repos.bzl


sysroot = repository_rule(
    implementation = _sysroot_impl,
    attrs = {
        "apko_config": attr.label(
            doc = "Label pointing to the apko config YAML file.",
            default = "//build-support/sysroot:sysroot.yaml"),
        "architecture": attr.string(
            mandatory = True,
            values = ["amd64", "arm64"]
        ),
        "apko_version": attr.string(
            doc = "Version of apko to use for building the sysroot.",
            default = "0.30.26",
        ),
        "apko_sha256": attr.string_dict(
            doc = "SHA256 checksums of the apko binary, per architecture.",
            default = {
                "darwin_arm64": "347bd6c...",
                "linux_amd64": "12c227b...",
                "linux_arm64": "f46bc84...",
            },
        ),
        "strip_components": attr.int(
            doc = "Number of components to strip when extracting (similar to strip_prefix).",
        ),
        "include_patterns": attr.string_list(),
        "exclude_patterns": attr.string_list(),
    },
)

With the rule defined, we need an implementation. The first part is just using the repository_ctx object and our input attributes to download the appropriate apko version for our host binary:

repos.bzl

load("@aspect_bazel_lib//lib:repo_utils.bzl", "repo_utils")

def _sysroot_impl(rctx):
    apko_version = rctx.attr.apko_version
    host_platform = repo_utils.platform(rctx)

    url = "https://github.com/chainguard-dev/apko/releases/download/v{apko_version}/apko_{apko_version}_{host_platform}.tar.gz".format(
        apko_version = apko_version,
        host_platform = host_platform,
    )
    strip_prefix = "apko_{}_{}".format(
        apko_version,
        host_platform,
    )

    apko_sha256 = rctx.attr.apko_sha256.get(host_platform)
    if apko_sha256 == None:
        fail("No apko SHA256 checksum provided for platform: %s" % host_platform)

    rctx.download_and_extract(
        url = url,
        output = "apko",
        sha256 = apko_sha256,
        strip_prefix = strip_prefix,
    )

We can then use the context to run apko build-minirootfs and turn the apko_config YAML into an extract of the Linux usr, lib and other top-level directories:

repos.bzl


    archive = rctx.path("sysroot.tar")
    result = rctx.execute([
        rctx.path("apko/apko"),
        "build-minirootfs",
        rctx.path(rctx.attr.apko_config),
        archive,
        "--build-arch",
        rctx.attr.architecture,
    ])
    if result.return_code != 0:
        fail(result.stdout + result.stderr)

The next step is critical for toolchains_llvm and ultimately clang to work. We declare a BUILD.bazel inside the new repo that returns a filegroup with the top-level directory. Subsequently the toolchain will execute this to convert the extracted repo into a sysroot:

repos.bzl

    rctx.file(
        "sysroot/BUILD.bazel",
        """filegroup(
    name = "sysroot",
    srcs = ["."],
    visibility = ["//visibility:public"],
)""",
    )

This approach is necessary but Bazel does not particularly like it: if you run this without any overrides you will get warnings about directories as inputs not being supported. So far the only workaround I found was to add this to .bazelrc:

startup --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1

This startup flag forces the Bazel daemon to monitor directories for changes, which it does not do by default for performance reasons. Still, we now have an empty sysroot directory in a format that toolchains_llvm can ingest.

The output of the apko execution is a tar file, so we'll follow toolchains_llvm's own custom sysroot.bzl and use the embedded tar toolchain for the host platform:

repos.bzl

    host_bsdtar = Label("@bsd_tar_toolchains_%s//:tar" % repo_utils.platform(rctx))
    cmd = [
        rctx.path(host_bsdtar),
        "--extract",
        "--no-same-owner",
        "--no-same-permissions",
        "--file",
        archive,
        "--directory",
        "sysroot",
        "--strip-components",
        str(rctx.attr.strip_components),
    ]

    for include in rctx.attr.include_patterns:
        cmd.extend(["--include", include])

    for exclude in rctx.attr.exclude_patterns:
        cmd.extend(["--exclude", exclude])

    result = rctx.execute(cmd)
    if result.return_code != 0:
        fail(result.stdout + result.stderr)

    rctx.delete(archive)

Finally we return the repo_metadata, and tell it that it's reproducible so it's cached:

repos.bzl

    if hasattr(rctx, "repo_metadata"):
        return rctx.repo_metadata(reproducible = True)
    else:
        return None

We can now use our rule and declare sysroots for each architecture:

MODULE.bazel

sysroot = use_repo_rule("//build-support/sysroot:repos.bzl", "sysroot")

sysroot(
    name = "sysroot_amd64",
    architecture = "amd64",
    include_patterns = ["**"],
    exclude_patterns = ["dev/*", "etc/shadow", "etc/gshadow"],
)

sysroot(
    name = "sysroot_arm64",
    architecture = "arm64",
    include_patterns = ["**"],
    exclude_patterns = ["dev/*", "etc/shadow", "etc/gshadow"],
)

which can support our toolchain:

MODULE.bazel


# Configure LLVM toolchain for cross-compilation
llvm = use_extension("@toolchains_llvm//toolchain/extensions:llvm.bzl", "llvm")

# Configure toolchain with Linux targets
llvm.toolchain(
    name = "llvm_toolchain",
    llvm_version = "21.1.6",
    extra_llvm_distributions = {
        "LLVM-21.1.6-Linux-ARM64.tar.xz": "1d8a9e...",
        "LLVM-21.1.6-Linux-X64.tar.xz": "38bd99...",
        "LLVM-21.1.6-macOS-ARM64.tar.xz": "bdf036...",
        "clang+llvm-21.1.6-x86_64-pc-windows-msvc.tar.xz": "6fd57e...",
    },
    stdlib = {
        "linux-x86_64": "stdc++",
        "linux-aarch64": "stdc++",
    }
)
# Register sysroots for cross-compilation to Linux
llvm.sysroot(
    name = "llvm_toolchain",
    label = "@sysroot_amd64//sysroot",
    targets = ["linux-x86_64"],
)

llvm.sysroot(
    name = "llvm_toolchain",
    label = "@sysroot_arm64//sysroot",
    targets = ["linux-aarch64"],
)

These last two blocks bind the labels of our declared sysroot repositories. Note that we have to point to the sysroot directory inside the repo in our label.

With all this in place, you can easily cross-compile Linux x86 and Linux ARM64 binaries from different hosts, including Apple Silicon.

Conclusion

Hermetic builds are an important principle for Bazel, and essential when you are dealing with a compiled language like C++ that is very sensitive to its environment. From a cybersecurity perspective as well, the more you can control and build and runtime environment, the lower the probability that something unexpected sneaks in. A fully realized Shift Left setup for Modern C++ will require extending this to more advanced features like building OCI containers and leveraging remote caching and execution, but Bazel gives you all the pieces needed to get there.

Kyle Downey

Cross-compiling C++ with Bazel and Wolfi

rules_apko, toolchains_llvm and Wolfi

Introduction

From wolfi-base to sysroot

Conclusion

Kyle Downey