
Left of Launch: Questioning the Speed-Quality Tradeoff in Software Engineering
The concept “left of launch” comes from missile defense: in the timeline, the events that happen before the missile launch are to the left; this is where the idea of Shift Left in QA testing comes from as well. You can apply this idea to building code, unit testing, benchmarking and even security scanning: the common thread is that in all these things when there is a mistake of some kind, you want it to become apparent as soon after the mistake is made as possible. Why you might want that, an...

Pre-Building Standard Devcontainers with GitHub CI
To support Left-of-Launch quality checks and ensure a consistent development environment straight out of GitHub, I make extensive use of VS Code’s devcontainers. The one problem is once you have built up a large set of tools, the time to rebuild the container gets to be quite long -- unnecessarily so, because the vast majority of the layers in the container never change. Ideally we want the majority of our core features to just be available as a pre-built image. If you search for how to do th...

Chaos Manor Reloaded
Many years ago Jerry Pournelle wrote a column for the now-defunct Byte magazine called Computing at Chaos Manor. Nowadays a lot of system administrators and developers are homelabbing and writing about it, but he was probably one of the pioneers, so the title here is a nod to that history. As I am getting back to regular hobbyist software development and being our home’s Bastard Operator From Hell, I wanted to write more about that as well. In the past I have written while building both hardw...

Left of Launch: Questioning the Speed-Quality Tradeoff in Software Engineering
The concept “left of launch” comes from missile defense: in the timeline, the events that happen before the missile launch are to the left; this is where the idea of Shift Left in QA testing comes from as well. You can apply this idea to building code, unit testing, benchmarking and even security scanning: the common thread is that in all these things when there is a mistake of some kind, you want it to become apparent as soon after the mistake is made as possible. Why you might want that, an...

Pre-Building Standard Devcontainers with GitHub CI
To support Left-of-Launch quality checks and ensure a consistent development environment straight out of GitHub, I make extensive use of VS Code’s devcontainers. The one problem is once you have built up a large set of tools, the time to rebuild the container gets to be quite long -- unnecessarily so, because the vast majority of the layers in the container never change. Ideally we want the majority of our core features to just be available as a pre-built image. If you search for how to do th...

Chaos Manor Reloaded
Many years ago Jerry Pournelle wrote a column for the now-defunct Byte magazine called Computing at Chaos Manor. Nowadays a lot of system administrators and developers are homelabbing and writing about it, but he was probably one of the pioneers, so the title here is a nod to that history. As I am getting back to regular hobbyist software development and being our home’s Bastard Operator From Hell, I wanted to write more about that as well. In the past I have written while building both hardw...

Subscribe to Kyle Downey

Subscribe to Kyle Downey
Share Dialog
Share Dialog
<100 subscribers
<100 subscribers


Chainguard's Wolfi is a distroless base for Docker containers: the bare minimum needed to run a Linux kernel. This has two big implications:
the images are small, which can make them faster to build -- a key consideration if you are aiming for rapid iteration and continuous delivery
the number of packages is limited, which reduces attack surface
It goes hand-in-hand with apko, a tool that builds OCI-compliant containers from Alpine's apk distribution format. The companion rules_apko module for the Bazel build system then lets you automate creating those containers. However, there is a catch if you are building native code: you also need to package up your own binary, and that binary needs to be fully compatible with the target runtime for the container, which could be different from the host operating system. For instance, you might be compiling on MacOS running on ARM64-based Apple Silicon chip, but you want to create an image that runs on Linux on an x86-based Intel i9 chip.
In order to do this, you need to cross-compile the binary. LLVM excels at this, and Bazel's toolchains_llvm helpfully takes care of things like downloading the toolchain and sysroot that you need to cross-compile. The latter, though, is its own tricky piece. A sysroot is a filesystem with things like glibc, Linux kernel headers, etc.. You need this because even if your compiler running on MacOS knows how to generate machine code for x86 while running on an ARM processor, it still needs headers and libraries to link to in order to create a complete Linux executable. Depending on how you compile and link, your new executable might end up with absolute paths or links to shared libraries that may or may not be available inside your container -- or worse, the versions present might have subtle differences, so what you compiled and tested might not be what you run.
To get around this hazard, Bazel encourages hermetic builds. Namely, you want full control of the build environment, libraries, etc. and then you want to use that to deploy as well. In our case, since we want to target Wolfi, it would be nice if that sysroot we were compiling against was the same base image as the one we plan to use with apko, right?
That's what we're going to tackle here.
Chainguard's wolfi-base image can be pulled from their Docker registry:
docker pull cgr.dev/chainguard/wolfi-basewhich is great if you want to run it, and not as helpful if your goal is to build a sysroot for cross-compilation. What we really want to do is run apko to create a sysroot, and then expose it to the LLVM toolchain. Thankfully, with a repository_rule in Bazel you can do just that, even downloading the apko binary you need on-the-fly.
We start with sysroot.yaml -- an apko configuration file that starts with wolfi-base and layers on top glibc, libstdc++ and the Linux kernel headers.
sysroot.yaml
contents:
repositories:
- https://packages.wolfi.dev/os
keyring:
- https://packages.wolfi.dev/os/wolfi-signing.rsa.pub
packages:
- build-base
- glibc-dev
- libstdc++-dev
- linux-headers
archs:
- x86_64
- aarch64This will create a larger than usual image: about 500 MB each, almost entirely due to gcc's inclusion in build-base. However, if we subsequently create an even more minimal setup, we know that the hermetic build's sysroot and the container image share identical foundations -- making the image more reliable and secure.
One key finding it getting this working is that toolchains_llvm requires a sysroot to be a package, which means you have to generate it with a repository_rule. However, this imposes a constraint: repository rules are evaluated during Bazel's loading phase, and so you cannot depend on artifacts generated by Bazel itself. That means if we want to call apko unfortunately we cannot rely on rules_apko to load the binary for us: we have to download a specific version. This is why the attributes include details like apko_version and apko_sha256 .
repos.bzl
sysroot = repository_rule(
implementation = _sysroot_impl,
attrs = {
"apko_config": attr.label(
doc = "Label pointing to the apko config YAML file.",
default = "//build-support/sysroot:sysroot.yaml"),
"architecture": attr.string(
mandatory = True,
values = ["amd64", "arm64"]
),
"apko_version": attr.string(
doc = "Version of apko to use for building the sysroot.",
default = "0.30.26",
),
"apko_sha256": attr.string_dict(
doc = "SHA256 checksums of the apko binary, per architecture.",
default = {
"darwin_arm64": "347bd6c...",
"linux_amd64": "12c227b...",
"linux_arm64": "f46bc84...",
},
),
"strip_components": attr.int(
doc = "Number of components to strip when extracting (similar to strip_prefix).",
),
"include_patterns": attr.string_list(),
"exclude_patterns": attr.string_list(),
},
)With the rule defined, we need an implementation. The first part is just using the repository_ctx object and our input attributes to download the appropriate apko version for our host binary:
repos.bzl
load("@aspect_bazel_lib//lib:repo_utils.bzl", "repo_utils")
def _sysroot_impl(rctx):
apko_version = rctx.attr.apko_version
host_platform = repo_utils.platform(rctx)
url = "https://github.com/chainguard-dev/apko/releases/download/v{apko_version}/apko_{apko_version}_{host_platform}.tar.gz".format(
apko_version = apko_version,
host_platform = host_platform,
)
strip_prefix = "apko_{}_{}".format(
apko_version,
host_platform,
)
apko_sha256 = rctx.attr.apko_sha256.get(host_platform)
if apko_sha256 == None:
fail("No apko SHA256 checksum provided for platform: %s" % host_platform)
rctx.download_and_extract(
url = url,
output = "apko",
sha256 = apko_sha256,
strip_prefix = strip_prefix,
)
We can then use the context to run apko build-minirootfs and turn the apko_config YAML into an extract of the Linux usr, lib and other top-level directories:
repos.bzl
archive = rctx.path("sysroot.tar")
result = rctx.execute([
rctx.path("apko/apko"),
"build-minirootfs",
rctx.path(rctx.attr.apko_config),
archive,
"--build-arch",
rctx.attr.architecture,
])
if result.return_code != 0:
fail(result.stdout + result.stderr)The next step is critical for toolchains_llvm and ultimately clang to work. We declare a BUILD.bazel inside the new repo that returns a filegroup with the top-level directory. Subsequently the toolchain will execute this to convert the extracted repo into a sysroot:
repos.bzl
rctx.file(
"sysroot/BUILD.bazel",
"""filegroup(
name = "sysroot",
srcs = ["."],
visibility = ["//visibility:public"],
)""",
)This approach is necessary but Bazel does not particularly like it: if you run this without any overrides you will get warnings about directories as inputs not being supported. So far the only workaround I found was to add this to .bazelrc:
startup --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1This startup flag forces the Bazel daemon to monitor directories for changes, which it does not do by default for performance reasons. Still, we now have an empty sysroot directory in a format that toolchains_llvm can ingest.
The output of the apko execution is a tar file, so we'll follow toolchains_llvm's own custom sysroot.bzl and use the embedded tar toolchain for the host platform:
repos.bzl
host_bsdtar = Label("@bsd_tar_toolchains_%s//:tar" % repo_utils.platform(rctx))
cmd = [
rctx.path(host_bsdtar),
"--extract",
"--no-same-owner",
"--no-same-permissions",
"--file",
archive,
"--directory",
"sysroot",
"--strip-components",
str(rctx.attr.strip_components),
]
for include in rctx.attr.include_patterns:
cmd.extend(["--include", include])
for exclude in rctx.attr.exclude_patterns:
cmd.extend(["--exclude", exclude])
result = rctx.execute(cmd)
if result.return_code != 0:
fail(result.stdout + result.stderr)
rctx.delete(archive)Finally we return the repo_metadata, and tell it that it's reproducible so it's cached:
repos.bzl
if hasattr(rctx, "repo_metadata"):
return rctx.repo_metadata(reproducible = True)
else:
return NoneWe can now use our rule and declare sysroots for each architecture:
MODULE.bazel
sysroot = use_repo_rule("//build-support/sysroot:repos.bzl", "sysroot")
sysroot(
name = "sysroot_amd64",
architecture = "amd64",
include_patterns = ["**"],
exclude_patterns = ["dev/*", "etc/shadow", "etc/gshadow"],
)
sysroot(
name = "sysroot_arm64",
architecture = "arm64",
include_patterns = ["**"],
exclude_patterns = ["dev/*", "etc/shadow", "etc/gshadow"],
)which can support our toolchain:
MODULE.bazel
# Configure LLVM toolchain for cross-compilation
llvm = use_extension("@toolchains_llvm//toolchain/extensions:llvm.bzl", "llvm")
# Configure toolchain with Linux targets
llvm.toolchain(
name = "llvm_toolchain",
llvm_version = "21.1.6",
extra_llvm_distributions = {
"LLVM-21.1.6-Linux-ARM64.tar.xz": "1d8a9e...",
"LLVM-21.1.6-Linux-X64.tar.xz": "38bd99...",
"LLVM-21.1.6-macOS-ARM64.tar.xz": "bdf036...",
"clang+llvm-21.1.6-x86_64-pc-windows-msvc.tar.xz": "6fd57e...",
},
stdlib = {
"linux-x86_64": "stdc++",
"linux-aarch64": "stdc++",
}
)
# Register sysroots for cross-compilation to Linux
llvm.sysroot(
name = "llvm_toolchain",
label = "@sysroot_amd64//sysroot",
targets = ["linux-x86_64"],
)
llvm.sysroot(
name = "llvm_toolchain",
label = "@sysroot_arm64//sysroot",
targets = ["linux-aarch64"],
)These last two blocks bind the labels of our declared sysroot repositories. Note that we have to point to the sysroot directory inside the repo in our label.
With all this in place, you can easily cross-compile Linux x86 and Linux ARM64 binaries from different hosts, including Apple Silicon.
Hermetic builds are an important principle for Bazel, and essential when you are dealing with a compiled language like C++ that is very sensitive to its environment. From a cybersecurity perspective as well, the more you can control and build and runtime environment, the lower the probability that something unexpected sneaks in. A fully realized Shift Left setup for Modern C++ will require extending this to more advanced features like building OCI containers and leveraging remote caching and execution, but Bazel gives you all the pieces needed to get there.
Chainguard's Wolfi is a distroless base for Docker containers: the bare minimum needed to run a Linux kernel. This has two big implications:
the images are small, which can make them faster to build -- a key consideration if you are aiming for rapid iteration and continuous delivery
the number of packages is limited, which reduces attack surface
It goes hand-in-hand with apko, a tool that builds OCI-compliant containers from Alpine's apk distribution format. The companion rules_apko module for the Bazel build system then lets you automate creating those containers. However, there is a catch if you are building native code: you also need to package up your own binary, and that binary needs to be fully compatible with the target runtime for the container, which could be different from the host operating system. For instance, you might be compiling on MacOS running on ARM64-based Apple Silicon chip, but you want to create an image that runs on Linux on an x86-based Intel i9 chip.
In order to do this, you need to cross-compile the binary. LLVM excels at this, and Bazel's toolchains_llvm helpfully takes care of things like downloading the toolchain and sysroot that you need to cross-compile. The latter, though, is its own tricky piece. A sysroot is a filesystem with things like glibc, Linux kernel headers, etc.. You need this because even if your compiler running on MacOS knows how to generate machine code for x86 while running on an ARM processor, it still needs headers and libraries to link to in order to create a complete Linux executable. Depending on how you compile and link, your new executable might end up with absolute paths or links to shared libraries that may or may not be available inside your container -- or worse, the versions present might have subtle differences, so what you compiled and tested might not be what you run.
To get around this hazard, Bazel encourages hermetic builds. Namely, you want full control of the build environment, libraries, etc. and then you want to use that to deploy as well. In our case, since we want to target Wolfi, it would be nice if that sysroot we were compiling against was the same base image as the one we plan to use with apko, right?
That's what we're going to tackle here.
Chainguard's wolfi-base image can be pulled from their Docker registry:
docker pull cgr.dev/chainguard/wolfi-basewhich is great if you want to run it, and not as helpful if your goal is to build a sysroot for cross-compilation. What we really want to do is run apko to create a sysroot, and then expose it to the LLVM toolchain. Thankfully, with a repository_rule in Bazel you can do just that, even downloading the apko binary you need on-the-fly.
We start with sysroot.yaml -- an apko configuration file that starts with wolfi-base and layers on top glibc, libstdc++ and the Linux kernel headers.
sysroot.yaml
contents:
repositories:
- https://packages.wolfi.dev/os
keyring:
- https://packages.wolfi.dev/os/wolfi-signing.rsa.pub
packages:
- build-base
- glibc-dev
- libstdc++-dev
- linux-headers
archs:
- x86_64
- aarch64This will create a larger than usual image: about 500 MB each, almost entirely due to gcc's inclusion in build-base. However, if we subsequently create an even more minimal setup, we know that the hermetic build's sysroot and the container image share identical foundations -- making the image more reliable and secure.
One key finding it getting this working is that toolchains_llvm requires a sysroot to be a package, which means you have to generate it with a repository_rule. However, this imposes a constraint: repository rules are evaluated during Bazel's loading phase, and so you cannot depend on artifacts generated by Bazel itself. That means if we want to call apko unfortunately we cannot rely on rules_apko to load the binary for us: we have to download a specific version. This is why the attributes include details like apko_version and apko_sha256 .
repos.bzl
sysroot = repository_rule(
implementation = _sysroot_impl,
attrs = {
"apko_config": attr.label(
doc = "Label pointing to the apko config YAML file.",
default = "//build-support/sysroot:sysroot.yaml"),
"architecture": attr.string(
mandatory = True,
values = ["amd64", "arm64"]
),
"apko_version": attr.string(
doc = "Version of apko to use for building the sysroot.",
default = "0.30.26",
),
"apko_sha256": attr.string_dict(
doc = "SHA256 checksums of the apko binary, per architecture.",
default = {
"darwin_arm64": "347bd6c...",
"linux_amd64": "12c227b...",
"linux_arm64": "f46bc84...",
},
),
"strip_components": attr.int(
doc = "Number of components to strip when extracting (similar to strip_prefix).",
),
"include_patterns": attr.string_list(),
"exclude_patterns": attr.string_list(),
},
)With the rule defined, we need an implementation. The first part is just using the repository_ctx object and our input attributes to download the appropriate apko version for our host binary:
repos.bzl
load("@aspect_bazel_lib//lib:repo_utils.bzl", "repo_utils")
def _sysroot_impl(rctx):
apko_version = rctx.attr.apko_version
host_platform = repo_utils.platform(rctx)
url = "https://github.com/chainguard-dev/apko/releases/download/v{apko_version}/apko_{apko_version}_{host_platform}.tar.gz".format(
apko_version = apko_version,
host_platform = host_platform,
)
strip_prefix = "apko_{}_{}".format(
apko_version,
host_platform,
)
apko_sha256 = rctx.attr.apko_sha256.get(host_platform)
if apko_sha256 == None:
fail("No apko SHA256 checksum provided for platform: %s" % host_platform)
rctx.download_and_extract(
url = url,
output = "apko",
sha256 = apko_sha256,
strip_prefix = strip_prefix,
)
We can then use the context to run apko build-minirootfs and turn the apko_config YAML into an extract of the Linux usr, lib and other top-level directories:
repos.bzl
archive = rctx.path("sysroot.tar")
result = rctx.execute([
rctx.path("apko/apko"),
"build-minirootfs",
rctx.path(rctx.attr.apko_config),
archive,
"--build-arch",
rctx.attr.architecture,
])
if result.return_code != 0:
fail(result.stdout + result.stderr)The next step is critical for toolchains_llvm and ultimately clang to work. We declare a BUILD.bazel inside the new repo that returns a filegroup with the top-level directory. Subsequently the toolchain will execute this to convert the extracted repo into a sysroot:
repos.bzl
rctx.file(
"sysroot/BUILD.bazel",
"""filegroup(
name = "sysroot",
srcs = ["."],
visibility = ["//visibility:public"],
)""",
)This approach is necessary but Bazel does not particularly like it: if you run this without any overrides you will get warnings about directories as inputs not being supported. So far the only workaround I found was to add this to .bazelrc:
startup --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1This startup flag forces the Bazel daemon to monitor directories for changes, which it does not do by default for performance reasons. Still, we now have an empty sysroot directory in a format that toolchains_llvm can ingest.
The output of the apko execution is a tar file, so we'll follow toolchains_llvm's own custom sysroot.bzl and use the embedded tar toolchain for the host platform:
repos.bzl
host_bsdtar = Label("@bsd_tar_toolchains_%s//:tar" % repo_utils.platform(rctx))
cmd = [
rctx.path(host_bsdtar),
"--extract",
"--no-same-owner",
"--no-same-permissions",
"--file",
archive,
"--directory",
"sysroot",
"--strip-components",
str(rctx.attr.strip_components),
]
for include in rctx.attr.include_patterns:
cmd.extend(["--include", include])
for exclude in rctx.attr.exclude_patterns:
cmd.extend(["--exclude", exclude])
result = rctx.execute(cmd)
if result.return_code != 0:
fail(result.stdout + result.stderr)
rctx.delete(archive)Finally we return the repo_metadata, and tell it that it's reproducible so it's cached:
repos.bzl
if hasattr(rctx, "repo_metadata"):
return rctx.repo_metadata(reproducible = True)
else:
return NoneWe can now use our rule and declare sysroots for each architecture:
MODULE.bazel
sysroot = use_repo_rule("//build-support/sysroot:repos.bzl", "sysroot")
sysroot(
name = "sysroot_amd64",
architecture = "amd64",
include_patterns = ["**"],
exclude_patterns = ["dev/*", "etc/shadow", "etc/gshadow"],
)
sysroot(
name = "sysroot_arm64",
architecture = "arm64",
include_patterns = ["**"],
exclude_patterns = ["dev/*", "etc/shadow", "etc/gshadow"],
)which can support our toolchain:
MODULE.bazel
# Configure LLVM toolchain for cross-compilation
llvm = use_extension("@toolchains_llvm//toolchain/extensions:llvm.bzl", "llvm")
# Configure toolchain with Linux targets
llvm.toolchain(
name = "llvm_toolchain",
llvm_version = "21.1.6",
extra_llvm_distributions = {
"LLVM-21.1.6-Linux-ARM64.tar.xz": "1d8a9e...",
"LLVM-21.1.6-Linux-X64.tar.xz": "38bd99...",
"LLVM-21.1.6-macOS-ARM64.tar.xz": "bdf036...",
"clang+llvm-21.1.6-x86_64-pc-windows-msvc.tar.xz": "6fd57e...",
},
stdlib = {
"linux-x86_64": "stdc++",
"linux-aarch64": "stdc++",
}
)
# Register sysroots for cross-compilation to Linux
llvm.sysroot(
name = "llvm_toolchain",
label = "@sysroot_amd64//sysroot",
targets = ["linux-x86_64"],
)
llvm.sysroot(
name = "llvm_toolchain",
label = "@sysroot_arm64//sysroot",
targets = ["linux-aarch64"],
)These last two blocks bind the labels of our declared sysroot repositories. Note that we have to point to the sysroot directory inside the repo in our label.
With all this in place, you can easily cross-compile Linux x86 and Linux ARM64 binaries from different hosts, including Apple Silicon.
Hermetic builds are an important principle for Bazel, and essential when you are dealing with a compiled language like C++ that is very sensitive to its environment. From a cybersecurity perspective as well, the more you can control and build and runtime environment, the lower the probability that something unexpected sneaks in. A fully realized Shift Left setup for Modern C++ will require extending this to more advanced features like building OCI containers and leveraging remote caching and execution, but Bazel gives you all the pieces needed to get there.
No activity yet