Engineering

Efficiency at its Best: How to Turn 1-Hour Builds into 1-Minute Builds

No items found.

Many software engineers aim to write code quickly and create an application. However, the more code you have, the slower the Edit -> Build -> Test feedback loop becomes.

On larger projects, this circle can take hours, and in this article, we will explore what we can do about this with a special focus on the Buck/Bazel build-system.

But before we dive into the build-system let’s first discuss what makes (CI) builds slow and what we can do about it from first principles.

What are my levers?

From the time your Continuous Integration system receives your changes, you want to build an

Given your change, the build system needs to answer the following:

  • Which parts of the application need to be rebuilt and tested?
  • How can the affected parts be built the quickest?
  • Can any parts be re-used from previous builds (cache)?
  • What parts can be built in parallel?

So optimising a build system means:

Build less

  • Identify the minimum set of what needs to be built
  • Identify what can be reused from a previous run

Build in parallel

  • Utilising more CPU
  • Distribute work across multiple machines/

Build faster

  • Use more efficient tools (e.g. switch from webpack to SWC)

What exactly are Buck/Bazel, and how do they fit into this?

Basel and Buck are written in java as a tool and are two similar build systems. How you would classify them: They are build-systems focusing on incremental, reproducible and hermetically sealed builds for multi-language projects. This makes them uniquely suitable for large-scale mono-repo projects, as you find them at Google and Facebook.

Bazel has grown immensely since it became an integral part of the widely popular TensorFlow machine learning library.

Reproducible and Hermetically Sealed

No matter on which computer and under which conditions you attempt to build the application. It should yield exactly the same binary.

This is done by running each build step in an isolated container in which certain parameters are controlled, e.g. timestamp and process ids or random number seeds.

This ensures 4 things:

  1. Mitigates the infamous “it works on my machine” problem
  2. Allows effective caching
  3. Allows safe distribution of cache
  4. Allows safe execution of builds on other machines to parallelize

How does it look?

Bazel uses skylark, a flavour python, as a description language:

# build.bzl
cxx_library(
name = "myLib"
srcs = glob (["lib/*.cpp"])
hdrs = [":gen_header"]
);

cxx_test (
name = "myLibl-tests",
deps = ["://myLibl", "some0therLib//:myLib2"]
srcs = glob ("lib/*.cpp"),
)

genrule(
name = "gen_header",
srcs = [],
outs = ["x.h"],
cmd = "echo 'int x();'>$@",
)

One of the interesting things about this is while you see the function describes what needs to be built, and they dont do the work. They’re merely a language that describes what you need and what depends on what. You can link them by describing and naming another rule, e.g. generated header and the genrule, you dont need source files, and the command is how you do it.

Everything is built on top of this general rule: “genrule”, an imperative instruction to do something, e.g. call an application via the command line.

This is primarily built for monorepos so the paradigm works: you paste in the project and build files here, and you’ll be able from anywhere to refer to any of the rules you defined in the other files. If you have a folder that has built rules, you can address this by specifying the path and the name of the rule, for instance. You can easily vendor libraries, for instance.

monorepo/
WORKSPACE
BUILD
project1/
 WORKSPACE
 BUILD
 srcs/
    …
project2/
 WORKSPACE
 BUILD
 my-libs/

You can create a folder and put the build file inside as well and easily refer back to it.

They also added as a capability to refer to rules not in your project that need to be downloaded from elsewhere. This allows you to use this code as a package manager to a certain extent. For things to be reproducible, things are usually locked down to one specific version, and this is how you can download the terraform rules so you can integrate your builds with the deployments described in Terraform, for example.

http_archive(
  name = "io_bazel_rules_terraform",
  sha256 = "cedf034b1454f3443f531354534245463566546",
  urls = [
      "https://github.com/jmileson/rules_terraform/archive/v0.1.0.tar.gz"
  ]
)

load(
 "@io_bazel_rules_terraform//terraform:terraform.bzl",
 "terraform_register_toolchains"
)

terraform_register_toolchains("0.12.8")

You will need to load this:

load("@io_bazel_rules_terraform//terraform:terraform.bzl", "terraform_plan")

Then you’ll need to have a rule that describes the terraform files that you find in the project, and you can run it:

terraform_plan(
name = "plan"
srcs = glob(["**/*.tf"])
)

How do you run this?

You can install Bazel using npm, and Github actions have Bazel installed by default. You’ll find the familiar syntax to build, test, and run things:

npm i -g @bazel/bazelisk
bazel build//... # ... means everything
bazel test//...
bazel run//...

The interesting thing about this is that everything is cached and built to be incremental and reproducible. This is how it’s built:

The description language is Skylark — it hashes the files describing the build. If it hasn’t run, it outputs an action graph which outputs what needs to be built. Based on the changes in your file system, which all get hashed, we will check the differences and determine the minimum set of things that need to be built.

Every step and service is consulted, which checks whether the cache has a specific hash readily available; if yes, it will skip everything and download the resource if it’s unavailable locally. This also introduces the capability of running a peer-to-peer distributed cache. You can hook up all the machines not only in the cloud but also in your local machine and colleague’s machine, which all can contribute to the cache, and you can exchange and populate the entries they create while building things.

The cache is also built on content-addressable storage. This means the file name, in a way, is the hash of the pile. This allows the likelihood of a hash hit because you dont care about the details where the file lives or which project it was built, so across projects, the cache can be used across branches and find if there’s a file hashes with the same name.

Another capability it offers is offloading the builds to the cloud or a connected new worker. That’s one of the reasons you must meticulously describe where everything depends and describe the graph of what depends on what because the worker needs to know what files are required to run this build step.

# build.bzl
cxx_library(
name = "myLib"
srcs = glob (["lib/*.cpp"])
hdrs = [":gen_header"]
);

cxx_test (
name = "myLibl-tests",
deps = ["://myLibl", "some0therLib//:myLib2"]
srcs = glob ("lib/*.cpp"),
)

genrule(
name = "gen_header",
srcs = [],
outs = ["x.h"],
cmd = "echo 'int x();'>$@",
)

There are also commercial and open-source dashboard solutions as well (like Buildbuddy). What you build also tracks how things progress and seek trends, how often things are being built, and how quickly, and it allows you to manage your workers with ease.

As you scale, you can decide to opt for this solution. The API is simple as to how simple it works. There’s an ecosystem around this which is useful so that you dont need to rebuild everything.

What’s the catch?

It promises a lot, but there’s a huge problem with meticulously declaring exactly what files it uses. You have to do this at a granular level and do it for every buildstep. The common workflow when building this is by writing out rules and then writing the build. Every build step runs in its container.

Many rules are community-maintained, and DX rules are incomplete, so you write your own rules on top of this general instruction.

You can optimize things a lot, and you end up in the rabbit holes with other tools in the javascript ecosystem; people see the potential and want to reach the potential. Sometimes people invest too much disproportionate to the value that they get out of this.

Opportunities

Bazel makes some workflows easier because the more granular your build description is, the faster things get.

Since the execution is sandboxed and parameters are controlled, builds are more deterministic, which leads to more cache hits. For instance, the files may have a processId as metadata, which would lead to a different hash on every run, even though the product didn’t change in any meaningful way. You can find this is the case for many build systems and languages. Bazel mitigates those types of issues.

If you want to take advantage of the distributed builds, you’ll want to bundle toolchains and systems libs, so things run on other platforms more easily.

Javascript

The rules and the nature of Bazel are explicit. It gives you full control over each step. In contrast, in Javascript, you have a few independent tools (yarn, tsc, webpack, etc.) that handle most of the build process without much configuration and deal with caching and dependency analysis out of the box.

This means if you want to replicate that in Bazel, you’ll need to explicitly describe the (filesystem-)dependencies of each tool so Bazel can correctly containerize each step and orchestrate the build process optimally.

As an example, this is how you could package a node application that requires node_moduels:

When you install packages like json, if it’s a monorepo, you’ll have to do some linking across projects. You have to explicitly pull out this info so that Bazel understands how to write things up together because, in the end, it runs in an isolated environment, in the container, and you need to cater to that limitation.

yarn_install(
  name = "npm",
  package_json = "//web:package.json",
  yarn_lock = "//web:yarn.lock",
  links = {
       "@scope/target": "//some/scoped/target",
       "target": "//some/target"
  }
)

nodejs_binary(
  name = "bin",
  entry_point = "bin.js",
  deps = [
      "@npm//@scope/target",
      "@npm//target",
      "@npm//other/dep"
  ]
)

Because all the files come from some other rules, the file system means they can be located anywhere. If you want to use it in an application in JS, you have to explicitly pull out the dependencies, so what you end up with if the project is bigger:

nodejs_binary(
  name = "bin",
  entry_point = "bin.js",
  deps = [
      "@npm//@scope/target",
      "@npm//target",
      "@npm//other/dep"
  ]
)

This gives you a lot of flexibility but might mean a lot of work and maintenance in bigger projects as your project grows.

More potential but also more work.

Conclusion

Most small JavaScript projects have decent out-of-the-box build performance and require relatively little configuration. Bazel adds an additional layer of configuration that may need to be maintained as the project evolves and poses an additional maintenance burden that needs to be considered before adopting it in a team.

If the benefits of build performance, security, distributed caching and execution, or the capability to deal with native or multi-language projects speak to you, then Bazel might be the right choice.

If you’re a mobile developer or react native-focused engineer and need to manage native components, you may want to try Bazel (or Buck).

Efficiency at its Best: How to Turn 1-Hour Builds into 1-Minute Builds
was originally published in YLD Blog on Medium.
Share this article: