Having a language that does a lot of checks at compile time is not free, it will impact compilation times. Luckily there are some things we can do to speed things up: Dynamic linking, be careful with code generation and caching dependencies.
Dynamic linking is a somewhat difficult thing to achieve in Rust but not impossible. The main reason is that at the time of writing this (Sept-Oct 2024) Rust does not have its own stable ABI and it must rely on the C binary representation (if we want to inter-operate with other languages or other Rust versions). This has some interesting consequences that we will explore in this post.
Code generation is when the high level representation of the source code is turned into binary code that can be executed by the machine. Given that the rust compiler uses LLVM, the level of optimizations and the quantity of generated code will affect the compilation speed.
What is Linking?
Programs are usually divided into several modules and they have numerous dependencies. Linking is a compilation stage where all the compiled code of those modules needed by a program (and the code of the program itself) is made available in the final executable. We have two ways of linking a program: static and dynamic.
Static Linking
All the code needed by a program (from external modules and the program itself) is put together in the final executable. This creates fat binaries but it makes the program portable.
Dynamic Linking
This type of linking must be supported by the operating system (most, if not all of the major operating systems support this). In this approach, instead of containing all the code needed, the executable contains undefined symbols and a list of objects that contain the code for those symbols.
These objects, often referred to as libraries, are binary files used to share binary code between several programs. In Microsoft Windows those files are known as DLLs (dynamically-linked library) and in Unix operating systems (Linux, Mac OS, etc) are known as SOs (shred objects).
When running a dynamically linked executable, the operating system loads the program code along with the libraries in memory and do the final linking.
This approach creates “thin” binaries and saves disk and memory space, since the code from the libraries are shared among several applications.
Different linking modes in Rust
There are several different linking modes that we can use, producing different kind of shared objects, but in this post we will focus only on one of them: dylib
.
If we put this configuration in our library crates, the Rust compiler will generate a dynamic library that will be dynamically linked with our executable. Isn’t that what we need? Why do we have other configurations?1 Given that Rust does not have an stable ABI yet, there are no guarantees that the compiled libraries will work if we don’t compile the project with the same Rust version used for compiling the library.
This mode is suitable for a project where we have one or more library crates that are used by several binary crates (or other libraries). We will usually compile everything altogether the first time and then recompile only the things we change. If we change the Rust version, we need to recompile everything.
Compilation stages
To generate binary objects, Rust compiler must go through several different stages. I am not going to explain how a compiler works in detail, but having a general idea of what happens will help us identify places where we can work to optimize compile times.
To measure compilation times, we will use the built-in cargo tool called timings. This tool will generate a detailed HTML report showing how long every compilation unit takes to compile.
From source code to intermediate representation
In the timings report, the stages described below are pictured in light blue in the Grantt chart.
Lexing and parsing
The compiler first performs the lexing and parsing stage, where lexing is transforming the source code into an stream of tokens and parsing is generating an AST (Abstract Syntax Tree) from those tokens.
Macro expansion (generating valid Rust code from the macros) is also done at this stage.
AST Lowering
After the AST is created, it is converted into a High Level Intermediate Representation (HIR), this stage is called AST Lowering. In HIR the compiler does type inference, type checking and resolve traits.
MIR Lowering
When HIR is ready, then we enter the MIR lowering stage, that is, transforming HIR to Middle Level Representation (MIR). In this stage, the famous borrow checking is done, code monomorphization, and some optimizations that will improve code generation and compilation speed in that stage.
Code generation
This stage is pictured in purple in the Grantt chart. When we are here, the compiler already has everything represented in MIR. During this phase, the MIR is transformed into LLVM-IR (LLVM Intermediate Representation) and handled to LLVM.
LLVM does a lot more of optimizations and generates the assembler and binary code that later is linked into the final object.
If you want to learn more about the compilation stages, check this article.
Reducing compile times
The toy project
In order to show how to optimize compiling times, we are going to use a toy project that consist in one library crate and 40 separate binaries that use the library. You may ask yourself: What kind of project has that structure?! It could be a server-less project containing several cloud functions (like AWS Lambdas) that share functionality through some library crates or some project that consist in several binaries (like GNU Core utilities).
How I get the timings
The timings found on this posts are the last timing returned by cargo after running 10 times the same compilation, always doing cargo clean
before executing cargo build [...]
. I compile the project 10 times to verify that on average the compilation times are more or less the same.
Initial compile times
Here are the individual compile times for the toy project. This and further compilations were done in Debian 12, with an i7-6700K and 16GB DDR4 2600Mhz Ram 2:
The total time was
Finished `release` profile [optimized] target(s) in 25.30s
And the size of the binaries is about 3.5MB:
ls -l --block-size=KB ./target/release
total 143262kB
-rwxr-xr-x 2 nico nico 3489kB Sep 16 15:03 bin1
-rwxr-xr-x 2 nico nico 3489kB Sep 16 15:03 bin10
-rw-r--r-- 1 nico nico 1kB Sep 16 15:04 bin10.d
-rwxr-xr-x 2 nico nico 3489kB Sep 16 15:03 bin11
-rw-r--r-- 1 nico nico 1kB Sep 16 15:04 bin11.d
-rwxr-xr-x 2 nico nico 3489kB Sep 16 15:03 bin12
-rw-r--r-- 1 nico nico 1kB Sep 16 15:04 bin12.d
-rwxr-xr-x 2 nico nico 3489kB Sep 16 15:03 bin13
The command used was cargo build --release --timings
. You can check the source code of the toy project here.
Removing unnecessary dependencies
It is common that in a project, the people involved usually forget about removing old dependencies. This happens because when projects are large, it is hard to know if a dependency is not used anymore. Luckily, we can use the -Wunused-crate-dependencies
flag that tells us which dependencies are not being used by the crates inside the project. If we compile with RUSTFLAGS=-Wunused-crate-dependencies cargo build --release --timings
we get the following output:
...
warning: external crate `actix` unused in `lib1`: remove the dependency or add `use actix as _;`
|
= note: requested on the command line with `-W unused-crate-dependencies`
warning: external crate `serde_json` unused in `lib1`: remove the dependency or add `use serde_json as _;`
warning: external crate `tokio` unused in `lib1`: remove the dependency or add `use tokio as _;`
warning: `lib1` (lib) generated 3 warnings
By removing the unused dependencies reported by the warnings, we reduced the total compilation time a little:
Finished `release` profile [optimized] target(s) in 23.85s
It is not much, but by not compiling those dependencies, we gained around 1.x seconds! To keep our project clean, we can use this flag in our CI/CD pipeline to warn us when we forget to remove an old dependency.
You can find the modifications made in this section here.
Removing unnecessary derives
Macros create valid Rust code that then has to be parsed, transformed, validated and optimized. It may happen that you need to derive some trait, not because is used by productive code but is used by test code. It does not make sense to process that code in release builds.
A nice “trick” to avoid processing that code in release builds is to derive it behind a cargo feature and only activate that feature in the [dev-dependencies]
section of the Cargo.toml
. The Cargo.toml
from lib1
was changed this way:
[package]
name = "lib1"
version = "0.1.0"
edition = "2021"
[dependencies]
mockall = { workspace = true, optional = true }
reqwest = { workspace = true }
serde = { workspace = true }
[features]
tests = ["dep:mockall"]
And we put behind the tests
feature, all the code we do not need in production 3:
#[cfg_attr(feature = "tests", mockall::automock)]
pub trait Trait1 {
fn fn1();
fn fn2(a: u16) -> String;
fn fn3(a: String) -> u16;
}
...
#[derive(Serialize, Default)]
#[cfg_attr(feature = "tests", derive(Deserialize, PartialEq, Eq, Debug))]
pub struct Struct1 {
pub f1: u8,
pub f2: String,
pub f3: HashMap<String, String>,
pub f4: HashSet<String>,
pub f5: Vec<String>,
}
...
#[derive(Deserialize, Default)]
#[cfg_attr(feature = "tests", derive(Serialize, PartialEq, Eq, Debug))]
pub struct Struct9 {
pub f1: u8,
pub f2: String,
pub f3: HashMap<String, String>,
pub f4: HashSet<String>,
pub f5: Vec<String>,
}
Activating the feature will work for both unit and integration tests. Here are the compilation times after introducing the flag:
And the total time:
Finished release profile [optimized] target(s) in 21.58s
By reducing the code generated by the derives and automock
macros, Rust have less code to translate to intermediate representation (light blue), code to generate (purple) and optimize. The time reduction was huge, from an average of 4.x seconds to an average of 0.3 seconds.
The take-home lesson here is: do not take the auto generation of code. If you don’t need it in the production build, do not compile it.
You can find the modifications made in this section here.
Dynamic Linking
Until now, the code contained in lib1
is statically linked to all the binaries in our project. Instead of repeating the code in every binary, we can use dynamic linking to have lib1
as a shared object, allowing the binaries to use the code without the need of having it embedded.
With dynamic linking we will not only achieve faster compile times, we will also get smaller binaries and, if there’s a bug in the library, we can fix it and deploy the shared object without the need of modifying the binaries (as long as we use the same Rust version used to compile the binaries).
To activate dynamic linking, we need to add to the lib1
’s Cargo.toml
the following lines at the end:
[lib]
crate-type = ["dylib"]
And compile ithe project t with: RUSTFLAGS="-C prefer-dynamic" cargo build --release --timings
. Here are the compilation times with dynamic linking:
The compilation times for lib1
increased 4, but for binaries times reduced from an average of 0.8x to an average of 0.3x! They were also reduced in size: from 3489 kB to 12 kB!
ls -l --block-size=KB ./target/release
total 7238kB
-rwxr-xr-x 2 nico nico 12kB Sep 16 14:43 bin1
-rwxr-xr-x 2 nico nico 12kB Sep 16 14:43 bin10
-rw-r--r-- 1 nico nico 1kB Sep 16 14:43 bin10.d
-rwxr-xr-x 2 nico nico 12kB Sep 16 14:43 bin11
-rw-r--r-- 1 nico nico 1kB Sep 16 14:43 bin11.d
-rwxr-xr-x 2 nico nico 12kB Sep 16 14:43 bin12
...
drwxr-xr-x 2 nico nico 5kB Sep 16 14:43 incremental
-rw-r--r-- 1 nico nico 1kB Sep 16 14:43 liblib1.d
-rwxr-xr-x 2 nico nico 6555kB Sep 16 14:43 liblib1.so
The total time was
Finished `release` profile [optimized] target(s) in 19.74s
If we check the libraries needed for any of our binaries, we are going to see a dependency with liblib1.so
. ldd
outputs “not found” because the shared object is located in the target directory at the moment of running the command and not in the usual paths where shared objects can be found (/lib
, /usr/lib
, /usr/local/lib
) or in any of the paths listed in the LD_LIBRARY_PATH
environment variable.
$ ldd ./target/release/bin1
linux-vdso.so.1 (0x00007ffe573aa000)
liblib1.so => not found
libstd-52417a9a08ba8fb9.so => not found
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc19594a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc195769000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc195986000)
You can find the modifications made in this section here.
Cache dependencies
Most of the total compilation time was taken by the the project’s dependencies. In this section we are going to explore two ways those dependencies can be cached so we avoid recompiling them every time we build it. This is specially useful in a continuous integration/deployment environment, where we are constantly compiling the project but the dependencies rarely change.
sccache
sccache is a tool developed by Mozilla. It can be used with several compilers, not only rustc. It works as a wrapper of the compiler, chaching compiled things locally on disk and avoiding recompiling them if possible.
To install it, we can run:
$ cargo install sccache
Then, we can use it by wrapping the rustc
compiler with the RUSTC_WRAPPER
environment variable:
$ RUSTC_WRAPPER=sccache RUSTFLAGS="-C prefer-dynamic" cargo build --release --timings
We compiled the project with the dynamic linking activated. The first compilation took around 23 seconds, 3.x seconds more than last compilation but, in the first one, sccache
was caching the compiled dependencies. After running cargo clean
and recompiling the project again we get:
Finished `release` profile [optimized] target(s) in 6.54s
So, we dropped from an average of 19.x seconds to an average of 6.5x seconds!
Cargo Chef
cargo-chef is an awesome tool created by Luca Palmieri. It is designed to speed up compilation times when using containers to build the project. Basically, what it does under the hood is locate all the entry points of our workspace either for libs (lib.rs
) or binaries (main.rs
), remove all the code from them, leave some trivial code like
// main.rs
fn main() {}
and compile the project. In other words, it avoids compiling the source code from the project. It just compiles the dependencies to cache them. In future compilations, the dependencies will be already cached, so only the project’s business logic will be compiled.
As stated in the official documentation and in a warning if you try to use it locally, this is designed to be used with containers because it leverages on the Docker’s layer cache mechanism to work. It is not recommended using it to compile the project locally.
For demonstration purposes, I modified the Dockerfile suggested in the official cargo chef documentation:
FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
WORKDIR /app
FROM chef AS planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json
FROM chef AS builder
COPY --from=planner /app/recipe.json recipe.json
# Build dependencies - this is the caching Docker layer!
RUN CARGO_TARGET_DIR=/app/cache RUSTFLAGS="-C prefer-dynamic" cargo chef cook --release --workspace --recipe-path recipe.json
COPY . .
ENTRYPOINT ["/bin/sh"]
The image produced from this Dockerfile will contain all the project dependencies already cached in /app/cache
directory. It is very important to use cargo chef with exactly the same configuration you are going to use to compile the project. Since we are using the dynamic linking branch for the demonstration, we must include the RUSTFLAGS="-C prefer-dynamic"
flag.
Here are the steps I followed:
- Build the image:
docker build --tag chef .
- Enter the container:
docker run -it chef
- Compile the project:
CARGO_TARGET_DIR=/app/cache RUSTFLAGS="-C prefer-dynamic" cargo build --release --workspace
.
The total compilation time is:
Finished `release` profile [optimized] target(s) in 2.64s
We’ve just compiled the whole project in 2.64s! This is a massive time reduction!
Summary
We started our compile time reduction journey with static linked binaries with a size of 3489kB and a total compilation time of 25.x seconds and we finished it with dynamically linked binaries with a size of 12kB and a total compilation time of 2.x seconds:
Modification | Total time | % Time reduction from original |
---|---|---|
Original codebase | 25.3s | 0% |
Remove unused dependencies | 23.85s | 5.73% |
Remove unnecesary derives | 21.58s | 14.70% |
Dynamic Linking | 19.74s | 21.98% |
Dynamic Linking + sccache | 6.54s | 75.15% |
Dynamic Linking + cargo chef | 2.64s | 89.57% |
It is important to remember that all the steps include the modifications from the previous steps, with the exception of the caches, that use the dynamic linking branch but sccache and cargo chef are used separately.
Conclusion
Sometimes when we are working on projects, deadlines are tight, product team need to release new features and we need to choose wisely on what we spend our time. If we are lucky enough to be in a team that saves time to work on technical debt, we should really use that oportunity to make the structural changes needed in the project to reduce the compilation times. This may sound obvious but not everyone agrees on what is important to solve first.
When projects are small, compilation times are usually small or tolerable, so we don’t pay much attention. As it grows, compilation times may become a real bottleneck for development (imagine that deploying a new version to a dev
environment takes an hour).
Taking care of the compilation will save the whole team a lot of time and headaches, enabling everyone to develop, test and deploy faster.
Resources
- Learn how to setup dynamically loadable plugins for your Rust app
- Rust’s official Linkage page
- Linking Rust crates series
- Minimizing Compile Times
- Speeding up incremental Rust compilation with dylibs
- Build cache
There’s a mode that we can use to avoid recompiling the library to match the Rust version we are currently using:
cdylib
. This mode will produce a dynamic linked library that can be used by other programming languages (and of course, also by Rust). The code compiled with this configuration will follow the C ABI (ordering, size, alignment of fields, etc…) enabling the possibility of directly linking the shared library with a C/C++ program or creating the bindings to use it in another language. The problem with this configuration and Rust is that using the shared object is not straightforward thanks to the C ABI. In another article I will explore this way and show how you can use a Rust library in other languages. ↩︎Compilations are quite fast because there’s not much code. It is enough to show the compilation times improvements. ↩︎
You may ask yourself why I used a feature flag and
#[cfg_attr(feature = "tests", ...)]
instead of plain#[cfg(test)]
. With#[cfg(test)]
, only the current crate would be able to see things under that configuration, or in other words, we would not be able to use the things behind that configuration in the unit and integration tests of the binaries. ↩︎I am not sure why the codegen section (purple) disappeared from the graph and why it took almost the double to compile it. I made the modifications described here in some real world projects and the timings certainly did not doubled for the library crates. ↩︎