Now it is time to talk about references and borrowing. To understand this topic, first check out this post where I talk about ownership and move semantics. As we have seen in the named article, the way Rust manages memory allocations is rather unique. This is also true when we talk about referencing some place in the memory, something that can be achieved in C with pointers.

GDB

In this post I am going to explore what is happening in memory using the GNU Debugger (gdb) with the special command rust-gdb:

What is a reference in Rust?

A reference is a value that points to data in memory. Although it is similar to a classic pointer there is a crucial difference between the two: a reference is guaranteed to always point to a memory address that contains a valid piece of data whereas pointers are not¹. The checks performed to guarantee that a reference is always valid is done at compile time.

Consider the following code:

1
2
3
4
5
6
7
8
9
fn main() {
    // Create a new value, with s1 as owner
    let s1 = String::from("hello world!");
    // Create a reference of s1
    let s2 = &s1;

    // Print the s2 value
    println!("{}", s2);
}

This code compiles and runs correctly. What happens in memory? Let’s check it out with GDB!

Breakpoint 1, references_and_borrowing::main () at src/main.rs:3
3           let s1 = String::from("hello world!");
(gdb) n
5           let s2 = &s1;
(gdb) n
8           println!("{}", s2);

At this point, the String s1 is initialized with the text "hello world! and the s1’s reference named s2 is set. Let’s check the stack:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
0x7fffffffd920: 56      217     255     255     255     127     0       0
0x7fffffffd928: 112     251     90      85      85      85      0       0
0x7fffffffd930: 32      208     255     247     255     127     0       0
0x7fffffffd938: 160     251     90      85      85      85      0       0
0x7fffffffd940: 12      0       0       0       0       0       0       0
0x7fffffffd948: 12      0       0       0       0       0       0       0
0x7fffffffd950: 56      217     255     255     255     127     0       0
0x7fffffffd958: 128     167     218     247     255     127     0       0
0x7fffffffd960: 0       0       0       0       0       0       0       0
0x7fffffffd968: 0       0       0       0       0       0       0       0

The lines 4 to 6 is the representation of s1 in the stack: 0x7fffffffd938 is ptr, 0x7fffffffd940 is len and 0x7fffffffd948 is capacity. The reference to s1 is located at 0x7fffffffd950. Let’s print the address value in hexadecimal:

(gdb) x/xg 0x7fffffffd950
0x7fffffffd950: 0x00007fffffffd938

As we can see, the value contained in the address 0x7fffffffd950 is 0x00007fffffffd938², the beginning of the s1’s stack representation!

Reference in memory

¹ An invalid memory region refers to a region that was not assigned to our process or memory that was valid at some point of the program execution but then was freed.
² Zeroes are trimmed for legibility when printed as a memory address by GDB.

The two rules of references

As everything in Rust, references have their own set of rules.

  1. References are always valid.
  2. At any given time we either have any number of immutable references or one mutable reference.

References are always valid

There’s no way of testing this rule at runtime (or at least I don’t know one). As I stated earlier in this post, references are guaranteed to always be valid and this validation is done at compile time.

At any given time we either have any number of immutable references or one mutable reference

At first glance, this rule feels like an unnecessary limitation but thanks to it we are able to catch hidden bugs in our code because data races are avoided at compile time.

A classic example is the one where we have n mutable references of the same piece of numeric data that represents a counter, all in different threads. The only thing the threads do is increment the counter. References by themselves do not have a synchronization mechanism. This is the concurrent counter problem, here’s the whole explanation and an example code in Java. This can’t happen in Rust (code won’t compile) since we need some kind of synchronization mechanism to mutate the same piece of data in different threads.

This is not the only problem this rule keeps us away from! In fact, we don’t even need concurrency, it can avoid bugs in simpler situations. Consider the following code in python:

1
2
3
4
5
6
7
8
def insert_even_zeros(vec):
    for (n,i) in enumerate(vec):
        if n % 2 == 0:
            vec.insert(i, 0)


vec = list(range(1,7)) # [1, 2, 3, 4, 5, 6]
print(vec)

What we are trying to do here is to insert a 0 at the index of a value, if the value is an even number. The expected result for the input [1, 2, 3, 4, 5, 6] is [1, 0, 2, 3, 0, 4, 5, 0, 6] but if we run it, we get [1, 0, 0, 0, 0, 0, 0, 2, 3, 4, 5, 6]. What is happening? The source of the problem resides in the fact that we are mutating the vector while iterating it:

  1. We start at index 0 where the value 1 is located, since 1 is not even we continue to index 1.
  2. At index 1 we find value 2. It is even so we insert a 0 at index 1. Now the array is: [1, 0, 2, 3, 4, 5, 6]. We continue to index 2.
  3. At index 2 we find the value 2 again, because it was moved from its original position in the previous iteration. It is even so we insert a 0 at index 2. Now the array is: [1, 0, 0, 2, 3, 4, 5, 6].
  4. This process is repeated 4 more times since the array length is 6 and how many iterations are going to be executed is calculated at the beginning of the for statement.

This is known as the iterator invalidation problem.

What happens in Rust?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
fn insert_even_zeros(vec: &mut Vec<u8>) {
    for (i, n) in vec.iter().enumerate() {
        if n % 2 == 0 {
            vec.insert(i, 0);
        }
    }
}

fn main() {
    let mut v: Vec<u8> = (1..=6).collect(); // [1, 2, 3, 4, 5, 6]
    insert_even_zeros(&mut v);
}

We get a compilation error that enforces the rule!

error[E0502]: cannot borrow `*vec` as mutable because it is also borrowed as immutable
 --> src/main.rs:4:13
  |
2 |     for (i, n) in vec.iter().enumerate() {
  |                   ----------------------
  |                   |
  |                   immutable borrow occurs here
  |                   immutable borrow later used here
3 |         if n % 2 == 0 {
4 |             vec.insert(i, 0);
  |             ^^^^^^^^^^^^^^^^ mutable borrow occurs here

For more information about this error, try `rustc --explain E0502`.

Where are the mutable and immutable references? In the vector’s function signatures:

  • .iter() takes an immutable reference of the vector (&self):
pub fn iter(&self) -> Iter<'_, T>
  • .insert() takes a mutable reference of the vector (&mut self):
pub fn insert(&mut self, index: usize, element: T)

Does this mean that there’s no way of modifying a vector in Rust while iterating it? No! You can do it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
fn main() {
    let mut v: Vec<u8> = (1..=6).collect();

    let mut i: usize = 0;
    let v_len = v.len();
    while i < v_len {
        if v[i] % 2 == 0 {
            v.insert(i, 0);
        }
        i += 1
    }
}

We also have two references, one immutable (len function) and one mutable (insert function). Why does it work? Because the scope of the immutable reference that is in len ends right after it the function is used (the scope of a reference begins at its creation and extends until the last time the reference is used).

Notice that the error message we got with the for loop says “immutable borrow occurs here” and “immutable borrow later used here”. Both errors come from the same place, the iter() function, where the immutable reference is used.

Does it make sense for a programming language to have these kinds of rules if it is possible to write code to circumvent them? Yes! The way the last code is written is rather “unnatural”. Most of the time Rust will catch bugs at compile time thanks to these rules.

Borrowing

There are times when you don’t want a specific scope to lose ownership of a value. There could be several reasons for that, for example, you need to reuse the value. Consider the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn hello(s: String) {
    println!("hello {s}.");
}

fn bye(s: String) {
    println!("bye {s}.");
}

fn main() {
    let s = String::from("fellow blog reader");
    hello(s);
    bye(s);
}

This code won’t compile:

error[E0382]: use of moved value: `s`
  --> src/main.rs:12:9
   |
10 |     let s = String::from("fellow blog reader");
   |         - move occurs because `s` has type `String`, which does not implement the `Copy` trait
11 |     hello(s);
   |           - value moved here
12 |     bye(s);
   |         ^ value used here after move
   |
note: consider changing this parameter type in function `hello` to borrow instead if owning the value isn't necessary
  --> src/main.rs:1:13
   |
1  | fn hello(s: String) {
   |    -----    ^^^^^^ this parameter takes ownership of the value
   |    |
   |    in this function
help: consider cloning the value if the performance cost is acceptable
   |
11 |     hello(s.clone());
   |            ++++++++

Here we have a similar situation as we had here. As the compiler error says, we are moving s into hello, so when we try to use it in bye we get the “use after move” error. How can we solve this?

Solution 1: Duplicating the value

We can do as the compiler says, and clone the value. This way, both functions get a separate copy of the value that they can own:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn hello(s: String) {
    println!("hello {s}.");
}

fn bye(s: String) {
    println!("bye {s}.");
}

fn main() {
    let s = String::from("fellow blog reader");
    hello(s.clone());
    bye(s);
}

This works! the code compiles and executes without a warning. Is this a good solution? No.

We don’t really need to duplicate s since we are only reading it to print it out. This solution does a lot of extra work by duplicating s’s value in memory.

Solution 2: Returning the ownership back to the caller

Instead of duplicating s’s value, we can return the ownership to the caller, so it can use it again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
fn hello(s: String) -> String {
    println!("hello {s}.");
    s
}

fn bye(s: String) {
    println!("bye {s}.");
}

fn main() {
    let s = String::from("fellow blog reader");
    let s2 = hello(s);
    bye(s2);
}

This also works! the code compiles and executes without a warning. Is this a good solution? Also no.

Passing ownership back and forth functions is not a very comfortable and idiomatic way of doing things. On top of that, the function signatures are not semantically accurate. The signature of hello suggests that we pass an String value and returns back another String value. By just looking at it, it is hard to understand what the function intends to do and it does not make sense to return anything if the only objective of the function is only to print something.

Solution 3: Borrowing

We need to keep the ownership of s in the scope of the main function, we don’t want to duplicate values and we don’t want to move the values back and forth either. What can we do? use a reference!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn hello(s: &String) {
    println!("hello {s}.");
}

fn bye(s: &String) {
    println!("bye {s}.");
}

fn main() {
    let s = String::from("fellow blog reader");
    hello(&s);
    bye(&s);
}

The code compiles and executes without a warning. Is this a good solution? Yes.

Given that we only need to read the value, we don’t want to move it or duplicate it, using a reference is the best solution. Also, it is more idiomatic and semantically correct. By looking at the function’s signatures we know that they do not need to own any value and they will not return any result from the operation they are performing.

Let’s now check what is happening in the memory with GDB:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Breakpoint 1, references_and_borrowing::main () at src/main.rs:10
10          let s = String::from("fellow blog reader");
(gdb) n
11          hello(&s);
(gdb) x/80ub $sp
0x7fffffffd970: 2       0       0       0       0       0       0       0
0x7fffffffd978: 128     217     255     255     255     127     0       0
0x7fffffffd980: 160     251     90      85      85      85      0       0
0x7fffffffd988: 18      0       0       0       0       0       0       0
0x7fffffffd990: 18      0       0       0       0       0       0       0
0x7fffffffd998: 48      251     90      85      85      85      0       0
0x7fffffffd9a0: 1       0       0       0       0       0       0       0
0x7fffffffd9a8: 59      216     85      85      85      85      0       0
0x7fffffffd9b0: 0       240     127     255     255     127     0       0
0x7fffffffd9b8: 48      251     90      85      85      85      0       0

Looks like our String representation in the stack starts at 0x7fffffffd980. Let’s confirm it.

(gdb) x/xg 0x7fffffffd980
0x7fffffffd980: 0x00005555555afba0
(gdb) x/18c 0x00005555555afba0
0x5555555afba0: 102 'f' 101 'e' 108 'l' 108 'l' 111 'o' 119 'w' 32 ' '  98 'b'
0x5555555afba8: 108 'l' 111 'o' 103 'g' 32 ' '  114 'r' 101 'e' 97 'a'  100 'd'
0x5555555afbb0: 101 'e' 114 'r'

Excellent, now let’s continue with the program execution and check what’s in hello function’s stack:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Breakpoint 2, references_and_borrowing::hello (s=0x7fffffffd980) at src/main.rs:2
2           println!("hello {s}.");
(gdb) x/80ub $sp
0x7fffffffd900: 128     217     255     255     255     127     0       0
0x7fffffffd908: 140     201     85      85      85      85      0       0
0x7fffffffd910: 32      208     255     247     255     127     0       0
0x7fffffffd918: 128     217     255     255     255     127     0       0
0x7fffffffd920: 128     217     255     255     255     127     0       0
0x7fffffffd928: 154     177     222     247     255     127     0       0
0x7fffffffd930: 160     251     90      85      85      85      0       0
0x7fffffffd938: 18      0       0       0       0       0       0       0
0x7fffffffd940: 18      0       0       0       0       0       0       0
0x7fffffffd948: 213     192     89      85      85      85      0       0
(gdb) x/xg 0x7fffffffd920
0x7fffffffd920: 0x00007fffffffd980

At 0x7fffffffd920 found a pointer pointing to s in main’s stack (0x7fffffffd920: 0x00007fffffffd980)! We can confirm that the whole representation still belongs to main’s scope and, in hello and bye functions, we are just referencing it. smemory will be freed once main finishes.

Borrowing in memory

There’s no need to change the scope to borrow a value: the code used in the previous section, is just a slight modification of an example used in a previous post that did not compile. We fixed it by borrowing s1’s value to s2.

Conclusion

Sometimes we have a hard time fighting the Rust compiler because it usually fails with errors that do not exist in other programming languages. Those errors feel arbitrary but, as we have seen in this post, they are there to protect us. It can take some time to wrap your head around them.

The more you code in Rust, the less you fight with the compiler and you end up with more performant and more secure programs. Also, a lot of errors are caught at compile time, saving us a lot of precious debugging time.

This post concludes a series of post about how Rust handles memory the internals of it:

  1. Stack and Heap
  2. Rust ownership and move semantics from the inside
  3. Rust references and borrowing from the inside