Well defined behavior means, in the Rust world, two things:
-
Memory safety:
There is no way for code to access memory that it does not own.
-
Data race safety:
In a multi-threaded environment, all mutable non-atomic data shared between threads
can only be accessed by one thread at a time.
Undefined behavior is the absence of either or both of these.
Preventing undefined behavior provides excellent support for preventing attacks and
unreliable behavior. Rust implements that support primarily by enforcing
Ownership policies at compile-time through static code analysis.
Static code analysis is conservative. The Rust compiler will reject any code for which it cannot
guarantee its ownership policies have been satisfied. Most correct code will pass static analysis,
but there are a few cases, most often in multi-threaded designs,
that static analysis cannot effectively handle, and so correct code may be rejected by the compiler.
We will show examples of this in a subsequent Bite.
Rust provides a mechanism, called interior mutability,
that defers enforcment to run-time. That allows correct code that can't satisfy static analysis,
to build, but has an impact on performance due to run-time ownership checks. Should any code fail
its run-time checks the program will panic, shutting down without allowing access to unowned memory
or corrupting data with data races.
Fortunately, most code does not require this deferred checking. Run-time checking is clever and
not very expensive, but avoiding use of interior mutability where feasible, avoids that cost.
1.0 Examples of undefined behavior:
Here, we will demonstrate undefined behavior with C++ code, then discuss the same code
written in Rust.
It is fairly easy, using C++, to program access to unowned memory. That is done in the dropdown below by:
Creating a std::vector<int>
Filling it to capacity
making a reference to one of its elements
pushing back another element in the vector.
That last addition forces the vector to allocate new memory to make room for the latest element
and then copies everything from the original location to the new location. But that leaves the reference
observing memory that is no longer owned by the vector. Code in the dropdown illustrates
this with a fragment of C++ code.
C++ Ref Unowned Memory
Observe that the reference reads memory not owned by any program instance and returns its value.
That means that the program could continue computing with invalid data. Observe further That
the process exits normally, as if nothing unexpected happened.
In the next dropdown you will see another way that C++ code can access unowned memory. It does That
by indexing an array, but failing to stop at the last element.
C++ Index out of Bounds
The code in this example indexes past the end of an array, and returns a value from unowned memory.
As before, program flow continues so invalid data could become part of the processing of the
program. And, the program exits normally, as if nothing unexpected happened.
In fairness to C++ both these code fragments are not idiomatic C++. In the first example, accepted
convention would have the program use an iterator rather than a reference. That would cause an
iterator invalidation exception to be thrown before memory could be accessed. In the second example,
convention would have the program use a range-based for loop, avoiding out of bounds indexing.
So C++ is memory safe by convention, and that works very well. However, when building large programs
- perhaps several hundred thousand lines of code - it is possible that a few lapses of good practice happen,
allowing unsafe memory access, and those few may be very hard to find.
In contrast, Rust is memory safe by construction, using data ownership policies to prevent unsafe memory
operations.
We illustrate that by duplicating the same process flow used in the previous two examples of C++ code.
In the first dropdown, below, we set up the same processing used in the first example above and show
that it fails to compile.
Rust Attempt to Ref Unowned Memory
As in the first C++ example, the Rust code, above, creates a vector, fills it to capacity, and makes a reference
to one of its items. Now, an attempt to mutate the vector by adding another element fails to
compile.
That action violates Rust ownership policy, and the compiler does
an excellent job of pointing
out what the problem is, and exactly where it occurs. Compiler error messages are very well crafted
in Rust. That makes software development significantly more productive than it would otherwise
be.
Rust code in the last dropdown replicates the index out of bounds processing of the second C++ example.
In this case, we don't generate a compile error, but when an attempt to access unowned memory beyond
the array happens, the program never gets access to that memory. Instead, a panic occurs that starts
an orderly shutdown of the process.
Rust Index out of Bounds
Program output shows that a panic occurs before the code can access unowned memory, causing termination
rather than a normal exit.
2.0 Conclusions:
Rust code is memory safe by construction. Access to unowned memory is not possible due to the ownership
policies enforced by the language. This has the advantage that attempts to use memory-unsafe operations
result in compile-time failures. We will see in a later Bite that there are situations where ownership
checking needs to be deferred to run-time. That happens most often in multi-threaded code. We address those
cases in the interior mutation section of the Safety Bite.
C++ is memory safe by convention. Rust is memory safe by construction.
Construction means that violations can not happen in compileable code.
Convention means that violations won't happen as long as the conventions are followed.