about
04/13/2022
RustBites - Undefined Behavior
Rust Bites Code

Rust/C++ Bite - Undefined Behavior

Reading from and writing to unowned memory

Well defined behavior means, in the Rust world, two things:
- Memory safety:   There is no way for code to access memory that it does not own.
- Data race safety:   In a multi-threaded environment, all mutable non-atomic data shared between threads can only be accessed by one thread at a time.
Undefined behavior is the absence of either or both of these.
Preventing undefined behavior provides excellent support for preventing attacks and unreliable behavior. Rust implements that support primarily by enforcing Ownership policies at compile-time through static code analysis. Static code analysis is conservative. The Rust compiler will reject any code for which it cannot guarantee its ownership policies have been satisfied. Most correct code will pass static analysis, but there are a few cases, most often in multi-threaded designs, that static analysis cannot effectively handle, and so correct code may be rejected by the compiler. We will show examples of this in a subsequent Bite. Rust provides a mechanism, called interior mutability, that defers enforcment to run-time. That allows correct code that can't satisfy static analysis, to build, but has an impact on performance due to run-time ownership checks. Should any code fail its run-time checks the program will panic, shutting down without allowing access to unowned memory or corrupting data with data races. Fortunately, most code does not require this deferred checking. Run-time checking is clever and not very expensive, but avoiding use of interior mutability where feasible, avoids that cost.

1.0 Examples of undefined behavior:

Here, we will demonstrate undefined behavior with C++ code, then discuss the same code written in Rust. It is fairly easy, using C++, to program access to unowned memory. That is done in the dropdown below by:
  • Creating a std::vector<int>
  • Filling it to capacity
  • making a reference to one of its elements
  • pushing back another element in the vector.
That last addition forces the vector to allocate new memory to make room for the latest element and then copies everything from the original location to the new location. But that leaves the reference observing memory that is no longer owned by the vector. Code in the dropdown illustrates this with a fragment of C++ code.
C++ Ref Unowned Memory  
Observe that the reference reads memory not owned by any program instance and returns its value. That means that the program could continue computing with invalid data. Observe further That the process exits normally, as if nothing unexpected happened. In the next dropdown you will see another way that C++ code can access unowned memory. It does That by indexing an array, but failing to stop at the last element.
C++ Index out of Bounds  
The code in this example indexes past the end of an array, and returns a value from unowned memory. As before, program flow continues so invalid data could become part of the processing of the program. And, the program exits normally, as if nothing unexpected happened. In fairness to C++ both these code fragments are not idiomatic C++. In the first example, accepted convention would have the program use an iterator rather than a reference. That would cause an iterator invalidation exception to be thrown before memory could be accessed. In the second example, convention would have the program use a range-based for loop, avoiding out of bounds indexing. So C++ is memory safe by convention, and that works very well. However, when building large programs - perhaps several hundred thousand lines of code - it is possible that a few lapses of good practice happen, allowing unsafe memory access, and those few may be very hard to find. In contrast, Rust is memory safe by construction, using data ownership policies to prevent unsafe memory operations. We illustrate that by duplicating the same process flow used in the previous two examples of C++ code. In the first dropdown, below, we set up the same processing used in the first example above and show that it fails to compile.
Rust Attempt to Ref Unowned Memory  
As in the first C++ example, the Rust code, above, creates a vector, fills it to capacity, and makes a reference to one of its items. Now, an attempt to mutate the vector by adding another element fails to compile. That action violates Rust ownership policy, and the compiler does an excellent job of pointing out what the problem is, and exactly where it occurs. Compiler error messages are very well crafted in Rust. That makes software development significantly more productive than it would otherwise be. Rust code in the last dropdown replicates the index out of bounds processing of the second C++ example. In this case, we don't generate a compile error, but when an attempt to access unowned memory beyond the array happens, the program never gets access to that memory. Instead, a panic occurs that starts an orderly shutdown of the process.
Rust Index out of Bounds  
Program output shows that a panic occurs before the code can access unowned memory, causing termination rather than a normal exit.

2.0 Conclusions:

Rust code is memory safe by construction. Access to unowned memory is not possible due to the ownership policies enforced by the language. This has the advantage that attempts to use memory-unsafe operations result in compile-time failures. We will see in a later Bite that there are situations where ownership checking needs to be deferred to run-time. That happens most often in multi-threaded code. We address those cases in the interior mutation section of the Safety Bite. C++ is memory safe by convention. Rust is memory safe by construction. Construction means that violations can not happen in compileable code. Convention means that violations won't happen as long as the conventions are followed.

3. References:

Code for Examples - in CppUndefinedBehavior
Arguing about Undefined Behavior - video
UDB Examples - Wikipedia
Falsehoods about undefined behavior
Cost of Rust bounds checking
  Next Prev Pages Sections About Keys