about

3/03/2022

Abstract Types

BasicBites - Abstract Types

Copy, Move, and Reference

"In programming languages, a type system is a logical system comprising a set of rules that assigns a property called a type to the various constructs of a computer program, such as variables, expressions, functions or modules." - Wikipedia

In this page we discuss static data types to illustrate some key differences between native and managed code. A static data type declaration defines, at compile-time, a set of rules for creating, mutating, and disposing of a typed data instance. Each instance has a fixed type throughout its lifetime [We will not be concerned here with type coersions like cast operations].

In the following sections we define type categories: Copy, Move, and Reference that have distinct operations for setting the value of one instance based on the value held by another instance. Common languages like C++, Rust, C#, and Java, implement these operations in different ways, discussed below.

So, there are two parts to this discussion:

Part 1. Defines abstract type behaviors Copy, Move, and Reference.
Part 2. Shows how C++, Rust, and C# implement the abstract type behaviors in their own type systems.
A description for Java would use almost the same text as used for C#, as C# started out with a fork of the Java JVM, the same type structure, with some language syntax changes.

Part 1. - Abstract Type Behaviors

Copy Types:

Fig 1b. Composite Copy

Fig 1a. Primitive Copy

Copy types are modeled after primitive language types like integers and floats. When instances are copy constructed or copy assigned, their values are copied to new locations with new names and can be independently mutated. For types occupying one contiguous block of memory this is implemented with a memcopy.

The C++ language provides copy constructor syntax that supports developer defined composite copy operations, as shown in Figure 1b. Other languages support developer defined clone operations that have the same result.

Copy construction:	T t1 = 2;	T t2 = t1;	t1 and t2 occupy disjoint blocks of memory with the same value
Copy assignment:	T t3 = 0;	t2 = t3;	t2 and t3 occupy disjoint blocks of memory with the same value
Pass by value:	fn f(t:T) { ... }	f(t2)	t2 copied onto f's stackframe

Move Types:

Move types efficiently transfer ownership of instance resources from the original named instance to another. The original becomes invalid and the new instance takes ownership of the resources of the original without moving them in memory. We have moved the ownership of resources from one instance to another efficiently. This is often implemented by giving the new owner the address of the resource.

Move construction:	T t1 = new T(1, "xyz")	T t2 = t1;	t1's resources given to t2, t1 now invalid
Move assignment:	T t3 = new T(2, "abc");	t2 = t3;	t3's resources given to t2, t3 now invalid
Pass by value:	fn f(t:T) { ... }	f(t2)	t2 moved into f, t2 now invalid

Move types often provide a clone() method that gives the destination a copy of the source's resources so the source remains valid.

Move operations usually only copy a pointer and so are fast. Clone operations require copying all of the resources into a new instance and may be slow if the resources are large.

Figure 2a. Move Types

Figure 2b. Clone Operation

Reference Types:

Reference types are handle‑body pairs. The handle resides in a local scope and body in the program's heap. Program code manipulates the handle to effect changes to the body's value.

Handle‑body construction:	T h1 = new T(1, "xyz")	T h2 = h1;	h1's resources - the body - are now attached to both h1 and h2
Handle‑body assignment:	T h3 = new T(2, "abc");	h2 = h3;	Both h2 and h3 point to the same body
Pass by value:	fn f(h:T) { ... }	f(h3)	Handle h3 copied to f's stackframe; copy points to h3's body

Figure 3. Reference Types

Reference types require either reference counting or garbage collection to avoid double deletion or memory leaks.

Reference counting is fast and deterministic, but reference cycles will cause memory leaks unless some form of mark and remove is used.

Garbage collection is non-deterministic and must devote processing resources to track pointers.

Part 2. - Language Specific Details

The Abstract types discussed in Part 1. represent fundamental programming operations:

Copy operation happens when we assign primitive instances in C++, Rust, and C#.
Move operation is an efficient transfer of ownership of instance resources.
Reference operations copy handles that refer to instances in the heap.

In this part we will illustrate how each of the languages: C++, Rust, and C# implement or approximate the abstract types.

C++ Specific Types

C++ programs compile to native code.

Copy Types

In the C++ programming language all primitives and any user defined type that has valid copy constructor and valid copy assignment are Copy types.

The copy operations and destructor are compiler generated if all the user type data members have correct copy semantics. Otherwise, the developer is obligated to provide definitions for them.

Move Types

C++ user types that define move constructor and move assignment behave like Move types when the source is a temporary or is preceded with a move declarator. If copy constructor and copy assignment are correctly defined the type will act like a Copy or Move type depending on the context of its instances.

Unlike Rust, the C++ compiler does not disallow use of the invalid source after it is moved, which may result in undefined behavior.

Reference Types

C++ does not directly support Reference types, but it has the primitives:

raw pointer, T* aptr = &t; and reference T& aref = t;

The C++ reference is not a Reference type, as described in Part 1., but rather a type qualifier that creates a fixed binding to its target with a pointer, but uses object syntax to manipulate the target.

The C++ std library provides std::unique_ptr<T>:

auto r = std::unique_ptr<T>(new T(args));

It provides a method move that transfers ownership of the target to another std::unique_ptr.

The std library also provides std::shared_ptr<T>, a reference counted smart pointer, so lifetime of the instance ends when the last shared pointer instance goes out of scope. Another shared_ptr is added to the share through assignment:

auto shareT1 = std::shared_ptr<new(t)>;
auto shareT2 = shareT1;

User defined reference types

One could create a type with reference behavior by carefully crafting copy operations and destructor, but requires care to avoid memory leaks or double delete exceptions. Not recommended.

Type Aliases

An alias is simply another name for a given type. C++ provides two means of defining aliases:

using ID = unsigned int;
typedef unsigned int ID;

Consequences:

C++ has type abstractions that help build large but maintainable systems. The language enables excellent run-time performance in the hands of experienced developers. And there is a lot of accumulated experience and expertise for C++ development.

The language rules allow possibility of undefined behavior and there are a lot of ways for that to happen. Rules the compiler enforces are not always obvious, especially for type inference, and may change depending on the program's context.

Rust Specific Types

Rust programs compile to native code and run directly in the processes created for them.

User defined types in Rust are structs, enums, or unions. The struct plays essentially the same role in Rust as classes do in C++, C#, and Java.

The Rust compiler disallows mutation of an instance through a mutable reference whenever another reference to the same instance is active, unless the other reference is not used after mutation. This is a sufficient condition for safety, e.g., prevents undefined behavior.

Copy Types

In the Rust programming language all types that implement the Copy trait are copy types. That includes all primitives and any user defined structs that have all copy members and are bounded by the Copy trait.

Move Types

Rust user defined structs that do not implement the Copy trait are Move types.

The Rust compiler disallows use of instances that have been moved.

Reference Types

Rust does not provide Reference types but does support pointing to other instances. It provides the primitives unsafe raw pointer, let aptr:T* = &t; and reference, let aref:T& = t;.

As with C++, the Rust reference is not a Reference type, as defined in Part 1. Instead a Rust reference declaration forms a fixed binding to a target instance. Its use is syntactically similar to the C++ reference, but has much less freedom. Its use is analyzed by the compiler to prevent undefined behavior due to dangling references. If analysis results cannot ensure safety, the build fails.

Standard smart reference-counted pointers Rc<T> and Arc<T> and smart heap pointer Box<T> provide means to point to and manipulate instances stored in the heap.

When any of them are initialized from an instance in the stack, the instance is copied or moved to the heap.

let t = T:new(); let boxed = Box::new(t);

boxed is a smart pointer to a copy or move of t in the heap.

User defined reference types

A user defined reference type could be constructed with a struct using unsafe implementation blocks and lifetime annotations; but users of that type will find it hard to craft compilable syntax.

Type Aliases

An alias is simply another name for a given type. Rust provides two means of defining aliases:

type ID = u32;
use u32 as ID;

Consequences:

Rust has essentially all of the advantages of the C++ language. In addition the compiler applies safety rules that eliminate undefined behavior, at the expense of longer compile times.

Rust has applied some smart ideas for its type system to achieve that. Also, the way Rust defines its Copy and Move types eliminates many of the boiler plate functions required for some C++ types. It also removes most of the implicit type conversions supported by C++, resulting in much simpler type inferencing.

The compiler's safety rules are sufficient, but not necessary - to have both is not feasible. So, for example, new strategies are required to implement components that are linked, like lists and graphs. Essentially, we can use Vector indexing instead of references to establish linked relationships. Experience indicates that anything that can be implemented in C++ can also be implemented in Rust without resorting to unsafe blocks.

It is more difficult to compile successfully in Rust, compared to C++, due to its safety checking. But once programs compile they are very likely to work, needing only fixes to program design logic. There are no more long sessions looking for the sources of pointer bugs. In fairness, C++ now has many library types like smart pointers that help write bug free code.

C# Specific Types:

C# programs compile to managed code, e.g., to IL (byte code) and run in a virtual machine (CLR) hosted by the processes created for them. At load time the CLR compiles IL to native code by function or assembly. The compiled code is cached and not compiled again.

C# types have two categories:

Value Types:
primitive scalar types, struct and enum
All reside in the stack
Reference Types:
class, interface, array, and delegate
All reside on the managed heap

All value types reside in the stack; all reference types reside in the managed heap.

Java types are nearly isomorphic to C# types, e.g., they have the same behaviors.

Copy Types

All of the value types are Copy types with the exception of structs that have aggregated members, i.e. contain references to instances on the managed heap.

Structs have the layout shown in Figure 4. Struct members are composed if, and only if, they are value types. All other members are aggregated. If no member is aggreggated then the struct is Copy.

Move Types

C# has no Move types.

Reference Types

All C# reference types have the same behavior as the Reference Type, described in Part 1. All classes are reference types and their instances reside in the managed heap.

Type Aliases

An alias is simply another name for a given type. C# provides:

using Records = List<Tuple<int, string, int>>

Consequences:

C#, like Java, eliminates undefined behavior by disallowing use of pointers. Instead it creates all user-defined instances in the managed heap, refers to them with managed handles, and returns instance resources with garbage collection.

There are performance consequences for using a VM with garbage collection. Programs suffer initial latencies due to translation, at load time, of byte code into platform native code. Garbage collection uses non-trivial processing to track managed pointers, and destruction is non-deterministic, so a designer cannot rely on immediate release of shared resources like files and sockets for other code to use.

Compiling C# is usually fast and relatively easy. And developers have fewer implementation bugs to chase after compilation, as those most often result from attempting to use null references.

References:

C++ Types	Informal survey of the C++ type system
Rust Types	Thorough semi-formal description of the Rust type system
C# Types	Thorough semi-formal description of the C# type system
Java Types	Informal survey of the Java type system