about
10/14/2022
Program Execution

BasicBites - Program Execution

Processes, memory, native, managed, resources

Execution of all user code happens in the context of a process. Native code is embedded, by the compiler, in an execution image which is then loaded, at execution time, into a newly created process. Managed source code compiles to byte code. At execution time a new process is created and loads a virtual machine. Byte code is "just-in-time" compiled by the virtual machine on a function or assembly basis and run in the context of the new process. Compiled byte code is cached and not compiled again. To program effectively you need to understand the object model your implementing language uses. Object models for native code are very different from those for managed code. This page, and pages it links to, discuss the object models for both. They are different and we focus on those differences here.

Processes

Figure 1. Windows Process Model
In Windows a process is a container of one or more threads and resources they share. All process resources are accessed and manipulated using handles. Figure 1. shows a process with handles for two threads, a dynamic link library, loaded at execution time, and handles to GUI windows and heap memory. Windows has an object model that provides programs with handles to object instances created with calls to the Windows API. For example, a heap object is created with: HANDLE HeapCreate(DWORD flOptions, SIZE_T dwInitialSize, SIZE_T dwMaximumSize); Most of a program's API calls are handled by the programming lanaguage's standard libraries so the program doesn't need to know about these calls. The Windows scheduler starts, runs, and interrupts threads, not processes. A Windows process is just a container for its threads and resources. The process model for linux and Unix are similar to Windows except that a linux thread is simply a child process that shares its parent's address space. That is hidden by the pthread library and linux no longer increments its reported process count when a new thread is created. So, the linux scheduler starts, runs, and interrupts processes. Process memory is divided into:
  • Static memory holding code and global data
  • Stack memory is allocated when the program's thread of execution enters a new scope, delineated by "{...}", and deallocated when execution leaves the scope. For native code this scratch-pad memory holds function parameters and all local objects declared within the scope. For managed code stack memory holds values for value types and references to heap-based objects for managed types. This is true for both function parameters and instances declared within the current scope.
  • Heap memory allocated with calls to new. Native code deallocates heap-based instances with calls to delete. Managed code uses the services of a garbage collector to defer deallocation for latter analysis to ensure that no references to the instance remain.
How these sections of memory are used by native code is significantly different than for managed code. Instances of managed user-defined types all reside in the managed heap. Instances of native types can reside in static, stack, and native heap memory segments.
All code, both native and managed run in some scope. When the thread of execution enters a scope, e.g., function invocation, that may cause creation of objects or passing of references to an object in the caller's scope. Leaving the scope may lead to destruction of objects created in the scope or construction of objects in the caller's scope. The details of that are the same for primitive types but vary significantly between native and managed user-defined types.
Figure 2. Native Assignment
Figure 3. Managed Assignments

Native vs. Managed Code

The semantics of construction and assignment are quite different for native and managed types. Native types are by default copy constructed and copy assigned. If a native instance holds a reference to an object on the heap, its copy constructor and copy assignment operator are obligated to make copies of both parts, the stack part and its heap resource, as shown in Figure 2. This creates a new independent instance that holds the same state as the original, but may be seperately mutated without affecting the original. Managed code semantics for construction and assignment are shown in Figure 3. Value types are copied just like native types, but for reference types only the handles are copied, so both wind up pointing to the same instance. Future mutations of either one affect the other as well, because the two handles always point to the same underlying instance. The point of this is to see that different languages may use platform resources in quite different ways, each with their own advantages and disadvantages. We will explore this in more detail in Types and Object Models. For this discussion we've taken C++ semantics as the prototypical native type and C# as the prototypical managed type. We will see in Types that the Rust programming language semantics are an interesting variant that combines some of the strongest features of each.

Native Types

Native programs compile to platform instructions running in the process created for their executable. C++ and Rust both compile to native code. Native data types are:
  • types with contiguous memory footprints:
    primitive types, native arrays, and structs without pointer or reference members
  • types with non-contiguous memory footprints:
    most library and user-defined types, like String.
  • pointers and references to either of the above
These types reside in the stack frame where they are declared. Each type may compose instances of other native types and aggregations of native instances on the native heap, as shown in Figure 1.
Figure 4. Native Types
Compositions of native types place the composed instances within the memory footprint of the composing instance. Aggregations place handles to the aggregated instance in the memory footprint of the aggregator and places the instances themselves in the native heap, as shown in Figure 4. Native types provide code that returns owned resources to the platform when they go out of scope. this is an efficient deterministic destruction process.

Managed Types

Figure 5. Managed Types
Managed programs compile to byte code and run in a virtual machine (VM) hosted by the process created for them. At load time the VM compiles byte code to native code by function or assembly. The compiled code is cached and not compiled again. C# and Java are managed languages - there are many others based on the Java VM. Managed code has three kinds of data types: value types, managed reference types, and managed handles:
  • Value types
    primitives, native arrays, and structs
    implement Copy type behavior described in Part 1
  • Managed reference types
    all user-defined types
    implement Reference type behavior of Part 1.
  • Managed handles (references)
    support access and resource management
All managed reference types reside in the managed heap. They may compose instances of value types. Those reside in the composer's memory footprint. They may also aggregate instances of reference types. That results in the instance holding a composed handle to an aggregated instance in the managed heap, as shown in Figure 5. The resources of managed reference types are not managed by the instance; instead are managed by a garbage collector that is part of the heap's management process. When a managed reference instance goes out of scope its resources are queued for disposal by the garbage collector. That is a non-determinstic, tiered process that requires platform processing to track active references.

Resource Management:

Native code handles resource management in an elegant efficient way, often referred to as "Resource Acquisition Is Initialization (RAII). Use of native pointers and references open opportunities for undefined behavior. The Rust language has an interesting way to avoid this deficit, as discussed in Program Types. Managed code requires placement of all user-defined types on the managed heap, accessed through handles. This avoids problems with undefined behavior at the expense of processing required to manage instances and latencies associated with compiling byte code to native code at run time.

Consequences:

C++ - Native Code
Source code Compiles to native code executed by its process
Objects stored in function's stack frame unless explicity placed in heap or static memory
Object management provided by program code, e.g., creation, deallocation, exception handling
Types Language enables user-defined objects to behave like primitive types, e.g., value behavior, through use of constructors and assignment operators.
Moves C++ provides move constructors but does not enforce single ownership, allowing use of source after move, possibly resulting in undefined behavior.
Pros and Cons excellent performance, requires care to avoid paths to undefined behavior
Rust - Native Code
Source code Compiles to native code executed by its process
Objects stored in function's stack frame unless explicity placed in heap with Box
Object management provided by program code and library, e.g., creation, deallocation, error handling
Types Rust has two categories of types: Copy types and Move types. Copy types have value behavior. Move types transfer ownership when assigned or passed by value.
Moves Rust treats all types that are not copy (satisfy the Copy trait) as moves. It enforces single ownership so source becomes invalid after move.
Pros and Cons excellent performance and safety. Initially hard to build due to safety constraints, but once built is very likely to have correct implementation.
C# - Managed Code
Source code Compiles to byte-code, jitted and executed by its process's virtual machine
Object storage object handles stored in function's stack, pointing to instances stored in heap
Object management provided by virtual machine using garbage collector and VM events
Types Types are either value or reference types, with quite different behavior
Moves C# does not provide move operations.
Pros and Cons promotes safety at the expense of performance and initial latency
C++ is the classic native language. Java and C# (very similar languages) are archtypes of managed languages. Rust generates native code, but has some of the look-and-feel of managed code, in part due to its Copy and Move Type dichotomy. All four languages enable building the same kinds of program functionality. But the implementation techniques are occasionally different. My personal opinion is that Rust source code more accurately represents its underlying mechanics than the other three (counting Java). For example, both Rust and C++ have syntax for Move operations, but Rust enforces no use after move at compile time, while C++ will allow subsequent use with likely undefined behavior.
  Next Prev Pages Sections About Keys