C++ Models

Compilation, exectution, memory storage

Compilation Model

Figure 1. Compilation Models

C++ compiles each *.cpp file independently, and does not save type information when compiling more than one. Each *.cpp file and all of its included *.h files are called a compilation unit.

The operation of a C++ Compiler is shown in Figure 1. Its first action is to build an intermediate source code file with a preprocessor by replacing each #include statement in source code with the entire code of the included file. Included files are really included in that source text. The preprocessor also expands any macros or uses them to set compiler directives, e.g., for #pragma once.

The Compiler then consumes the intermediate source file and either compiles to an object file (*.obj), static library (*.lib), or dynamic link library (*.dll), or, if there are compilation errors, it simply emits error messages.

The results of these compilation output files are processed by a Linker. When program code makes calls or transfers to code in the same compilation unit, the compiler assigns addresses based on the code it has laid out. However, if the code makes calls into another compilation unit, then the compiler doesn't have an address, and so makes an entry in a table of unresolved addresses.

The job of the Linker is to resolve these addresses. It can do that, because it does not execute until all of the compilation units that target a specific execution image are compiled, so it has all the addresses it needs and proceeds to resolve the unknowns.

That results in a runnable execution image. However, that is not the end of the story. The build process may have defined dynamic link libraries which get loaded during execution. It is the job of the Loader to start the execution image, and bind, at run-time, any dlls that the program needs.

Program Execution Model:

Figure 2. Program Execution Model

When execution of a C++ program begins, initialization code generated by the compiler runs, then the thread of execution enters main, with any arguments defined on the command line. Main entry creates a stack frame - a block of allocated stack memory - that holds input arguments, any local data defined by main, and the return value, used to indicate success or failure to the operating system.

Should main call a function, another stack frame is allocated for that function, and if that function calls another, it too allocates a stack frame. Stack frames are allocated, as scratch-pad memory, for every scope entered by the thread of execution. When the thread leaves that scope the allocated memory becomes invalid. The next time a stack frame needs allocation, the invalid memory is likely to be part of that allocation.

Heap memory can be allocated by a program's code with a call to new, and deallocated with a call to delete. Malloc and Free serve the same purpose in a C language program.

Input and output operations defined by a C++ program are handled with streams - iostreams cin and cout for the screen and console and fstreams ifstream and ofstream for files. Error and logging are handled by cerr and clog. All of these stream objects are constructed as global objects by the initialization code that runs before main is entered, and are available anywhere in the program code.

The handles stdin, stdout, stderr, and stdlog are used by C programs. They are references to the program's input and output channels, attached to screen and console. The program can define other handles for channels to files defined by the program or discovered in the file system.

Memory Model:

Figure 3. Program Memory Model

Figure 3. shows details of the C++ Memory Model. There are three types of memory: static, stack, and heap, each with their own lifetime models.

Anything in static memory is defined, and has coherent values, for the lifetime of the program. That includes all program code, global data, and static local data. Static local data is defined inside functions and qualified by the keyword static. Local static data is initialized on the first entry to the function where defined, but does not get re-initialized on subsequent entries, so static data can save information that persists between function calls.

Stack memory holds information defined by each program scope, e.g., a block of code surrounded by braces "{" and "}". The life time of a stack allocation begins when the thread of execution enters the scope, and ends when it leaves the scope. Then the allocation becomes invalid, the memory is returned to the memory manager for reuse, and may be allocated for the next scope. The call stack you see in a debugger running C++ code is just the set of stack allocations shown in Figure 2. and Figure 3..

Heap memory is provided to a running program by the operating system. A default heap is created when a C++ program begins execution. The program code allocates heap space by using calls to operator new and deallocates with a call to delete. So the life time of a heap object starts with it's allocation with new, and ends with deallocation with delete.