about

10/27/2022

C++ Story Models

C++ Story Code

Chapter #1 - C++ Story Models

code structure, compilation, execution, memory, classes, templates

Like many modern languages, C++ is a large and ambitious language. The purpose of this chapter is to help you develop effective mental models for important features of the language. We do this with diagrams and associated text, and occasionally small code fragments.

You will find details, with a lot more code, in succeeding chapters where we focus on data, operations, classes, and templates.

1 Code Structure

C++ developers structure code with packages which contain files, which in turn, contain classes and functions.

Package boundaries are not enforced by the C++ language nor by the platform operating system. They are an abstraction that represents a unit of documentation, and are expected to take on a single responsibility. So they are defined by the C++ developer. It is common practice for each C++ package to contain a single .cpp implementation file and usually a single .h header file of the same name, e.g., MyPkg.h and MyPkg.cpp. To that a developer may choose to add an interface file, IMyPkg.h.

Packages for components with no parents may only have a single .cpp file, and we sometimes elect to make an interface file, IPkg.h, a separate package with just that one file. We do that when more than one other package has a class that implements the interface.

Files are an operating system construct. Code files are expected, by the C++ build environment, to obey naming conventions, based on extensions like .h, .hpp, and .cpp that define their contents. Files may contain zero or more classes and zero or more unbound functions. C++ does not enforce one class per file, but language convention dictates only a few closely related classes per file, very often only one.

Classes are a C++ language construct (also used by many other languages) that define units of managed data. Functions and class methods are units of computation. Classes are expected to have a single responsibility and each of their methods contributes to one specific part of that responsibility.

Code Structure Details

Figure 1. C++ File Inclusion Model

Each C++ implementation file, one with the extension .cpp, usually includes several header files, ones with .h extension, and C++ libraries, like <iostream> with no extension. Figure 1. shows Executive.cpp including an interface header file, IComponent_A.h and Component_B.h header file.

This example, implemented in CppStory code repository, has no significant functionality. Its purpose is to show how packages are formed and some useful ways they communicate. The Logger Repository has code for a useful facility that uses these same techniques.

If you look at the interface file, IComponent_A.h you won't see any implementation. Just declarations for the Component_A and a single declaration of a creational function. Since the Executive and Component_B only include this file, they have no dependencies on the implementation details of Component_A.

They use the factory function to create an instance of Component_A and return a pointer typed as IComponent_A but bound to a concrete instance of Component_A. So they create and use the component without binding to its implementation.

When a C++ project is built, a single compiler source file is created by the compiler preprocessor from a single .cpp file and all of the header files it includes. That is compiled and if no errors are encountered, it generates an object file, .obj, or library, .lib. If there are any more .cpp files in the project, the process is repeated until all .cpp files have been compiled.

When the Files of Figure 1. are compiled there are three passes by the compiler for each of the three .cpp files, generating three .obj or lib files, which are then linked together to form an execution image, Executive.exe.

Figure 2. Package Diagram

In Figure 2., we show a package structure that contains the files from Figure 1. The Executive package makes calls to functions in both Component_A and Component_B. Also, Component_B makes a call on Component_A.

Because of these calling relationships, the Executive needs type information about both components, and Component_B needs type information about Component_A. That type information comes from including headers in each .cpp file.

Component_A provides an interface, IComponent_A.h, which describes the calling signatures of all its public methods, but does not include any implementation detail. It servers as a contract for services that Component_A exports. It also supplies an object factory, so callers, like Executive and Component_B, have no dependencies on Component_A's implementation, and changes to that component will not affect the using code from compiling correctly, as long as the interface and object factory signature did not change.

Figure 3. C++ Class Diagram

Looking inward, we see in Figure 3. the classes that each package contains. The Executive package has a single class, Executive. Component_A has two classes, the interface IComponent_A¹ and class Component_A. Finally, Component_B contains class Component_B and a subordinate class Helper. The object factory is an unbound function, not a class, and so is not shown in the class diagram.

Executive composes Component_B and aggregates IComponent_A. Component_A inherits from its interface IComponent_A. Component_B composes its helper class and uses the IComponent_A interface. Note that Executive and Component_B both actually bind to an instance of Component_A, but they have to use the contract provided by IComponent_A. That's whCode Structure y we show them linking to the interface.

It is common, though not essential, that one of the classes in a package has the same name as the package. Usually, a package name comes from the name of the project that controls its build process.

Since Component_B was not configured with an interface, Executive depends on its implementation details. It is likely that Executive and Component_B were designed together, to be used as a unit, separated into two classes to make understanding and testing easier.

Interfaces in C++ are usually implemented with structs rather than classes. Classes and structs are identical except that structs by default have public members while classes by default have private members.

The next section describes how a collection of packages are built into a library or executable file.

2 Compilation Model

The C++ tool-chain consists of a preprocessor, compiler, and linker. Each .cpp file and its included .h header files form a translation unit. The build system handles one translation unit at a time and does not carry over information from one translation to the next. It is the job of the linker to bind various translations into an executable or library.

Compilation Model Details

Figure 1. C++ Compilation Model

C++ compiles each *.cpp file independently, and does not save type information when compiling more than one. Each *.cpp file and all of its included *.h files are called a translation unit.

The C++ language is designed to support one-pass compilation. That means that an entity: function, struct, or class must be defined before its first use in compiler scan order. This is called the definition first rule:

Definition First Rule:

Instances of structs or classes can be declared only after the struct or class has been declared. The compiler can't lay out code for the instance until it knows how much stack space it will occupy. That is determined by the struct or class declaration.

The operation of a C++ Compiler is shown in Figure 1. Its first action is to build an intermediate source code file with a preprocessor by replacing each #include statement in source code with the entire code of the included file. Included files are really included in that source text. The preprocessor also expands any macros or uses them to set compiler directives, e.g., for #pragma once.

The Compiler then consumes the intermediate source file and either compiles to an object file (*.obj), static library (*.lib), or dynamic link library (*.dll), or, if there are compilation errors, it simply emits error messages.

The results of these compilation output files are processed by a Linker. When program code makes calls or transfers to code in the same compilation unit, the compiler assigns addresses based on the code it has laid out. However, if the code makes calls into another compilation unit, then the compiler doesn't have an address, and so makes an entry in a table of unresolved addresses.

The job of the Linker is to resolve these addresses. It can do that, because it does not execute until all of the compilation units that target a specific execution image are compiled, so it has all the addresses it needs and proceeds to resolve the unknowns.

That results in a runnable execution image. However, that is not the end of the story. The build process may have defined dynamic link libraries which get loaded during execution. It is the job of the Loader to start the execution image, and bind, at run-time, any dlls that the program needs.

When the linker has successfully completed creating an executable, the executable can be started using services of the operating system loader. The loader loads the executable image into memory and loads any dynamic-link libraries on which the executable may depend. In that case it binds the executable's calls to an appropriate entry in the library.

3 Program Execution Model

C++ source code compiles to native code that is loaded into memory, is initialized, and begins to directly execute machine language instructions. You can view the program's assembled code by choosing an option to generate a file containing assembler output.

Program Execution

Figure 2. C++ Program Model

When execution of a C++ program begins, initialization code generated by the compiler runs, then the thread of execution enters main, with any arguments defined on the command line. Main entry creates a stack frame - a block of allocated stack memory - that holds input arguments, any local data defined by main, and the return value, used to indicate success or failure to the operating system.

Should main call a function, another stack frame is allocated for that function, and if that function calls another, it too allocates a stack frame. Stack frames are allocated, as scratch-pad memory, for every scope entered by the thread of execution. When the thread leaves that scope the allocated memory becomes invalid. The next time a stack frame needs allocation, the invalid memory is likely to be part of that allocation.

Heap memory can be allocated by a program's code with a call to new, and deallocated with a call to delete. Malloc and Free serve the same purpose in a C language program.

Input and output operations defined by a C++ program are handled with streams - iostreams cin and cout for the screen and console and fstreams ifstream and ofstream for files. Error and logging are handled by cerr and clog. All of these stream objects are constructed as global objects by the initialization code that runs before main is entered, and are available anywhere in the program code.

The handles stdin, stdout, stderr, and stdlog are used by C programs. They are references to the program's input and output channels, attached to screen and console. The program can define other handles for channels to files defined by the program or discovered in the file system.

The C++ programming language gives developers freedom to choose where in memory fundamental data and user defined objects reside. The consequences of those choices determine lifetimes of the data and objects. This is addressed in the next section.

4 Memory Model

A C++ program runs in an environment with tiered memory: static memory allocated by the compiler holds code and global data. Stack memory is allocated by the developer's use of scopes, and heap memory is allocated to the process when it starts and heap storage of program artifacts is managed by the language implemented memory manager, part of the C++ infrastructure.

Memory Model

Figure 3. C++ Memory Model

Figure 3. shows details of the C++ Memory Model. There are three types of memory: static, stack, and heap, each with their own lifetime models.

Anything in static memory is defined, and has coherent values, for the lifetime of the program. That includes all program code, global data, and static local data. Static local data is defined inside functions and qualified by the keyword static. Local static data is initialized on the first entry to the function where defined, but does not get re-initialized on subsequent entries, so static data can save information that persists between function calls.

Stack memory holds information defined by each program scope, e.g., a block of code surrounded by braces "{" and "}". The life time of a stack allocation begins when the thread of execution enters the scope, and ends when it leaves the scope. Then the allocation becomes invalid, the memory is returned to the memory manager for reuse, and may be allocated for the next scope. The call stack you see in a debugger running C++ code is just the set of stack allocations shown in Figure 2. and Figure 3..

Heap memory is provided to a running program by the operating system. A default heap is created when a C++ program begins execution. The program code allocates heap space by using calls to operator new and deallocates with a call to delete. So the life time of a heap object starts with it's allocation with new, and ends with deallocation with delete.

Most classes manage data - their state - by using the static qualifier to place data in static memory, creating scopes in stack memory defined by brace pairs, { and }, and making heap allocations with the keywords new and delete.

5 Classes

Classes and class relationships are the building blocks for object-oriented design. The results are a collection of objects - instances of classes - that cooperate to conduct operations required of their program. We structure designs using: inheritance, composition, aggregation, using, and friend class relationships.

Class Structure:

A class is like a "cookie cutter". It stamps out a section of memory in the stackframe of its local scope, and initializes that memory with data required to create a valid object. Each time it's used to declare an instance, another piece of memory is allocated and initialized.

Each class has a set of methods - functions associated with that specific class - providing operations on its allocated data when invoked. Each class has code that is stored in static memory, and potentially many instances holding data usually stored in the stackframe of the function where it is declared.

The example below models a point in some space, perhaps physical space-time, so it would have four coordinates: x, y, z, t.

class Point { public: using iterator = std::vector<double>::iterator; using const_iterator = std::vector<double>::const_iterator; Point(size_t N, const std::string& name = "none"); Point(std::initializer_list<double> il); void name(const std::string& name); std::string name() const; double& operator[](size_t i); double operator[](size_t i) const; size_t size() const; iterator begin(); iterator end(); const_iterator begin() const; const_iterator end() const; private: std::string name_ = "unspecified"; std::vector<double> coordinates_; };

Figure 3. C++ Class/Object Layout

When an instance method is invoked, p1.name(param); p1εPoint, the address of p1 is sent to the code for class Point to use when Point::name modifies p1's data. That address is identified by the reserved word "this". You may occasionally see references to this in methods of the class. Most use is implicit, but occasionally it must be used explicitly, as in assignment operators that return *this.

C++ classes define special methods: constructors, assignment operators, and destructors. Constructor method names are all the name of the class. Assignment operators use the operator= name and destructor names are the class name prepended with a ~ character.

C++ is a strictly typed language. Every data artifact is required to have a type, either fundamental - defined by the language - or user-defined. Templates allow a component to define a function or class in terms of one or more unspecified parameters. Those are compiled, checking syntax for all those parts that don't depend on the unspecified parameters.

So type checking for those unspecified parts is deferred until application code that defines the parameters is compiled. This design is very useful, allowing construction of libraries that can support operations on many different types of data.

6 Object Model

The C++ object model is concerned with managment of object resources and the kinds of operations that are supported for instances of its type. Look closely at the memory layout and value type discussions.

Object Model Details

All C++ objects support construction and destruction semantics. When an object is declared in some scope its constructor ensures that its state is initialized with no support needed from the using code except to provide parameters, if needed, to the constructor.

6.1 Scope-based Resource Management

When the thread of execution leaves the scope where an object has been constructed its destructor will be invoked, releasing object resources deterministically, with no support needed from its using code. This inherent resource management is often referred to as Resource Acquisition Is Initialization (RAII).

Scope-based Resource Management a.k.a. RAII:

The C++ language guarantees that, when the thread of execution leaves a scope, all the objects created within that scope will be destroyed by calling their destructors, releasing any resources that have allocated to each object.

6.2 Memory Layout

The C++ object model is also concerned with how compound objects are layed out in memory. Structs and Classes support five relationships that bind objects together to build composite objects: inheritance, composition, aggregation, using, and friend-ship.

Figure 4. C++ Composite Object Model

The class diagram shown at the top of Figure 4 illustrates relationships between six entities:

D, a composite object which inherits a base class B and uses an object, U, created by some other entity.
B, the base class, composes an instance of class C.
C, the composed class.
U, the used class.
friend, an entity: struct or class or method, or function, that is granted, by D, access to its private data.
Client, an entity that aggregates an instance of D. That means that Client created its instance of D sometime during its lifetime.

The object diagram, at the bottom of Figure 4., illustrates the layout of each of these entities in memory. That's shown in two dimensions for clarity, but the layout is actually a one-dimensional region of memory.

When B is constructed its composed member C is built within the memory footprint of B. That means that the C instance is constructed as part of the construction of B.

Similarly, an instance of D contains, within its memory footprint, an instance of B. Again, that requires B to be constructed as part of the construction process for D.

Instances of U, friend, and client are not owned by D and their memory layout is outside that of the compound instance of D. That means that their construction processes are independent of that of D. Of course, if client creates an instance of D it must be constructed before D.

6.3 Value Types

C++ has been designed, from the ground up, to support value types, that is, types whose instances can be assigned and copied.

Value Types:

Instances of value types can be copied and assigned. When a value type is copied, the destination instance is constructed and acquires the same state values as the source of the copy, but remains an independent instance. Should one of the instances have its state modified that does not affect the other.

Assignment is a similar operation. The only difference is that the destination object already exists; the assignment gave it the same state values as the source of the assignment.

C++ supports value types by allowing a class or struct designer to provide a copy constructor and copy assignment operator overload members to manage the change of state

If a class has base classes (if any) and data members with correct copy and assignment semantics, then the compiler will generate correct copy operations by memberwise copy construction and copy assignment for both the object and its bases.

If that is not the case, then the designer supplies the constructor and operator overload to handle the transfer of state correctly.

This is the most important of the C++ models. If you understand this, most of the rest of the language makes sense.

7 Polymorphism

Polymorphism means to occur in several different forms. The term is used in two specific ways for programming languages like C++. One refers to "dynamic" polymorphism where a base class may represent any one of several classes that derive from the base. The other way refers to "static" polymorphism that uses templates. A class template can generate several different forms of code based on parameter(s) supplied to instantiate the template.

Polymorphism Details:

When a class D derives from some class B it inherits all of the methods and data of B.

class D : public B { ... };

If there are multiple derived classes: D1, D2, ... then a base class pointer or reference can be bound to any one of them:

B* pB = &D1;
B& br = D2;

Functions that accept a base pointer will accept a base pointer bound to any derived class:

void fun(B* ptrB) { ... }
B* pB = &D1;
fun(pB);
pB = &D2;
fun(pB);

Function fun can only use the interface supplied by the base B. But all of the derived classes inherit that interface, so fun uses each of them in different ways if the derived classes overrode base methods that fun invokes.

This is a very powerful way to build flexible code. If we need to add a new derived class to satisfy some new requirement none of the functions that accept base pointers will be affected.

Inheritance supports two features:

Inheritance of implementation, e.g., a derived class inherits all of the methods and data members of its base class.
Substitution of base pointers or references bound to derived classes in functions that accept base pointers or references allow our code to add additional derived classes that are guaranteed to work with functions that use them.

Figure 5. Virtual Function Pointer Table

Substitution is important because calls on a base class pointer bound to a derived class instance are dispatched to the derived instance's code. So if the derived class has overridden the base class method being called, the call is routed to the overridden code.

Every class that includes one or more virtual functions has a virtual function pointer table, as shown in Figure 5. Two objects are shown in the left column, a base instance and a derived instance, both accessed through a base pointer. The virtual function pointer tables for both B and D are shown in the middle column, and the code for those functions appears in the column on the right.

Notice, from the code declarations at the top of the figure, that D has not overriden mf1 but has overridden mf2. When a call to mf1 is made: pB = &b; pB->mf1(); invokes B::mf1
pB = &b; pB->mf2(); invokes B::mf2 // not overridden

pB = &d; pB->mf1(); invokes B::mf1
pB = &d; pB->mf2(); invokes D::mf2 // overridden This indirect invocation is call dynamic dispatch because the function to call is deterimined at run-time.

Dynamic dispatch is the mechanism for making substitution work, invoking methods of the bound object, not those determined by the type of the pointer. And this is responsible for making our code very flexible through polymorphism.

The other form of polymorphism - using Templates - is the topic of the next model.

8 Templates

A template is a code generator that creates type specific artifacts from parameterized patterns. Function templates generate concrete functions and class templates generate concrete classes when supplied, in application code, with types.

Templates

Templates generate functions and classes when an application instantiates them with specific template parameters. The class template shown below generates a point class for each type, T specified by using code. The using code instantiates it with two parameter types, int and double, so the template generates two classes, one for each type.

Figure 3. Template Code Generation

The example, below, models a point in some space, so it might have three coordinates: x, y, z, e.g., coordinates_[0], coordinates_[1], coordinates_[2].

The diagram in Figure 3. assumes that Point<T> has been instantiated for int and for double.

template<typename T> class Point { public: Point(const std::string& name = "none"); std::string& name(); T& operator[](size_t i); T operator[](size_t i) const; size_t size() const; private: std::string name_; std::vector<T> coordinates_; };

Template parameterization allows the designer to structure coordinates to fit an application. If the point described a location in a flowing fluid, the coordiates might be the location and flow rate components for each axis.

T --> std::pair<double, double>

So, coordinates_ would have size 3 with entries for { x, flow_x }, { y, flow_y }, { z, flow_z }

Figure 4. Specializing Generic Template

Should using code instantiate a template for additional parameter types, that would generate another class for each one. Fortunately, that is done by the compiler. Developers only need to write one template class and perhaps one or two special cases.

A class template can be specialized or a function template overloaded to handle special cases. It is not uncommon for a template design to work well for several application types, but fail to operate as desired for special cases.

Specialization is possible because templates generate code when an application, using the template for specified types, is compiled. When we specialize a class template, we define not only the generic class, but also a type specific class for the special case. The language guarantees that, if a specialization matches a specified type, the specialization will be compiled. If the type doesn't match any specialization (there can be more than one) then the generic template will be compiled using that type.

The same process happens for function templates, except that the special cases are overloads of the template function and overload resolution is used instead of template type deduction.

We will discuss class template specialization and function template overloading in Chapter #7.

This concludes our discussion of C++ models. You will find details on each of them in succeeding chapters.

9 Epilogue

This chapter has explored important models used to help you understand how to develope C++ code. The succeeding chapters turn these models into concrete designs and code.

Chapter #2 surveys the language and provides many code examples to help you get started. Later chapters look in detail at classes, class relationships, and templates.

Finally, there are two chapters that discuss C++ standard libraries. There will be more libraries covered before too long.

10 References

C++ keywords - cppreference.com
Basic Concepts - cppreference.com
Frequently Asked Questions - isocpp.org
C++ Survival Guide
Comparison of C++ with C