Like many modern languages, C++ is a large and ambitious language. The purpose of this chapter is
to help you develop effective mental models for important features of the language. We do this with
diagrams and associated text, and occasionally small code fragments.
You will find details, with a lot more code, in succeeding chapters where we focus on data,
operations, classes, and templates.
1 Code Structure
C++ developers structure code with packages which contain
files, which in turn, contain classes
and functions.
Package boundaries are not enforced by the C++
language nor by the platform operating system. They are an abstraction that represents a unit
of documentation, and are expected to take on a single responsibility. So they are defined by
the C++ developer. It is common practice for each C++ package to contain a single .cpp
implementation file and usually a single .h header file of the same name, e.g., MyPkg.h and
MyPkg.cpp. To that a developer may choose to add an interface file, IMyPkg.h.
Packages for components with no parents may only have a single .cpp file, and we sometimes
elect to make an interface file, IPkg.h, a separate package with just that one file. We do
that when more than one other package has a class that implements the interface.
Files are an operating system construct. Code files are expected, by the C++ build environment,
to obey naming conventions, based on extensions like .h, .hpp, and .cpp that define their
contents. Files may contain zero or more classes and zero or more unbound functions. C++ does
not enforce one class per file, but language convention dictates only a few closely related
classes per file, very often only one.
Classes are a C++ language construct (also used by many other languages)
that define units of managed data. Functions and
class methods are units of computation. Classes are expected to
have a single responsibility and each of their methods contributes to one specific part of
that responsibility.
Code Structure Details
Figure 1. C++ File Inclusion Model
Each C++ implementation file, one with the extension .cpp, usually includes several header
files, ones with .h extension, and C++ libraries, like <iostream> with no extension.
Figure 1. shows
Executive.cpp including an interface header file, IComponent_A.h and Component_B.h header file.
This example, implemented in CppStory code repository, has no
significant functionality. Its purpose is to show how packages are formed and some useful
ways they communicate. The Logger Repository has code for a useful
facility that uses these same techniques.
If you look at the interface file, IComponent_A.h you won't see any implementation.
Just declarations for the Component_A and a single declaration of a creational function.
Since the Executive and Component_B only include this file, they have no dependencies
on the implementation details of Component_A.
They use the factory function to create
an instance of Component_A and return a pointer typed as IComponent_A but bound to a
concrete instance of Component_A. So they create and use the component without binding
to its implementation.
When a C++ project is built, a single compiler source file is created by the compiler preprocessor
from a single .cpp file and all of the header files it includes. That is compiled and if no
errors are encountered, it generates an object file, .obj, or library, .lib. If
there are any more .cpp files in the project, the process is repeated until all .cpp files have
been compiled.
When the Files of Figure 1. are compiled there are three passes by the compiler for each of
the three .cpp files, generating three .obj or lib files, which are then linked together to
form an execution image, Executive.exe.
Figure 2. Package Diagram
In Figure 2., we show a package structure that contains the files from Figure 1. The Executive
package makes calls to functions in both Component_A and Component_B. Also, Component_B makes
a call on Component_A.
Because of these calling relationships, the Executive needs type information about both
components, and Component_B needs type information about Component_A. That type information
comes from including headers in each .cpp file.
Component_A provides an interface, IComponent_A.h, which describes the calling signatures
of all its public methods, but does not include any implementation detail. It servers as
a contract for services that Component_A exports. It also supplies an object factory, so
callers, like Executive and Component_B, have no dependencies on Component_A's
implementation, and changes to that component will not affect the using code from
compiling correctly, as long as the interface and object factory signature did not change.
Figure 3. C++ Class Diagram
Looking inward, we see in Figure 3. the classes that each package contains. The Executive package
has a single class, Executive. Component_A has two classes, the interface IComponent_A1
and class Component_A. Finally, Component_B contains class Component_B and a subordinate class
Helper. The object factory is an unbound function, not a class, and so is not shown in the class
diagram.
Executive composes Component_B and aggregates IComponent_A. Component_A inherits from its
interface IComponent_A. Component_B composes its helper class and uses the IComponent_A interface.
Note that Executive and Component_B both actually bind to an instance of Component_A, but they
have to use the contract provided by IComponent_A. That's whCode Structure y we show them linking to the
interface.
It is common, though not essential, that one of the classes in a package has the same name
as the package. Usually, a package name comes from the name of the project that controls its
build process.
Since Component_B was not configured with an interface, Executive depends on its implementation
details. It is likely that Executive and Component_B were designed together, to be used as a unit,
separated into two classes to make understanding and testing easier.
Interfaces in C++ are usually implemented with structs rather than classes. Classes and
structs are identical except that structs by default have public members while classes
by default have private members.
The next section describes how a collection of packages are built into a library or executable
file.
2 Compilation Model
The C++ tool-chain consists of a preprocessor, compiler, and linker. Each .cpp file and its
included .h header files form a translation unit. The build system handles one translation
unit at a time and does not carry over information from one translation to the next. It is
the job of the linker to bind various translations into an executable or library.
Compilation Model Details
Figure 1. C++ Compilation Model
C++ compiles each *.cpp file independently, and does not save type information when compiling more than one.
Each *.cpp file and all of its included *.h files are called a translation unit.
The C++ language is designed
to support one-pass compilation. That means that an entity: function, struct, or class must be defined before
its first use in compiler scan order. This is called the definition first rule:
Definition First Rule:
Instances of structs or classes can be declared only after the struct or class has been declared.
The compiler can't lay out code for the instance until it knows how much stack space it will
occupy. That is determined by the struct or class declaration.
The operation of a C++ Compiler is shown in Figure 1. Its first action is to build an intermediate source
code file with a preprocessor by replacing each #include statement in source code with the entire code of
the included file. Included files are really included in that source text.
The preprocessor also expands any macros or uses them to set compiler directives, e.g., for #pragma once.
The Compiler then consumes the intermediate source file and either compiles to an object file (*.obj), static
library (*.lib), or dynamic link library (*.dll), or, if there are compilation errors, it simply emits error
messages.
The results of these compilation output files are processed by a Linker. When program code makes calls or transfers
to code in the same compilation unit, the compiler assigns addresses based on the code it has laid out. However,
if the code makes calls into another compilation unit, then the compiler doesn't have an address, and so makes
an entry in a table of unresolved addresses.
The job of the Linker is to resolve these addresses. It can do that, because it does not execute until all of the
compilation units that target a specific execution image are compiled, so it has all the addresses it needs and
proceeds to resolve the unknowns.
That results in a runnable execution image. However, that is not the end of the story. The build process may
have defined dynamic link libraries which get loaded during execution. It is the job of the Loader to start
the execution image, and bind, at run-time, any dlls that the program needs.
When the linker has successfully completed creating an executable, the executable can be started
using
services of the operating system loader. The loader loads the executable image into memory and
loads any dynamic-link libraries on which the executable may depend. In that case it binds the
executable's calls to an appropriate entry in the library.
3 Program Execution Model
C++ source code compiles to native code that is loaded into memory, is initialized, and
begins to directly execute machine language instructions. You can view the program's assembled
code by choosing an option to generate a file containing assembler output.
Program Execution
Figure 2. C++ Program Model
When execution of a C++ program begins, initialization code generated by the compiler runs, then the thread of
execution enters main, with any arguments defined on the command line. Main entry creates a stack frame
- a block of allocated stack memory - that holds input arguments, any local data defined by main, and the
return value, used to indicate success or failure to the operating system.
Should main call a function, another stack frame is allocated for that function, and if that function calls
another, it too allocates a stack frame. Stack frames are allocated, as scratch-pad memory, for every scope
entered by the thread of execution. When the thread leaves that scope the allocated memory becomes invalid.
The next time a stack frame needs allocation, the invalid memory is likely to be part of that allocation.
Heap memory can be allocated by a program's code with a call to new, and deallocated with a call to delete.
Malloc and Free serve the same purpose in a C language program.
Input and output operations defined by a C++ program are handled with streams - iostreams cin and cout for the screen and console
and fstreams ifstream and ofstream for files. Error and logging are handled by cerr and clog. All of these stream
objects are constructed as global objects by the initialization code that runs before main is entered, and are available
anywhere in the program code.
The handles stdin, stdout, stderr, and stdlog are used by C programs. They are references to the program's
input and output channels, attached to screen and console. The program can define other handles for channels to
files defined by the program or discovered in the file system.
The C++ programming language gives developers freedom to choose where in memory fundamental data
and user defined objects reside. The consequences of those choices determine lifetimes of the data
and objects. This is addressed in the next section.
4 Memory Model
A C++ program runs in an environment with tiered memory: static memory allocated by the compiler
holds code and global data. Stack memory is allocated by the developer's use of scopes, and
heap memory is allocated to the process when it starts and heap storage of program artifacts is
managed by the language implemented memory manager, part of the C++ infrastructure.
Memory Model
Figure 3. C++ Memory Model
Figure 3. shows details of the C++ Memory Model. There are three types of memory: static, stack, and heap,
each with their own lifetime models.
Anything in static memory is defined, and has coherent values, for the lifetime of the program. That
includes all program code, global data, and static local data. Static local data is defined inside functions and qualified by the
keyword static. Local static data is initialized on the first entry to the function where defined, but does not
get re-initialized on subsequent entries, so static data can save information that persists between function calls.
Stack memory holds information defined by each program scope, e.g., a block of code surrounded by braces
"{" and "}". The life time of a stack allocation begins when the thread of
execution enters the scope, and ends when it leaves the scope. Then the allocation becomes invalid, the memory
is returned to the memory manager for reuse, and may be allocated for the next scope.
The call stack you see in a debugger running C++ code is just the set of stack allocations shown in Figure 2. and
Figure 3..
Heap memory is provided to a running program by the operating system. A default heap is created when a C++
program begins execution. The program code allocates heap space by using calls to operator new and deallocates
with a call to delete. So the life time of a heap object starts with it's allocation with new, and ends
with deallocation with delete.
Most classes manage data - their state - by using the static qualifier to place data in static memory,
creating scopes in stack memory defined by brace pairs, { and }, and
making heap allocations with the keywords new and delete.
5 Classes
Classes and class relationships are the building blocks for object-oriented design. The results
are a collection of objects - instances of classes - that cooperate to conduct operations
required of their program. We structure designs using:
inheritance, composition, aggregation, using, and friend class relationships.
Class Structure:
A class is like a "cookie cutter". It stamps out a section of memory in the stackframe
of its local scope, and initializes that memory with data required to create a valid object.
Each time it's used to declare an instance, another piece of memory is allocated and
initialized.
Each class has a set of methods - functions associated with that specific class - providing
operations on its allocated data when invoked. Each class has code that is stored
in static memory, and potentially many instances holding data usually stored in the stackframe of the function
where it is declared.
The example below models a point in some space, perhaps physical space-time, so it would have
four coordinates: x, y, z, t.
class Point {
public:
using iterator = std::vector<double>::iterator;
using const_iterator = std::vector<double>::const_iterator;
Point(size_t N, const std::string& name = "none");
Point(std::initializer_list<double> il);
void name(const std::string& name);
std::string name() const;
double& operator[](size_t i);
double operator[](size_t i) const;
size_t size() const;
iterator begin();
iterator end();
const_iterator begin() const;
const_iterator end() const;
private:
std::string name_ = "unspecified";
std::vector<double> coordinates_;
};
Figure 3. C++ Class/Object Layout
When an instance method is invoked, p1.name(param); p1εPoint, the address of p1 is sent to the
code for class Point to use when Point::name modifies p1's data. That address is identified
by the reserved word "this". You may occasionally see references to this in
methods of the class. Most use is implicit, but occasionally it must be used explicitly, as in
assignment operators that return *this.
C++ classes define special methods: constructors, assignment operators, and destructors. Constructor
method names are all the name of the class. Assignment operators use the operator= name and
destructor names are the class name prepended with a ~ character.
C++ is a strictly typed language. Every data artifact is required to have a type, either fundamental -
defined by the language - or user-defined. Templates allow a component to define a function or
class in terms of one or more unspecified parameters. Those are compiled, checking syntax for all
those parts that don't depend on the unspecified parameters.
So type checking for those unspecified parts is deferred until application code that defines the parameters
is compiled. This design is very useful, allowing construction of libraries that can support
operations on many different types of data.
6 Object Model
The C++ object model is concerned with managment of object resources and the kinds of operations
that are supported for instances of its type. Look closely at the memory layout and value type
discussions.
Object Model Details
All C++ objects support construction and destruction semantics. When an object is declared
in some scope its constructor ensures that its state is initialized with no support needed
from the using code except to provide parameters, if needed, to the constructor.
6.1 Scope-based Resource Management
When the thread of execution leaves the scope where an object has been constructed its
destructor will be invoked, releasing object resources deterministically, with no support
needed from its using code. This inherent resource management is often referred to as
Resource Acquisition Is Initialization (RAII).
Scope-based Resource Management a.k.a. RAII:
The C++ language guarantees that, when the thread of execution leaves a scope, all the
objects created within that scope will be destroyed by calling their destructors, releasing
any resources that have allocated to each object.
6.2 Memory Layout
The C++ object model is also concerned with how compound objects are layed out in memory. Structs
and Classes support five relationships that bind objects together to build composite objects:
inheritance, composition, aggregation, using, and friend-ship.
Figure 4. C++ Composite Object Model
The class diagram shown at the top of Figure 4 illustrates relationships between six entities:
D, a composite object which inherits a base class B and uses an object, U, created by
some other entity.
B, the base class, composes an instance of class C.
C, the composed class.
U, the used class.
friend, an entity: struct or class or method, or function, that is granted, by D,
access to its private data.
Client, an entity that aggregates an instance of D. That means that Client created
its instance of D sometime during its lifetime.
The object diagram, at the bottom of Figure 4., illustrates the layout of each of these
entities in memory. That's shown in two dimensions for clarity, but the layout is
actually a one-dimensional region of memory.
When B is constructed its composed member C is built within the memory footprint of B.
That means that the C instance is constructed as part of the construction of B.
Similarly, an instance of D contains, within its memory footprint, an instance of B. Again,
that requires B to be constructed as part of the construction process for D.
Instances of U, friend, and client are not owned by D and their memory layout is outside
that of the compound instance of D. That means that their construction processes are
independent of that of D. Of course, if client creates an instance of D it must be
constructed before D.
6.3 Value Types
C++ has been designed, from the ground up, to support value types, that is, types whose
instances can be assigned and copied.
Value Types:
Instances of value types can be copied and assigned. When a value type is copied, the
destination instance is constructed and acquires the same state values as the source
of the copy, but
remains an independent instance. Should one of the instances have its state modified
that does not affect the other.
Assignment is a similar operation. The only difference is that the destination object
already exists; the assignment gave it the same state values as the source of the
assignment.
C++ supports value types by allowing a class or struct designer to provide a copy
constructor and copy assignment operator overload members to manage the change of state
If a class has base classes (if any) and data members with correct copy and assignment
semantics, then the compiler will generate correct copy operations by memberwise copy
construction and copy assignment for both the object and its bases.
If that is not the case, then the designer supplies the constructor and operator overload
to handle the transfer of state correctly.
This is the most important of the C++ models. If you understand this, most of the rest of
the language makes sense.
7 Polymorphism
Polymorphism means to occur in several different forms. The term is used in two specific ways for
programming languages like C++.
One refers to "dynamic" polymorphism where a base class may represent any one of several
classes that derive from the base. The other way refers to "static"
polymorphism that uses templates. A class template can generate several different forms of code
based on parameter(s) supplied to instantiate the template.
Polymorphism Details:
When a class D derives from some class B it inherits all of the methods and data of B.
class D : public B { ... };
If there are multiple derived classes: D1, D2, ... then a base class pointer or reference
can be bound to any one of them:
B* pB = &D1;
B& br = D2;
Functions that accept a base pointer will accept a base pointer bound to any derived class:
void fun(B* ptrB) { ... }
B* pB = &D1;
fun(pB);
pB = &D2;
fun(pB);
Function fun can only use the interface supplied by the base B.
But all of the derived classes inherit that interface, so fun uses each of them
in different ways if the derived classes overrode base methods that fun invokes.
This is a very powerful way to build flexible code. If we need to add a new derived class
to satisfy some new requirement none of the functions that accept base pointers will be affected.
Inheritance supports two features:
Inheritance of implementation, e.g., a derived class inherits all of the methods and data
members of its base class.
Substitution of base pointers or references bound to derived classes in functions that
accept base pointers or references allow our code to add additional derived classes
that are guaranteed to work with functions that use them.
Figure 5. Virtual Function Pointer Table
Substitution is important because calls on a base class pointer bound to a derived class instance
are dispatched to the derived instance's code. So if the derived class has overridden the base class
method being called, the call is routed to the overridden code.
Every class that includes one or more virtual functions has a virtual function pointer table, as
shown in Figure 5. Two objects are shown in the left column, a base instance and a derived
instance, both accessed through a base pointer. The virtual function pointer tables for both
B and D are shown in the middle column, and the code for those functions appears in the column
on the right.
Notice, from the code declarations at the top of the figure, that D has not overriden mf1 but
has overridden mf2. When a call to mf1 is made:
pB = &b; pB->mf1(); invokes B::mf1 pB = &b; pB->mf2(); invokes B::mf2 // not overridden
pB = &d; pB->mf1(); invokes B::mf1 pB = &d; pB->mf2(); invokes D::mf2 // overridden
This indirect invocation is call dynamic dispatch because the function to call is deterimined
at run-time.
Dynamic dispatch is the mechanism for making substitution work, invoking methods of the
bound object, not those determined by the type of the pointer. And this is responsible for
making our code very flexible through polymorphism.
The other form of polymorphism - using Templates - is the topic of the next model.
8 Templates
A template is a code generator that creates type specific artifacts from parameterized patterns.
Function templates generate concrete functions and class templates generate concrete classes when
supplied, in application code, with types.
Templates
Templates generate functions and classes when an application instantiates them with
specific template parameters. The class template shown below generates
a point class for each type, T specified by using
code. The using code instantiates it with two parameter types, int and double,
so the template generates two classes, one for each type.
Figure 3. Template Code Generation
The example, below, models a point in some space, so it might have
three coordinates: x, y, z, e.g., coordinates_[0], coordinates_[1], coordinates_[2].
The diagram in Figure 3. assumes that Point<T> has been instantiated
for int and for double.
template<typename T>
class Point {
public:
Point(const std::string& name = "none");
std::string& name();
T& operator[](size_t i);
T operator[](size_t i) const;
size_t size() const;
private:
std::string name_;
std::vector<T> coordinates_;
};
Template parameterization
allows the designer to structure coordinates to fit an application. If the point described a location
in a flowing fluid, the coordiates might be the location and flow rate components for each axis.
T --> std::pair<double, double>
So, coordinates_ would have size 3 with entries for { x, flow_x },
{ y, flow_y }, { z, flow_z }
Figure 4. Specializing Generic Template
Should using code instantiate a template for additional
parameter types, that would generate another class for each one.
Fortunately, that is done by the compiler. Developers only need to write one template
class and perhaps one or two special cases.
A class template can be specialized or a function template overloaded to handle special cases.
It is not uncommon for a template design to work well for several application types, but fail
to operate as desired for special cases.
Specialization is possible because templates generate code when an application, using the template
for specified types, is compiled. When we specialize a class template, we define not only the
generic class, but also a type specific class for the special case. The language guarantees that,
if a specialization matches a specified type, the specialization will be compiled. If the type
doesn't match any specialization (there can be more than one) then the generic template
will be compiled using that type.
The same process happens for function templates, except that the special cases are overloads of
the template function and overload resolution is used instead of template type deduction.
We will discuss class template specialization and function template overloading in Chapter #7.
This concludes our discussion of C++ models. You will find details on each of them in succeeding chapters.
9 Epilogue
This chapter has explored important models used to help you understand how to develope C++ code.
The succeeding chapters turn these models into concrete designs and code.
Chapter #2 surveys
the language and provides many code examples to help you get started. Later chapters look in detail
at classes, class relationships, and templates.
Finally, there are two chapters that discuss C++ standard libraries. There will be more libraries
covered before too long.