about
Bits Data C++
11/25/2023
0
Bits Repo Code Bits Repo Docs

Bits: C++ Data Types

types, initialization, construction, assignment

Synopsis:

Most of a language's syntax and semantics derives directly from its type system design. This is true of all of the languages discussed in these Bits: C++, Rust, C#, Python, and JavaScript.
C++ has a complex type system, followed in order by C#, Rust, JavaScript, and Python. We will spend more time with this Bit than with all of the others because of its complexities and importance.
This page demonstrates simple uses of the most important C++ types. The purpose is to quickly acquire some familiarity with types and their uses.
  • Primitive types and aggregates of primitive types are copyable. Assignment and pass-by-value copies the source's value to the destination.
  • For user-defined types C++ provides special class methods - constructors, operators, and destructors. When properly implemented they provide instance syntax that mimics that of primitives. We won't see that here, but will in the next Bit page.
  • C++ supports move operations with move constructor and move assignment operator. temporaries are moved when assigned or passed-by-value to the destination using the resources of the move source. This is a transfer of ownership and makes the source invalid. Attempting to use a "moved" variable is, unfortunately, not a compile error.
  • C++ supports making fixed references to either primitive or user-defined types. These may refer to instances in stack memory, static memory, or in the native heap.
  • C++ pointers and references are un-constrained and a frequent source of memory management errors. By convention these can be avoided by using range-based for loops and smart pointers. For very large code bases it can be challenging to insure that the conventions have been followed everywhere - think of projects with 100,000 source lines or more.
  • Here we begin to see significant differences between the languages, especially when comparing statically typed languages like C++, Rust, and C#, with dynamically typed languages like Python and JavaScript.
C++ Types Details 

Table 1.0 C++ Types

Type Comments Example
-- Integral types ----
bool values true and false bool b = true;
int
with signed, unsigned, short, long qualifiers
1 == sizeof(char) ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long) ≤ sizeof(long long) short int i = 42;
size_t 2 ≤ sizeof(size_t),
size_t is used for indexing
size_t i = 0;
char
with signed and unsigned qualifiers
signedness of char depends on platform char c = 'a';
wchar_t, char16_t, char32_t wchar_t is 16 bits and holds UTF-16 code units on Windows, 32 bits and holds UTF-32 on Linux wchar_t c = L'\U263A';
-- Floating point types ----
float, double, long double values have finite precision, and may have approximate values double d = 3.14159;
-- literal string types --
const char[], const wchar_t[] literal string, resides in static memory and is always null terminated. "Hello" is a const char[6] containing the chars: 'H', 'e', 'l', 'l', 'o', '\0'. char* lst = "a literal string"
-- Aggregate types ----
T[N] Native array of N elements all of type T int arr[] = [1, 2, 3, 2, 1];
let first = arr[0];
std::tuple<T1, T2, ...> collection of heterogeneous types accessed by position std::tuple<int, float, char> tu = (42, 3.14159, 'z');
char third = get<2>(tu);
std::optional<T> std::optional holds optional value tεT or std::nullopt fn doOp(..args) -> Optional<T>
-- Std::library types ----
std::string Expandable collection of ASCII characters allocated in the heap std::string strg = "a string";
std::wstring Expandable collection of Unicode characters allocated in the heap std::wstring strg = L"a wstring";
std::array<T, N> Fixed size generic array of items of type T std::array<int,3> v { 1, 2, 3 };
std::vector<T> Expandable generic collection of items of type T std::vector<double> v { 1.0, 1.5, 2.0 }; v.push(2.5);
std::deque<T> Expandable double-ended generic collection of items of type T. Uses circular buffer internally. std::deque<int> vd { 1, 2, 3 }
v.push_front(-1); ...
std::unordered_map<K,V> Unordered associative container of Key-Value pairs, held in a table of bucket lists. std::unordered_map<std::string, int> m { {"one",1}, {"two",2} }
map.insert("zero", 0); ...
std::map<K,V> Ordered associative container of Key-Value pairs, held in binary tree. std::map<std::string, int> m { {"one",1}, {"two",2} }
map.insert("zero", 0); ...
forward_list, list, unordered_map, unordered_set, unordered_multimap, unordered_multiset, set, multiset, and several adapters like stack and queue The C++ std::library also defines types for threading and synchronization, reading and writing to streams, anonymous functions, and many more. Containers library
-- User-defined Types --
User-defined types Based on classes and structs, these will be discussed in the next Bit.
C++ Type System Details 

Table 2. C++ Copy and Move Operations

Operation Example Primitive or Aggregate of Primitives Library or User-defined Type
If uεT is a named variable
Construction T t = u; uεT u's value is mem-copy'ed to t T's copy constructor copies u
Assignment t = u, tεT, uεT u's value is mem-copy'ed to t T's assignment operator copies value of u
Pass-by-value fn doOp(T t) t's value is mem-copy'ed to doOp stack frame t's value is copied to doOp stack frame,
using T's copy constructor
If uεT is a temporary or u = std::move(v), vεT
Construction T t = u; uεT u's value is mem-copy'ed to t u's value is moved to t using T's move constructor
u's value becomes undefined
Assignment t = u, tεT, uεT u's value is mem-copy'ed to t u's value is moved to t using T's move assignment operator
u's value becomes undefined
Pass-by-value fn doOp(T t) t's value is mem-copy'ed to doOp stack frame T's move constructor copies t to doOp stack frame
t's value becomes undefined

Table 3. C++ Type System Attributes

Static typing All types are known at compile time and are fixed throughout program execution.
Inference Compiler infers types in expressions if not explicitly annotated or if declared auto. Occasionally inference fails and explicit annotation is required.
Intermediate strength typing Types are exhaustively checked but there are many implicit conversions.
  • Numeric and boolean literals coerce to their correspoinding type, e.g., 42 to int.
  • Variables with auto type declarations are coerced to the type of their RHS's
  • Values can be coerced using user-defined conversion constructors.
Generics Generics provide types and functions with unspecified parameters, supporting code reuse and abstraction over types. Generic parameters are specified at the call site, e.g., doOp<T, U>(T t, U u) The function doOp is checked for syntax before instantiating with specific type(s). Any use of t or u in the doOp body are not checked until the types of t and u are known.
Unlike all the other languages examined in these Bits, C++ generics use an internal meta language at compile time to implement generic facilities. That can be used to move some processing from run-time to compile-time using "template metaprogramming".
Class Relationships Class relationships are important tools for modeling both application and implementation domains.
  1. Inheritance: A design process that uses a base type as part of a more specialized derived type, automatically exposing public members of the base as public members of the derived. An instance of the base type becomes part of the memory footprint of instances of the derived type. Some languages allow use of base implementation as part of the derived implementation, some languages allow only declarations of base members to be declarations of the derived. C++ supports multiple inheritance of base implementations and interface1 declarations.
  2. Composition: A design process that uses an instance of a composed child type as a member of the composing type. Composition results in the child instance embedded in the memory footprint of the composer. Construction, assignment, and pass-by-value result in two independent instances, e.g., the source and destination1. C++ types can compose instances of arbitrary types.
  3. Aggregation: A design process that uses a pointer or managed handle referring to an aggregated type as a member of the aggregator. Aggregation results in a child instance placed in a native or managed heap2 at a location distinct from its aggregator type and referred to with a handle. Construction, assignment, and pass-by-value result in two references to the one source instance. C++ can use composition and aggregation with any type.

  1. C++ does not have a unique interface type category. It uses classes with no member data and all public virtual method declarations for specifying interfaces.
Concepts Concepts are similar to Rust traits and Java and C# interfaces. They define shared behavior that types can implement, supporting abstraction over behavior. Concepts define behavior by declaring concept specific functions. A template type can use a Requires clause with concept arguments to bound types that are valid for a class or function.

1.0 Initialization

Several of the code blocks shown below have formatting and output code elided. You can find complete code in the Bits Repository:   Bits_Data.cpp, Bits_DataAnalysis.h,

1.1 Primitives

Initialization is the process of endowing a newly created type instance with a specified value. Uninitialized local instances have undefined values. Uninitialized variables at global scope are compiler-initialized to zero. To insure well defined behavior initialize all varibles where they are declared, as shown below.
  /*----------------------------------------------  
    All code used for output has been elided
  */
  /*-- scalars --*/
  bool b = true;
  std::byte byte { 0x0f };
    /*  std::byte => unsigned char {} */
  int i = 42;  // equiv to int i { 42 };
  double d = 3.1415927;
  char ch = 'z';
  const char* lst = "a literal string";

Instances of primitive types each occupy one block of contiguous memory. Scalars can be initialized by either assigning a value or with a braced initializer. The int type can be qualified with keywords short, long, and long long. Type double can be qualified with long. A complete list of types and their qualifiers are given in the "C++ Types Details" dropdown list, above.
Output
  --- bool ---
  b: true
  b: type: bool
  b: size = 1
  --- byte ---
  byte: 0xf
  byte: type: enum std::byte
  byte: size = 1
  --- int ---
  i: 42
  i: type: int
  i: size = 4
  --- double ---
  d: 3.141593
  d: type: double
  d: size = 8
  --- char ---
  ch: z
  ch: type: char
  ch: size = 1
  --- const char* ---
  lst: "a literal string"
  lst: type: char const * __ptr64  
  lst: size = 8
  lst: char count = 16

Each type in the code block on the left is characterized by its value, type evaluated from the core C++ function typeid(t).name(), and size retrieved from the core C++ function sizeof(t). sizeof(t) returns the number of bytes allocated by the compiler for the value of t. Note that the literal string, lst, at the bottom, is accessed from a pointer, which, for a 64 bit computer, is 8 bytes, not the count of characters. The actual length of lst is 17 bytes, because strlen(lst) ignores the trailing null terminator which all C++ strings have.

1.2 Aggregates

Aggregate types each occupy one block of contiguous memory if and only if all their items are primitives. In that case they are copied byte-by-byte with std::memcopy. If they have any non-primitive items then they are copied with the type's Copy construction and copy assignment operators.
  /*---------------------------------------------------
    All code used for output has been elided
  */
  /*-- array --*/
  short int fa[] { 1, 2, 3, 4, 5 };
  short int fa_alt[5] { 1, 2, 3, 4, 5 };
  auto afirst = fa[0];

  /*-- struct --*/
  struct S { int a; char b; double c; };
  S strct { 1, 'a', 3.1415927 };
  auto sfirst = strct.a;

  /*-- tuple --*/
  std::tuple<int, double, char> tup { 1, 3.14, 'z' };
  auto tfirst = get<0>(tup);

  /*-- optional --*/
  std::optional<double> opt1 { 3.1415927 };
  std::optional<double> opt2; // { std::nullopt };
  if(opt2 == std::nullopt) {
    std::cout << "empty\n";
  }
  else {
    std::cout << *opt2 << "\n";
  }
All the aggregate types: array, struct, tuple, and optional, use a braced initializer list. std::tuple and std::optional are generic types defined in the std::library, not by the core C++ language. We discuss generic types in Bits_GenericCpp. All the aggregates occupy contiguous blocks of memory if and only if all their elements are primitive. Those are copied with a bit-wise copy supplied by std::memcopy. If any of their elements are non-primitive, e.g., a std::string element, the element is copied by its copy constructor or copy assignment operator. The struct and tuple types are similar in that they each contain a list of elements of heterogeneous types. Struct elements are retrieved by name while tuple elements are retrieved by position.
Output
  --- aggregate types ---

  --- native array ---
  fa[]: [ 1, 2, 3, 4, 5 ]
  fa[]: type: short const * __ptr64
  fa[]: size = 10
  afirst = 1
  --- struct ---
  strct: S { 42, a, 3.14159 }
  strct: type: struct `void __cdecl initialize_primitiv...
  strct: size = 16
  sizeof(strct.a) = 4
  sizeof(strct.b) = 1
  sizeof(strct.c) = 8
  sfirst = 42
  --- tuple ---
  tup: { -1, 3.14, z }
  tup: type: class std::tuple
  tup: size = 24
  tfirst = -1
  --- optional ---
  opt1: 3.14159
  opt2: empty
  opt1: type: class std::optional
  opt1: size = 16

The variable fa is the name of a fixed size array of short interger elements stored in a contiguous block of memory, usually in the program's stack. When passed to functions its type decays to a pointer to the caller's array. That avoids potentially large copy operations. strct is a structure holding an integer, character, and double. The memory layout pads the character strct.b to 4 bytes so all the elements begin on word boundaries, improving read and write times. tup is a tuple with integer, double, and character elements respectively. The memory layout adds padding to the last character element to improve performance. The std::optional<T> type represents a value that may or may not exist. Commonly used to return values from a search function that may not find a match.

1.3 Std::library Types

  /*-- initialize std::library types --*/

  /*------------------------------------------------- 
    All code for output has been elided
  */
  /*-- generic array --*/
  std::array<double, 7>
  sarr { 1.0, 1.5, 2.0, 2.5, 2.0, 1.5, 1.0 };
  double afirst = sarr[0];

  /*-- expandable string --*/
  std::string sstr = "a string";
  double sfirst = sstr[0];

  /*-- expandable indexable array --*/
  std::vector<int> vec { 1, 2, 3, 4 };
  vec.push_back(5);
  int vthird = vec[2];

  /*-- expandable double-ended queue --*/
  std::deque<int> dq { 1, 2, 3 };
  dq.push_front(0);
  dq.push_back(4);
  int qfirst = dq.front();

  /*-- expandable associative container --*/
  std::unordered_map<std::string, int> umap
  {{"one", 1}, {"two", 2}, {"three", 3}};
  umap.insert({"zero", 0});
  int uZero = umap["zero"];

Of the many std::library types, we look here at a few of the collection types. Collections have a finite number of items, all of the same type. This code block illustrates how to initialize a collection and how to access its members. Collections all have copy contructors, copy assignment operators, and corresponding move construction and assignment operations. The next Bits_ObjectsCpp discusses user-defined types. Definition of construction and assignment operations will be illustrated there. The std::string, std::vector<T>, and std::unordered_map<K, V> are the most frequently used of the collection types.
Output
  --- std::array<double, 7> ---
  sarr:   [ 1.00, 1.50, 2.00, 2.50, 2.00, 1.50, 1.00 ]
  sarr:   type: class std::array<double,7>
      
  --- std::string => std::basic_string<char> ---
  sstr:   "a string"
  sstr:   type: class std::basic_string<char,struct std:...

  --- std::vector<int> ---
  vec:    [ 1, 2, 3, 4, 5 ]
  vec:    type: class std::vector<int,class std::allocat...
  
  --- std::deque<int> ---
  dq:     [ 0, 1, 2, 3, 4 ]
  dq:     type: class std::deque<int,class std::allocato...
  
  --- std::unordered_map<std::string, int> ---
  umap:   { {one, 1}, {two, 2}, {zero, 0}, {three, 3} }
  umap:   type: class std::unordered_map<class std::basi...

std::array<T, N>; is similar to C++'s native array T[N], but also provides methods like size_t size(), constexpr T& front() and constexpr T& back(). Both are stored in contiguous memory in the program' stack. The std::string, std::vector<T>, std::deque<T> and std::unordered_map<K, V> all have control blocks stored in stack memory and data elements stored in the heap. That is done so that their capacities are expandable and can be as large as available memory. Insertions into vector, string, and deque are linear time, while insertions into an unordered_map are nearly constant time. A map insertion takes constant time to compute a table address and linear time to walk short bucket lists. All but the fixed-size array are subject to memory reallocations when an insertion is attempted when their data memory is full. That includes the map which may have to reallocate its table to avoid long bucket lists.

1.4 Heap Storage

  /*-- initialized in heap memory --*/

  /*--------------------------------------------------
    All code for output has been elided
  */
  /*-- raw pointer --*/
  int* tmol = new int{42};
  // using code here
  delete tmol;  // forgetting this causes memory leak

  /*-- unique_ptr to scalar --*/
  std::unique_ptr<int> suni = 
    std::make_unique<int>(-1);
  int utmp = *suni;
  // std::unique_ptr deallocates heap storage when
  // it goes out of scope, so no memory leak.

  /*-- shared_ptr to scalar --*/
  std::shared_ptr<int> stmol 
    = std::make_shared<int>(42);
  int stmp = *stmol;
  // std::shared_ptr deallocates heap storage when
  // all references go out of scope, so no memory leak.

  /*-- shared_ptr to collection --*/
  std::shared_ptr<std::vector<int>> pVec =
    make_shared<std::vector<int>>(
      std::vector<int>{ 1, 2, 3 }
  );
  auto vtmp = *pVec;
  // only difference from scalar case is initialization

  /*-- using aliases to simplify --*/
  using VecI = std::vector<int>;
  using SPtr = std::shared_ptr<VecI>;
  SPtr pVec2 = make_shared<VecI>(VecI{ 1, 2, 3 });
  // equivalent to pVec, just abbreviates syntax

Heap memory is used for on-demand allocation at run-time, most often for std::library and user-define collection types. Memory is allocated with the new operator and deallocted with delete; Modern C++ by convention replaces use of raw pointer for allocation and deallocation with the smart pointer std::unique_ptr<T>, its creational function std::make_unique<T>(T t), or std::shared_ptr<T>, and its creational function std::make_unique<T>(T t). The creational functions internally use the new operator to create an instance of the input type on the heap. Both call delete on their member pointers when they go out of scope. That eliminates one source of memory leaks, but depends on convention to advocate for its use. Syntax for the standard pointers and their creational functions is rather verbose. That can be simplified by using alias declarations that provide shorter, but equivalent, syntax. Note that all of these declarations use literals or temporaries to initialize heap memory storage.
Output
  --- initialized in heap memory ---

  --- raw pointer ---
  *tmol: 42
  *tmol: type: int
  *tmol: size = 4
  tmol: 0000016E4A956D80
  tmol: type: int * __ptr64
  tmol: size = 8

  --- std::unique_ptr<int> ---
  *suni: -1
  *suni: type: int
  *suni: size = 4
  suni: 000001D7CFCE6300
  suni: type: class std::unique_ptr<int,struct std::de...
  suni: size=8

  --- std::shared_ptr<int> ---
  *stmol: 42
  *stmol: type: int
  *stmol: size=4
  stmol: 0000016E4A967420
  stmol: type: class std::shared_ptr<int>
  stmol: size=16

  --- std::shared_ptr<std::vector<int>> ---
  *pVec: [ 1, 2, 3 ]
  *pVec: type: class std::vector<int,class std::allocat...
  *pVec: size=32
  pVec: 0000016E4A968260
  pVec: type: class std::shared_ptr<class std::vector<...
  pVec: size=16

  --- using aliases to simplify ---
  *pVec2: [ 1, 2, 3 ]
  *pVec2: type: class std::vector<int,class std::allocat...
  pVec2: 0000016E4A968880
  pVec2: type: class std::shared_ptr<class std::vector<...

Smart pointers std::unique_ptr<T> and std::shared_ptr<T> are dereferenced using the same syntax as raw pointers. Note that the size of the raw pointer is 8 bytes and the size of the unique pointer is also 8 bytes. The shared pointer keeps both a pointer to the heap storage and a reference count, so its size is 16 bytes. There is very little size and performance penalty for using these smart pointers.

2.0 Copy Operations

C++ copy operations occur during construction, assignment, and pass-by-value. Primitive types and aggregates of primitive types occupy a contiguous block of memory and are copied bit-wise with std::memcopy. Non-primitives like all of the std::library collections have a control block in stack memory and values in heap memory. They are copied with copy constructor and copy assignment operator.

2.1 Copy Operations for Primitives

  /*-- copy operations for primitives --*/

  /*-------------------------------------------------------
    All code for output has been elided
  */
  /*-- primitive copy construction - bit-wise copy --*/
  int ival = 42;      // initialization
  int jval = ival;    // here's the copy construction

  /*-- copy assignment, output elided --*/
  double dval = 3.1415927;  // create dst instance
  double eval = 1.33333;    // create src instance
  dval = eval;              // copy assignment operation

Copy operations for primitive types are very simple - just copy bytes from source to destination. There are no side-effects and no context dependencies.
Output
  ------------------------------------------
    copy operations for primitives
  --------------------------------------------------

  --- copy construction ---
  ival:   42
  ival:   type: int
  ival:   address: 0000002E4DEFDAF4
  --- int jval = ival ---
  jval:   42
  jval:   type: int
  jval:   address: 0000002E4DEFDC04
  ---------------------------------------------
    Addresses of ival and jval are different,
    demonstrating copy, as expected.
  ---------------------------------------------

  --- copy assignment ---
  dval:   3.141593
  dval:   type: double
  dval:   address: 0000002E4DEFDD18
  eval:   1.333330
  eval:   type: double
  eval:   address: 0000002E4DEFDDD8
  --- dval = eval ---
  dval:   1.333330
  dval:   type: double
  dval:   address: 0000002E4DEFDD18
  ---------------------------------------------
    Addresses of dval and eval are different
    demonstrating copy.
  ---------------------------------------------

Copy construction copies source values to a newly allocated memory location. Comparing addresses, it is clear that the value of ival was copied to the location of newly created jval. Assignment copies values from one existing memory location to another already existing location, e.g., same as copy construction except no new allocation. In this code, eval is created to be the source of an assignment to dval. The addresses demonstrate that assignment results in a copy. This demonstration of the near-obvious sets up the basis for more complex construction and assignment operations for std::library types.

2.2 Copy Operations for Std::library Types

  /*-- copy operations for std::library types --*/

  /*-------------------------------------------------------
    All code for output has been elided
  */
  /*-- vector copy construction --*/
  std::vector<double> vec { 1.0, 1.5, 2.0 };
  auto uec = vec;  // copy construction

  /*-- vector copy assignment --*/
  std::vector<double> tec { 1.0, 1.5, 2.0, 2.5 };
  uec = tec;       // copy assignment

For std::library and user-defined types copy construction is defined as part of the type's class definition. The std::vector<double> uec is created with the copy constructor defined in the std::vector<T> class. The second example illustrates copy assignment in which std::vector<double> uec is assigned the value of tec, using the copy assignment operator of the vector class.
Output
  --------------------------------------------------  
    copy operations for std::library types
  --------------------------------------------------

  --- vector copy construction ---
  vec:    [ 1.00, 1.50, 2.00 ]
  vec:    address: 0000002E4DEFDFF8
  vec[0]: address: 0000016E4A967950
  --- auto uec = vec ---
  uec:    [ 1.00, 1.50, 2.00 ]
  uec:    address: 0000002E4DEFE198
  uec[0]: address: 0000016E4A967A10
  --------------------------------------------------
  Note:
  Both uec and vec and their resources are unique.
  That's because the vector copy constructor
  copies each element of vec to uec.

  Managed languages copy handles to instances,
  not the instances themselves, so construction
  does not create a new instance in those
  languages.  Resources are shared.
  --------------------------------------------------

  --- vector copy assignment ---
  tec:    [ 1.00, 1.50, 2.00, 2.50 ]
  tec:    address: 0000002E4DEFE338
  tec[0]: address: 0000016E4A967770
  --- uec = tec ---
  uec:    [ 1.00, 1.50, 2.00, 2.50 ]
  uec:    address: 0000002E4DEFE198
  uec[0]: address: 0000016E4A9673B0
  --- original source vec has not changed ---
  vec:    [ 1.00, 1.50, 2.00 ]
  vec:    address: 0000002E4DEFDFF8
  vec[0]: address: 0000016E4A967950
  --------------------------------------------------
  Note:
  Both uec and tec and their resources are unique.
  That's because vector copy assignment operator
  copies each element of tec to uec.

  Managed languages copy handles to instances,
  not the instances themselves, so assignment
  causes sharing of resources in those languages.
  --------------------------------------------------

The examples shown here use the addresses of a library type, std::vector and its heap resources vec[0] to show that the source and destination are unique, e.g., the type's control block and heap resources have indeed been copied. That's demonstrated for both copy construction and copy assignment. The next Bits page demonstrates how to implement construction and copy assignment for user-defined types.

3.0 Move Operations

Move operations are used to improve performance over copy operations. Their use is usually context dependent and require the convention of never using an instance after it has "moved" to avoid undefined behavior.

Move std::string

  /*-- move temporary string --*/
  auto first = std::string("first part");
  auto mid = std::string(" and ");
  auto last = std::string("last part");
  auto aggr = first + mid + last;
  /*-----------------------------------------------------
    first + mid + last is a temporary string that
    is moved to aggr using move constructor.
  ------------------------------------------------------*/

  /*-- forced string move --*/
  auto src = std::string("src string");
  auto dst = std::move(src);
  /*-----------------------------------------------------
    There is no guarantee that src will have valid
    state after move, so using src after has undefined
    behavior.
  ------------------------------------------------------*/

Unlike copy, move operations move ownership of a type instance's resources to another instance, without actually copying the instance's resources. Instead, only the instance's control block is copied, so the new instance has all the same properties as the originial, including the pointer to its resources, without the expense of copying what may be a large collection of resources. C++ moves occur when a temporary (unnamed) instance is used to construct a new instance. The first example shows a std::string temporary, the sum of three string fragments, used to create a new string, aggr. A move can also be forced using the std::move(t) function. Forced moves are used to efficiently send an instance's value into a pass-by-value function if it won't be needed again. It is also used when the argument is not copyable, as we saw for std::unique_ptr<int> in the last section. Unforced moves may occur for return by value of an instance created within a function. That will only happen when "return value optimization" does not apply. This is another of C++'s famous context dependencies.
Output
  --- move temporary string ---
  first:  "first part"
  mid:    " and "
  last:   "last part"
  --- auto aggr = first + mid + last ---
  aggr:   "first part and last part"
  --------------------------------------------------
   first + mid + last is a temporary string that
   is moved to aggr using move constructor.
  --------------------------------------------------

  --- forced string move ---
  src:    "src string"
  src:    type: class std::basic_string<char,struct std:...
  --- auto dst=std::move(src) ---
  dst: "src string"
  dst: type: class std::basic_string<char,struct std:...
  src:    ""  // DON'T DO THIS
  src:    type: class std::basic_string<char,struct std:...
  --------------------------------------------------
   There is no guarantee that src will have a valid
   state after move, so the src display, above, has
   undefined behavior - just happens to work on MSVC.
  --------------------------------------------------
The first example creates a temporary std::string from three string fragments and moves the result to aggr. The second example shows the forced move of src to the dst string. It also shows the "value" Of the source string after it has been moved. THIS IS IMPLEMENTATION DEPENDENT. Using an instance after it has been moved has undefined results. It might have resulted in a segment fault on another platform or operating system.

4.0 Pass-by-value & Pass-by-reference

This is the only code block on this page that shows all of the code, including formatting and output. Note that you can see all of the code for the other demos by looking at contents of the Bits Repository.
  /*-------------------------------------------
    All code is shown for this example, e.g.,
    includes formatting code and output.
  -------------------------------------------*/
  template<typename T>
  void pass_seq_by_value(T t) {
    std::cout << formatOutput<T>(
                   t, "   passed", seq_collectionToString<T>
                 );
    std::cout << formatAddress<T>(t, "   passed");
    std::cout << formatAddress<typename T::value_type>(
                   t[0], "passed[0]"
                 );
  }
  template<typename T>
  void pass_seq_by_reference(T& t) {
    std::cout << formatOutput<T>(
                   t, "   passed", seq_collectionToString<T>
                 );
    std::cout << formatAddress<T>(t, "   passed");
    std::cout << formatAddress<typename T::value_type>(
                   t[0], "passed[0]"
                 );
  }

  void pass_by_value_and_ref() {

    showLabel("pass-by-value and pass-by-reference");
    nl();

    showOp("std::vector<int> pass-by-value");
    nl();
    using VECI = std::vector<int>;
    auto v = std::vector<int>() = { 1, 2, 3 };
    std::cout << formatOutput<VECI>(
                   v, "     call", 
                   seq_collectionToString<VECI>
                 );
    std::cout << formatAddress<VECI>(v, "     call");
    std::cout << formatAddress<int>(v[0], "  call[0]");
    pass_seq_by_value(v);
    showLabel(
      "passed has the same value as call.\n"
      "  call and its resources are different from\n"
      "  passed and its resources.  That demonstrates\n"
      "  passed was copy constructed from call."
    );
    nl();

    showOp("std::vector<int> pass-by-reference");
    nl();
    using VECI = std::vector<int>;
    auto rv = std::vector<int>() = { 1, 2, 3 };
    std::cout << formatOutput<VECI>(
                   rv, "     call", 
                   seq_collectionToString<VECI>
                 );
    std::cout << formatAddress<VECI>(rv, "     call");
    std::cout << formatAddress<int>(rv[0], "  call[0]");
    pass_seq_by_reference(rv);
    showLabel(
      "call and its resources are the same as\n"
      "  passed and its resources.  That demonstrates\n"
      "  that only reference was copied."
    );
}
A "pass-by-value" function passes an instance unadorned with references, e.g., R f(T t). Arguments are passed by reference by passing a reference to an instance, e.g., R g(T& t). Pass-by-value creates a copy of the argument in the function's stack frame. Pass-by-reference creates a reference to the caller's instance, so a possibly large copy is avoided. Unless qualified by const, as in R h(const T& t), passing by reference has side-effects. Any change to the referenced argument in the function is seen by the caller.
Output
  --------------------------------------------------
    pass-by-value and pass-by-reference
  --------------------------------------------------

  --- std::vector<int> pass-by-value ---

       call: [ 1, 2, 3 ]
       call: type: class std::vector<int,class std::allocat...
       call: address: 0000004A4276F3C8
    call[0]: address: 00000202AD1B4B80
     passed: [ 1, 2, 3 ]
     passed: type: class std::vector<int,class std::allocat...
     passed: address: 0000004A4276F760
  passed[0]: address: 00000202AD1B4AE0
--------------------------------------------------
  passed has the same value as call.
  call and its resources are different from
  passed and its resources.  That demonstrates
  passed was copy constructed from call.
--------------------------------------------------

  --- std::vector<int> pass-by-reference ---

       call: [ 1, 2, 3 ]
       call: type: class std::vector<int,class std::allocat...
       call: address: 0000004A4276F578
    call[0]: address: 00000202AD1B45E0
     passed: [ 1, 2, 3 ]
     passed: type: class std::vector<int,class std::allocat...
     passed: address: 0000004A4276F578
  passed[0]: address: 00000202AD1B45E0
--------------------------------------------------
  call and its resources are the same as
  passed and its resources.  That demonstrates
  that only reference was copied.
--------------------------------------------------
Addresses tell the whole story. Pass-by-value results in two unique instances, the caller's object and its copy in the function's stack frame. Pass-by-reference results in a reference created in the function's stack frame attached to the caller's object, so the adddresses are the same.

5.0 Display Functions

When you look at any of the "Output" details, you will see some output with detailed formatting, but you won't see code providing that output in corresponding code sections. Code responsible for formatting and supplying low-level details, like type information, has been elided from the code shown above. The elided code consists of calls to functions shown in the dropdown below. These functions use language features, like generics, that will be covered in later Bits. You can find the complete code, including all the elisions, in the Bits Repository.
Display Functions
  /*-- convert scalar to string ----------------------*/
  template <typename T>
  std::string scalarToString(const T& scalar) {
    /* format in-memory stringstream so formats 
       are temporary */
    std::ostringstream out;
    out.precision(3);
    out << std::showpoint;
    out << std::boolalpha;
    out << scalar;
    return out.str();
  }
  /*-- convert sequential collection to string -------*/
  template <typename C>
  std::string seq_collectionToString(const C& coll) {
    /* format in-memory stringstream so formats 
       are temporary */
    std::ostringstream out;
    out.precision(3);
    out << std::showpoint;
    out << std::boolalpha;
    out << "[ ";
    /*-- show comma only in interior of sequence --*/
    bool first = true;
    for(auto item : coll) {
      if(first) {
        out << item;
        first = false;
      }
      else {
        out << ", " << item;
      }
    }
    out << " ]";
    return out.str();
  }

  /*-- several conversion functions elided --*/

  /*-- return formatted output string ----------------
    - third arg is lambda (aka closure) std::function
    - intent is to pass in format converter function
      customized for type T
    - examples of converter functions are given above
  */
  template <typename T>
  std::string formatOutput(
    const T& t,             // format t
    const std::string& nm,  // caller name
    std::function<std::string(const T&)> f, // format
    bool showtype = true    // default to show type
  ){
    std::stringstream out;
    out << "  " << nm + ": "
    << f(t) << "\n";
    if(showtype) {
      out << getType(t, nm);
      out << "  " << nm + ": size = " 
          << sizeof(t) << "\n";
    }
    return out.str();
  }
  /*--------------------------------------------------
    return formatted address as string
  */
  static const size_t WIDTH = 8;

  /*-- return address of t --*/
  template<typename T>
  std::string formatAddress(
    const T& t, const std::string& nm
  ) {
    const T* ptrToArg = &t;
    std::stringstream out;
    out.precision(7);
    out << "  " << std::showpoint;
    out << std::boolalpha;
    out << std::setw(WIDTH) << std::left 
        << nm + ": " << "address: ";
    out << std::showbase << std::hex 
        << ptrToArg << "\n";
    return out.str();
  }
  /*-----------------------------------------------
    truncate string to length n
    - does nothing if string length is less than n
  */
  inline std::string truncate(
    const std::string& str, size_t n = 40
  ) {
    std::string tmp = str;
    if(tmp.size() < n) {
      return tmp;
    }
    tmp.resize(n);
    tmp += "...";
    return tmp;
  }
  /*-----------------------------------------------
    return type of t
  */
  template<typename T>
  std::string getType(T t, const std::string &nm) {
    std::ostringstream out;
    out << "  " 
        << nm + ": ";    // show name at call site
    out << "type: " 
        << truncate(typeid(t).name());  // show type  
    out << "\n";
    return out.str();
  }
  /*-----------------------------------------------
    display text delimited with "---"
  */
  inline void showOp(
    const std::string& text, 
    const std::string& suffix = ""
  ) {
    std::cout << "  --- " 
              << text 
              << " ---" 
              << std::endl << suffix;
  }
  /*-----------------------------------------------
    display text with lines above and below
  */
  inline void showLabel(
    const std::string& text, size_t n = 50
  ) {
    auto line = std::string(n, '-');
    std::cout << line << std::endl;
    std::cout << "  " << text << std::endl;
    std::cout << line << std::endl;
  }
  /*-----------------------------------------------
    send a text string to std::cout
  */
  inline void print(const std::string& txt) {
    std::cout << "\n  " << txt;
  }
  /*-----------------------------------------------
    send a line of text to std::cout
  */
  inline void println(const std::string& txt) {
    std::cout << "\n  " << txt << "\n";
  }
  /*-----------------------------------------------
    emit newline
  */
  static void nl() {
  std::cout << std::endl;
  }
The first code block on the left shows:
functions for converting scalars and collections to strings.
a formatOutput function accepting an object and one of the converter functions.
That function returns a formatted string representing the first argument.
The first code block also shows a function used to format pointer addresses for display.
The second code block provides two functions:
template<T>
std::string getType(T t, ...) returns a string representation of the compiler's inferred type for t.
std::string truncate(...) truncates strings returned by getType to fit the display medium.
These functions are helpers for the formatting functions in the first block.

6.0 Build

  C:\github\JimFawcett\Bits\Cpp\Cpp_Data\build
cmake --build .
MSBuild version 17.5.1+f6fdcf537 for .NET Framework

  Bits_Data.cpp
  Bits_DataAnalysis.cpp
  Generating Code...
  Cpp_Data.vcxproj -> 
    C:\github\JimFawcett\Bits\Cpp\Cpp_Data\build\Debug\Cpp_Data.exe
C:\github\JimFawcett\Bits\Cpp\Cpp_Data\build
All C++ builds for the Bits demos use CMake invoked from the VS Code terminal. The first build needs CMake infrastructure created with the terminal command:
cmake ..
Builds are created with the terminal command:
cmake --build .
A build is executed with the command:
debug/CppData.exe
where the executable name, here CppData.exe, is defined in CMakeLists.txt.

6.1 CMakeLists.txt


  cmake_minimum_required(VERSION 3.25)
  project(CppData)
  #---------------------------------------------------
  set(CMAKE_BUILD_TYPE Debug)
  #---------------------------------------------------
  #   CppData dir
  #   -- CMakeLists.txt (this file)
  #   -- src dir
  #      -- Bits_Data.cpp
  #      -- Bits_DataAnalysis.h
  #      -- Bits_DataAnalysis.cpp
  #   -- build directory
  #      -- Debug directory
  #         -- Cpp_Data.exe
  #         -- ...
  #---------------------------------------------------

  # Wasn't able to get std::library modules to work with CMake.
  # - does work in Visual Studio, preview edition, non CMake project
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}
    /experimental:module /std:c++latest /EHsc /MD"
  )
  #-- Things I tried to get CMake to find str module --
  # set(CMAKE_MODULE_PATH "C:/Users/Public/std_modules")
  # set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /std:c++20")
  # set_property(TARGET $CppData PROPERTY CXX_STANDARD 20)
  # target_compile_features(CppData PUBLIC CXX_STANDARD 20)
  # set(CMAKE_CXX_FLAGS "${/experimental:module /std:c++latest}")
  # set(CMake_CXX_STANDARD 20)
  # set(CMAKE_CXX_STANDARD_REQUIRED ON)
  # set(CMAKE_CXX_EXTENSIONS OFF)

  #---------------------------------------------------
  # build Bits_Data.obj, Bits_DataAnalysis.obj
  #   in folder build/Cpp_Data.dir/debug
  #---------------------------------------------------
  set(SRC
    src/Bits_Data.cpp
  )
  include_directories(src)
  add_executable(CppData.exe ${SRC})
  #---------------------------------------------------

Comments in CMakeLists.txt illustrate a fruitless search for means to compile C++ code that uses C++20 modules. I've been able to build such code with Visual Studio projects, using the preview version of Visual Studio circa September 2023. I suspect that CMake will build them, but doesn't like my path specification. I'll fix that eventually.

7.0 VS Code View

The code for this demo is available in github.com/JimFawcett/Bits. If you click on the Code dropdown you can clone the repository of all code for these demos to your local drive. Then, it is easy to bring up any example, in any of the languages, in VS Code. Here, we do that for Cpp\Cpp_Data. Figure 1. VS Code IDE Figure 2. Launch.JSON

8.0 References

Reference Description
C++ Story E-book with thirteen chapters covering most of intermediate C++
C++ Bites Relatively short feature discussions