about

Bits Data Python

11/25/2023

Bits: Python Data Types

types, initialization, construction, assignment

Synopsis:

Most of a language's syntax and semantics derives directly from its type system design. This is true of all of the languages discussed in these Bits: C++, Rust, C#, Python, and JavaScript.

Python has a relatively simple type system, entirely composed of dynamic types. Dynamic types are checked at run-time. Variables may be bound to any language defined type at run-time and may be reset to another type at any time.

This page demonstrates simple uses of the most important Python types. The purpose is to quickly acquire some familiarity with types and their uses.

All instances of any of the Python types live in its managed heap.
Construction, assignment, and pass-by-value copy references, not instances stored in the Python managed heap. That means that any of these operations results in two handles pointing to the same instance in the managed heap.
Python has a deepcopy operation, defined in the copy module that implements a clone operation, e.g., the original and clone are independent instances and do not share any data.
Python does not support move operations since its garbage collector owns all type instances. There is no need to use moves to avoid the expense of possibly large copies, since the language does not directly support making copies of reference types.
Python is memory safe because all memory management operations are executed by its execution engine, not by user's code. That comes with performance penalties for throughput and latency that are significantly worse than that of native code languages like C++ and Rust.
Here we begin to see significant differences between the languages, especially when comparing statically typed languages like C++, Rust, and C#, with dynamically typed languages like Python and JavaScript.

Python Types Details

Types are defined by the size of their memory allocations, the encodings used to place values in memory, and rules that define how the values can be accessed, modified, and combined.

The name of an instance of any Python type binds to a unique handle that refers to a heap allocation that may have other handles referring to it.

Table 1.0 Python Types

Type	Comments	Example
-- Scalar reference types ----
bool	values true and false	bool b = true;
int	can expand to limit of available memory	int i = 42;
float	size = 4 bytes, values have finite precision, and may have approximate values	f = 3.1415927; f = -3.33e5
complex	real and imaginary floating point parts	c = 3.5 + 2.0j
-- Aggregate reference types ----
bytes	immutable binary sequence of bytes, 1 byte ~ 8 bits	byts = b"ardvark" b0 = byts[0]
bytearray	fixed size mutable binary sequence of bytes	ba = bytearray(4) ba[0] = 1
tuple	Tuple is collection of heterogeneous types accessed by position	tup = (42, 3.14159, 'z'); third = tup[2];
range	sequence of natural numbers: r = range(start, stop, increment)	r = range(1, 10, 2) # result: 1, 3, 5, 7, 9
str	Immutable collection of Unicode characters. Several str methods return a new string object.	s = "hello python" s0 = s[0] # value = "h"
list	collection of items of any type. It is common practice to use a single type T for all elements, but that is not required.	l = [1, 2, 3, 2, 1] l0 = l[0]
dictionary	An associative collection of key-value pairs. Keys must all be of the same type, but values can be any arbitrary type.	d = { "zero": 0, "one": 1, "two":2 } d["three"] = 3 # inserts {"three":3} into d valz = d["zero"] # value is 0
Many additional types defined in Python packages	Collections, Random, TKinter, Requests, Numpy, Matplotlib, Flask, ...
-- User-defined Types --
User-defined types	Based on classes, these will be discussed in the next Bit.

Python Type System Attributes

Table 2. Python Type System Attributes

Dynamic typing	All Python types are dynamic. Dynamic types are evaluated at run-time and may be bound to any type of data. All data is held in the Python managed heap.
Type inference	Compiler infers all types from the declaring definition. There is a type linting facility which requires an external static analysis program. Type linting has no effect on run-time behavior.
Duck typing	All expressions are checked at run-time. Exceptions are thrown at load-time if there are any failures of syntax. An exception is thrown at run-time if an expression cannot be evaluated.
Generics	There are no generics in Python code since any type can be passed to a function or added to a collection. The Python interpreter checks the validity of operations on data and throws exceptions if expressions are invalid.

1.0 Initialization

Several of the code blocks shown below have formatting and output code elided. You can find complete code in the Bits Repository: Py_Data.py, Py_DataAnalysis.py.

1.1 Scalar Types

Initialization is the process of endowing a newly created instance with type and a specified value. All Python variables must be initialized at time of declaration. Otherwise they are undefined.

Note that variable types are defined by the data they refer to. You can use type hints in declarations, but the Python execution engine ignores them. Tools like mypy are available to help catch syntax errors before run-time, but they are not run automatically.

Type hints are illustrated in a later Bit Generic Python. They are useful for improving code readability in addition to supporting tool-based syntax analysis. I've used them to enhance user understancing in a few places in several of the Python Bits.

  #-------------------------------------------
    All code used for output has been elided
  #-------------------------------------------

  # NoneType
  n = None

  # boolean
  bl = True
  bl2 = bool()

  # integer
  i = 42
  i2 = -3000000000000000
  i3 = int()

  # float
  f = 3.1415927
  f2 = 3.33333e5
  f3 = 3.33333e55
  f4 = float()

  # complex
  c = 1.5 + 3.0j
  r = c.real
  i = c.imag
  c1 = complex()

Instances of scalar types each occupy one block of contiguous heap memory.

These scalars, types with a single value, are initialized by assigning a value.

Each of the scalar types can be initialized with a constructor, T(), where T is one of the scalar types.

Python integers are fundamentally different from integers for C++, Rust, and C#. They can be arbitrarily large, limited only by available memory.

A complete list of Python types are given in the "Python Types Details" dropdown list, above.

Output

  --------------------------------------  
    Initialize Scalars
  --------------------------------------  

  -- n = None --
  n <class 'NoneType'>
  value:  None , size:  16

  -- bl = true --
  bl <class 'bool'>
  value:  True , size:  28
  -- bl2 = bool() --
  bl2 <class 'bool'>
  value:  False , size:  24

  -- i = 42 --
  i <class 'int'>
  value:  42 , size:  28
  -- i = -3000000000000000 --
  i2 <class 'int'>
  value:  -3000000000000000 , size:  32
  -- i = int() --
  i3 <class 'int'>
  value: 0 , size:  24

  -- f = 3.1415927 --
  f <class 'float'>
  value:  3.1415927 , size:  24
  -- f2 = 333333.0 --
  f2 <class 'float'>
  value:  333333.0 , size:  24
  -- f3 = 3.33333e+55 --
  f3 <class 'float'>
  value:  3.33333e+55 , size:  24
  -- f4 = float() --
  f4 <class 'float'>
  value:  0.0 , size:  24

  -- c = 1.5 + 3.0j --
  c <class 'complex'>
  value:  1.5+3j , size:  32
  c.real <class 'float'>
  value:  1.5 , size:  24
  c.imag <class 'float'>
  value:  3.0 , size:  24
  -- c1 = complex() --
  c1 <class 'complex'>
  value:  0j , size:  32

Each type in the code block on the left is characterized by its value, its type evaluated using reflection with

type(t)

and its size retrieved using reflection in the function

sys.getsizeof(t)

That is defined in the AnalysisData.cs file, and shown in Section 1.5, below.

Scalar types are held in a single block of contiguous heap memory. Copies of all Python types, including scalars, are copies of the handle pointing to a value in heap memory. That results in two handles pointing to the same heap-based object.

The int type has arbitrarily large size, determined by its initialization value.

The float type has lower precision, but can approximate very large numbers using an exponent. Its contiguous memory allocation is partitioned into a value part and exponent part.

complex numbers are composed of two floating point numbers for the real and imaginary parts.

1.2 Aggregate Types

  # bytes
  byts = b"python"
  byts1 = byts[0]
  # byts[1] = 2  # error: bytes are immutable
  bytstr = byts.decode('UTF-8')

  # bytearray
  ba = bytearray(4)
  ba[0] = 1
  ba[1] = 0xee
  ba[2] = 0
  ba[3] = 1

  # str
  s = "hello python"

  # range
  r = range(6)
  for n in r :
    print(n, " ", end="")
    print()
  # 0 1 2 3 4 5 
  r2 = range(-1, 10, 2)
  for n in r2 :
    print(n, " ", end="")
    print(nl)
  # -1 1 3 5 7 9

  # tuple
  t = (42, 3.1415927, 'z')
  telem2 = t[1]

  # list
  l = [1, 2, 3, 2, 1]
  l.append("weird")
  l0 = l[0]

  # dict
  d = { "zero": 0, "one": 1, "two":2 }
  d["three"] = 3    # insert new element
  d2 = d["two"]     # access value at key = "two"

This code block illustrates initialization, access, and modification of selected aggregates.

Aggregates are instances of types with child members, e.g., bytes, str, tuple, list, dict ... An aggregate can accept an element of any type, independant of the type of other contained elements.

The number of elements of any Python collection instance is returned by the function len(coll).

bytes are an immutable, contiguous, indexable collection of binary data, initialized with a string which may contain hex characters.

A bytearray is a mutable, contiguous, indexable collection of binary data. Each byte is initialized and accessed by indexing.

Instances of the str type are immutable sequences of UTF-8 characters. Several member functions return strings, e.g., encode(), format(), lower(), ...

range(start, stop, increment) generates a sequence of natural numbers, often used to step through indexed collections with a for statement.

tuples are immutable finite collections of child elements, accessed by index.

lists are ordered collections of mutable elements. lists have methods append(x), copy(), insert(n, x), ...

dicts are associative dictionary collections of mutable elements. Element values are accessed by key, using an index syntax. They are based on hash tables and so have nearly constant-time look-up and insertion.

Output

  ----------------------------------------
  Initialize Aggregates
  ----------------------------------------

  -- byts = b"python" --
  byts <class 'bytes'>
  value:  b'python' , size:  39
  byts[0] <class 'int'>
  value:  112 , size:  28
  byts.decode(UTF-8) <class 'str'>
  value:  python , size:  55
  byts.decode(UTF-8)[0] <class 'str'>
  value:  p , size:  50

  -- ba = bytearray(4) --
  ba <class 'bytearray'>
  value:  bytearray(b'\x01\xee\x00\x01') , size:  61

  -- s = "hello python" --
  s <class 'str'>
  value:  hello python , size:  61
  empty str <class 'str'>
  value:   , size:  49
  1 char str <class 'str'>
  value:  h , size:  50
  2 char str <class 'str'>
  value:  he , size:  51
  s[0] <class 'str'>
  value:  h , size:  50

  -- r = range(6) --
  range(6) <class 'range'>
  value:  range(0, 6) , size:  48
  0  1  2  3  4  5
  range(-1, 10, 2) <class 'range'>
  value:  range(-1, 10, 2) , size:  48
  -1  1  3  5  7  9

  -- t = (42, 3.1415927, 'z') --
  t <class 'tuple'>
  value:  (42, 3.1415927, 'z') , size:  64
  t[1] <class 'float'>
  value:  3.1415927 , size:  24

  -- l = [1, 2, 3, 2, 1] --
  l <class 'list'>
  value:  [1, 2, 3, 2, 1] , size:  104
  -- l.append("weird") --
  l <class 'list'>
  value:  [1, 2, 3, 2, 1, 'weird'] , size:  104
  -- l0 = l[0] --
  l0 <class 'int'>
  value:  1 , size:  28

  -- d = { "zero": 0, "one": 1, "two":2 } --
  d <class 'dict'>
  value:  {'zero': 0, 'one': 1, 'two': 2}, 
  size:  232
  -- d["three"] = 3 --
  d <class 'dict'>
  value:  
  {'zero': 0, 'one': 1, 'two': 2, 'three': 3},
  size:  232
  -- d2 = d["two"] --
  d2 <class 'int'>
  value:  2 , size:  28

Each type in the code block on the left is characterized by its value, its type evaluated using reflection with

type(t)

and its size retrieved using reflection in the function

sys.getsizeof(t)

That is defined in the AnalysisData.cs file, and shown in Section 1.5, below.

bytes are immutable binary data used to read files, pack messages, ... They can't be changed after initialization.

bytearrays are fixed size collections of mutable binary data.

Instances of the str type hold immutable character data that, by default, uses UTF-8 encoding. An indexed element of a str is a str instance of length 1, i.e., there is no separate char type.

range(start, end, increment) is useful for stepping through parts of a collection. Its values are natural numbers, used as indexes to create a view of an indexable collection.

A tuple is a finite group of immutable values, often used for passing data to or from a function.

list is a mutable ordered indexable collection of values. It is used in Python (and in C#) where other languages use vectors, i.e., for processing collections of data built in real-time during program execution.

Instances of the dict type are associative dictionary containers that use Key-Value pairs to store data. Key is a look-up token that is hashed to a table address. That address is the root of a linked-list (bucket) of Key-Value pairs where all the keys hash to the same table address. To find a value, the bucket is walked to find the specified key.

1.3 Copy and Modify Scalar

  /*-- Copy and Modify scalar --*/

  t1 = 42  # t1 is handle pointing to value 42 in heap
  t2 = t1  # copy handle to existing heap-based value
  t1 = 0   # reset handle to new heap object

All Python data is stored in managed heap. Copies, even of scalars, result in two handles pointing to same managed object.

One of the handles, as shown here, can be programmatically reset to a new object.

Output

  ----------------------------------------
    Copy and Modify scalar
  ----------------------------------------  

  -- t1 = 42 --
  t1 <class 'int'>
  value:  42 , size:  28
  -- t2 = t1 --
  t2 <class 'int'>
  value:  42 , size:  28
  t1: address: 0x1dc78420610
  t2: address: 0x1dc78420610
  ----------------------------------------
  After copy construction t1 and t2 have
  same address, e.g., two handles pointing
  to the same heap integer object.
  ----------------------------------------
  -- t1 = 0  # new object --
  t1 <class 'int'>
  value:  0 , size:  24
  t1: address: 0x1dc784200d0
  t2 <class 'int'>
  value:  42 , size:  28
  t2: address: 0x1dc78420610
  ----------------------------------------
  After setting new value for t1,
  t1 and t2 have unique addresses,
  e.g., two handles pointing to different
  heap integer objects.
  ----------------------------------------

Each type in the code block on the left is characterized by its value, its type evaluated using reflection with

type(t)

and its size retrieved using reflection in the function

sys.getsizeof(t)

That is defined in the AnalysisData.cs file, and shown in Section 1.5, below.

The statement t2 = t1 constructs a new t2 handle from a copy of t1's handle.

Note that t1 and t2 have the same address, e.g., they are two handles pointing to the same underlying object.

The languages C++, Rust, and C# all make unique copies of scalar types. Python and JavaScript do not, as shown here.

Resetting a handle by assigning a new value creates a new object. That is, t1's value is reset, but that does not affect the t2 value.

1.4 Copy and Modify Aggregates

  /*-- copy and modify aggregates --*/

  t3 = [1, 2, 3, 2, 1]  # handle pointing to list in heap
  t4 = t3  # copy construction
  t3.append(0)  # changes value, doesn't create new object

  t5 = "Hello Python"  # handle pointing to string in heap
  t6 = t5   # copy construction
  t6 = t6.replace("P", "p")  # copy on write creates new object

For all aggregate types except str, modifying a value affects all variables that refer to that value.

Instances of the str type implement copy on write. Values of str are immutable. Member functions that modify the underlying value return a new modified string without affecting the original stored value.

Output

  ----------------------------------------
  copy and modify aggregate
  ----------------------------------------

  -- t3 = [1, 2, 3, 2, 1] --
  t3 <class 'list'>
  value:  [1, 2, 3, 2, 1] , size:  104
  -- t4 = t3 --
  t4 <class 'list'>
  value:  [1, 2, 3, 2, 1] , size:  104
  t3: address: 0x248ce392280
  t4: address: 0x248ce392280
  ----------------------------------------
  After copy construction t3 and t4 have
  same address, e.g., two handles pointing
  to the same heap integer object.
  ----------------------------------------
  -- t3.append(0)  # modify object --
  t3 <class 'list'>
  value:  [1, 2, 3, 2, 1, 0] , size:  104
  t3: address: 0x248ce392280
  t4 <class 'list'>
  value:  [1, 2, 3, 2, 1, 0] , size:  104
  t4: address: 0x248ce392280
  ----------------------------------------
  After appending new value for t3,
  t3 and t4 still have same value and
  address, e.g., two handles pointing
  to same heap integer object. No copy
  on write.
  ----------------------------------------

  -- t5 = "Hello Python" --
  t5 <class 'str'>
  value:  Hello Python , size:  61
  -- t6 = t5  # copy construction --
  t6 <class 'str'>
  value:  Hello Python , size:  61
  t5: address: 0x248ce3ac5f0
  t6: address: 0x248ce3ac5f0
  ----------------------------------------
  After copy construction t5 and t6 have
  same address, e.g., two handles pointing
  to the same heap string object.
  ----------------------------------------
  -- t6 = t6.replace("P", "p")  # copy on write --
  t6 <class 'str'>
  value:  Hello python , size:  61
  t5 <class 'str'>
  value:  Hello Python , size:  61
  t6: address: 0x248ce3ada70
  t5: address: 0x248ce3ac5f0
  ----------------------------------------
  After modifying value for t6,
  t5 and t6 have different values
  and addresses, e.g., string has
  copy on write.
  ----------------------------------------

t3 is an instance of the list type. Assignment creates a new handle, t4 to the same heap value.

Modification of the value referred to by t3, by appending the value 0, is also seen by the handle t4.

The same is true for all mutable aggregate types, e.g., bytearray, tuple, dict, ...

Immutable data types, e.g., bytes, str, ... support modification via methods that generate a new, modified, instance. That is subsequently assigned to a handle for later use.

1.5 functions

Python does not directly support generics so methods can be used for more than one type without a lot of code scaffolding as in C++, Rust, and C#.

That makes the code much simpler, but now the developer must insure valid method arguments without the help of compiler diagnostics.

For quick small prototypes, that works well. For large code bases, getting to valid argments requires careful implementation with a lot of trial and error.

Python makes a very good prototyping language, but static typing is much more effective when working with large projects.

Function Code

  /*-- Analysis and Display Functions --*/
  # Python/Py_Data::Py_DataAnalysis.py
  #
  import sys

  nl = "\n"

  # displays type, value, and size of apex object
  # - does not account for sizes of decendent objects
  def showType(t, nm, suffix = "") :
    print(nm, type(t))
    print(
      "value: ", t, ', size: ', sys.getsizeof(t), suffix  
    )

  # id is heap address
  def showIdent(t, nm, suffix = "") :
    print(nm, ": ", hex(id(t)), suffix, sep='')

  # evaluates heap address
  def showAddress(t, nm, suffix = "") :
    print(nm, "address: ", hex(id(t)), suffix, sep='')

  # show text encased in upper and lower lines
  def showNote(text, suffix = "") :
    print(
      "----------------------------------------"
    )
    print(" ", text)
    print(
      "----------------------------------------", suffix
    )

  # show text enclosed in -- delimiters on same line
  def showOp(text, suffix = ""):
    print("--", text, "--", suffix)

The function showType displays the caller name, type, value, and size of the first argument. This uses reflection for type with type(t) and for size with sys.getsizeof(t).

getsizeof evaluates the apex object from t's object graph, but not the size of its children.

The functions showIdent and showAddress are the same except for display of the string "address: " in the later. Python uses the addess of an object's heap location as an identifer.

Function showNote displays text with prefix and suffix lines to highlight its message.

Function showOp displays text between two "--" delimiters. It is often used to display an operation that generates data, like construction or assignment.

2.0 VS Code View

The code for this demo is available in github.com/JimFawcett/Bits. If you click on the Code dropdown you can clone the repository of all code for these demos to your local drive. Then, it is easy to bring up any example, in any of the languages, in VS Code.

Here, we do that for Python\Python_Data.

Figure 1. VS Code IDE - Python Data

Figure 2. Python Data Launch.JSON

3.0 References

Reference	Description
Python Data Types - w3schools	Interactive examples
Python Data Types - docs.python.org	Detailed description of Python Types
Character Encodings	Detailed summary of Python strs and encodings.
Python Libraries	Summaries of 14 popular libraries and modules.