Basics Story

Chapter #10 – Data Structures & Algorithms

arrays, lists, stacks, queues, maps, trees, graphs, Big-O

10.0  Prologue

Data structures organise information so it can be efficiently stored and retrieved. Algorithms are the procedures that operate on them. Choosing the right data structure is often the single largest factor in program performance: a hash map lookup is O(1) while a linear scan through an unsorted array is O(n) — a difference of millions of operations at scale.

10.1  Arrays

An array is a contiguous block of same-type elements stored sequentially in memory. It is the foundation of nearly every other data structure.
  • Random access: O(1) — element i is at address base + i × element_size.
  • Append (dynamic array): amortised O(1) — when the backing buffer is full, double it and copy.
  • Insert/delete in middle: O(n) — must shift subsequent elements.
  • Cache-friendly: elements are adjacent; accessing them sequentially maximises cache line utilisation.
Language names: Vec<T> (Rust), list (Python), ArrayList<T> (Java), std::vector<T> (C++), Array<T> (C#).

10.2  Linked Lists

A linked list is a chain of nodes where each node holds a value and a pointer to the next node. A doubly-linked list also holds a pointer to the previous node.
  • Insert/delete at known node: O(1) — just rewire pointers.
  • Search / access by index: O(n) — must traverse from the head.
  • Cache-unfriendly: nodes are scattered in the heap; accessing consecutive nodes often causes cache misses.
Use linked lists when you need frequent insertion or deletion in the middle and never need random access by index. In practice, arrays are almost always faster for real workloads due to cache effects, even when a linked list would have better theoretical complexity.

10.3  Stacks and Queues

Stack (LIFO — Last In, First Out):
  • Operations: push (add to top), pop (remove from top), peek (view top) — all O(1).
  • Uses: function call frames, undo/redo history, expression evaluation, DFS graph traversal.
Queue (FIFO — First In, First Out):
  • Operations: enqueue (add to back), dequeue (remove from front) — O(1) with a circular buffer or deque.
  • Uses: task scheduling, BFS graph traversal, message passing between threads.
A deque (double-ended queue) supports O(1) insert and remove at both ends. Python’s collections.deque, Rust’s VecDeque<T>.

10.4  Hash Maps

A hash map (dictionary, associative array) maps arbitrary keys to values using a hash function. The hash function converts a key to an integer bucket index.
  • Lookup, insert, delete: average O(1); worst case O(n) if all keys hash to the same bucket (collisions).
  • Unordered: iteration order is not guaranteed to match insertion order (though Python dicts and Java LinkedHashMap preserve it).
  • Load factor: the ratio of stored entries to buckets. When it exceeds a threshold (typically 0.75), the map is rehashed into a larger array, an O(n) operation that happens infrequently.
Hash maps are used for: symbol tables, caches, frequency counting, de-duplication, and any mapping from one type to another. They are the most widely used non-array data structure in practice.

10.5  Trees

A tree is a hierarchical data structure where each node has zero or more children and exactly one parent (except the root). Important variants:
  • Binary Search Tree (BST): each node’s left subtree has smaller keys, right has larger. O(log n) search, insert, and delete if balanced; O(n) worst case for a degenerate (sorted-input) tree.
  • Self-balancing BST (Red-Black, AVL): guarantees O(log n) by rebalancing after inserts and deletes. Used in most standard library sorted maps (std::map in C++, TreeMap in Java, BTreeMap in Rust).
  • B-Tree: wide branching factor; each node holds many keys. Minimises disk reads, so used in databases (PostgreSQL, SQLite) and file systems (NTFS, HFS+).
  • Binary Heap: a complete binary tree where each parent is ≥ its children (max-heap). O(1) maximum access, O(log n) insert and remove-max. Used to implement priority queues and heap sort.
  • Trie (prefix tree): each edge represents one character of a string. O(m) search (m = string length) regardless of the number of stored strings. Used in autocomplete and IP routing tables.

10.6  Graphs

A graph is a set of nodes (vertices) connected by edges. Directed or undirected; weighted or unweighted.
  • Adjacency list: each node stores a list of its neighbours. O(V + E) space (V = vertices, E = edges). Efficient for sparse graphs.
  • Adjacency matrix: a V×V boolean (or weight) matrix. O(V²) space; O(1) edge lookup. Efficient for dense graphs.
Essential graph algorithms:
  • BFS (Breadth-First Search): visits all nodes at distance k before nodes at distance k+1. Finds shortest paths in unweighted graphs. O(V + E).
  • DFS (Depth-First Search): explores as deep as possible before backtracking. Used for cycle detection, topological sort, connected components. O(V + E).
  • Dijkstra’s algorithm: shortest path in a weighted graph (non-negative weights). O((V + E) log V) with a priority queue.
  • Topological sort: linear ordering of vertices in a DAG such that all edges go forward. Used by build systems, package managers, and task schedulers.

10.7  Algorithmic Complexity (Big-O)

Big-O notation describes how an algorithm’s resource use (time or space) grows as the input size n grows, ignoring constant factors.
Class Name Example n=1 000 000
O(1) Constant Array index, hash map lookup 1 op
O(log n) Logarithmic Binary search, balanced BST ~20 ops
O(n) Linear Linear scan, BFS/DFS 1 M ops
O(n log n) Linearithmic Merge sort, heap sort ~20 M ops
O(n²) Quadratic Bubble sort, naive string match 10¹² ops
O(2⊃n) Exponential Naive subset enumeration vastly infeasible
Rule of thumb: O(n²) is fine for n < 1 000 in a non-critical path; O(n log n) is the typical target for sorting; O(n) or O(n log n) for everything in a hot loop.

10.8  Epilogue

Data structures and algorithms are the vocabulary of efficient software. A working knowledge of the structures above — when to use each and what their complexity guarantees are — is the foundation for reading and writing performant code in any language. The next chapter addresses concurrency: running multiple tasks at once.

10.9  References

VisuAlgo — interactive data structure animations
GeeksforGeeks — Data Structures
Big O Notation — Wikipedia
LeetCode — practice problems