Dec Code Story

Chapter #10 – Immutability & Persistent Data Structures

persistent lists/trees/maps, structural sharing, transient optimization, copy-on-write

10.0  Prologue

Declarative programming favors immutable data: once created, a value never changes. But programs must represent change - adding an item to a set, updating a field in a record. Persistent data structures solve this by returning a new version of the structure on update while sharing as much of the old structure as possible. The old version remains accessible and unchanged.

10.1  Persistent Lists, Trees, and Maps

A persistent data structure guarantees that all previous versions are preserved after an “update.” The update returns a new root pointing to the modified path; all unmodified nodes are shared. -- Haskell: singly-linked list is naturally persistent list1 = [3, 4, 5] list2 = 1 : 2 : list1 -- [1,2,3,4,5] -- list1 and list2 share the [3,4,5] tail; no copying -- Persistent balanced tree (Haskell Data.Map is an AVL tree) import qualified Data.Map.Strict as Map m1 = Map.fromList [("a", 1), ("b", 2)] m2 = Map.insert "c" 3 m1 -- new map; m1 unchanged -- m2 shares most of m1's internal tree structure -- Clojure: persistent vector (wide branching factor trie) (def v1 [1 2 3 4]) (def v2 (conj v1 5)) ; v2 = [1 2 3 4 5]; v1 unchanged Haskell’s standard library uses persistent data structures throughout: Data.Map, Data.Set, and Data.Sequence are all persistent. Clojure builds all of its core collections on persistent tries.

10.2  Structural Sharing

Structural sharing is the mechanism that makes persistence efficient. An update allocates only the nodes along the modified path; all other nodes are shared (referenced) by both the old and new versions. -- Updating a balanced tree of depth d requires O(log n) new nodes; -- O(n) nodes from the original tree are shared -- Singly-linked list: prepend is O(1), sharing the entire original original = [3, 4, 5] -- 3 nodes extended = 1 : 2 : original -- 2 new nodes + shares the 3 original nodes -- Persistent vector (Clojure-style trie with branching factor 32): -- update of one element at index i requires ~log_32(n) new nodes -- a 1-million element vector requires only ~4 new nodes per update Because values are immutable, sharing is safe: no thread can modify a shared node, so two data structures referencing the same node always see the same data. This makes persistent structures naturally thread-safe.

10.3  Transient Optimization

Building a persistent data structure by repeated single-element insertions requires O(n log n) allocations. A transient mode allows in-place mutation during construction, then converts to an immutable persistent structure when done. -- Clojure: transient for bulk construction (defn build-map [pairs] (persistent! (reduce (fn [m [k v]] (assoc! m k v)) (transient {}) pairs))) -- Haskell: Data.Map.fromList uses an efficient bulk-build algorithm -- internally (not transient, but avoids O(n log n) individual inserts) m = Map.fromList [(k, v) | ...] -- Rust: HashMap::with_capacity pre-allocates to avoid rehashing let mut map = HashMap::with_capacity(1000); for (k, v) in pairs { map.insert(k, v); } let frozen = map; // owned; no further mutation needed Transient construction does not violate the persistence guarantee: the transient version is not accessible outside the construction scope, so no code observes the intermediate mutable state.

10.4  Copy-on-Write

Copy-on-write (CoW) is a simpler sharing strategy: multiple owners share a single copy until any of them tries to mutate it; at that point, the mutating owner receives a private copy. -- Rust: Cow<'a, B> (Clone-on-write smart pointer) use std::borrow::Cow; fn process(input: &str) -> Cow<str> { if input.contains(" ") { // Only allocate a new String if we actually need to modify Cow::Owned(input.replace(" ", " ")) } else { // Zero-copy path: borrow the original Cow::Borrowed(input) } } -- Python: copy-on-write semantics for integers and strings (interning) a = "hello" b = a # b and a reference the same object b = b + "!" # new string created; a is unchanged CoW is used in operating systems (fork copies page tables, not memory pages), in Rust’s Cow type for avoiding allocations on read-only paths, and in string interning. It is simpler than full structural sharing but provides no O(log n) update guarantee.

10.5  Epilogue

Persistent data structures reconcile immutability with efficiency. Structural sharing bounds the cost of “updates” to the changed path; transient modes handle bulk construction; CoW handles the read-mostly case. Together they make immutable-by-default programming practical for production workloads. The next chapter examines how declarative approaches tame concurrent computation.

10.6  References

Persistent Data Structure - Wikipedia
Understanding Clojure’s Persistent Vector - Hypirion
Haskell Data.Map.Strict - Hackage
Rust Cow - std