BasicsDecCodeStory_Persistence.html
copyright © James Fawcett
Revised: 05/11/2026
10.0 Prologue
Declarative programming favors immutable data: once created, a value never changes.
But programs must represent change - adding an item to a set, updating a field
in a record. Persistent data structures solve this by returning a
new version of the structure on update while sharing as much of the old structure
as possible. The old version remains accessible and unchanged.
10.1 Persistent Lists, Trees, and Maps
A persistent data structure guarantees that all previous versions
are preserved after an “update.” The update returns a new root pointing
to the modified path; all unmodified nodes are shared.
-- Haskell: singly-linked list is naturally persistent
list1 = [3, 4, 5]
list2 = 1 : 2 : list1 -- [1,2,3,4,5]
-- list1 and list2 share the [3,4,5] tail; no copying
-- Persistent balanced tree (Haskell Data.Map is an AVL tree)
import qualified Data.Map.Strict as Map
m1 = Map.fromList [("a", 1), ("b", 2)]
m2 = Map.insert "c" 3 m1 -- new map; m1 unchanged
-- m2 shares most of m1's internal tree structure
-- Clojure: persistent vector (wide branching factor trie)
(def v1 [1 2 3 4])
(def v2 (conj v1 5)) ; v2 = [1 2 3 4 5]; v1 unchanged
Haskell’s standard library uses persistent data structures throughout:
Data.Map, Data.Set, and Data.Sequence are
all persistent. Clojure builds all of its core collections on persistent tries.
10.2 Structural Sharing
Structural sharing is the mechanism that makes persistence efficient.
An update allocates only the nodes along the modified path; all other nodes are
shared (referenced) by both the old and new versions.
-- Updating a balanced tree of depth d requires O(log n) new nodes;
-- O(n) nodes from the original tree are shared
-- Singly-linked list: prepend is O(1), sharing the entire original
original = [3, 4, 5] -- 3 nodes
extended = 1 : 2 : original -- 2 new nodes + shares the 3 original nodes
-- Persistent vector (Clojure-style trie with branching factor 32):
-- update of one element at index i requires ~log_32(n) new nodes
-- a 1-million element vector requires only ~4 new nodes per update
Because values are immutable, sharing is safe: no thread can modify a shared node,
so two data structures referencing the same node always see the same data. This makes
persistent structures naturally thread-safe.
10.3 Transient Optimization
Building a persistent data structure by repeated single-element insertions requires
O(n log n) allocations. A transient mode allows in-place mutation
during construction, then converts to an immutable persistent structure when done.
-- Clojure: transient for bulk construction
(defn build-map [pairs]
(persistent!
(reduce (fn [m [k v]] (assoc! m k v))
(transient {})
pairs)))
-- Haskell: Data.Map.fromList uses an efficient bulk-build algorithm
-- internally (not transient, but avoids O(n log n) individual inserts)
m = Map.fromList [(k, v) | ...]
-- Rust: HashMap::with_capacity pre-allocates to avoid rehashing
let mut map = HashMap::with_capacity(1000);
for (k, v) in pairs { map.insert(k, v); }
let frozen = map; // owned; no further mutation needed
Transient construction does not violate the persistence guarantee: the transient
version is not accessible outside the construction scope, so no code observes
the intermediate mutable state.
10.4 Copy-on-Write
Copy-on-write (CoW) is a simpler sharing strategy: multiple owners
share a single copy until any of them tries to mutate it; at that point, the mutating
owner receives a private copy.
-- Rust: Cow<'a, B> (Clone-on-write smart pointer)
use std::borrow::Cow;
fn process(input: &str) -> Cow<str> {
if input.contains(" ") {
// Only allocate a new String if we actually need to modify
Cow::Owned(input.replace(" ", " "))
} else {
// Zero-copy path: borrow the original
Cow::Borrowed(input)
}
}
-- Python: copy-on-write semantics for integers and strings (interning)
a = "hello"
b = a # b and a reference the same object
b = b + "!" # new string created; a is unchanged
CoW is used in operating systems (fork copies page tables, not memory pages), in
Rust’s Cow type for avoiding allocations on read-only paths, and
in string interning. It is simpler than full structural sharing but provides no
O(log n) update guarantee.
10.5 Epilogue
Persistent data structures reconcile immutability with efficiency. Structural sharing
bounds the cost of “updates” to the changed path; transient modes handle
bulk construction; CoW handles the read-mostly case. Together they make
immutable-by-default programming practical for production workloads. The next chapter
examines how declarative approaches tame concurrent computation.
10.6 References
Persistent Data Structure - Wikipedia
Understanding Clojure’s Persistent Vector - Hypirion
Haskell Data.Map.Strict - Hackage
Rust Cow - std