BasicsStory_Concurrency.html
copyright © James Fawcett
Revised: 04/29/2026
11.0 Prologue
Modern hardware has many cores; modern software handles many simultaneous requests.
Concurrency is the practice of managing multiple tasks that can overlap in time.
It is also one of the hardest areas of software to reason about correctly, because
the interactions between concurrent tasks depend on timing that varies at runtime.
11.1 Parallelism vs. Concurrency
These terms are related but distinct:
-
Concurrency means tasks can start, run, and complete in
overlapping time periods. On a single core this is achieved by interleaving:
the OS rapidly switches between tasks, giving the appearance of simultaneous
progress.
-
Parallelism means tasks literally run at the same instant
on different CPU cores or different machines. Parallelism requires multiple
execution units.
A web server is concurrent (handles many requests, each waiting for I/O) even on
one core. Matrix multiplication is parallel (uses all cores for pure computation).
Many real systems are both: concurrent for managing I/O, parallel for compute.
11.2 OS Threads
OS threads are the OS-managed unit of execution. They share the process address
space and are scheduled by the OS kernel onto available CPU cores. Creating a thread
costs ~1–50 µs; switching between threads costs ~1–5 µs (a
context switch). A typical server can run hundreds to a few thousand threads before
overhead becomes significant.
Language threading APIs:
- C++: std::thread
- Java: Thread, ExecutorService
- Python: threading.Thread (limited by the GIL for CPU-bound work)
- Rust: std::thread::spawn
- Go: goroutines (lightweight; runtime-scheduled, not OS threads)
Green threads / goroutines / fibers: lightweight user-space threads
managed by the language runtime rather than the OS. Go goroutines start at ~2 KB of
stack and can number in the millions on one machine.
11.3 Shared State and Data Races
A data race occurs when:
- Two or more threads access the same memory location concurrently, and
- at least one access is a write, and
- there is no synchronisation between the threads.
Data races produce undefined behaviour: wrong results, crashes, security
vulnerabilities, or intermittent failures that are nearly impossible to reproduce.
They are the number-one source of bugs in concurrent code.
Consider a shared integer counter incremented by two threads:
// Thread A Thread B
read counter (= 5) read counter (= 5)
add 1 (= 6) add 1 (= 6)
write counter (= 6) write counter (= 6)
// Expected final value: 7. Actual: 6. Lost update.
11.4 Synchronisation Primitives
Synchronisation ensures that concurrent accesses to shared state are ordered correctly:
-
Mutex (mutual exclusion lock): only one thread may hold the lock
at a time. Other threads block until it is released. Eliminates data races on the
protected data at the cost of contention overhead and potential deadlock.
-
Read-write lock (RWLock): allows multiple concurrent readers
or one exclusive writer. Better throughput for read-heavy workloads.
-
Semaphore: a counter; threads may proceed only if the counter
is positive, decrementing it on entry and incrementing on exit. Limits concurrent
access to a resource pool (e.g., a connection pool of N database connections).
-
Condition variable: paired with a mutex; lets a thread sleep
until another thread signals that a condition holds (e.g., the queue is non-empty).
-
Atomic operations: single indivisible read-modify-write (e.g.,
compare-and-swap, fetch-add). No locking overhead; the CPU guarantees atomicity.
Suitable for simple shared counters and lock-free data structures.
11.5 Message Passing
Instead of sharing memory with locks, threads or processes can communicate by
passing messages through channels. The sender writes a message; the receiver reads
it. Data ownership transfers with the message — no shared mutable state,
no data races.
Go’s motto captures the intent:
“Do not communicate by sharing memory; instead, share memory by
communicating.”
Message-passing APIs:
- Go: built-in chan type; ch <- value
to send, value := <-ch to receive.
- Rust: std::sync::mpsc::channel() (multi-producer,
single-consumer); async channels in Tokio.
- Erlang / Elixir: actor model; every actor is a process;
send and receive are the only communication primitives.
- Python: multiprocessing.Queue,
asyncio.Queue.
11.6 Async/Await and Event Loops
For I/O-bound work, spawning an OS thread per request is wasteful: each blocked
thread still holds ~1–8 MB of stack and a kernel-side control block while
waiting for the network. An event loop solves this with one
thread that services many pending I/O operations concurrently.
The pattern:
- An async function runs until it hits an await point (e.g.,
waiting for a database response).
- The runtime suspends the function (saving its state in a tiny heap object)
and switches to another ready task.
- When the I/O completes, the runtime marks the suspended function ready and
resumes it.
This cooperative multitasking allows thousands of concurrent I/O operations on
one thread with very low memory overhead. CPU-bound work must be offloaded to a
thread pool to avoid blocking the event loop.
Async runtimes: Node.js (JavaScript), asyncio (Python), Tokio (Rust),
async/await in C# (TPL), Project Loom virtual threads (Java 21+).
11.7 Common Concurrency Bugs
-
Deadlock: thread A holds lock X and waits for lock Y;
thread B holds lock Y and waits for lock X. Both block forever. Prevention:
always acquire locks in the same order, or use a lock hierarchy.
-
Livelock: threads keep retrying and yielding but make no
progress, like two people stepping aside for each other in a corridor.
-
Starvation: a thread is perpetually denied access to a
resource because higher-priority threads always win.
-
Priority inversion: a high-priority thread waits on a lock
held by a low-priority thread. The low-priority thread may itself be preempted
by medium-priority threads, delaying the high-priority one indefinitely.
Mitigated by priority inheritance in real-time systems.
-
False sharing: two threads modify different variables that
happen to occupy the same CPU cache line. Each write invalidates the other
core’s cached copy, causing unnecessary cache-coherence traffic.
Mitigated by padding or aligning data to cache-line boundaries.
11.8 Epilogue
Concurrency is unavoidable in modern software. The key insight is that most problems
stem from mutable shared state. Minimise sharing, prefer message passing or immutable
data, use atomic operations for simple counters, and reach for mutexes only when
necessary. The final chapter covers deployment: how to take working software and
make it available to users reliably.
11.9 References
OSTEP — Concurrency Introduction (PDF chapter)
Effective Go — Concurrency
Tokio — async Rust runtime
Deadlock — Wikipedia