Basics Story

Chapter #11 – Concurrency

threads, shared state, synchronisation, message passing, async/await

11.0  Prologue

Modern hardware has many cores; modern software handles many simultaneous requests. Concurrency is the practice of managing multiple tasks that can overlap in time. It is also one of the hardest areas of software to reason about correctly, because the interactions between concurrent tasks depend on timing that varies at runtime.

11.1  Parallelism vs. Concurrency

These terms are related but distinct:
  • Concurrency means tasks can start, run, and complete in overlapping time periods. On a single core this is achieved by interleaving: the OS rapidly switches between tasks, giving the appearance of simultaneous progress.
  • Parallelism means tasks literally run at the same instant on different CPU cores or different machines. Parallelism requires multiple execution units.
A web server is concurrent (handles many requests, each waiting for I/O) even on one core. Matrix multiplication is parallel (uses all cores for pure computation). Many real systems are both: concurrent for managing I/O, parallel for compute.

11.2  OS Threads

OS threads are the OS-managed unit of execution. They share the process address space and are scheduled by the OS kernel onto available CPU cores. Creating a thread costs ~1–50 µs; switching between threads costs ~1–5 µs (a context switch). A typical server can run hundreds to a few thousand threads before overhead becomes significant. Language threading APIs:
  • C++: std::thread
  • Java: Thread, ExecutorService
  • Python: threading.Thread (limited by the GIL for CPU-bound work)
  • Rust: std::thread::spawn
  • Go: goroutines (lightweight; runtime-scheduled, not OS threads)
Green threads / goroutines / fibers: lightweight user-space threads managed by the language runtime rather than the OS. Go goroutines start at ~2 KB of stack and can number in the millions on one machine.

11.3  Shared State and Data Races

A data race occurs when:
  1. Two or more threads access the same memory location concurrently, and
  2. at least one access is a write, and
  3. there is no synchronisation between the threads.
Data races produce undefined behaviour: wrong results, crashes, security vulnerabilities, or intermittent failures that are nearly impossible to reproduce. They are the number-one source of bugs in concurrent code. Consider a shared integer counter incremented by two threads: // Thread A Thread B read counter (= 5) read counter (= 5) add 1 (= 6) add 1 (= 6) write counter (= 6) write counter (= 6) // Expected final value: 7. Actual: 6. Lost update.

11.4  Synchronisation Primitives

Synchronisation ensures that concurrent accesses to shared state are ordered correctly:
  • Mutex (mutual exclusion lock): only one thread may hold the lock at a time. Other threads block until it is released. Eliminates data races on the protected data at the cost of contention overhead and potential deadlock.
  • Read-write lock (RWLock): allows multiple concurrent readers or one exclusive writer. Better throughput for read-heavy workloads.
  • Semaphore: a counter; threads may proceed only if the counter is positive, decrementing it on entry and incrementing on exit. Limits concurrent access to a resource pool (e.g., a connection pool of N database connections).
  • Condition variable: paired with a mutex; lets a thread sleep until another thread signals that a condition holds (e.g., the queue is non-empty).
  • Atomic operations: single indivisible read-modify-write (e.g., compare-and-swap, fetch-add). No locking overhead; the CPU guarantees atomicity. Suitable for simple shared counters and lock-free data structures.

11.5  Message Passing

Instead of sharing memory with locks, threads or processes can communicate by passing messages through channels. The sender writes a message; the receiver reads it. Data ownership transfers with the message — no shared mutable state, no data races. Go’s motto captures the intent: “Do not communicate by sharing memory; instead, share memory by communicating.” Message-passing APIs:
  • Go: built-in chan type; ch <- value to send, value := <-ch to receive.
  • Rust: std::sync::mpsc::channel() (multi-producer, single-consumer); async channels in Tokio.
  • Erlang / Elixir: actor model; every actor is a process; send and receive are the only communication primitives.
  • Python: multiprocessing.Queue, asyncio.Queue.

11.6  Async/Await and Event Loops

For I/O-bound work, spawning an OS thread per request is wasteful: each blocked thread still holds ~1–8 MB of stack and a kernel-side control block while waiting for the network. An event loop solves this with one thread that services many pending I/O operations concurrently. The pattern:
  1. An async function runs until it hits an await point (e.g., waiting for a database response).
  2. The runtime suspends the function (saving its state in a tiny heap object) and switches to another ready task.
  3. When the I/O completes, the runtime marks the suspended function ready and resumes it.
This cooperative multitasking allows thousands of concurrent I/O operations on one thread with very low memory overhead. CPU-bound work must be offloaded to a thread pool to avoid blocking the event loop. Async runtimes: Node.js (JavaScript), asyncio (Python), Tokio (Rust), async/await in C# (TPL), Project Loom virtual threads (Java 21+).

11.7  Common Concurrency Bugs

  • Deadlock: thread A holds lock X and waits for lock Y; thread B holds lock Y and waits for lock X. Both block forever. Prevention: always acquire locks in the same order, or use a lock hierarchy.
  • Livelock: threads keep retrying and yielding but make no progress, like two people stepping aside for each other in a corridor.
  • Starvation: a thread is perpetually denied access to a resource because higher-priority threads always win.
  • Priority inversion: a high-priority thread waits on a lock held by a low-priority thread. The low-priority thread may itself be preempted by medium-priority threads, delaying the high-priority one indefinitely. Mitigated by priority inheritance in real-time systems.
  • False sharing: two threads modify different variables that happen to occupy the same CPU cache line. Each write invalidates the other core’s cached copy, causing unnecessary cache-coherence traffic. Mitigated by padding or aligning data to cache-line boundaries.

11.8  Epilogue

Concurrency is unavoidable in modern software. The key insight is that most problems stem from mutable shared state. Minimise sharing, prefer message passing or immutable data, use atomic operations for simple counters, and reach for mutexes only when necessary. The final chapter covers deployment: how to take working software and make it available to users reliably.

11.9  References

OSTEP — Concurrency Introduction (PDF chapter)
Effective Go — Concurrency
Tokio — async Rust runtime
Deadlock — Wikipedia