Basic Bites: Threads

Creation, lifecycle, synchronization, thread pools

A thread is the unit of execution that the operating system schedules on a processor core. Every process starts with one thread — the primary thread — and may create additional threads to perform work concurrently. All threads within a process share its address space, open file handles, and other resources, but each thread has its own stack, register set, and program counter. This shared-memory model enables efficient communication between threads but introduces the possibility of data races and requires explicit synchronization.

1.  Thread Basics

Threads vs. processes. Creating a new process duplicates the parent's page tables, file descriptor table, and kernel bookkeeping — an expensive operation. Creating a thread within an existing process is much cheaper: it allocates a new stack (typically 1–8 MB reserved, a few pages committed) and a kernel thread object, but shares everything else. This is why servers favor thread pools or async I/O over spawning a new process per request. The thread stack. Each thread gets its own call stack. Local variables, function parameters, and return addresses for that thread's call chain all live there. Stack size is fixed at thread creation; overflowing it causes a stack overflow exception or a segmentation fault. Heap memory, global variables, and static data are shared among all threads in the process. Hardware threads and logical processors. A physical core may expose two hardware threads via simultaneous multithreading (SMT, marketed as Hyper-Threading on Intel). The OS scheduler sees each hardware thread as a logical processor. Software threads are multiplexed onto logical processors by the scheduler; a machine with 8 logical processors can run 8 threads truly in parallel at any instant.

2.  Thread Lifecycle

A thread moves through a sequence of states from creation to termination: Join and detach. A thread that was created must be either joined or detached before the owning object is destroyed. Joining blocks the calling thread until the target terminates and reclaims its resources. Detaching lets the thread run independently; its resources are reclaimed automatically on termination, but the caller cannot observe its result. Destroying a joinable thread handle without joining or detaching is undefined behavior in C++ and a panic in Rust.

3.  Synchronization

Shared mutable state is the root cause of most threading bugs. Two threads reading the same memory location concurrently is safe; any concurrent access where at least one thread writes is a data race and produces undefined behavior in C++ and a compile error in Rust. Synchronization primitives serialize access so that only one thread at a time operates on shared data. Common primitives:
Primitive Purpose
Mutex Mutual exclusion lock: only one thread may hold it at a time. Other threads block on lock() until the holder calls unlock().
Recursive mutex Like a mutex but the same thread may acquire it multiple times without deadlocking; must release it the same number of times.
Read/write mutex Allows many concurrent readers or exactly one writer. Improves throughput when reads heavily outnumber writes.
Semaphore A counter that allows up to N threads to proceed simultaneously. Used to limit concurrency (e.g., a pool of N database connections).
Condition variable Lets a thread sleep until a predicate becomes true. Always used with a mutex: the thread atomically releases the mutex and sleeps; it reacquires the mutex before returning from wait().
Spinlock Busy-waits in a loop rather than yielding to the scheduler. Efficient only when the wait is expected to be very short (microseconds); wastes CPU on longer waits.
Atomic operation Hardware-guaranteed indivisible read-modify-write on a single word. Lock-free; used for flags, counters, and reference counts.
Barrier / latch Blocks a group of threads until all have reached the barrier, then releases them together. Useful for phased parallel algorithms.
Deadlock. Deadlock occurs when two or more threads each hold a resource the other needs and none can proceed. The four necessary conditions are: Breaking any one condition prevents deadlock. The most practical strategies are consistent lock ordering (always acquire mutexes in the same global order) and timed lock attempts with backoff. Memory ordering. Modern CPUs and compilers reorder instructions for performance. Atomic operations carry a memory order parameter (seq_cst, acquire, release, relaxed) that constrains reordering. Incorrect memory ordering produces subtle, hardware-dependent bugs that only appear on multi-core systems under specific timing conditions.

4.  Thread Pools

Creating and destroying OS threads for every unit of work is expensive: each thread requires stack allocation, a kernel object, and scheduler registration. A thread pool pre-creates a fixed number of worker threads that pull tasks from a shared work queue, amortizing creation cost over many tasks. Work queue. Tasks (closures, function pointers, or futures) are enqueued by producers. Worker threads dequeue and execute them. The queue is protected by a mutex or is a lock-free structure; a condition variable wakes idle workers when work arrives. Work stealing. Each worker maintains its own local deque of tasks. When a worker's queue is empty it steals tasks from the back of another worker's deque. This reduces contention on a central queue and improves cache locality. Rust's tokio and the .NET ThreadPool both use work-stealing schedulers. Sizing the pool. CPU-bound work typically uses one thread per logical processor. I/O-bound work can use more threads because most are blocked waiting at any given time; the optimal count depends on the I/O latency and throughput requirements. Oversizing the pool wastes memory (each thread has a stack) and increases scheduler overhead.

5.  Language Support

Language Thread type Key synchronization and safety
Rust std::thread::spawn; returns a JoinHandle<T> Mutex<T>, RwLock<T>, Condvar, Arc<T> for shared ownership; Send and Sync traits enforce data-race freedom at compile time — sharing non-Send types across threads is a compile error
C++ std::thread (C++11), std::jthread (C++20, auto-joins) std::mutex, std::shared_mutex, std::condition_variable, std::atomic<T>; no compile-time data-race prevention — correctness is the programmer's responsibility
C# System.Threading.Thread; ThreadPool; Task (preferred) lock statement (Monitor), Mutex, SemaphoreSlim, ReaderWriterLockSlim, Interlocked for atomics; no compile-time race detection
Python threading.Thread threading.Lock, RLock, Condition, Semaphore; the GIL serializes bytecode execution in CPython, preventing true parallel CPU work on multiple threads — use multiprocessing or concurrent.futures.ProcessPoolExecutor for CPU-bound parallelism
For Rust, you will find more details with examples in ../Rust/RustBites_Threads.html. Eventually details with examples will arrive for C++, C#, and Python.

6.  Repository Support

Language Repositories Interface
Rust RustThreadPool
RustBlockingQueue
ThreadPool<M>: new(nt, f), post_message(), get_message(), wait(), shut_down()

BlockingQueue<T>: new(), en_q(), de_q(), len()
C++ ThreadPool
CppBlockingQueue
ThreadPool<W,N>: N threads dequeue and execute callable workitems W; Task wraps a static pool instance for fire-and-forget use

BlockingQueue<T>: enQ(), deQ() (blocks on empty); std::mutex + std::condition_variable internals
C# CsBlockingQueue BlockingQueue<T>: blocks dequeuer when empty; Monitor (condition variable + lock) internals; moveable, not copyable

ThreadPool: System.Threading.ThreadPool (built-in)
Python none yet

7.  Consequences

Threads enable a program to use multiple processor cores and to overlap I/O latency with computation, but they introduce failure modes that do not exist in single-threaded code: Where possible, prefer designs that minimize shared mutable state: immutable data needs no synchronization, and message passing (channels, queues) confines mutation to one owner at a time. Rust enforces this structurally; in other languages it is a discipline.

8.  References

  1. C++ thread support library – cppreference
  2. Fearless Concurrency – The Rust Programming Language
  3. Managed Threading Basics – Microsoft Docs
  4. threading – Python 3 Docs
  5. Understanding the Windows Threading Model