Basic Bites: I/O

Synchronous, asynchronous, buffered, unbuffered

I/O operations send or receive data using terminals, files, in-memory strings, sockets, Graphical User Interfaces (GUIs), and devices like keyboards, mice, trackpads, and speakers. This involves one of two rather distinct types of processing: synchronous or asynchronous.
  • Synchronous I/O uses functions that don"t return until the operation completes. Programming language libraries provide read and write functions for that purpose.
  • Asynchronous I/O uses functions that supply a call back function and return immediately. The callback is invoked when the operation completes. Programming language libraries provide async read and async write functions for that purpose.
Synchronous operations interact with the platform API to translate to and from an external environment. Asynchronous operations use either I/O completion ports or Windows messages to send and receive data, using the platform API. API calls are routed to device drivers appropriate for a specified device.
Figure 1. Windows I/O completion ports

1.  I/O Completion Ports

I/O completion ports provide a mechanism for processing multiple asynchronous I/O requests. When sending a request the requesting thread returns without waiting for the I/O process to complete. Asynchronous read or write requests using library methods may use I/O completion ports internally. When an asynchronous I/O request is made, an I/O Request (IOR) object is created and sent to the completion port. The IOR holds a file handle for the specified destination and may also hold a callback reference. Each IOR enqueues an I/O Completion Package (IOCPkg) that gets serviced by a thread pool thread which is responsible for sending the request to an appropriate device driver and await its completion. If a callback is registered it will invoke the callback on completion. IOCP as a kernel object. An I/O Completion Port is a Windows kernel object created with CreateIoCompletionPort. The same call associates one or more file or socket handles with the port. Any overlapped I/O operation initiated on an associated handle will, on completion, post a completion packet to the port's internal queue. Overlapped I/O. Every asynchronous request is initiated by calling ReadFile or WriteFile (or socket equivalents) with an OVERLAPPED structure pointer and the handle already associated with the completion port. The call returns immediately with ERROR_IO_PENDING; the kernel continues the operation independently. When the device driver signals completion, the kernel places a completion packet on the port queue containing the number of bytes transferred, the original OVERLAPPED pointer, and a user-supplied completion key. Worker threads and the completion queue. A pool of worker threads each call GetQueuedCompletionStatus, which blocks until a completion packet arrives. The call returns the transferred byte count, the completion key, and the OVERLAPPED pointer, giving the worker everything it needs to continue processing the result. PostQueuedCompletionStatus lets application code inject synthetic packets into the queue — a common technique for signaling workers to shut down. Concurrency limit. CreateIoCompletionPort accepts a NumberOfConcurrentThreads parameter. When set to the number of logical processors (or 0, which defaults to that), the kernel releases only that many threads from GetQueuedCompletionStatus at a time. This prevents the thundering herd problem where a burst of completions wakes every waiting thread simultaneously, saturating the scheduler and causing context-switch thrashing. Scalability. IOCP enables a small, fixed-size thread pool to multiplex over a very large number of concurrent connections. A server with 10,000 active sockets needs only a handful of threads — typically 2–4 per logical core. Threads never block waiting for a specific connection; they dequeue whichever completion arrives next. This is the model used internally by ASP.NET, Node.js on Windows, and Rust's tokio runtime on Windows. Key Windows API calls:
Function Purpose
CreateIoCompletionPort Create a new port or associate an existing handle with a port
ReadFile / WriteFile Initiate overlapped I/O on an associated handle
GetQueuedCompletionStatus Block until a completion packet is available, then dequeue it
GetQueuedCompletionStatusEx Dequeue multiple completion packets in a single call
PostQueuedCompletionStatus Inject a synthetic completion packet (e.g., shutdown signal)
CloseHandle Destroy the port; wakes all blocked workers with an error
Cross-platform equivalents. Other operating systems provide analogous async I/O notification mechanisms:
  • epoll (Linux): level-triggered or edge-triggered readiness notification on file descriptors. Used by libuv (Node.js), tokio, and most Linux async runtimes.
  • kqueue (BSD / macOS): a unified event queue for file descriptors, signals, timers, and process events. Conceptually closer to IOCP than epoll.
  • io_uring (Linux 5.1+): a pair of ring buffers shared between kernel and user space for submitting and collecting I/O operations with near-zero system-call overhead. Supports true async I/O (not just readiness notification) and is increasingly the preferred mechanism on modern Linux kernels.
All three serve the same purpose as IOCP: allow a small thread pool to drive very high concurrency without one thread per connection.

2.  Async / Await

What it is. async / await is a syntactic feature that lets a developer write asynchronous code in a straight-line, sequential style without explicit callbacks. The compiler transforms an async function into a state machine whose execution can be suspended at each await point and resumed later, potentially on a different thread, without blocking the thread it was running on. Compiler transformation. When the compiler encounters await expr, it splits the function at that point. Everything before the await becomes one state; everything after becomes a continuation registered as the callback that fires when expr completes. The resulting state machine is heap-allocated (a future in Rust, a Task in .NET, a coroutine object in Python) and driven by a runtime scheduler rather than a call stack. The runtime scheduler (executor). An async function does nothing when called — it returns a future/task object immediately. Progress only happens when a scheduler polls the future. The scheduler runs on one or more OS threads and maintains a ready queue of futures that have received a completion signal (wakeup) from the underlying I/O subsystem (IOCP on Windows, epoll / io_uring on Linux). When a future yields at an await, the scheduler dequeues another ready future and resumes it on the same thread — cooperative multitasking within the thread pool. Language implementations:
Language Future / task type Runtime / executor
C# Task<T> ThreadPool + SynchronizationContext; IOCP-backed on Windows
Rust Future<Output=T> tokio or async-std; zero-cost, no heap allocation for leaf futures
Python Coroutine / Task asyncio event loop; single-threaded by default
JavaScript Promise<T> V8 event loop; single-threaded; microtask queue
C++ (C++20) co_await expr no standard executor; Asio (asio::awaitable<T>), cppcoro, or folly required; standard executor (std::execution) targets C++26
Key behaviors and caveats:
  • Non-blocking, not free. await yields the thread so other work can proceed, but every suspension point allocates a wakeup registration and involves scheduler bookkeeping. Very high-frequency tiny I/O operations can incur more overhead from the async machinery than from the I/O itself.
  • Cancellation. C# uses CancellationToken passed through the call chain. Rust futures are cancelled by dropping them (the destructor runs immediately). Python uses Task.cancel() which injects a CancelledError at the next await point.
  • Colored functions. An async function can only be awaited from another async function or from the executor entry point. This “colors” the call graph: synchronous and asynchronous code cannot freely intermix without bridging (e.g., block_on in Rust, asyncio.run() in Python).
  • CPU-bound work blocks the executor. Long-running synchronous computation on an async thread starves other futures. The remedy is to offload CPU-bound work to a separate thread pool (Task.Run in C#, spawn_blocking in tokio, loop.run_in_executor in asyncio).
Figure 2. Windows Event Processing

3.  Windows Messaging

Windows messaging is the mechanism the operating system uses to notify applications of user input, system events, and inter-window communication. Every GUI element — a button, text box, or top-level window — is identified by a window handle (HWND) and receives events as messages posted to its thread's message queue. The application's message loop dequeues and dispatches those messages to the appropriate window procedure. Message structure. A Windows message is a small structure:
  • hwnd — handle of the target window
  • message — numeric message identifier (e.g., WM_LBUTTONDOWN, WM_KEYDOWN)
  • wParam / lParam — message-specific parameters (key code, mouse coordinates, etc.)
  • time — timestamp of when the message was posted
  • pt — cursor position at posting time
When external devices are used to send key presses, mouse movements, button clicks, ... each device event results in a device driver creating a message and enqueueing to a raw input queue. The Window manager dequeues each message and routes it to an appropriate window. For example, a mouse button click gets routed to the first window which contains the mouse coordinates in its active area. Keyboard key events by default are routed to the window in focus. Routing means that the Window manager enqueues a filtered version of the message to an appropriate window's message queue. Each window has an event dispatcher that sends the message to an event handler for processing. Note that an application may send messages to a hidden window as a means of internal communication. Message loop. Every GUI thread runs a message loop that drives event processing: GetMessage blocks until a message arrives, TranslateMessage converts raw key messages into character messages (WM_CHAR), and DispatchMessage calls the window procedure (WndProc) registered for the target HWND. The loop runs on a single thread, so all window procedure calls for windows on that thread are serialized — no synchronization is needed within the handler, but a long-running handler will freeze the UI. SendMessage vs. PostMessage.
  • PostMessage places the message on the target thread's queue and returns immediately. The sender does not wait for the handler to run.
  • SendMessage delivers the message directly to the window procedure and blocks the caller until the handler returns. Calling SendMessage from one thread to a window on another thread can deadlock if both threads are simultaneously trying to send to each other.
  • SendMessageTimeout and SendNotifyMessage provide non-blocking or time-limited variants.
System-wide broadcast and registered messages. PostMessage(HWND_BROADCAST, ...) delivers a message to all top-level windows — used sparingly for system notifications. RegisterWindowMessage allocates a globally unique message identifier at runtime, allowing unrelated processes to coordinate without hardcoding numeric identifiers. Relationship to IOCP. Windows messaging and I/O completion ports are complementary, not competing, mechanisms. IOCP handles high-volume, low-latency I/O (network, file, pipe) on background thread-pool threads. Windows messaging handles GUI events on a dedicated UI thread. A common pattern is to process I/O on IOCP worker threads and marshal results back to the UI thread with PostMessage, keeping the two subsystems cleanly separated.

4.  Streams

Streams provide buffered I/O operations that collect data from possibly several requests and send that data collection to a device driver as one operation. Each basic read or write request needs to enter the platform kernel to access an appropriate device driver. That takes a significant amount of time compared to user mode processing. Buffering reduces the number of calls into the platform kernel and so improves program throughput. Essentially, streams collect a continuing sequence of event data and, based on an internal threshold, send on all of the data since the last operation as a single new operation. This happens for both input and output streams, that is, data events external to the program - input - and data events generated by the program - output.

5.  Consequences

Each application has the option of sending and receiving either synchronous or asynchronous I/O:
  • Synchronous I/O requires the handling thread to block until completion so operations that may block for a long time, like network communications, will adversely affect program performance if handled on the main thread of execution.
  • Asynchronous I/O allows the requesting thread to return immediately, but increases the overall processing load on machine's cores due to creation of I/O completion object and dispatching to thread pool threads.
Each application has the option of using buffered or unbuffered I/O. This is independent of the choice to use synchronous or asynchronous operations.
  • Unbuffered I/O sends each request to the kernel for processing by a device driver. For infrequent requests this requires no addition processing at the expense of more (infrequent) system calls.
  • Buffered I/O collects a series of I/O requests before entering the kernel, resulting in fewer expensive operations, but incurs the buffering overhead.
For most programs the choice of buffering and asynchrony may or may not improve performance and so should be tested before committing to production code.

6.  References

  1. Understanding the Windows IO System
  2. Asynchronous I/O: I/O Completion Ports
  3. Part 4 - I/O Completion Ports
  4. async await - stackOverflow   Read answer 4, at bottom
  5. .Net async/await in a single picture
  6. Linux and I/O completion ports?