Blog: Software Structure

hierarchal, services, federations, programs driven by: data, analysis, Concurrency, events, communication

About

click to toggle Site Explorer

Initial Thoughts:

First, you will find a few more details than provided here in Software Structure.pdf.

There are many different aspects of Software Design. An important one is structure - what are the system parts, what are their responsibilities, and how do the parts relate to each other.

Software Parts Catalog:

Here's a set of software parts, listed with increasing levels of abstraction:
- Functions and methods:
  
  Basic building blocks for all the other parts.
- Classes, structs, and interfaces:
  
  A class encapsulates methods and data by controlling access, ensuring consistency of operation. Interfaces provide contracts for service. They allow the text of client code to bind to the inteface abstraction rather than a concrete class. That makes the code far less subject to breakage when the concrete classes have to change.
- Packages:
  Packages are files each of which focuses on a single responsibility. Each package has:
  - prologue comments describing their operations, build process, and history of the package
  - declarations and definitions of classes, structs, and interfaces
  - test stub - a main and helper functions that provide construction testing for the package
- Modules:
  Modules are collections of packages that are focused on a single responsibility, e.g., the parser module has:
  - Parser package with parser class and IRule and IAction interfaces
  - ActionsAndRules package with derived rules and actions and a Repository class
  - ScopeStack package
  - AbstrSynTree package
- Programs:
  Programs are collections of modules. The CodeAnalyzer program has modules for:
  - Scanning and tokenizing files into token sequences called SemiExpressions
  - parsing, using token sequences as inputs
  - building an Abstract Syntax Tree using ScopeStack and AbstrSynTree packages
  - managing program execution
- Systems:
  Systems are collections of cooperating programs. They often have structure determined by their communication style, e.g.:
  - client/server
  - peer-to-peer
  - federation of servers and clients
Simple Programs:

Simple programs often take the form show in the accompaning figure. The Executive package doesn't do any computation or formatting, but simply creates the program's top level components, and controls execution by passing them requests.

Inputs might be a list of files to process, stored in a collection structure of some kind. The Transform package requests the list, or possibly the Input package sends them to the Transformer. This communication implies that the Executive passed the caller a reference to the callee.

When transformation is complete the Display package requests the results, or some part of the results, and displays them to a user, based on filtering information supplied by the Executive.

A surprisingly large fraction of all programs have this structure, where the Input, Transform, and Display packages may depend on lower level packages to accomplish their tasks.

This figure shows the program's packages and how they communicate using function calls.
Data-Driven Programs:

The Client/Server model was one of the earliest software structures designed specifically to share data. Under this model the server is reactive, only responding to client requests, but taking no action under its own initiative¹. Clients initiate transactions, using HTTP messages, waiting for reply messages from the server.

We show, in the figure below, a modern incarnation of that structure. The purpose of the server is to share content among many clients. In this case the browser and web server conspire to present a "concurrent access" model by having the server download content to the browser which then may allow a user to spend significant time on its downloaded content while other browser clients are accessing the web server.

Should a client's browser upload new content to the web server, all other clients may be able to see the new content immediately, providing a very useful collaboration model.

We see from the figure, that the server may be dynamically generating views of its content determined by the specific web application and the client accesses and browser settings the client may have established.

This figure presents a block diagram (not one of the UML diagrams) showing the principle processing activities for the exchange of web application information, in the form of web pages, between web server and browser.

A variation on the Client/Server model is the Model-View-Controller (MVC) structure, shown in the next figure.
The controller, on startup, sends a default view to the client. Subsequent actions by the user on a view results in calls to the controller which may access its data model to build a response. That it sends back to the current view or a new view of some different form. In order to factor out view-handling details from the controller we often build a View Model which has objects that provide simple interfaces for the control but may be managing views in complex ways.

Similarly, we may factor out of the controller data management operations which we encapsulate in an Application Model. This model contains objects that manage processes and information inherent in the application, and take care of all data management operations so that the controller doesn't have to deal with that.

Note that the Views and Data may be local to a single running process, or may be distributed to serveral different remote hosts. Views may be requested by a browser client through HTTP requests and returned via an HTTP response, or may simply be direct references with a local process. Similarly, the Data Model may be supported by a remote web service, like Google Maps, or again, may be a local facility.
Analysis Driven Programs:

Analyzers may nomonally take the form of a simple system. For example, the code parser, shown in the next figure, fits into that model.

The Executive and Display are obvious. The Scanner model replaces the Inputs package with the two packages: Tokenizer, and SemiExp. The Transformer morphs into packages for managing the parsing process and for defining detailed rules and actions needed for code analysis.

Often those packages are accompanied by additional packages like the ScopeStack and AbstrSynTree packages. The Abstract Syntax Tree (AST) holds results of parsing to be used later by the Executive and Display packages. ScopeStack plays an important role in building the AST.

You can find here an example Code Analyzer application, written in C++, that uses exactly this structure.

Division of a program into packages is often a matter of binding together things that are related and separating those things that are not. We do this to satisfy the Single Responsibility Princple (SRP) - make each package focus on a single activity. If you can't describe what a package does in a single short sentence, then it probably violates the SRP.

However, there are other reasons for making program divisions. Here's a list:
- Keep things together that are logically related (SRP)
- Separate things that change at different rates. The ActionsAndRules package changes with every parsing application, but the Parser and its interfaces, IRule and IAction, do not. We separate ActionsAnRules from the Parser package, to which it is intimately related, because it changes frequently and the rest of the Parser does not.
- Keep things that do a lot of communication back and forth together for performance reasons. This is especially true where the processing is distributed across a network or the internet.
- Partition to reuse or salvage existing software components. Reuse is the incorporation of a component without making any changes. Salvage is a weaker form of reuse where we need to change parts of the component's code, but its design remains intact. If we have existing FileManager, Tokenizer, ... packages, the program that uses those parts must be structured to accomodate them and their processing needs and abilities.
- concurrent processing, either parallel or pipelined or both, usually has a major impact on the way we need to lay out a system structure - see below.
Concurrent Programs:

In the accompanying figure, we show analysis flow for a concurrent dependency analyzer that establishes dependency relationships between two or more source code files.

Processing occurs in two phases: the first analyzes all the types and global functions and data (for C++) found in the set of input files, storing the type information in a type table that also holds namespace and source file information.

The second pass consists of tokenizing files and marking a dependency relation for them if any of their tokens are keys in the type table.
File A depends on File B if it uses, in any way, the name of a type, global function, or global data, defined in B.
The first pass starts a File Search process running on its own thread. As files are discovered, their fully qualified names are enqueued for type analysis. Multiple type analysis threads are started from a thread pool. Each thread attempts to dequeue the name of a file to analyze. If the queue is empty it simply blocks until a name becomes available².

Once a thread has completed analysis of a file it has built a table of types found in the file, which it deposits into a merge queue.

A TypeTable merge thread dequeues partial type tables as they become available and merges them into a table it is building that, at the end of its processing, contains the entire type information for all the files analyzed.

When the merge process completes it publishes the type table and enqueues the file set for Dependency Analysis in the second phase.

The second phase flow is very similar to the first, and so we will skip over those details. The structure of this Analyzer is nominally like the simple program, but it's Transform part blossoms into a number of packages that manage the parallel/pipelined concurrency used in this program. That middle part will have packages for: a thread-safe blocking queue, thread pool, parser for type analysis - the rules are fairly simple, TypeTable package, and a Dependency package which records all of the dependency relationships found in the second pass.

It's likely that the Simple System Display package will also blossom in this Analyzer program. It has to show, perhaps with an indented list, the dependency relationships. Note that these relationships form a graph, not a tree, so it is harder to visualize the dependency structure with text output. Therefore we may be tempted to render the dependency graph visually, which presents all kinds of interesting graph routing and exploration processing to be developed.

You will find here a Parallel Text Search application that works just like the first phase of the Concurrent Type Analyzer, except that instead of type analysis the program searches files for multiple text strings.
Event Driven Programs:

Windows Graphical User Applications have a special structure, sometimes referred to as a Single Threaded Apartment (STA), illustrated in the diagram to the right. Single Threaded Apartments are software components that create Windows which accept all inputs through a thread-safe message queue. Messages are dequeued by the the primary thread, e.g., the thread that created the window. Usually many clients of the STA may concurrently send or post messages to the queue. However, the STA assumes that the only thread accessing the window functions and data is the primary thread, responding to messages from the queue, and so no locking or other synchronization is performed.

Graphical windows processing tends to be rather complex so the STA structure was created to avoid the additional complexity of dealing with concurrent access to the window internals.

Message arrive at the queue from device events through a "raw input" queue. The screen management software has a window manager with a stack of window references, one for each open window on the desktop. It dequeues messages from devices and drops them down its window stack. The first window that can handle the message does so and stops message propagation to other windows. This may happen for a mouse button message, for example, when the mouse coordinates are within the Window's client area or chrome.

Messages also arrive from code within the application, perhaps triggered by a control event - button click or scroll bar movement - or from other programs or the Windows operating system. All inputs from these agents occur as messages, never as direct access to a windows functions or data. This message-passing process allows a window to react to incoming messages in a way that seems concurrent to a user.

All of the Graphical User Interface frameworks like Qt, Gnome, WinForms, and Windows Presentation Foundation, are implemented with STAs, as is browser processing for all of the major implementations - Internet Explorer, Google Chrome, FireFox, etc. Also the Node.js message-dispatching JavaScript framework uses an STA model.
Communication-Driven Systems:

Distributed systems need some form of communication channel to communicate between remote machines. This is done now, most frequently, with TCP-based channels, often with an overlay of HTTP message handling. The classic form of HTTP is used to implement Client/Server communication where a client makes an HTTP request to a reactive server that forms an HTTP reply synchronously.

In our courses we will favor message-passing systems using HTTP style messages, but expect to use one-way transmissions. That is, a sender forms a request or notification message and sends to a designated receiver over a connected socket channel, perhaps wrapped in middleware infrastructure like Windows Communication Foundation (WCF). If the request implies some form of response, the receiver forms the reply message, perhaps after a significant amount of processing, then sends the reply to the reciever over a separate channel connection.

This one-way communication over dedicated channels uses more system resources than a single bilateral channel, e.g., more sockets and more threads. However, it yields a very flexible style of communication where no one waits for a reply. Each end of the communication is active. That is, it can initiate messages as well as sink them. We never have to worry about talk protocols of the request/response form. We just send messages when appropriate, and check our receive queues in a timely fashion, probably with a dedicated thread.

The implications of this structure are that we can expect each peer to host both a sender and receiver. Receivers listen for connection requests. When one arrives the receiver uses a thread pool thread to handle the incoming message, mostly to frame messages arriving as bytes in the receive socket, and depositing into a thread safe blocking queue, to await processing by the receiver's host.

Senders dequeue their host's outgoing messages, then attempt to open a channel to the destination with a socket connection request. When that succeeds the sender thread pushes the dequeued message into the channel and goes back to get another message. Note that this implies the send thread will do message packet inspection to see where the message needs to go. If it goes the same place as the previous message the send thread just pushes it into the socket. If the message has a new destination the send thread closes the previous connection and attempts to open a new connection to the new destination.

It is relatively simple to cache opened channels for a while to minimize the overhead of frequent opeing/closing cycles. One easy way to do that is to use a hash table that uses the destination url as its key and holds the connected socket as the corresponding value. Now, when the destination changes the send thread checks the hash table to see if there already exists a connection to the destination. If so, it uses the connection. If not, it creates a new one and inserts into the hash table before sending the message. This idea also allows us to hold onto messages that could not be sent in a stalled-message queue, also made part of the hash table's value. When a connection is finally established we may push stalled messages before passing on new messages.

Distributed systems also need careful division of processing to make information transfer as efficient as possible. Note that there is approximately an order of magnitude difference in the time taken to:
- call within a process (T)
- call between processes on the same machine (10 T)
- call into another machine on a local network (100 T)
- Call to another machine across the internet (1000 T)
We try to make communication as local as possible. Obviously distributed processing has to communicate to remote processes to do any cooperation. However, we try to do that with as few messages as possible and cache information and data so that doesn't need to be sent repeatedly between the same endpoints. For example, if we are sending files from a code Repository to a Test Harness, we will want the Test Harness to keep a cache of files it has recently requested. If it needs to perform testing on a particular module from the code baseline, it should ask for a list of all of the dependencies of the root module, and then request just those files it needs but does not already have to carry out its tests.
Web Services:

Programs can consume services provided externally, perhaps by Operating System Service running on the same machine or a Web Service running on the same machine, another machine on the same network, or running on a web host somewhere in the internet.

Operating System Serices communicate with local programs through interprocess communication mechanisms based on pipes or TCP messaging.

Web Services have a lot of similarities with web applications that are consumed in browsers. However, the client of a web service is usually not a person using a browser³ but rather a program that accesses data from the service host and that data is usually shared by many other programs as well.

Traditional Web services use TCP communication exclusively, usually wrapped in HTTP messages. These messages contain Simple Object Access Protocol XML messages carried in the HTTP message body. These help the service support a variety of operations including creating a Remote Procedure Call fiction. That uses a proxy to translate method calls into the required HTTP messages, and a stub at the other end to parse the messages and build stack frames to invoke remote objects managed by the service host.

SOAP processing is expensive, so a newer service protocol, call REST for REpresentational State Transfer is often used today. That big name identifies a simple idea. Use the standard HTTP messages: GET, POST, PUT, and DELETE to manage shared information on the service host.

People claim that REST is stateless, and the messaging is, in the sense that the response to messages does not depend on prior messages, e.g., REST services are not state machines. They do not support sessions or application state. If the functionality of a service requires managing state between messages, that state is sent to the client with every reply and the client has to send that back with any request that needs it. Most REST services do manage data state in the service host, e.g., state stored in a database or in files.

Partitioning your program's functionality into local processing augmented by remote shared services means that development time may decrease and program quality improve. You may get access to processing and data that you would not otherwise have. However, if one or more of the services your program uses is provided by a third party, the functionality of the program can be compromised by failures that you cannot easily correct.

For systems like the Federations, discussed below, many of the clients and federated servers may share common functionality for collaboration, security, data management, and other similar operations. Using web services to supply some of those operations for all the servers in a Federation, where performance is not critical, could be a very effective design strategy.
Program Federations:

Many large systems take the form of Federations of Servers and Clients that cooperate to carry out a common set of tasks by specialized behaviors. In the figure to the right we show a Software Development Collaboration system composed of:
- A Collaboration Server that holds managment information and hosts collaboration tools.
- A Repository Server that manages the software baseline of a large project, supporting ownership policies, checkin and checkout, version management, and dependency analysis (see the end of the previous item on Communication).
- A TestHarness Server that supports continuous integration of new code into an existing baseline by running tests of new parts in the context of a large part of the current baseline.
- Clients that provide users views into, and support using, the various facilities provided by the Federation.
The large-scale structure of the Federation is dominated by its communication network, probably based on Peer-To-Peer processing, as discussed above.

Each of the Client and Server programs may be structured using techniques described in the previous items. The Collaboration Server, for example, is largely Data-Driven, but also provides a set of collaboration services, probably exposed as web applications hosted in client web browsers and web services that JavaScript libraries consume, also in client web browsers.
Summary:
Software structure is determined by:
- The packages, modules, and programs that are physical containers of code
- How data is managed and viewed
- How code is factored into flexible pieces that can be substituted into an enclosing structure - Rules and Actions in the Parser
- The way processing and data flows through a system - analysis products like filenames and typetables in the CodeAnalyzer
- How events are processed
- How remote parts are linked together through communication channels
- The use of auxiliary services
- The division of labor between major parts of a federation
You won't learn enough about these variations to do useful things simply by reading this blog entry. However, it will give you some common sense about how things are built and you can look at concrete implementation examples provided by the Code Analyzer and Parallel Text Search application.

You will get practice and lots of experience with some of these structures in the course projects in CSE681 - Software Modeling and Analysis, and CSE687 - Object Oriented Design.

Some modern web applications are becoming proactive, sending notifications and pushing data to clients. Features in and accompanying HTML5 and JavaScript 6 provide some support for proactive operations. Websockets, WebWorkers, and Local Storage are examples.
We will discuss blocking queues when we cover threading later this semester.
Browsers can consume web services using JavaScript libraries.

Initial Thoughts:

Software Parts Catalog:

Simple Programs:

Data-Driven Programs:

Analysis Driven Programs:

Concurrent Programs:

Event Driven Programs:

Communication-Driven Systems:

Web Services:

Program Federations:

Summary: