First, you will find a few more details than provided here in
Software Structure.pdf.
There are many different aspects of Software Design. An important one is structure - what are the system parts,
what are their responsibilities, and how do the parts relate to each other.
Software Parts Catalog:
Here's a set of software parts, listed with increasing levels of abstraction:
Functions and methods:
Basic building blocks for all the other parts.
Classes, structs, and interfaces:
A class encapsulates methods and data by controlling access, ensuring consistency of operation. Interfaces provide
contracts for service. They allow the text of client code to bind to the inteface abstraction rather than a concrete
class. That makes the code far less subject to breakage when the concrete classes have to change.
Packages:
Packages are files each of which focuses on a single responsibility. Each package has:
prologue comments describing their operations, build process, and history of the package
declarations and definitions of classes, structs, and interfaces
test stub - a main and helper functions that provide construction testing for the package
Modules:
Modules are collections of packages that are focused on a single responsibility, e.g., the parser module
has:
Parser package with parser class and IRule and IAction interfaces
ActionsAndRules package with derived rules and actions and a Repository class
ScopeStack package
AbstrSynTree package
Programs:
Programs are collections of modules. The CodeAnalyzer program has modules for:
Scanning and tokenizing files into token sequences called SemiExpressions
parsing, using token sequences as inputs
building an Abstract Syntax Tree using ScopeStack and AbstrSynTree packages
managing program execution
Systems:
Systems are collections of cooperating programs. They often have structure determined by
their communication style, e.g.:
client/server
peer-to-peer
federation of servers and clients
Simple Programs:
Simple programs often take the form show in the accompaning figure. The Executive package doesn't do any computation or formatting, but
simply creates the program's top level components, and controls execution by passing them requests.
Inputs might be a list of files to process, stored in a collection structure of some kind. The Transform package requests the list, or
possibly the Input package sends them to the Transformer. This communication implies that the Executive passed the caller a reference
to the callee.
When transformation is complete the Display package requests the results, or some part of the results, and displays them to a user, based on
filtering information supplied by the Executive.
A surprisingly large fraction of all programs have this structure, where the Input, Transform, and Display packages may depend on lower
level packages to accomplish their tasks.
This figure shows the program's packages and how they communicate using function calls.
Data-Driven Programs:
The Client/Server model was one of the earliest software structures designed specifically to share data.
Under this model the server is reactive, only responding to client requests, but taking no action under its own initiative1.
Clients initiate transactions, using HTTP messages, waiting for reply messages from the server.
We show, in the figure below, a modern incarnation of that structure.
The purpose of the server is to share content among many clients. In this case the browser
and web server conspire to present a "concurrent access" model by having the server download content to the browser which then
may allow a user to spend significant time on its downloaded content while other browser clients are accessing the web server.
Should a client's browser upload new content to the web server, all other clients may be able to see the new content immediately, providing
a very useful collaboration model.
We see from the figure, that the server may be dynamically generating views of its content determined by the specific web application
and the client accesses and browser settings the client may have established.
This figure presents a block diagram (not one of the UML diagrams) showing the principle processing activities for the exchange of web
application information, in the form of web pages, between web server and browser.
A variation on the Client/Server model is the Model-View-Controller (MVC) structure, shown in the next figure.
The controller, on startup, sends a default view to the client. Subsequent actions by the user on a view results in calls to the
controller which may access its data model to build a response. That it sends back to the current view or a new view of some different
form. In order to factor out view-handling details from the controller we often build a View Model which has objects that provide
simple interfaces for the control but may be managing views in complex ways.
Similarly, we may factor out of the controller data management operations which we encapsulate in an Application Model. This model
contains objects that manage processes and information inherent in the application, and take care of all data management operations
so that the controller doesn't have to deal with that.
Note that the Views and Data may be local to a single running process, or may be distributed to serveral different remote hosts.
Views may be requested by a browser client through HTTP requests and returned via an HTTP response, or may simply be direct references
with a local process. Similarly, the Data Model may be supported by a remote web service, like Google Maps, or again, may be a local
facility.
Analysis Driven Programs:
Analyzers may nomonally take the form of a simple system. For example, the code parser, shown in the next figure,
fits into that model.
The Executive and Display are obvious. The Scanner model replaces the Inputs package with the two packages:
Tokenizer, and SemiExp. The Transformer morphs into packages for managing the parsing process and for defining
detailed rules and actions needed for code analysis.
Often those packages are accompanied by additional packages like the ScopeStack and AbstrSynTree
packages. The Abstract Syntax Tree (AST) holds results of parsing to be used later by the Executive and Display packages. ScopeStack plays an
important role in building the AST.
You can find here an example Code Analyzer application, written in C++, that uses exactly this structure.
Division of a program into packages is often a matter of binding together things that are related and separating those things that are not.
We do this to satisfy the Single Responsibility Princple (SRP) - make each package
focus on a single activity. If you can't describe what a package does in a single short sentence, then it probably violates the SRP.
However, there are other reasons for making program divisions. Here's a list:
Keep things together that are logically related (SRP)
Separate things that change at different rates. The ActionsAndRules package changes with
every parsing application, but the Parser and
its interfaces, IRule and IAction, do not. We separate ActionsAnRules
from the Parser package, to which it is intimately related, because it changes frequently and the rest of the Parser does not.
Keep things that do a lot of communication back and forth together for performance reasons. This is especially true where the processing
is distributed across a network or the internet.
Partition to reuse or salvage existing software components.
Reuse is the incorporation of a component without making any changes. Salvage is a weaker form of reuse where we need to change parts of the
component's code, but its design remains intact.
If we have existing FileManager, Tokenizer, ... packages, the program that uses those
parts must be structured to accomodate them and their processing needs and abilities.
concurrent processing, either parallel or pipelined or both, usually has a major impact on the way
we need to lay out a system structure - see below.
Concurrency Driven Programs:
In the accompanying figure, we show analysis flow for a concurrent dependency analyzer that establishes dependency relationships
between two or more source code files.
Processing occurs in two phases: the first analyzes all the types and global functions and data (for C++) found in the set of input
files, storing the type information in a type table that also holds namespace and source file information.
The second pass consists
of tokenizing files and marking a dependency relation for them if any of their tokens are keys in the type table.
File A depends on File B if it uses, in any way, the name of a type, global function, or global data, defined in B.
The first pass starts a File Search process running on its own thread. As files are discovered, their fully qualified names are
enqueued for type analysis. Multiple type analysis threads are started from a thread
pool. Each thread attempts to dequeue the name of a file to analyze. If the queue is empty it simply blocks until a name becomes
available2.
Once a thread has completed analysis of a file it has built a table of types found in the file, which it
deposits into a merge queue.
A TypeTable merge thread dequeues partial type tables as they become available and merges them
into a table it is building that, at the end of its processing, contains the entire type information for all the files analyzed.
When the merge process completes it publishes the type table and enqueues the file set for Dependency Analysis in the second phase.
The second phase flow is very similar to the first, and so we will skip over those details. The structure of this Analyzer is nominally
like the simple program, but it's Transform part blossoms into a number of packages that manage the parallel/pipelined concurrency used
in this program. That middle part will have packages for: a thread-safe blocking queue, thread pool, parser for type analysis - the rules
are fairly simple, TypeTable package, and a Dependency package which records all of the dependency relationships found in the second pass.
It's likely that the Simple System Display package will also blossom in this Analyzer program. It has to show, perhaps with an indented list,
the dependency relationships. Note that these relationships form a graph, not a tree, so it is harder to visualize the dependency structure
with text output. Therefore we may be tempted to render the dependency graph visually, which presents all kinds of interesting graph routing
and exploration processing to be developed.
You will find here a Parallel Text Search application that works just like the first phase
of the Concurrent Type Analyzer, except that instead of type analysis the program searches files for multiple text strings.
Event Driven Programs:
Windows Graphical User Applications have a special structure, sometimes referred to as a Single Threaded Apartment (STA), illustrated in the
diagram to the right. Single Threaded Apartments are software components that create Windows which accept all inputs through a
thread-safe message queue.
Messages are dequeued by the the primary thread, e.g., the thread that created the window. Usually many clients of the STA
may concurrently send or post messages to the queue. However, the STA assumes that the only thread accessing the window functions and data
is the primary thread, responding to messages from the queue, and so no locking or other synchronization is performed.
Graphical windows processing tends to be rather complex so the STA structure was created to avoid the additional complexity of dealing with concurrent
access to the window internals.
Message arrive at the queue from device events through a "raw input" queue. The screen management software has a window manager
with a stack of window references, one for each open window on the desktop. It dequeues messages from devices and drops them down its window
stack. The first window that can handle the message does so and stops message propagation to other windows. This may happen for a mouse button
message, for example, when the mouse coordinates are within the Window's client area or chrome.
Messages also arrive from code within the application, perhaps triggered by a control event - button click or scroll bar movement - or from other
programs or the Windows operating system. All inputs from these agents occur as messages, never as direct access to a windows functions or data.
This message-passing process allows a window to react to incoming messages in a way that seems concurrent to a user.
All of the Graphical User Interface frameworks like Qt, Gnome, WinForms, and Windows Presentation Foundation, are implemented with STAs, as is
browser processing for all of the major implementations - Internet Explorer, Google Chrome, FireFox, etc. Also the Node.js message-dispatching
JavaScript framework uses an STA model.
Communication-Driven Systems:
Distributed systems need some form of communication channel to communicate between remote machines. This is done now, most frequently, with
TCP-based channels, often with an overlay of HTTP message handling. The classic form of HTTP is used to implement Client/Server communication
where a client makes an HTTP request to a reactive server that forms an HTTP reply synchronously.
In our courses we will favor message-passing systems using HTTP style messages, but expect to use one-way transmissions. That is, a sender
forms a request or notification message and sends to a designated receiver over a connected socket channel, perhaps wrapped in middleware
infrastructure like Windows Communication Foundation (WCF). If the request implies some form of response, the receiver forms the reply message,
perhaps after a significant amount of processing, then sends the reply to the reciever over a separate channel connection.
This one-way communication over dedicated channels uses more system resources than a single bilateral channel, e.g., more sockets and more
threads. However, it yields a very flexible style of communication where no one waits for a reply. Each end of the communication is active.
That is, it can initiate messages as well as sink them. We never have to worry about talk protocols of the request/response form. We just
send messages when appropriate, and check our receive queues in a timely fashion, probably with a dedicated thread.
The implications of this structure are that we can expect each peer to host both a sender and receiver. Receivers listen for connection requests.
When one arrives the receiver uses a thread pool thread to handle the incoming message, mostly to frame messages arriving as bytes in the
receive socket, and depositing into a thread safe blocking queue, to await processing by the receiver's host.
Senders dequeue their host's outgoing messages, then attempt to open a channel to the destination with a socket connection request. When that succeeds
the sender thread pushes the dequeued message into the channel and goes back to get another message. Note that this implies the send thread
will do message packet inspection to see where the message needs to go. If it goes the same place as the previous message the send thread
just pushes it into the socket. If the message has a new destination the send thread closes the previous connection and attempts to open
a new connection to the new destination.
It is relatively simple to cache opened channels for a while to minimize the overhead of frequent opeing/closing cycles. One easy way to do
that is to use a hash table that uses the destination url as its key and holds the connected socket as the corresponding value. Now, when
the destination changes the send thread checks the hash table to see if there already exists a connection to the destination. If so, it uses
the connection. If not, it creates a new one and inserts into the hash table before sending the message. This idea also allows us
to hold onto messages that could not be sent in a stalled-message queue, also made part of the hash table's value. When a connection is
finally established we may push stalled messages before passing on new messages.
Distributed systems also need careful division of processing to make information transfer as efficient as possible. Note that
there is approximately an order of magnitude difference in the time taken to:
call within a process (T)
call between processes on the same machine (10 T)
call into another machine on a local network (100 T)
Call to another machine across the internet (1000 T)
We try to make communication as local as possible. Obviously distributed processing has to communicate to remote processes to do any
cooperation. However, we try to do that with as few messages as possible and cache information and data so that doesn't need to be
sent repeatedly between the same endpoints. For example, if we are sending files from a code Repository to a Test Harness, we will want
the Test Harness to keep a cache of files it has recently requested. If it needs to perform testing on a particular module from the code
baseline, it should ask for a list of all of the dependencies of the root module, and then request just those files it needs but does not already
have to carry out its tests.
Web Services:
Programs can consume services provided externally, perhaps by Operating System Service running on the same machine or
a Web Service running on the same machine, another machine on the same network, or running on a web host somewhere in the
internet.
Operating System Serices communicate with local programs through interprocess communication mechanisms based on pipes or TCP
messaging.
Web Services have a lot of similarities with web applications that are consumed in browsers. However, the client of a web service is usually
not a person using a browser3 but rather a program that accesses data from the service host and that data is usually shared by
many other programs as well.
Traditional Web services use TCP communication exclusively, usually wrapped in HTTP messages. These messages contain Simple Object Access
Protocol XML messages carried in the HTTP message body. These help the service support a variety of operations including creating
a Remote Procedure Call fiction. That uses a proxy to translate method calls into the required HTTP messages, and a stub at the other end
to parse the messages and build stack frames to invoke remote objects managed by the service host.
SOAP processing is expensive, so a newer service protocol, call REST for REpresentational State Transfer is often used today.
That big name identifies a simple
idea. Use the standard HTTP messages: GET, POST, PUT, and DELETE to manage shared information on the service host.
People claim that REST is
stateless, and the messaging is, in the sense that the response to messages does not depend on prior messages, e.g., REST services are not
state machines. They do not support sessions or application state. If the functionality of a service requires managing state
between messages, that state is sent to the client with every reply and the client has to send that back with any request that
needs it.
Most REST services do manage data state in the service host, e.g., state stored in a database or in files.
Partitioning your program's functionality into local processing augmented by remote shared services means that development time
may decrease and program quality improve. You may get access to processing and data that you would not otherwise have. However,
if one or more of the services your program uses is provided by a third party, the functionality of the program can be compromised
by failures that you cannot easily correct.
For systems like the Federations, discussed below, many of the clients and federated servers may share common functionality for
collaboration, security, data management, and other similar operations. Using web services to supply some of those operations
for all the servers in a Federation, where performance is not critical, could be a very effective design strategy.
Program Federations:
Many large systems take the form of Federations of Servers and Clients that cooperate to carry out a common set of tasks by specialized behaviors.
In the figure to the right we show a Software Development Collaboration system composed of:
A Collaboration Server that holds managment information and hosts collaboration tools.
A Repository Server that manages the software baseline of a large project, supporting ownership policies, checkin and checkout, version
management, and dependency analysis (see the end of the previous item on Communication).
A TestHarness Server that supports continuous integration of new code into an existing baseline by
running tests of new parts in the context of a large part of the current baseline.
Clients that provide users views into, and support using, the various facilities provided by the Federation.
The large-scale structure of the Federation is dominated by its communication network, probably based on Peer-To-Peer processing,
as discussed above.
Each of the Client and Server programs may be structured using techniques described in the previous items. The Collaboration Server,
for example, is largely Data-Driven, but also provides a set of collaboration services, probably exposed as web applications hosted in
client web browsers and web services that JavaScript libraries consume, also in client web browsers.
Summary:
Software structure is determined by:
The packages, modules, and programs that are physical containers of code
How data is managed and viewed
How code is factored into flexible pieces that can be substituted into an enclosing structure - Rules and Actions in the Parser
The way processing and data flows through a system - analysis products like filenames and typetables in the CodeAnalyzer
How events are processed
How remote parts are linked together through communication channels
The use of auxiliary services
The division of labor between major parts of a federation
You won't learn enough about these variations to do useful things simply by reading this blog entry. However, it will give you some
common sense about how things are built and you can look at concrete implementation examples provided by the
Code Analyzer and Parallel Text Search application.
Some modern web applications are becoming proactive, sending notifications and pushing data to clients. Features in and accompanying HTML5 and JavaScript 6 provide
some support for proactive operations. Websockets, WebWorkers, and Local Storage are examples.
We will discuss blocking queues when we cover threading later this semester.
Browsers can consume web services using JavaScript libraries.