Design Bite - Introduction

Concept, Architecture, TextFinder Spec, TextFinder Impl

"We will either find a way, or make one."
- Hannibal

1.0 Prologue

Design Bites are a short sequence of pages, each focused on a specific aspect of software design. They are brief, pragmatic, and relevent to things you and I do professionally.

This page is an introduction to the sequence. We use the TextFinder project, introduced in BuildOn presentations, to make discussion concrete and specific.

Software development has several phases:

designing concept, architecture, specification, and documenting the design
implementing and testing code, and documenting the results
deploying code and documentation

You can view examples of documentation and code from this site here: Rust TextFinder and CommCompare.

For small brief projects, like TextFinder, This likely to be a linear sequence. For larger projects development may be staged into producing a core release, and then a series of releases with progressively more functionality, until the specification has been satisfied.

It is important to note that design need not be a huge process. Its size, effort, and products should be scaled to match the project. For projects like TextFinder, it might be a day or two of work with a few pages of documentation, perhaps split between web page disclosure and comments in individual code packages.

2.0 Concept Development

Figure 1. Software Design

There we think about who the users will be, what their goals and expected activities are, and we also think about issues that may become apparent as the design proceeds.

Issues may be things like performance, ease of use, scope of activities, complexity, and ability of the development team to complete the project at reasonable cost with a practical schedule.

2.1 - TextFinder Concept

Design concept deals with what?, why?, and so what?

What do we want to do? Find files that contain specified text.

Why do we want to do that? To locate files that contain things we want to look at and modify.

Here are some practical uses:

Find links in html files, write them to a test page, then click on each link to check for 404s.
Find all html pages that mention a specific Rust feature, e.g., Arc<T>, to build a glossary of links.
where did I write about xyz?

Note that generation of output differs markedly for each of these!

So what are the issues?

May need to look at thousands of files. So performance is an issue.
Some uses seem to require using regular expressions to specify text to find.
How do we build flexibility to handle a variety of uses, e.g., things cited above.
Are there any existing parts we can reuse?

When the concept is complete, we begin developing an architecture.

3.0 Architecture

Architecture is an abstraction that leaves out all of the details of language and platform, letting ideas take precedence. Its purpose is to think about how the project will function, what its parts will be, their responsibilities, how they will communicate, and how data flows through the system.

3.1 TextFinder Architecture

Figure 2. TextFinder Packages

One useful way to start is to summarize tasks the system will need to execute in order to to find files with specified text:

Accept information from the command line that specifies directories to search, kinds of files to analyze, text to find, and any options that seem appropriate.
Navigate through the directory tree rooted at an input path, find names of all files that match specified patterns, e.g., extensions like ".html", ".rs", ".h", ".cpp", ..., and the paths where they are found, and send on for text search.
Open each such file and search for specified text. Send results on for display.
Extract useful information from the data stream, perform any required post-processing, and display the results.

Here's an example description: TextFinder architecture, with more detail in subsequent BuildOn steps.

Each task is a candidate to become a package. Note that we've described a data flow process. That's not the only way to configure TextFinder, but will be effective. That's because Information can be supplied to the user as part of the processing, which may visit hundreds or even thousands of files and directories. We will see later, that data-flow lends itself to concurrent processing for text search in files.

Now the parts - source code packages - are emerging from that thought process, e.g.:

TextFinder executive creates instances of the implementation's types and starts the processing flow. It creates CmdlnParser and accepts its parsed output, creates an instance of DirNav and configures it with the TextSearch type. It creates an instance of the Display type, and configures TextSearch with text and a reference to the Display type.
CmdlnParser accepts the input command line, parses it into an associative list of attributes - path, patterns, text, ... - and returns that information to TextFinder.
DirNav accepts a starting path and the set of file patterns to find, and recursively visits all directories in the directory tree rooted at the specified path. Each time it enters a directory and finds file names that match one of its patterns it passes that information on for text search, using an eventing interface.
TextSearch opens each file it's given and searches for the specified text. It then passes that information on for display.
Display is interesting. What it needs to do is very application specific, as indicated in the Textfinder Concept. Since TextFinder executive configures TextSearch with a reference to its display, the application can provide several display types, one of which is instantiated based on user input.

We now have a clear picture of what needs to be implemented and what structure to use. Next, we need to describe what we are going to deliver.

4.0 TextFinder Specification:

A specification should be as brief as practical while still being complete and unambiguous. We are electing to make display quite simple, but the architecture leaves open the possibility of easily replacing that with alternate display processing in a later version.

Identify all files in a directory subtree that match one or more patterns and contain a specified text.
Specify root path, one or more file patterns (.h, .cpp, .cs, .rs, ...), and search text on command line.
Specify options /s [true|false], /v [true|false], /H [true|false] /h [true|false] for recursive directory walk, verbose output header, Hidden dirs with no match, and help message, respectively.
Display file name and path, without duplication of path name, e.g., organized by directory, for files containing the search text.
Interesting extensions, not required for this implementation:
- Replace text by regular expressions for both search text and file patterns.
- Replace sequential file searches with parallel searches to improve performance and useability.

At this point, we can begin developing code, focusing on generating specified processing with user-defined types and their methods. We know how data will flow, which makes it relatively simple to build test mocks for functionality not yet in place.

5.0 Implementation

Implementation is all about details. Every single line of code for every package matters. Once an architecture is defined, each package can be built in near isolation from the others until we tie everything together near the end. So that makes it much easier to handle this level of detail.

Figure 3. TextFinder Types

The implementation process often starts with a single package, populating that with types and functions. Since we will be using Rust, we won't use the term class as it builds user-defined types with structs.

Each package will have a primary type, as shown in Figure 3. The types DirEvent and SearchEvent are traits, used to support communication, as described below.

Executive type is defined in the TextFinder package. It creates an instance of DirNav<Finder> and configures its internal Finder instance with a reference to an instance of GenOut. This generates a data flow pipe-line which dominates TextFinder processing.
DirNav<Finder> and the DirEvent interface is defined in the DirNav package. In its constructor DirNav creates an instance of Finder using a factory function specified in the DirEvent trait and implemented by Finder.
It exposes a public method get_app(&mut self) -> &mut App. The App type is a generic parameter which Executive supplied by constructing DirNav<Finder>. get_app() is called by Executive to configure the Finder app with an instance of GenOut and search text.
DirNav provides a function visit(&mut self, dir: Path) -> io::Result<()> that Executive calls to start TextFinder processing flow.
Finder and SearchEvent are defined in the TextSearch package. Finder creates a file path using file name and current directory, passed by DirNav<Finder>. It attempts to open the file and, if successful it searches the file for text supplied by Executive. In either case it reports the result to GenOut using the SearchEvent interface.
GenOut is defined in the Display package. It receives search events from Finder and builds information for TextFinder users. For simple applications - see uses in the Concept section - it simply formats results information and writes to the console. For more complex applications it may build an internal data structure and do some post processing for each search event, perhaps sending out the entire results data at the end of the program.
If so, it will likely indicate search activities, perhaps by displaying directories as they become available, so users know that the application is running as expected.

6. - Data Flow Patterns

In the next Bite: Structure we look at the the ideas that lead up to the structure shown in Figure 3.

3. Epilogue

Before diving into options for design stucture you may find a relatively pragmatic discussion of design philosophy helps to understand some of the motivations for these Bites.

We will consider five design alternatives for TextFinder:

These are progressively more flexible, eventually resulting in reusable components, but also increasingly complex. Where you settle in these alternatives is determined by design context. Is this a one-of-a-kind project that you want to finish quickly or is it heading for production code that will be maintained by more than one developer?