BuildOn - Step #6:   Parallel Text Search

 

BuildOn Project - Parallel Execution of Text Search

Figure 1. TextFinder Types
Figure 2. Parallel Text Search
This Step adds processing of text search in parallel, one thread for each file. It introduces a thread pool into TextSearch, a blocking queue in Display, and uses 1 thread for directory navigation, mulitple threads for text search, and another thread for sending display contents to the output, as shown in Figure 2. Figure 1 is a type diagram for Textfinder, showing how these parts fit together.
TextFinder Specification

TextFinder Specification:

  1. Identify all files in a directory subtree that match a pattern and contain a specified text.
  2. Specify root path, one or more file patterns (.h, .cpp, .cs, .rs, ...), and search text on command line.
  3. Specify options /s [true|false], /v [true|false], /H [true|false] /h [true|false] for recursive directory walk, verbose output header, Hidden dirs with no match, and help message, respectively.
  4. Display file name and path, without duplication of path name, e.g., organized by directory, for files containing the search text.
  5. Interesting extensions:
    • Replace text by regular expressions for both search text and file patterns.
    • Replace sequential file searches with parallel searches to improve performance and useability.
Executive uses CLParser to access the program's starting path, file patterns, search text, and options. Executive configures the CLPparser instance for TextFinder's Finder operations by defining default values of program attributes not already defined on the command line. It then creates a member instance of DirNav, providing it access to TextSearch::Finder as a generic parameter. DirNav creates an instance of TextSearch's Finder and provides access to it via a member function DirNav<App>::getApp(). This may be used to configure Finder before starting a search. It may also be used to collect results that are specific to DirNav, e.g., the number of files and directories processed. Executive uses getApp() to configure Finder with search text.
 

Step #6 - Add Parallel Search

In this step we will add code modifications to support searching files for text in parallel, one thread for each file. TextSearch and GenOut modifications:
  1. In Finder add a thread pool that receives file and directory names from Finder::do_dir and Finder::do_file. Each thread uses a processing function that pulls a file from the thread pool queue and searches it for text. It sends its results to GenOut.
  2. To GenOut add a blocking queue that receives Finder's results. Now, GenOut cannot immediately write out the result because of the uncertainty in timing of each individual search. A Finder thread may still be searching when a new directory is entered in DirNav.
  3. The issue cited above can be resolved by storing Finder results in a data structure and simply display directory names as they are encountered, saving the file results for display at the end of processing. This allows the user to monitor progress and allow GenOut to use some policy for display, perhaps sorting by directory name and by file name in each directory.
 

Step #2 References

The table below provides references relevant for Step #2 : DirNav. The first links refer to specific regions of the Rust Story, from this site. Other links go to Rust documentation. You can look at the Rust Story by selecting the Rust Story link in the menu in the left panel.
 

Table 2. - Step #2 References

Topic Description Link
Threads Threads are similar to those in the C++ thread library, but subject to Rust ownership policies. Data ownership in a thread processing function uses "interior mutability" to track at run-time memory and data race safety. It does that with RefCell instances embedded in Rust's synchronizing constructs, Mutex, RWLock, and Channel. RustBites Threads
RustBites Sync
Fearless Concurrency
RustStory threads
RustStory Sync
RustStory MPSC
Generics Generics in Rust are very similar to those in C# and Java, and simpler than C++ templates. They are code generators often do little more than substitute a specific type for a generic parameter. Rust generics are often constrained with traits, as discussed above. Rust Story Generics
Rust Bites Generics and Traits
The Rust Book
Ownership Rusts ownership rules: There is only one owner for any resource. Owners deallocate their resources when they go out of scope. Ownership can be transferred with a Move or borrowed with a reference. References don't own resources, they just borrow them, and so never deallocate. Rust ownership does not support simultaneously aliasing and mutation. Rust Bites Ownership
Rust Story Ownership
By Example
Rust Book
Rust Nomicon
Strings Rust std strings come in two flavors: String and str, representing string objects and literal strings. Each contains utf-8 characters. The Rust library path also provides PathBuf, similar to String, and Path, similar to &str, but uses the encoding for paths provided by the current platform, e.g., Windows, Linux, or macOS. std::String
std::str
std::path
std::path::PathBuf std::path::Path
Rust by Example
struct Rust structs serve the same role as classes do in C++ and C#. Struct methods are defined inside impl StructName {} blocks. Rust Story structs
std::Stuct
keyword impl
File System Rust has a well engineered facility for accessing files and directories.
Some key types in std::fs are: DirEntry, File, OpenOptions, ReadDir, ...
Rust story File System
std::fs
 
You don't need to use all of the references in the right-most column. Just look at each quickly and use the one(s) that work(s) best for you.
 
 toggle menu