BuildOn - Step #3:   CmdlnParser Pkg

 

BuildOn Project - CmdlnParser Package

Figure 1. TextFinder Packages
CmdlnParser is a package in the first BuildOn project, TextFinder. Figure 1 is a package diagram for Textfinder, discussed in Step #0.
TextFinder Specification

TextFinder Specification:

  1. Identify all files in a directory subtree that match a pattern and contain a specified text.
  2. Specify root path, one or more file patterns (.h, .cpp, .cs, .rs, ...), and search text on command line.
  3. Specify options /s [true|false], /v [true|false], /H [true|false] /h [true|false] for recursive directory walk, verbose output header, Hidden dirs with no match, and help message, respectively.
  4. Display file name and path, without duplication of path name, e.g., organized by directory, for files containing the search text.
  5. Interesting extensions:
    • Replace text by regular expressions for both search text and file patterns.
    • Replace sequential file searches with parallel searches to improve performance and useability.
CmdlnParser builds a HashMap<P, V> from command line arguments. P represents program properties like path, file patterns, search text, and other options. V is a Vec<String> that holds property values. Some of the properties have more than one value, e.g., .rs, .h, .cpp, ... The property P will be either a character, perhaps 'T', for the search text property, or a string like "Text". It is your choice which you use. Characters are faster to process, but less descriptive. So the command line, using character property identifiers, will look something like this: /P .  /p .rs  /T find_me  /H  /p .h .cpp or, using String identifiers will look like this: /Path .  /patt .rs  /Text find_me  /Hide  /patt .h .cpp Command line arguments that start with the '/' character are property identifiers. All others are values associated with the preceding property. A property identifier may have no specified value, like the /H property above. Properties appearing on the command line that have no specified values are given the value "true", meaning they are present, so /H means that directories that have no results are hidden - not part of the program's output. Note that property identifiers can appear more than once. That simply adds additional values to the property's Vec<String>. Any given property identifier may appear anywhere on the command line - they are not position dependent.
 

Step #3 - Build CmdlnParser Package

Before laying out requirements for this package, let's focus for a moment on a sensible goal: Command line parsing is application specific in that property identifers and their expected values depend on operations a program is expected to execute. However, it is important to note that most of the parsing is program agnostic, e.g.: Build a HashMap<P, Vec<String>> with property keys where each key may have more than one value. So our goal is to separate out the program agnostic part for CmdlnParser and leave application specific parts for each program to provide. Note that there are several well-known Rust packages for processing program command lines in Crates.io. You are specifically asked to use only your own code and std::library code for this package. In this step we will create the CmdlnParser package which will integrate with Executive in step #4. It:
  1. Implements a struct CLParser that is responsible for building an association of property identifiers with a collection of values for each identifier.
  2. The association is captured in a member HashMap<P, C>, where P may be either a utf-8 character or String (your choice - don't enable both, that's too complicated). The value collection, C, is a Vec<String> holding the values associated with each instance of P, e.g., a HashMap key.
  3. CLParser is passed, in a new function constructor, the program's command line arguments, which it parses and stores in its HashMap.
  4. Parsing consists of:
    • Recognizing property identifiers by their leading '/' character.
    • Collecting all of the succeeding arguments that are not property identifiers.
    • Inserting those values associated with the identifier into the HashMap. How that is done depends on whether the key already exists. You will find the HashMap Entry API makes this easy.
    • If no values succeed a property identifier, then a value of "true" is inserted as the single entry in the identifier's value collection.
    • This process is repeated until there are no more arguments on the command line.
    • Any command line arguments that occur before the first attribute is detected are ignored.
    • Due to the simplicity of this parsing model, there are no parsing errors, assuming the parser implementation is correct.
    • CLParser constructor function new returns a Result: CLParser::new(args: &Vec<String>) -> std::Result<CLParser, Error> Here, std::Result is an enum that will contain either Ok(cparser) or Err(error) where Ok and Err are elements of the Result enumeration. The instance cparser will have all of the command line associations correctly parsed, else error ε Error identifies the error. Here, the Error type is a custom error version of std::Error.
  5. When an instance of CLParser has finished parsing its specified command line, it serves as a container for property associations that can be used by any application processing that needs them. This means that CLParser will need to provide an interface for querying its contents in a simple way.
Application specific processing consists of specifying all of the properties required for program processing and giving them default values if those properties have not been extracted from the command line. That means that:
  • CLParser will need to provide an interface for an application to supply new property identifiers and give them value(s).
  • Since program functions will need access to the parser instance, it will be good design to create an Executive type that contains the parser as a member, and has non-static member functions for each of the program's activities.
  • For this package you should define an Executive type in your demonstration program for the CmdlnParser library - in the package's examples directory. You use that for final testing to ensure that CmdlnParser is ready for the next step.
 

Step #3 Starter Code

Due to experience gained with the TextSearch and DirNav components, no starter code will be provided. The parser structure is essentially a wrapper around a HashMap and the command line arguments are available from std::env::args(). The most difficult part is to partition the design so that CmdlnParser is easy to use. We will demonstrate my implementation to illustrate a few design alternatives.
 

Step #3 References

The table below provides references relevant for Step #3 : CmdlnParser.
 

Table 2. - Step #2 References

Topic Description Link
Iterators Rust iterators are used to sequence through collections. They have a large collection of adapters that allow code using collections to be written in a style much like that used by functional programming. RustBite_Iterators
std::iter::Iterator
std::iter::IntoIterator
Error Handling Rust error handling is based on use of the enumeration: enum Result<T,E> { Ok(T), Err(E), } where T is the type of the returned value, E is the type of the expected error. Rust enums are unique in that each of the enumertion items may be a wrapper for a specified type, like Ok and Err. RustStory Enums
RustStory Error Handling
Gentle Introduction to Rust std::Result
Rust env Command line arguments for any Rust program are available through the std::env module. std::env
Accepting CL args
Ownership Rusts ownership rules: There is only one owner for any resource. Owners deallocate their resources when they go out of scope. Ownership can be transferred with a Move or borrowed with a reference. References don't own resources, they just borrow them, and so never deallocate. Rust ownership does not support simultaneously aliasing and mutation. Rust Bites Safety
Rust Bites Ownership
Rust Story Ownership
By Example
Rust Book
Rust Nomicon
Strings Rust std strings come in two flavors: String and str, representing string objects and literal strings. Each contains utf-8 characters. The Rust library path also provides PathBuf, similar to String, and Path, similar to &str, but uses the encoding for paths provided by the current platform, e.g., Windows, Linux, or macOS. std::path
std::path::PathBuf std::path::Path
Rust by Example
struct Rust structs serve the same role as classes do in C++ and C#. Struct methods are defined inside impl StructName {} blocks. Rust Story structs
std::Stuct
keyword impl
You don't need to use all of the references in the right-most column. Just look at each quickly and use the one(s) that work(s) best for you.
 
 toggle menu