Code Projects

multi-language implementations compared side-by-side

1.0 Introduction

The Code/Projects/ folder holds multi-language implementations of software tools, each built to the same design and command-line interface so the languages can be compared side-by-side. Currently two project families are included.
Family Languages Description
TextFinder C++23, C#, Python, Rust, Rust (opt) Walk a directory tree and report files whose content matches a regular expression
PageValidator Rust, C++23, C#, Python Validate HTML files for structural correctness against eight rules
Each family README contains architecture diagrams, shared CLI documentation, code metrics, and performance timing. Every project was built using spec-driven development - see CodeBites: Spec-Driven Development for details.

2.0 TextFinder

TextFinder walks a directory tree and reports files whose content matches a regular expression supplied on the command line. All five variants share the same three-component architecture and CLI. Architecture
  CommandLine   DirNav   Output
        \          |       /
             EntryPoint
  • CommandLine - parses /Key [Value] tokens from argv
  • DirNav - depth-first directory walk; fires callbacks on each directory and file
  • Output / TextFinder - performs the regex match and writes results to the console
  • EntryPoint - wires the three components together; no direct cross-dependencies between them
Shared Command-Line Interface
Option Meaning Default
/P <path>Root directory to search. (current directory)
/p <ext,...>File extensions to includeall files
/r <regex>Regular expression matched against file content. (any)
/sRecurse into subdirectoriestrue
/HShow only directories with matchestrue
/hPrint help and exit
Performance (20 warm-cache runs, search root: NewSite, regex: class)
Variant Min (s) Median (s) Max (s)
PyTextFinder0.2220.2810.715
RustTextFinderOpt0.5360.6101.034
CppTextFinder0.5680.6470.706
CsTextFinder0.8271.0531.456
RustTextFinder0.8730.9051.402
All five variants agree on 656 matched files out of 1196 visited. Python leads because the workload is I/O-bound and Python's hot path (os.scandir, re.search) runs in C. C++ trails partly because std::regex uses a slower backtracking engine compared to the DFA-based engines in Python and Rust.

3.0 PageValidator

PageValidator examines HTML files for valid structural composition and reports all errors found before returning. It never stops at the first failure. Architecture - a strictly linear pipeline:
  Tokenizer ← Lexer ← Validator ← EntryPoint
  • Tokenizer - reads raw HTML and emits a flat token stream; no HTML grammar knowledge
  • Lexer - groups tokens into structured lexemes with source positions
  • Validator - drives the lexer, maintains an open-tag stack, applies eight rules, returns a full error report
  • EntryPoint - parses CLI flags, iterates HTML files, prints pass/fail report
Validation Rules
Rule ID Description
doctypeDocument begins with <!DOCTYPE html>
root-elementExactly one <html> element wraps the entire document
head-required<head> is present and contains at least one <title>
body-required<body> is present
tag-nestingEvery open tag has a matching close tag in correct stack order
void-elementsVoid elements (br, hr, img, input, link, meta, ...) carry no close tag
attr-quotesAll attribute values are enclosed in quotes
duplicate-idThe id attribute value is unique within the document
Performance (20 warm-cache runs, scanning the full NewSite HTML tree, 664 files)
Validator Min (s) Median (s) Max (s)
C++ (Release)0.5210.6452.729
Rust (Release)0.9010.9361.972
C# (Release)1.1271.2901.538
Python2.7662.8463.029
All four agree on 425 files with errors out of 664 visited.

4.0 Utility Scripts

Three Python scripts in Code/Projects/ support timing and metrics collection. tf_timer.py - time a single TextFinder variant:
  python tf_timer.py <program> [--runs N] [TextFinder options ...]
Program names: PyTextFinder, CsTextFinder, CppTextFinder, RustTextFinder, RustTextFinderOpt.
pa_timer.py - time all four PageValidator implementations and print a comparison table:
  python pa_timer.py [--site PATH] [--runs N]
code_metrics.py - report line counts and scope counts for every source file under a project root:
  python code_metrics.py [path] [--html] [--html-only] [--no-recurse]
Recognized extensions: .py .cs .cpp .c .h .hpp .ixx .rs .js .ts .jsx .tsx .java .go