| Package | Kind | Role |
|---|---|---|
| library | Raw lexical scanner; produces a |
|
| library | Groups tokens into structured |
|
| library | Drives the Lexer, applies the eight structural rules, and returns a |
|
| executable ( |
CLI orchestration — parses arguments, walks directories, calls Validator, prints the report |
# Validate a single file python EntryPoint/page_validator.py index.html # Validate a directory tree, quiet mode (errors only) python EntryPoint/page_validator.py -r -q ./site # Validate with pass/fail summary python EntryPoint/page_validator.py -r -s ./site
| Option | Argument | Default | Meaning |
|---|---|---|---|
| file or directory | (required) | One or more HTML files or directories to validate | |
| (flag) | off | Descend into subdirectories | |
| (flag) | off | Print only files with errors; suppress PASS lines | |
| (flag) | off | Print pass/fail count after all files are processed | |
| (flag) | off | Print help and exit |
PASS site/index.html
FAIL site/about.html
[tag-nesting] 14:3 — </div> does not match open <p>
[duplicate-id] 22:10 — duplicate id 'header'
PASS site/contact.html
2 passed, 1 failed
| Rule | What is checked |
|---|---|
| Document begins with |
|
| Exactly one |
|
| Every open tag has a matching close tag in the correct order | |
| Void elements ( |
|
| All attribute values are quoted | |
| No two elements share the same |
class Token: pass @dataclass class TagOpen(Token): name: str @dataclass class TagClose(Token): name: str @dataclass class AttrName(Token): name: str @dataclass class AttrValue(Token): value: str # quoted @dataclass class AttrUnquoted(Token): value: str # unquoted class SelfClose(Token): pass class TagEnd(Token): pass @dataclass class Text(Token): content: str @dataclass class Comment(Token): content: str @dataclass class Doctype(Token): content: str class Eof(Token): passThe tokenizer holds no HTML grammar knowledge — it only recognises
class Lexeme: pass @dataclass class OpenTag(Lexeme): name: str; attrs: list[Attr]; pos: tuple[int,int] @dataclass class SelfClosingTag(Lexeme): name: str; attrs: list[Attr]; pos: tuple[int,int] @dataclass class CloseTag(Lexeme): name: str; pos: tuple[int,int] @dataclass class TextNode(Lexeme): content: str @dataclass class CommentNode(Lexeme): content: str @dataclass class DoctypeDecl(Lexeme): content: str
# Verify Python version python --version # should show 3.10 or later # From the PyPageValidator/ root: python EntryPoint/page_validator.py index.html python EntryPoint/page_validator.py -r -q -s ./site
| Test module | Coverage |
|---|---|
| tags, attributes, doctypes, comments, self-closing elements | |
| tag grouping, attribute collection, case normalization | |
| valid documents, missing elements, nesting errors, void elements, duplicate IDs, unquoted attributes |
# Run one component's tests python -m unittest Tokenizer/test_tokenizer.py python -m unittest Lexer/test_lexer.py python -m unittest Validator/test_validator.py # Run all tests via discovery (from PyPageValidator/) python -m unittest discover -s . -p "test_*.py"