0.0 Prologue
Code Track is a place to experiment — with language
features, AI tools, multi-language project comparisons, and code-generation utilities.
This story is a narrative guide through those experiments, ordered from the most
hands-on (building a scratch arena with real measurement tools) through progressively
more capable AI workflows: chat bots → CLI agents → API calls → tool-using
agents → agentic pipelines → reusable skill libraries → spec-driven
development.
Why study AI-assisted code development?
-
AI tools shift the bottleneck from writing code to reviewing and guiding it.
Understanding that shift makes you a more effective user of the tools.
-
Knowing how LLMs work — message roles, context windows, tool use,
prompt caching — lets you use them reliably instead of hoping they
produce what you want.
-
Agents with tool use can read your files, run your tests, and iterate
without manual steps between each action.
-
Spec-driven development inverts the usual workflow: you write what the
code must do before the code exists, and the spec becomes the AI’s
contract.
-
Comparing the same project in Rust, C++, C#, and Python builds intuition
about what language features actually cost in code size, complexity, and
runtime performance.
The story uses two recurring example projects — TextFinder (search a directory
tree for regex matches) and PageValidator (check HTML structural correctness) —
alongside the AI/ folder demos to keep discussion grounded in real, runnable code.
0.1 Getting Started
Install these tools in order — each chapter builds on what came before.
-
VS Code with language extensions:
rust-analyzer (Rust), clangd (C++), C# Dev Kit (C#), Pylance (Python),
Error Lens and GitLens (all languages).
-
At least one language toolchain:
Rust,
a C++ compiler (MSVC, GCC, or Clang),
.NET SDK for C#,
or Python 3.
-
Git and a
GitHub account.
-
An Anthropic API key (required for chapters 4–8):
Create one at
console.anthropic.com
and set it as the environment variable ANTHROPIC_API_KEY.
-
Python package: anthropic
pip install anthropic
0.2 Story Content
The story is ordered so each chapter provides vocabulary and tools used in the next.
Start at the beginning or jump to any chapter — each is written to be
readable on its own.
-
Motivation, getting started, chapter index, and references.
-
Building a scratch arena with build chains, metrics tools, performance
timers, and visualizers.
-
Using Claude, ChatGPT, and Gemini through the browser to analyze,
generate, and document code.
-
Claude Code and Gemini CLI as terminal-based coding partners —
reading files, making changes, multi-step tasks, hooks.
-
Calling Claude directly from Python: the messages API, structured output,
streaming, and prompt caching.
-
Tool use, the agentic loop, a working file-reading agent, error handling,
and safety constraints.
-
Multi-step autonomous workflows: analyze → plan → generate → test,
chaining agent calls, human-in-the-loop checkpoints.
-
Extending agents with a reusable tool library: anatomy of a skill,
a code_metrics example, composing multiple tools in one session.
-
Using Constitution.md, Structure.md, and Spec.md to drive AI implementation;
the full workflow from spec to code to validation.
0.3 References
| Resource |
Description |
|
Anthropic Docs
|
Full API reference, model guides, and prompt engineering tips.
|
|
Claude Code
|
Anthropic’s CLI tool — install, keyboard reference, and docs.
|
|
CodeBites Introduction
|
Track page that maps the full CodeBites page sequence.
|
|
AI Links
|
Curated links to AI tools, documentation, and research.
|
1. Experimenting with Code
Before using AI to help write code you need a place to run and measure code quickly.
This chapter sets up a scratch arena — a lightweight, disposable
workspace where experiments are cheap, failure is expected, and anything worth keeping
gets promoted to a real project.
Why isolate experiments?
-
Real projects carry pressure: tests must pass, CI must be green, code
review is watching. A scratch arena has none of that friction.
-
Separating “trying something” from “building something”
keeps both cleaner — the real project stays stable while experiments
fail fast.
-
A version-controlled scratch space lets you revisit a dead end without losing
it, and compare two approaches side by side.
-
Measuring code from the start — lines, complexity, timing — builds
the habit of treating metrics as feedback, not bureaucracy.
1.1 Arena Layout
A flat directory at the root of your workspace gives each language its own folder.
The metrics/ and notes/ folders accumulate output and
observations across sessions.
sandbox/
rust/ ← cargo workspace, or cargo new per experiment
cpp/ ← single-file programs; no CMake needed for scratch
csharp/ ← dotnet new console -o scratch; reuse between runs
python/ ← flat scripts; one .venv at this level
metrics/ ← tokei and code_metrics output, saved per session
notes/ ← scratchpad.md, one entry per session
Quick project init per language:
## Rust
cargo new scratch && cd scratch
cargo run
## C++ (no project file needed for single-file experiments)
g++ -std=c++23 -Wall -o out main.cpp && ./out
## C#
dotnet new console -o scratch && cd scratch
dotnet run
## Python (Windows; use forward slash on macOS/Linux)
python -m venv .venv
.venv\Scripts\activate
python script.py
python -i script.py # run then drop into REPL with all names live
1.2 Code Metrics
Metrics tools answer “how big?” and “how complex?” before
and after a change. Run them at the start of a session to establish a baseline, and
after each significant change to see what moved.
code_metrics.py (in track) —
Reports line counts, function counts, and blank/comment ratios per file.
Good for tracking growth of a scratch project over sessions.
tokei —
Cross-language line counter; breaks down by language and file type.
Run at the sandbox root to see the whole arena at a glance.
cargo install tokei
scc (Sloc Cloc and Code) —
Like tokei but adds estimated complexity and cost columns.
Useful when comparing equivalent programs across languages.
cargo install scc
radon (Python) —
Cyclomatic complexity and maintainability index per function.
pip install radon — run: radon cc -s script.py
cargo clippy (Rust) —
Catches non-obvious mistakes and style issues that the compiler misses.
Treat warnings as a quality signal, not a failure condition.
cppcheck (C++) —
Static analysis for undefined behavior, memory issues, and style.
cppcheck --enable=all main.cpp
1.3 Performance and Timing
hyperfine —
Cross-language wall-clock benchmarker. Runs a command N times, warms the cache,
and reports mean ± stddev with outlier detection. Use it to compare the
same algorithm across languages directly.
cargo install hyperfine
hyperfine './out input.txt' 'python script.py input.txt'
tf_timer.py / pa_timer.py (in track) —
Purpose-built timing wrappers for TextFinder and PageValidator.
Model these for any experiment that needs repeated-run averaging.
Python cProfile (built-in) —
Function-level profiler; no install needed.
Pairs with snakeviz (see section 1.4) for visual output.
python -m cProfile -s cumtime script.py
cargo flamegraph (Rust) —
Generates a flame graph SVG showing where CPU time actually goes.
Requires perf (Linux) or DTrace (macOS); works in WSL on Windows.
cargo install flamegraph
BenchmarkDotNet (C#) —
Add as a NuGet package; annotate methods with [Benchmark].
Produces statistically rigorous tables with warmup and GC stats.
1.4 Visualizers
CodeWebifier (in track) —
Converts source files to syntax-highlighted HTML for display in the site.
Run it on any experiment worth keeping to publish it.
snakeviz (Python) —
Browser-based flame graph for cProfile output.
pip install snakeviz
python -m cProfile -o out.prof script.py
snakeviz out.prof # opens browser with interactive flame graph
graphviz / dot —
Renders dependency graphs, state machines, and call graphs as SVG.
Most build tools can emit .dot format (cargo metadata, doxygen).
winget install graphviz
cargo doc --open (Rust) —
Not just for publishing — generated doc pages are the fastest way to browse
the public API of any crate you add to an experiment.
VS Code extensions worth installing for the arena:
- Error Lens — inline error messages next to the offending line
- GitLens — last-edit blame inline; useful for tracking what changed in scratch
- CodeMetrics (kisstkondoros) — per-function complexity score in the editor gutter
- Rust Analyzer — essential for Rust; includes inlay type hints
- clangd — C++ language server with inline diagnostics and completions
1.5 A Typical Session Flow
A short ritual at the start and end of each session keeps the arena useful:
- Create or reset the scratch project for today’s language.
- Write the simplest version that compiles and runs (15–20 lines max).
- Run metrics:
tokei and code_metrics.py for a
baseline snapshot.
- Iterate: change one thing, re-run, compare metrics and timing.
- If the result is worth keeping, run CodeWebifier and move it to a named
folder; otherwise delete and move on.
- Add one line to
notes/scratchpad.md: what you tried, what
you learned.
1.6 References
| Resource |
Description |
|
hyperfine
|
Cross-platform benchmarking tool for command-line programs. |
|
tokei
|
Fast, accurate code line counter with language breakdown. |
|
scc
|
Line counter with complexity and estimated cost columns. |
|
radon
|
Python complexity metrics: cyclomatic complexity, maintainability index. |
|
cargo flamegraph
|
Flame graph profiler for Rust programs. |
|
BenchmarkDotNet
|
Rigorous microbenchmark framework for .NET. |
|
snakeviz
|
Browser-based viewer for Python cProfile output. |
2. Chat Bots
A chat bot session is the simplest form of AI-assisted development: open a browser,
describe a problem in plain language, and get a response. No API key, no install,
no code required. This chapter covers what chat bots are good for, how to prompt
them effectively for code tasks, and when to move to a more capable tool.
Why start with chat bots?
- Zero setup — the fastest path from a question to an answer.
- Good for understanding an unfamiliar API, pattern, or error message
before writing any code.
- Prompting for a chat bot and prompting for the API use the same vocabulary
— roles, context, constraints — so skills transfer directly.
- The limitations of a chat session (no file access, fixed context window,
no tool use) make the step up to a CLI agent feel motivated rather than
arbitrary.
2.1 Prompting for Code Analysis
Give the bot the code, then ask a specific question. Vague prompts get vague answers.
Effective patterns:
- “What does this function do? Focus on the return value.”
- “What would happen if I passed an empty slice to this function?”
- “Explain the ownership rules that apply to this block.”
- “What is the time complexity of this algorithm and why?”
2.2 Prompting for Code Generation
State the function signature, the inputs, the expected output, and any constraints.
The more precise the spec in the prompt, the less revision the output needs.
- Always specify the language and version (e.g., “Python 3.12”, “C++23”, “Rust 2021 edition”).
- Include one concrete example: input → expected output.
- State what the code must NOT do (allocate, panic, use unsafe, etc.).
2.3 Prompting for Documentation
Chat bots produce good first-draft documentation when given the code and the audience.
- “Write a one-paragraph description of this module for a README.”
- “Write a doc comment for each public function in this file.”
- “Summarize this diff in three sentences for a commit message.”
2.4 Limitations and When to Move On
Know when to switch tools:
- Context window fills up — the bot forgets earlier parts of a long conversation.
- No file access — you must paste code manually; large codebases are impractical.
- No tool use — it can suggest a command but cannot run it or read the output.
- Hallucination risk — always run generated code before trusting it.
When you hit these limits, move to a CLI agent (Chapter 3) or the API (Chapter 4).
2.5 References
| Resource | Description |
| Claude |
Anthropic’s chat interface. Best for long-context code tasks. |
| ChatGPT |
OpenAI’s chat interface. Large model family with code interpreter. |
| Gemini |
Google’s chat interface. Strong at multi-modal and search-grounded tasks. |
| CodeBites: Chat Bots |
Track page with example sessions and prompt templates. |
3. Code AI CLI
A Code AI CLI runs in your terminal alongside your code. It can read your files,
write changes, run commands, and navigate your repository — all without leaving
the shell. This chapter covers Claude Code and Gemini CLI as terminal-based
coding partners.
Why the CLI over the browser?
- The CLI has direct access to your filesystem — no pasting required.
- It can run shell commands and read their output, closing the loop between
code and behavior.
- Multi-step tasks (read file, analyze, edit, run tests) happen in a single
session without context loss.
- Hooks let you automate recurring actions: run linters before each edit,
show a summary when the session ends.
3.1 Starting a Session
Run claude in the root of your project. On the first run in a new
repository, use /init to generate a CLAUDE.md file
describing the project structure. Subsequent sessions load that file automatically,
giving the model context without you re-explaining it each time.
3.2 Making Changes
Describe what you want in plain language. The CLI reads relevant files, proposes
a diff, and waits for your approval. You see every change before it lands.
Use /diff to review pending edits at any point in the session.
3.3 Multi-Step Tasks
Compound tasks work well because the CLI maintains context across steps.
Example session sequence:
- Read
src/main.rs and summarize the data flow.
- Identify the three functions with the highest cyclomatic complexity.
- Refactor the most complex function into two smaller ones.
- Run
cargo test and fix any failures.
Each step uses output from the previous one. The context window is the limit —
for very large refactors, split into smaller tasks.
3.4 Hooks and Automation
Claude Code hooks run shell commands automatically on events such as tool calls,
session start, and session end. Configure them in
.claude/settings.json under the hooks key. Useful hooks:
- Run
cargo clippy after every file edit
- Display a summary of changed files when the session closes
- Block writes outside an allowed directory whitelist
3.5 References
| Resource | Description |
| Claude Code |
Anthropic’s CLI tool — install, docs, and keyboard reference. |
| Gemini CLI |
Google’s terminal AI tool, open source. |
| CodeBites: Code AI |
Track page with recorded CLI sessions and technique notes. |
4. LLM API
Calling Claude directly from code gives you full programmatic control: you choose
the model, the system prompt, the context, the output format, and whether to stream.
This chapter covers the Anthropic Python SDK from a minimal completion through
structured output, streaming, and prompt caching.
Why write code that calls the API?
- You can embed AI calls inside larger programs: scripts, batch processors,
analysis pipelines.
- You control the system prompt precisely — the AI’s persona,
constraints, and output format are not left to the chat interface defaults.
- Structured output (JSON) lets you parse and act on AI responses
programmatically.
- Prompt caching reduces cost and latency when the same large context is
reused across many calls.
4.1 A Minimal Completion
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Explain Rust ownership in three sentences."
}]
)
print(response.content[0].text)
4.2 Structured Output
Put the JSON schema in the system prompt; the model returns JSON you can
parse directly. Validate with pydantic for robustness.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
system='Respond only with valid JSON matching {"summary": str, "complexity": int}.',
messages=[{"role": "user", "content": code_text}]
)
import json
result = json.loads(response.content[0].text)
4.3 Streaming
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
4.4 Prompt Caching
Mark large, reused context blocks with cache_control to avoid
re-processing them on every call. Cached tokens cost roughly 10× less and
return faster. Useful when every call in a session loads the same large file.
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": large_context,
"cache_control": {"type": "ephemeral"}
},
{"type": "text", "text": "Summarize the above."}
]
}]
4.5 References
5. Agent AI
An agent is a program that calls an LLM in a loop, giving the model access to
tools (functions it can invoke), and continuing until the model decides
it has finished. This chapter covers the tool-use pattern, the agentic loop,
a working file-reading agent, and safety constraints.
Why agents instead of single calls?
- A single call cannot react to its own output. An agent can read a file,
see what’s in it, and decide what to read next.
- Tool use turns the model into an orchestrator: it plans, delegates to
tools, and synthesizes results — all without your intervention.
- Agents can retry failed steps, ask clarifying questions, and handle
unexpected inputs gracefully.
5.1 Defining a Tool
tools = [{
"name": "read_file",
"description": "Read a source file and return its contents as a string.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Relative path to the file"}
},
"required": ["path"]
}
}]
5.2 The Tool Loop
import anthropic, pathlib
client = anthropic.Anthropic()
messages = [{"role": "user", "content": "Summarize the file main.py."}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-6", max_tokens=2048,
tools=tools, messages=messages
)
if resp.stop_reason == "end_turn":
print(resp.content[0].text)
break
for block in resp.content:
if block.type == "tool_use":
text = pathlib.Path(block.input["path"]).read_text()
messages += [
{"role": "assistant", "content": resp.content},
{"role": "user", "content": [{
"type": "tool_result",
"tool_use_id": block.id,
"content": text
}]}
]
5.3 Safety Constraints
An unconstrained agent can do more than intended. Useful guards:
- Cap iterations:
for _ in range(MAX_STEPS) — raise an error if exceeded
- Whitelist allowed paths: reject tool calls outside the sandbox directory
- Separate read tools from write tools and require explicit user confirmation
before any write tool runs
- Log every tool call and its result to a file for post-session review
5.4 References
| Resource | Description |
|
Tool Use Docs
|
Anthropic’s guide to defining and using tools with the Claude API. |
| CodeBites: Agent AI |
Track page with agent demos and design notes. |
6. Agentic AI
An agentic workflow chains multiple agent calls together, with the output of one
becoming the input of the next. Each call can have a different focus: analyze,
plan, generate, test. This chapter covers how to structure multi-step workflows,
how to pass state between calls, and where to add human-in-the-loop checkpoints.
When does autonomy help?
- When the task has multiple clearly-ordered steps and each step’s
output is the next step’s input.
- When you want repeatability — the same workflow produces the same
kind of output regardless of who runs it.
- When the individual steps are too small to justify a full CLI session
but too many to do manually each time.
6.1 Analyze → Plan → Generate → Test
A four-call workflow for producing a new module:
- Analyze — Read existing code; produce a JSON summary
of types, functions, and dependencies.
- Plan — Feed the summary to a second call; ask for a
numbered implementation plan as JSON.
- Generate — Feed the plan to a third call; ask for
source code, one file at a time.
- Test — Run the generated code, capture stdout/stderr,
feed errors back into a fourth call for a fix.
6.2 Passing State Between Calls
Use plain Python data structures to carry results forward. A dict
or dataclass per step is enough for most workflows. For long-running
pipelines, serialize to a JSON file so a failed step can be retried without
re-running the earlier ones.
6.3 Human-in-the-Loop Checkpoints
Pause after high-risk steps for user confirmation. A simple
input("Continue? [y/N] ") is enough for a personal script.
For production workflows, write the plan to a file and require an explicit
approval file before the generate step runs.
6.4 References
7. Skills AI
A skill is a reusable, named tool definition. Building a skill library
means you write a tool once and use it across many agents and sessions.
This chapter covers the anatomy of a skill, a code_metrics example,
and composing multiple skills in a single agent call.
Why build a skill library?
- Copy-pasting tool definitions into every script creates maintenance debt.
A shared library means a fix reaches every agent at once.
- Named, well-described skills are self-documenting — the model reads
the description and knows what the tool does without explanation in the prompt.
- A library of composable skills lets you assemble new agents quickly from
existing parts.
7.1 Anatomy of a Skill
def skill_code_metrics(path: str) -> dict:
"""Count lines, functions, and blank lines in a Python source file.
Returns {"lines": int, "functions": int, "blanks": int}.
"""
import ast, pathlib
src = pathlib.Path(path).read_text()
tree = ast.parse(src)
functions = sum(1 for n in ast.walk(tree) if isinstance(n, ast.FunctionDef))
lines = src.count("\n")
blanks = sum(1 for ln in src.splitlines() if not ln.strip())
return {"lines": lines, "functions": functions, "blanks": blanks}
SKILL_CODE_METRICS = {
"name": "code_metrics",
"description": skill_code_metrics.__doc__,
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}
}
7.2 Composing Skills
Pass a list of skill definitions to a single agent call. The model chooses which
tools to invoke and in what order. Keep each skill focused on one action —
composition happens at the agent level, not inside a skill.
all_skills = [SKILL_READ_FILE, SKILL_CODE_METRICS, SKILL_LIST_DIR]
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=all_skills,
messages=[{
"role": "user",
"content": "Report metrics for every .py file in src/."
}]
)
7.3 References
8. Spec-Driven Development
Spec-driven development reverses the usual order: you write what the software must do
before you write any code, then hand those documents to an AI to drive implementation.
This chapter covers three spec files — Constitution.md, Structure.md, and Spec.md
— and the workflow that uses them.
Why specs before prompts?
- A prompt without a spec is ambiguous. A spec makes the constraints
explicit and auditable before the AI writes a line of code.
- The spec becomes the acceptance criterion: generated code that violates
the spec is wrong by definition, regardless of whether it compiles.
- Separating “what must be true” (Constitution) from
“how it is organized” (Structure) from “what each piece
does” (Spec) keeps each document small and focused.
- Specs are reusable — the same Constitution can govern implementations
in multiple languages.
8.1 Constitution.md
States values and hard constraints that apply to all generated code. Written in
plain imperative sentences; one page maximum. Examples:
- Never use
unwrap() in library code.
- All public functions must have a doc comment.
- No external dependencies beyond the standard library.
- Functions must not exceed 30 lines.
8.2 Structure.md
Defines the package layout, file names, module boundaries, and dependency rules.
The AI reads this before generating any file so it knows where each piece belongs.
## Package Layout
src/
lib.rs -- public API re-exports only
config.rs -- configuration types; no I/O
scanner.rs -- directory traversal; depends on config only
matcher.rs -- regex matching; depends on config only
reporter.rs -- output formatting; depends on scanner and matcher
main.rs -- entry point; depends on all others
## Rules
- scanner.rs must not import from matcher.rs
- reporter.rs must not perform I/O beyond writing to a provided Writer
8.3 Spec.md
Documents each public function or type: signature, preconditions, postconditions,
and one example. Written before implementation; updated when requirements change.
## fn scan(root: &Path, config: &Config) -> Result<Vec<Match>>
- root must exist and be a directory; returns Err otherwise
- traverses root recursively, following symlinks if config.follow_symlinks
- returns all Match records where config.pattern matches file content
- excludes files whose extension is not in config.extensions
- example: scan(Path::new("src"), &cfg) -> Ok(vec![Match{path: ..., line: 3}])
8.4 The Workflow
- Write Constitution.md — values and constraints, no code yet.
- Write Structure.md — package layout and dependency rules.
- Write Spec.md — signatures and contracts for each public item.
- Start a CLI session or API call; load all three files as context.
- Ask the AI to implement one file at a time, citing the spec for each function.
- Run tests after each file; feed failures back into the session.
- After all files pass, ask the AI to verify the implementation against
Constitution.md and report any violations.
8.5 Epilogue — Connecting to the SWDev Track
Spec-driven development is the Code Track’s answer to the SWDev track’s
design chapter: Constitution maps to architectural constraints, Structure maps to
package design, and Spec maps to specification. The difference is the AI is now
the implementer. For a deeper treatment of the design concepts behind these
documents, see the
SWDev Story: Software Design
chapter.
8.6 References