AI Story: Models

4. Models

Each AI provider maintains a family of models that trade off capability, speed, and cost. Choosing the right model matters: the most capable model is not always the right one, and the cheapest is not always adequate. This chapter surveys the major families and gives a framework for selection.

4.1 Claude (Anthropic)

Anthropic’s Claude 4.x series (mid-2026) has three tiers:

Claude Haiku 4.5 — fastest and cheapest. Good for classification, summarisation, extraction, and high-volume tasks where latency matters. 200K context.
Claude Sonnet 4.6 — the daily workhorse. Strong code generation, analysis, and multi-step reasoning at moderate cost and latency. 200K context, 64K output.
Claude Opus 4.7 — highest capability. Best for complex reasoning, research synthesis, and agentic tasks that require sustained multi-step planning. Higher cost and latency.

Claude models are particularly strong at following system-prompt instructions precisely, tool use, and long-context tasks. They are the primary models used in the code examples throughout this story.

4.2 GPT (OpenAI)

OpenAI’s GPT-4o family offers strong general-purpose capabilities with a 128K context window. The o-series models (o1, o3) add extended “thinking” steps for problems that require systematic reasoning. The GPT family has the broadest ecosystem of third-party integrations and the most documentation in training data, which can make it the default choice for many web developers.

4.3 Gemini (Google)

Google’s Gemini 1.5/2.0 series is notable for very large context windows (up to 1M tokens), strong multi-modal support (text, image, audio, video), and deep integration with Google Workspace and Search. Gemini Flash models are among the fastest and cheapest available for high-throughput tasks.

4.4 Selection Criteria

Pick a model by matching its profile to your task:

Task complexity — multi-step reasoning and planning benefit from the most capable tier; classification and extraction work fine on the smallest tier.
Context size — if your prompt regularly exceeds 100K tokens (large codebases, many documents), check that the model’s window is large enough.
Latency — user-facing applications need fast responses. Use a small model and cache stable context aggressively.
Cost — batch processing at scale makes model cost dominant. Profile token usage before committing to a tier.
Tool use quality — not all models handle the request/tool-use/tool-result loop equally well. Test with your actual tool schemas; behaviour varies.
Consistency — pin to a specific model version (e.g., claude-sonnet-4-6) in production. “Latest” aliases can change output characteristics on update.

4.5 References

Resource	Description
Claude Models	Current Claude model IDs, context windows, and pricing.
OpenAI Models	GPT and o-series model reference.
Gemini Models	Gemini model reference including context limits and pricing.
Next: Messages API	The request/response structure for calling a model from code.