11. Reliability
A working prototype that calls the API once is not a reliable application.
Production AI systems must handle transient failures, respect rate limits,
stay within token budgets, validate unpredictable model output, and keep
costs observable. This chapter covers the operational layer that makes
an AI application trustworthy enough to ship.
11.1 Error Types
Anthropic’s Python SDK raises typed exceptions:
anthropic.RateLimitError (429) — too many requests per minute or too many tokens per minute.
anthropic.APIStatusError (5xx) — server-side transient error; safe to retry.
anthropic.APIConnectionError — network failure; retry with backoff.
anthropic.BadRequestError (400) — invalid request (malformed messages, invalid model ID); do not retry without fixing the request.
anthropic.AuthenticationError (401) — invalid or missing API key; do not retry.
11.2 Retry with Exponential Backoff
The SDK has built-in retry logic, but for agentic loops you often need more
control. A simple backoff wrapper:
import time, anthropic
def call_with_retry(client, max_retries=5, **kwargs):
delay = 1.0
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except anthropic.RateLimitError:
if attempt == max_retries - 1:
raise
time.sleep(delay)
delay *= 2
except anthropic.APIStatusError as e:
if e.status_code >= 500 and attempt < max_retries - 1:
time.sleep(delay)
delay *= 2
else:
raise
11.3 Token Budget Management
Track cumulative token usage across an agent session to avoid surprise costs
and to detect runaway loops early:
class TokenBudget:
def __init__(self, max_tokens):
self.max_tokens = max_tokens
self.used = 0
def record(self, usage):
self.used += usage.input_tokens + usage.output_tokens
def check(self):
if self.used >= self.max_tokens:
raise RuntimeError(
f"Token budget exhausted: {self.used}/{self.max_tokens}"
)
budget = TokenBudget(max_tokens=200_000)
response = client.messages.create(...)
budget.record(response.usage)
budget.check() # raises if over budget
11.4 Output Validation
Never pass model output directly to security-sensitive code paths. The model
can produce plausible-looking but wrong, incomplete, or malicious content.
- Parse structured output with Pydantic (Chapter 6) and reject invalid shapes.
- Check
stop_reason == "end_turn" before trusting a response
as complete.
- For generated code: run it in a subprocess with a timeout and resource
limits; do not
exec() it directly.
- For file path arguments from model output: validate against an allowlist
of directories before passing to filesystem APIs.
11.5 Cost Monitoring
Build token accounting into your logging from day one:
import logging
logger = logging.getLogger("ai_app")
def log_usage(model, usage, task_name=""):
logger.info(
"api_call model=%s task=%s "
"in=%d out=%d cache_write=%d cache_read=%d",
model, task_name,
usage.input_tokens,
usage.output_tokens,
getattr(usage, "cache_creation_input_tokens", 0),
getattr(usage, "cache_read_input_tokens", 0)
)
Aggregate logs in a dashboard or spreadsheet. Unexpected spikes in input
tokens usually mean a conversation history is growing unchecked. Unexpected
spikes in output tokens usually mean max_tokens is too low and
the model is being cut off, causing retry loops.
11.6 References