AI Story: Messages API

5. Messages API

The Messages API is the primary way to call an LLM from code. Every interaction — single-turn completions, multi-turn conversations, tool-using agents — goes through the same endpoint. Knowing the exact shape of the request and response makes debugging straightforward.

5.1 Minimal Request

The four required fields are model, max_tokens, messages, and at least one message with role and content.

import anthropic

client = anthropic.Anthropic()   # reads ANTHROPIC_API_KEY from env

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is a transformer architecture?"}
    ]
)
print(response.content[0].text)

5.2 Request Parameters

model — exact model ID string (pin to a version in production).
max_tokens — maximum output tokens. The model may produce fewer; it will never produce more.
messages — list of {role, content} objects, alternating user/assistant.
system — system prompt string (optional). Sent separately from messages.
temperature — 0.0–1.0. Lower values produce more deterministic output; higher values more varied. Default is 1.0.
top_p — nucleus sampling threshold. Rarely needed; adjust temperature instead.
stop_sequences — list of strings that cause the model to stop generating when encountered.
tools — tool definitions for function calling (Chapter 9).

5.3 Response Structure

The response object has these key fields:

response.id              # unique message ID
response.model           # model that generated the response
response.stop_reason     # "end_turn" | "max_tokens" | "stop_sequence" | "tool_use"
response.content         # list of content blocks
response.usage.input_tokens
response.usage.output_tokens

# Extract text from the first content block:
text = response.content[0].text

Always check stop_reason. If it is "max_tokens" the response was truncated — the answer is incomplete. Raise max_tokens or break the task into smaller calls.

5.4 Multi-Turn Conversations

There is no session state on the server. You build a conversation by appending previous turns to the messages list on each call.

messages = []

def chat(user_text):
    messages.append({"role": "user", "content": user_text})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=messages
    )
    assistant_text = response.content[0].text
    messages.append({"role": "assistant", "content": assistant_text})
    return assistant_text

print(chat("What is a monad?"))
print(chat("Give me a concrete example in Python."))

The messages list grows with every turn. Long conversations eventually approach the context limit. Production applications summarise or truncate old turns to stay within budget.

5.5 References

Resource	Description
Messages API Reference	Complete request and response schema.
anthropic-sdk-python	Python SDK source code and examples.
Next: Structured Output	Getting reliable JSON and typed data from the model.