Claude Architecture Explained: A Deep Dive into Server Components, Client Interfaces, Harnesses, and Agent Systems

Q: 5. If I use the Raw Messages API with tools, isn't that already an agent?

Not by itself. The API lets the model request a tool by returning a structured tool_use block, but it never executes the tool. You must run the tool, return the result, and loop. Building that loop and execution layer is what turns raw API access into an agent.

Claude Architecture showing server components, client interfaces, harness orchestration, and AI agent systems. — A deep dive into Claude Architecture covering server components, client interfaces, harnesses, and agent systems.

Table of Contents

Claude Architecture Introduction

If you have used Claude through the chat website, automated a coding task in your terminal, or called the API directly from a Python script, you have touched three very different pieces of software — but only one of them was actually Claude the model. The rest was scaffolding.

That distinction is the single most useful thing a developer can internalize before building anything serious with Claude. The model is the intelligence. Everything around it — the loop that lets it call tools, the system that feeds it files, the gate that asks you before it deletes something — is orchestration. Conflating the two leads to predictable mistakes: expecting the raw API to “use tools by itself,” assuming an agent’s reliability comes from the model alone, or picking the wrong product for the job and then fighting it for weeks.

This article maps the whole ecosystem. We will separate the one server-side component (the model) from the client-side components you actually build and interact with (Claude.ai, Claude Code, the Claude Agent SDK, and the Raw Messages API). We will define the term harness precisely, because it is the concept that makes everything else click. And we will work through a concrete principle that you can carry into every design decision:

Agent = Model + Harness.

A note on terminology. “Server component” and “client component” are not official Anthropic labels — they are a lens this article uses to separate intelligence that only runs on Anthropic’s infrastructure (the model) from the access and orchestration layers a developer works with (the harnesses and APIs). “Client component” here means “the layer you build against,” not literally browser-side code. Claude.ai, for example, is hosted by Anthropic even though it sits on the “client” side of this mental model. Keep that in mind whenever the words server and client appear below.

What Is Claude?

The Difference Between a Model and a Product

“Claude” is overloaded. It refers to both a family of large language models and the products built on top of them. When someone says “I asked Claude,” they usually mean a product — the chat app or a coding tool. When an engineer says “the model returned a tool-use block,” they mean the model.

The model (for example, Claude Opus 4.8, Claude Sonnet 4.6, or Claude Haiku 4.5) is a neural network hosted on Anthropic’s servers. You never run it on your own hardware. You send it text (and sometimes images or documents), and it sends text back. It has no memory of previous requests, no ability to open a file on its own, and no way to run a command. It is, in the most literal sense, a function: text in, text out.

A product is software wrapped around that function to make it useful for a specific job. Claude.ai is a product. Claude Code is a product. Your own application calling the API is, effectively, a product you built.

Why Developers Often Confuse the Two

The confusion is natural, because the products are good enough that the seams disappear. When Claude Code reads ten files, edits three, runs your test suite, and reports back, it feels like the model did all of that. It did not. The model decided what to do at each step; a surrounding program actually read the files, ran the tests, and fed the results back into the model’s next request.

Holding this line — model decides, harness executes — prevents most architectural mistakes. The rest of this article is, in a sense, just elaborations of that one idea.

The Server Components of Claude Architecture

The Claude Model

There is really only one server-side component that matters here: the model itself. It lives on Anthropic’s infrastructure, behind an API. You access it; you do not host it.

Role of the Model

The model’s job is reasoning and generation. Given a context window full of text — instructions, conversation history, file contents, tool definitions, tool results — it predicts the most appropriate next output. That output might be a paragraph of prose, a block of code, or a structured request to use a tool (more on that shortly).

Crucially, the model is stateless. Each request is independent. If you want it to remember what was said three messages ago, you must include those three messages in the next request. The model does not retain anything between calls. This single property is the reason harnesses exist.

What the Model Does and Does Not Do

In the Claude architecture, what the model does: reason over the text it is given, write and analyze code, follow instructions, and emit structured tool-use requests when it is told tools are available.

The model does not: remember past conversations on its own, read your filesystem, execute code, call an external API, or take any action in the world. When the model “decides to run a test,” what actually happens is that it outputs a structured message saying “call the tool named run_tests“ — and some other piece of software has to notice that, actually run the tests, and hand back the results. If no such software exists, nothing happens. The model’s request just sits there as text.

That gap between deciding and doing is exactly the space a harness fills.

The Client Components of Claude Architecture

Overview of Client-Side Interfaces

Everything you build against or interact with to use the model is, in our framing, a client component. There are four that matter:

Claude.ai — the managed chat product, hosted by Anthropic.
Claude Code — the agentic coding tool that runs in your terminal or IDE.
The Claude Agent SDK (formerly the Claude Code SDK) — a library that exposes Claude Code’s machinery to your own programs.
The Raw Messages API + Client SDKs — the lowest-level, direct access to the model.

They form a spectrum from fully managed (Claude.ai does everything for you) to fully manual (the Raw API gives you the model and nothing else).

Overview diagram of Claude client-side interfaces including Claude.ai, Claude Code, Claude Agent SDK, and MCP connections. — Overview of the four major client-side interfaces in the Claude architecture ecosystem.

How Client Components Interact with the Model

All four ultimately talk to the same model over the same kind of API call. The difference is how much they do around that call. Claude.ai wraps the model in a complete, polished application. Claude Code wraps it in an agent loop tuned for software work. The Agent SDK hands you that loop as code. The Raw API hands you the bare model and lets you build whatever you want — or nothing.

Claude.ai (Managed Consumer Harness)

What Claude.ai Is

Claude.ai is the web and mobile chat application. For most people, it is Claude. From an architecture standpoint, it is a fully managed harness: Anthropic runs every part of it, you supply only your messages, and you customize almost nothing about its internal behavior.

Features Provided by the Harness

Even though it presents as a simple chat box, Claude.ai is doing substantial orchestration on your behalf.

Context management. The app decides what goes into each request to the model — your latest message, relevant earlier turns, system instructions, and any uploaded content. As conversations grow long, it manages the limited context window so the most relevant information stays in view. You never see this; it just works.

File handling. When you drop in a PDF, spreadsheet, or image, Claude.ai parses it, extracts the relevant content, and formats it into something the model can consume. The model cannot open files; the harness does the opening.

Tool usage. Features like web search, code execution, and document creation are tools the harness wires up. The model requests them; Claude.ai runs them and returns results. Again, the model only ever asks — the app acts.

Conversation management. Saving threads, letting you switch models mid-conversation, applying your saved preferences — all of this is harness behavior layered on top of the stateless model.

Claude.ai is the right reference point for understanding what a harness is, precisely because it hides the harness so completely.

Understanding the Concept of a Harness

Definition of a Harness

A harness is the software layer that surrounds a model and turns it from a text predictor into something that can carry out multi-step work. It supplies everything the model lacks on its own: a loop, the ability to execute the model’s requested actions, management of what the model sees, controls on what it is allowed to do, persistence across steps, and a way for a human to interact.

Why Models Need Harnesses

Because the model is stateless and inert, a single API call can only ever do one thing: produce one response. Real tasks — “fix this failing test,” “research these five companies,” “refactor this module” — require many steps, where each step depends on the result of the last. Something has to run the model repeatedly, carry results forward, and execute the actions the model asks for. That something is the harness.

Intelligence vs. Execution in Claude Architecture

This is the cleanest way to draw the line:

Intelligence = the model. It decides.
Execution = the harness. It acts.

A brilliant model with no harness can think but cannot do. A sophisticated harness with no model can do but cannot think. You need both, and they are genuinely separate components.

Real-World Analogy

Think of a master chef and a kitchen. The chef (the model) has the skill and judgment to produce an extraordinary meal. But the chef cannot cook in an empty room. The kitchen (the harness) supplies the stove, the knives, the pantry, the order tickets coming in, and the rule that nobody touches the deep fryer without checking first. The chef decides what to cook and how; the kitchen makes it possible to actually cook it, repeatedly, safely, for a full dinner service.

Swap the chef for a less skilled one and the kitchen still works — the food is just worse. Swap the kitchen for a campfire and even a great chef is limited. Intelligence and execution are independent, and you can upgrade either one separately.

Agent Systems Explained

What Is an AI Agent?

An AI agent is a system that pursues a goal over multiple steps, deciding its own next action at each step and acting on the world, until the goal is met or it stops. The defining feature is the loop: it does not just answer once; it observes, decides, acts, observes the result, and repeats.

The Formula: Agent = Model + Harness

Strip away the buzzwords and an agent is exactly this:

Agent = Model (decides) + Harness (loops, executes, manages, gates)

There is no third magic ingredient. “Agentic” capability is not something baked into the model alone — it emerges when a harness wraps the model in a loop and gives it tools. This is why the same underlying model can power a simple chatbot and a sophisticated autonomous agent: the difference is the harness around it.

Components of an Agent System

Tool use. Tools are functions the model can request — read a file, run a command, query a database, call an API. The model emits a structured request naming the tool and its arguments; the harness executes it and feeds the result back. Without a harness to execute them, tool definitions are just descriptions the model can talk about but never invoke.

Memory. Because the model is stateless, the harness maintains memory — conversation history, intermediate results, longer-term notes written to disk. The agent loop’s feedback (gather context → act → verify → repeat) only works because the harness carries state from one turn to the next.

Context management. The context window is finite. As an agent works, it accumulates far more information than will fit. The harness decides what to keep in view, what to summarize, what to offload to files, and what to discard.

Permission systems. An agent that can run shell commands and edit files can also do damage. The harness enforces what the agent may do automatically, what requires human approval, and what is forbidden outright.

Claude Code

Overview

Claude Code is Anthropic’s agentic coding tool in Claude architecture. It runs in your terminal, IDE, or desktop app, on your machine, with access to your files and shell. It is a complete, opinionated harness purpose-built for software engineering, with a human (you) steering.

Built-In Agent Loop

Claude Code’s core is a tuned agent loop: it gathers context from your codebase, decides on an action, executes it, verifies the result, and repeats until the task is done. You do not implement this loop — it ships inside the tool.

Tool Integration

It comes with the tools a programmer uses every day already wired up: reading files, searching the codebase, writing and editing files, running shell commands, fetching web pages, and more. The design principle is straightforward — give Claude the same tools a human developer uses, and it can work the way a developer works. It also speaks the Model Context Protocol (MCP), so you can plug in additional external tools.

Context Management

Claude Code reads relevant files into context, supports project-level configuration (such as a CLAUDE.md file loaded into every session to give the agent standing instructions about your codebase), and manages the context window as work accumulates.

Permission Gates

By default, Claude Code asks before doing anything consequential — before editing files, before running commands. These permission gates are a harness feature, not a model feature. The model would happily request a destructive command; the harness is what pauses and asks you first.

Human-in-the-Loop Operation

Claude Code is built around a human actively steering the loop. You watch it work, approve risky steps, redirect it when it goes sideways, and stop it when needed. This makes it ideal for interactive development, where judgment and oversight matter.

Example Workflow

You type: “The login endpoint returns a 500 when the email field is empty. Find and fix it.” Claude Code searches the codebase for the endpoint, reads the handler, identifies the missing validation, proposes an edit, asks your permission to apply it, runs the test suite to confirm the fix, and reports back. At each side-effectful step, it paused for you. That pausing is the harness; the diagnosis was the model.

The Claude Agent SDK (Formerly the Claude Code SDK)

What It Is

In Claude architecture, the Claude Agent SDK is a Python and TypeScript library for building your own agents. It was originally released as the “Claude Code SDK” and renamed to “Claude Agent SDK” to reflect that the same machinery powering a coding tool can power agents of any kind — finance assistants, customer-support bots, research agents, and more. If you find older tutorials referencing the “Claude Code SDK,” they are describing this same library under its previous name; the package is now published as @anthropic-ai/claude-agent-sdk (TypeScript) and claude-agent-sdk (Python).

Relationship to Claude Code

This is the key fact, and it is literally true rather than a metaphor: the Agent SDK is the same harness that powers Claude Code, exposed as a library. It is not a thin wrapper around the Messages API, and it is not a separate re-implementation. Under the hood it runs the Claude Code engine in your process. The same agent loop, the same built-in tools (Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch), the same context management — but driven by your code instead of a human at a terminal.

Using the Harness as a Library

The primary entry point is the query() function, an async generator. You give it a prompt and an options object; it runs the agent loop and streams back typed messages — tool calls, tool results, and a final result — until the task completes. You configure behavior through options: which tools are auto-approved (allowed_tools / allowedTools), which are always denied, the permission mode, the model, a hard cap on turns (max_turns) to prevent runaway loops, and the working directory.

A minimal Python example:

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions

async def main():
    async for message in query(
        prompt="What files are in this directory?",
        options=ClaudeAgentOptions(allowed_tools=["Bash", "Glob"]),
    ):
        if hasattr(message, "result"):
            print(message.result)

asyncio.run(main())

Application-Controlled Agent Workflows

The reason to reach for the SDK rather than Claude Code is that your application becomes the driver. You can run the agent unattended in a CI pipeline, embed it in a backend service, define subagents (specialized child agents with their own restricted tools and context), install hooks that fire on every tool call for logging or approval, and resume sessions across multiple calls for long-running workflows. You get programmatic, code-level control while inheriting all of Claude Code’s hard-won engineering.

Example Claude Architecture

Imagine a security-review service. An HTTP request arrives with a repository URL. Your service calls query() with a prompt to review the code, an allowedTools list limited to read-only tools plus the Task tool, and a security-reviewer subagent definition. The SDK runs the loop, the subagent reads the code in its own isolated context, returns findings, and your service formats them into a JSON response. You wrote the service; the SDK was the agent engine inside it.

The Raw Messages API and Client SDK

What the Raw API Provides

The Messages API is the most direct way to use the model. The official client SDKs (anthropic for Python, @anthropic-ai/sdk for TypeScript) are thin, convenient wrappers around HTTP calls to that API. You send a list of messages; you get one response back. That is the entire contract.

Why It Is Not a Harness

The Raw API gives you the model and nothing else. There is no agent loop, no built-in tool execution, no context management, no permission system, no memory. It is a single request/response. If you want any of those capabilities, you build them.

This is the most misunderstood layer. The API supports tool use — you can describe tools in your request, and the model will respond with a structured tool_use block when it wants to call one. But the API will not execute anything. It hands you a request like “call get_weather with city=Paris“ and stops. Running that function, capturing the result, and sending it back is entirely your responsibility.

Manual Tool Loop Implementation

A tool-use loop on the Raw API looks like this, conceptually:

Send the conversation plus your tool definitions to the model.
The model responds. If it returns a tool_use block, the loop continues; if it returns a final text answer, you are done.
You execute the requested tool in your own code.
You append a tool_result message containing the output.
You send the whole updated conversation back to the model.
Repeat from step 2 until the model stops requesting tools.

Every one of those numbered steps that says “you” is work the higher-level products did for you automatically.

Building Your Own Agent Infrastructure

To build an agent on the Raw API, you must implement: the loop, the tool registry and execution, error handling and retries, context-window management as the conversation grows, any permission or safety gates, and any memory or persistence. This is real engineering, and it is easy to get subtly wrong.

Responsibilities Shifted to the Developer

The trade is total control for total responsibility. Nothing is hidden, nothing is opinionated, nothing constrains you — and nothing helps you. You choose this layer when you genuinely need that control, and you avoid it when you would just be rebuilding the Agent SDK badly.

Claude Agent SDK vs. Raw Messages API

Feature Comparison Table

Feature	Claude.ai	Claude Code	Claude Agent SDK (formerly Claude Code SDK)	Raw Messages API + Client SDK
Who runs it	Anthropic (hosted web/mobile app)	You (terminal / IDE / desktop)	You (your application process)	You (your application process)
Agent loop	Built in, fully managed	Built in, tuned for coding	Built in, exposed via `query()`	None — you build it
Tool execution	Handled by the app	Handled by the tool	Handled by the SDK (built-in tools included)	You implement every tool
Context management	Automatic, invisible	Automatic, configurable	Automatic, configurable	You manage it manually
Permission controls	N/A (consumer app)	Built-in approval gates	Configurable: allow/deny lists, modes, hooks	You build them
Memory / persistence	Conversation history saved	Session + project config	Sessions, resumable, subagent context	You build it
Customization / control	Minimal	Moderate (config + MCP)	High (code-level control)	Total
Target user	Anyone	Developers, interactive	Developers, programmatic/automated	Developers needing full control
Code required	None	None (you type prompts)	Yes (Python / TypeScript)	Yes, and much more of it
Maintenance burden	None (Anthropic)	Low	Low–moderate (SDK upgrades)	High (you own everything)

Development Complexity

Claude.ai requires zero code. Claude Code requires zero code but assumes developer fluency. The Agent SDK requires writing an application but hands you the hard parts. The Raw API requires writing the application and the agent infrastructure.

Flexibility, Control, and Maintenance

Flexibility rises as you move right across the table; so does the amount you must build and maintain. The Agent SDK is the sweet spot for most production agents: you get code-level control and automation without re-implementing the loop. Reach past it to the Raw API only when you need something the SDK’s design will not allow — an unusual control flow, a non-Claude model in the same loop, or a deliberately minimal dependency footprint.

How Requests Flow Through the Claude Architecture Ecosystem

End-to-End Architecture Walkthrough

No matter which layer you use, the bottom of the stack is identical: an HTTPS request to Anthropic’s Messages API, carrying a context window, returning one model response. What differs is everything built on top.

Claude Architecture Request Lifecycle (Described)

Picture the flow as a vertical stack with the model at the bottom.

In a Raw API call, your code builds the request, sends it, and receives one response. The lifecycle is a straight line: your code → API → model → API → your code. If a tool is involved, your code loops back and sends a new request. You are the loop.

In an Agent SDK call, you call query() once. Inside, the SDK builds the request, sends it to the model, receives a tool_use, executes the tool itself, appends the result, and sends an updated request — repeating until the model returns a final answer. The loop runs inside the SDK, in your process. You see typed messages stream by, but you did not write the loop.

In Claude Code, the same loop runs, but a human sits in the middle, approving steps and watching output.

In Claude.ai, the same loop runs entirely on Anthropic’s servers, and you see only the polished result.

Same engine at the bottom; progressively more orchestration as you go up.

Common Misconceptions in Claude Architecture

Claude Is Not the Harness

The most common error is treating “Claude” as a single thing that does everything. The model decides; the harness acts. When you praise or blame “Claude” for an agent’s behavior, you are usually talking about the harness.

APIs Are Not Agents

Calling the Messages API does not give you an agent. It gives you one model response. An agent requires a loop and tools around that API. The API is a component of an agent, not an agent itself.

Models Do Not Automatically Use Tools

Describing tools in an API request does not mean the model will run them. The model can only request a tool. Something else must execute it and return the result. If you forget to build that something, your “tool-using” agent will simply emit tool requests into the void.

Why Agent Loops Matter

The loop is the heart of agentic behavior. Single requests answer questions; loops accomplish tasks. Everything interesting about agents — multi-step reasoning, self-correction, working until done — comes from wrapping the model in a loop that carries results forward. Understanding that the loop is separate, buildable software is what lets you choose how much of it to build yourself.

When to Use Each Option

Best Use Cases for Claude.ai

Everyday tasks with no engineering required: writing, analysis, brainstorming, document review, quick coding help, and research. Choose it when you want results, not infrastructure.

Best Use Cases for Claude Code

Interactive software development where you steer: debugging, refactoring, implementing features, and exploring an unfamiliar codebase, with a human approving consequential actions. Choose it when you are the one in the loop.

Best Use Cases for the Claude Agent SDK

Production agents and automation: backend services, CI pipelines, scheduled jobs, and domain-specific agents (support, research, finance). Choose it when you need programmatic control and want the loop, tools, and context management handled for you.

Best Use Cases for the Raw Messages API

Maximum control and minimal abstraction: custom orchestration the SDK cannot express, lightweight single-shot calls (classification, extraction, simple generation) where no loop is needed at all, or research into agent architectures. Choose it when you genuinely need to own every layer — or when you need no harness because one request is the whole job.

The Future of Agent Development

Increasing Abstraction Layers

The trajectory is clear: more capability is moving into reusable harnesses. The renaming of the Claude Code SDK to the Claude Agent SDK is itself a signal — the industry is converging on the idea that a well-built agent harness is a general-purpose runtime, not something each team should rebuild from scratch.

Managed vs. Custom Agent Infrastructure

Expect the same split that played out in cloud computing: most teams will use managed harnesses (the SDK, or higher-level tools built on it) and treat the agent loop as solved infrastructure, while a minority with specialized needs will continue building custom orchestration on the Raw API. Knowing which camp you are in — and why — is the practical payoff of understanding Claude architecture.

Conclusion

Claude’s ecosystem is best understood through one distinction repeated at every level: the model is intelligence; the harness is execution. The model — Anthropic-hosted, stateless, inert until prompted — decides what to do. The harness loops, executes tools, manages context, enforces permissions, and remembers. Put them together and you get an agent: Agent = Model + Harness.

From there, the four access layers fall into place. Claude.ai is a fully managed harness you cannot see. Claude Code is a complete coding harness you steer by hand. The Claude Agent SDK (formerly the Claude Code SDK) is that same harness exposed as a library for your own programs. And the Raw Messages API is the model alone — total control, total responsibility, no harness at all. Choose by asking a single question: how much of the orchestration do I want to own? Your answer points directly at the right layer.

Frequently Asked Questions

1. What is the difference between Claude the model and Claude the product?

The model is the neural network hosted by Anthropic that takes text in and produces text out. A product (Claude.ai, Claude Code, your own app) is software wrapped around the model to make it useful for a specific task. The model decides; the product executes.

2. What exactly is a harness?

A harness is the software layer around a model that turns it from a text predictor into something that can complete multi-step tasks. It provides the agent loop, tool execution, context management, permission controls, and memory — everything the stateless model cannot do on its own.

3. Is the Claude Agent SDK the same as the Claude Code SDK?

Yes. The Claude Code SDK was renamed to the Claude Agent SDK to reflect that it is useful for building agents of any kind, not just coding agents. The package is now @anthropic-ai/claude-agent-sdk (TypeScript) and claude-agent-sdk (Python). Older “Claude Code SDK” references describe the same library.

4. Does the Claude Agent SDK just wrap the Messages API?

No. It is not a thin API wrapper , it is the same harness that powers Claude Code, running inside your process and exposed as a library. It includes the agent loop, built-in tools, context management, and permission controls. The Raw Messages API has none of those.

5. If I use the Raw Messages API with tools, isn’t that already an agent?

Not by itself. The API lets the model request a tool by returning a structured tool_use block, but it never executes the tool. You must run the tool, return the result, and loop. Building that loop and execution layer is what turns raw API access into an agent.

6. Why is the model described as “stateless”?

Each API request is independent; the model retains nothing between calls. Any “memory” like conversation history, prior results must be re-sent in each request. This is precisely why harnesses exist: to carry state forward across the steps of a task.

7. When should I use Claude Code instead of the Agent SDK?

Use Claude Code when a human is actively steering the work in a terminal or IDE like interactive debugging, refactoring, feature work with approval at each step. Use the Agent SDK when your application drives the agent programmatically, such as in a backend service or CI pipeline.

8. Can I switch between layers as my project grows?

Yes, and many teams do. A common path is to prototype interactively in Claude Code, move to the Agent SDK to automate and embed the workflow, and drop to the Raw API only for the specific pieces that need custom orchestration or that are simple enough to need no harness at all.

9. Which Claude model should my agent use?

It depends on the workload.

More capable models (such as Opus-tier models) are better suited for:
– Complex reasoning
– Long-running workflows
– Agent orchestration

Faster and lighter models (such as Haiku-tier models) are better suited for:
– High-volume workloads
– Latency-sensitive tasks
– Cost-sensitive applications

Because intelligence and harness are separate concerns, you can change the model without rewriting your harness. Simply update the model specified in your SDK configuration or API request.

Suggested External Authoritative References

Written by Dhanushri Devi Kannan, Senior at University of Illinois Chicago