How to Build a Multi-Agent Chatbot with Claude API (Step-by-Step)

What You'll Build

If you've searched for a real, working example of how to build a multi-agent chatbot with the Claude API, you're in the right place. Most tutorials hand you theory and pseudocode — this one gives you a fully functional Python system you can run today.

By the end of this tutorial, you'll have a multi-agent chatbot where a coordinator agent routes user requests to specialized sub-agents: one for research tasks, one for data analysis, and one for writing. Each agent has its own tools, role definition, and context — and they hand off work to each other through a central dispatch loop.

This is the same architectural pattern we use at Naples AI when building production chatbots for local businesses in Southwest Florida. It scales, it's maintainable, and it works.

📦 Full Source Code
The complete, GitHub-ready implementation is built step-by-step throughout this tutorial. Every snippet connects — by Step 4 you'll have one working file you can run end-to-end. Copy each section in order, or skip to the bottom of Step 4 for the assembled version.

Prerequisites

Python 3.10 or higher installed
An Anthropic API key (get one at console.anthropic.com)
anthropic Python SDK installed (pip install anthropic)
Basic familiarity with Python classes and functions
A terminal or IDE you're comfortable running scripts from

Step 1: Initialize Your Claude API Client and Define Agent Roles

The foundation of any multi-agent system is knowing what each agent is responsible for. I define this upfront with a simple Agent class that holds a role name, a system prompt, and a list of tools it's allowed to use.

Each agent gets its own identity. The coordinator doesn't write — it routes. The researcher doesn't analyze — it retrieves. Keeping these boundaries clear is what makes the whole system predictable.

agents.py

import os
import anthropic
from dataclasses import dataclass, field
from typing import Optional

# Initialize the Anthropic client once — all agents share it
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
MODEL = "claude-sonnet-4-6"

@dataclass
class Agent:
    """Represents a specialized agent with a fixed role and toolset."""
    name: str
    role: str                          # Short label used in logs and routing
    system_prompt: str                 # Defines the agent's personality and constraints
    tools: list = field(default_factory=list)
    conversation_history: list = field(default_factory=list)

    def chat(self, user_message: str) -> str:
        """Send a message to this agent and return its text response."""
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })

        kwargs = {
            "model": MODEL,
            "max_tokens": 1024,
            "system": self.system_prompt,
            "messages": self.conversation_history,
        }

        # Only attach tools if this agent has any defined
        if self.tools:
            kwargs["tools"] = self.tools

        response = client.messages.create(**kwargs)

        # Extract the text content from the response
        assistant_text = ""
        for block in response.content:
            if hasattr(block, "text"):
                assistant_text += block.text

        self.conversation_history.append({
            "role": "assistant",
            "content": response.content
        })

        return assistant_text, response


# --- Define the three specialized agents ---

coordinator_agent = Agent(
    name="Coordinator",
    role="coordinator",
    system_prompt="""You are a coordinator agent. Your only job is to read the user's 
request and decide which specialist should handle it. Reply with EXACTLY one of these 
routing labels and nothing else:
  ROUTE:researcher  — for questions requiring information lookup or web searches
  ROUTE:analyst     — for data interpretation, calculations, or comparisons
  ROUTE:writer      — for drafting, editing, summarizing, or content creation
If the request is ambiguous, pick the most likely fit. Never answer the question yourself."""
)

researcher_agent = Agent(
    name="Researcher",
    role="researcher",
    system_prompt="""You are a research specialist. You retrieve and summarize factual 
information clearly and concisely. Always cite what you know and be upfront when 
something is outside your training data. Keep responses focused and scannable."""
)

analyst_agent = Agent(
    name="Analyst",
    role="analyst",
    system_prompt="""You are a data analyst. You interpret numbers, spot trends, run 
calculations, and explain findings in plain English. Show your reasoning step by step. 
Format any numerical results clearly."""
)

writer_agent = Agent(
    name="Writer",
    role="writer",
    system_prompt="""You are a professional writer and editor. You draft, rewrite, and 
polish content for clarity and impact. Match the tone the user asks for. If no tone is 
specified, default to clear and professional."""
)

Notice I'm using a dataclass here — it keeps the agent definition clean and makes it easy to add new agents later without rewriting your dispatch logic. The conversation_history list is what gives each agent memory within a session.

Step 2: Create Tool Definitions and Validation

Claude's tool-use feature lets your agents call functions in your own code. You define what the tool does, what parameters it accepts, and what it returns. Claude decides when to call it based on context.

For this tutorial, I'm giving the analyst agent a calculate tool and the researcher agent a search_knowledge_base tool. These are realistic stand-ins you'd replace with actual APIs in production.

tools.py

import json
import math
from typing import Any

# --- Tool Definitions (passed to Claude as schema) ---

CALCULATE_TOOL = {
    "name": "calculate",
    "description": "Evaluates a safe mathematical expression and returns the numeric result. Use for arithmetic, percentages, and basic statistics.",
    "input_schema": {
        "type": "object",
        "properties": {
            "expression": {
                "type": "string",
                "description": "A valid Python math expression, e.g. '(150 * 0.08) + 12.5'"
            }
        },
        "required": ["expression"]
    }
}

SEARCH_KNOWLEDGE_BASE_TOOL = {
    "name": "search_knowledge_base",
    "description": "Searches a local knowledge base for relevant documents or facts. Use when the user asks about stored information or company-specific data.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query string"
            },
            "max_results": {
                "type": "integer",
                "description": "Maximum number of results to return (default 3)",
                "default": 3
            }
        },
        "required": ["query"]
    }
}

# Attach tools to the agents that should use them
# (Import agents here after defining tools to avoid circular imports)


# --- Tool Execution Functions ---

def execute_calculate(expression: str) -> dict:
    """
    Safely evaluates a math expression using only allowed operations.
    Returns a result dict or an error message.
    """
    # Whitelist of safe names — never use raw eval() on user input
    allowed_names = {
        "abs": abs, "round": round, "min": min, "max": max,
        "sum": sum, "pow": pow, "sqrt": math.sqrt,
        "log": math.log, "pi": math.pi, "e": math.e
    }
    try:
        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return {"result": result, "expression": expression}
    except Exception as ex:
        return {"error": str(ex), "expression": expression}


def execute_search_knowledge_base(query: str, max_results: int = 3) -> dict:
    """
    Simulates a knowledge base search. In production, replace this with
    a vector DB call (Pinecone, Weaviate, etc.) or a SQL lookup.
    """
    # Simulated document store — replace with your real retrieval logic
    mock_documents = [
        {"id": 1, "title": "Q4 Revenue Report", "snippet": "Total Q4 revenue reached $2.4M, up 18% YoY. Top performing segment: enterprise licenses."},
        {"id": 2, "title": "Product Roadmap 2026", "snippet": "Three major features planned: AI scheduling assistant, bulk export, and SSO integration."},
        {"id": 3, "title": "Customer Churn Analysis", "snippet": "Monthly churn rate stabilized at 2.1%. Primary exit reason: pricing (44% of churned accounts)."},
        {"id": 4, "title": "Support Ticket Trends", "snippet": "Average resolution time dropped to 4.2 hours. Top ticket category: onboarding questions (31%)."},
    ]

    # Simple keyword match — swap for semantic search in production
    query_lower = query.lower()
    matches = [
        doc for doc in mock_documents
        if any(word in doc["title"].lower() or word in doc["snippet"].lower()
               for word in query_lower.split())
    ]

    return {
        "query": query,
        "results": matches[:max_results],
        "total_found": len(matches)
    }


def dispatch_tool_call(tool_name: str, tool_input: dict) -> Any:
    """Routes a tool call from Claude to the correct Python function."""
    if tool_name == "calculate":
        return execute_calculate(tool_input["expression"])
    elif tool_name == "search_knowledge_base":
        return execute_search_knowledge_base(
            tool_input["query"],
            tool_input.get("max_results", 3)
        )
    else:
        return {"error": f"Unknown tool: {tool_name}"}

⚠️ Security Note
Never pass raw user input directly to Python's eval(). The execute_calculate function above uses an explicit allowlist of safe names. If you expand the tool, keep that whitelist tight.

Step 3: Build the Coordination Loop and Agent Communication

This is the core of the system. The coordination loop takes a user message, sends it to the coordinator agent, parses the routing decision, then hands the message to the right specialist. If that specialist needs to call a tool, we handle the tool loop here too.

The tool loop is the part most tutorials skip over. When Claude responds with a tool_use block, you have to execute the tool yourself and feed the result back before Claude gives you a final answer. I'll show you exactly how that works.

coordinator.py

import json
import anthropic
from agents import (
    client, MODEL,
    coordinator_agent, researcher_agent, analyst_agent, writer_agent,
    Agent
)
from tools import (
    CALCULATE_TOOL, SEARCH_KNOWLEDGE_BASE_TOOL, dispatch_tool_call
)

# Attach tools to the agents that need them
researcher_agent.tools = [SEARCH_KNOWLEDGE_BASE_TOOL]
analyst_agent.tools = [CALCULATE_TOOL]
# Writer agent has no tools — it just generates text

AGENT_REGISTRY = {
    "researcher": researcher_agent,
    "analyst": analyst_agent,
    "writer": writer_agent,
}


def parse_routing_decision(coordinator_response: str) -> str:
    """
    Extracts the agent key from the coordinator's routing label.
    Returns 'writer' as a safe fallback if parsing fails.
    """
    response_clean = coordinator_response.strip()
    for key in AGENT_REGISTRY:
        if f"ROUTE:{key}" in response_clean:
            return key
    print(f"[Coordinator] Could not parse routing from: '{response_clean}' — defaulting to writer")
    return "writer"


def run_agent_with_tools(agent: Agent, user_message: str) -> str:
    """
    Sends a message to a specialist agent and handles the full tool-use loop.
    Claude may call tools zero, one, or multiple times before giving a final answer.
    """
    # Add the user message to this agent's history
    agent.conversation_history.append({
        "role": "user",
        "content": user_message
    })

    while True:
        kwargs = {
            "model": MODEL,
            "max_tokens": 1024,
            "system": agent.system_prompt,
            "messages": agent.conversation_history,
        }
        if agent.tools:
            kwargs["tools"] = agent.tools

        response = client.messages.create(**kwargs)

        # Check the stop reason to decide what to do next
        if response.stop_reason == "tool_use":
            # Claude wants to call one or more tools
            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    print(f"  [Tool Call] {block.name}({json.dumps(block.input)})")
                    result = dispatch_tool_call(block.name, block.input)
                    print(f"  [Tool Result] {json.dumps(result)}")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })

            # Add Claude's tool-call message and our results to history
            agent.conversation_history.append({
                "role": "assistant",
                "content": response.content
            })
            agent.conversation_history.append({
                "role": "user",
                "content": tool_results
            })
            # Loop back — Claude will now generate a final answer using the tool results

        else:
            # stop_reason is "end_turn" — we have the final response
            final_text = ""
            for block in response.content:
                if hasattr(block, "text"):
                    final_text += block.text

            agent.conversation_history.append({
                "role": "assistant",
                "content": response.content
            })
            return final_text


def process_user_message(user_message: str) -> dict:
    """
    Main entry point. Routes a user message through the coordinator
    and returns the specialist's final response.
    """
    print(f"\n{'='*60}")
    print(f"User: {user_message}")
    print(f"{'='*60}")

    # Step 1: Ask the coordinator who should handle this
    routing_response, _ = coordinator_agent.chat(user_message)
    agent_key = parse_routing_decision(routing_response)
    selected_agent = AGENT_REGISTRY[agent_key]

    print(f"[Coordinator] Routing to: {selected_agent.name}")

    # Step 2: Send the message to the selected specialist
    specialist_response = run_agent_with_tools(selected_agent, user_message)

    print(f"\n[{selected_agent.name}] {specialist_response}")

    return {
        "routed_to": agent_key,
        "agent_name": selected_agent.name,
        "response": specialist_response
    }

Step 4: Implement Context Switching Between Specialized Agents

Context switching is what separates a toy demo from something you'd actually deploy. Each agent maintains its own conversation history, so a follow-up question to the researcher doesn't bleed into the analyst's context. But the user gets a seamless experience.

In the complete example below, I add a simple CLI loop so you can actually talk to the system. I also show how to reset individual agent histories when you want to start a fresh context without restarting the whole program.

main.py

import sys
from coordinator import process_user_message, AGENT_REGISTRY


def reset_agent_context(agent_key: str = None):
    """
    Clears conversation history for one agent or all agents.
    Useful for starting a new topic without restarting the session.
    """
    if agent_key and agent_key in AGENT_REGISTRY:
        AGENT_REGISTRY[agent_key].conversation_history = []
        print(f"[System] Cleared context for {AGENT_REGISTRY[agent_key].name}")
    else:
        for agent in AGENT_REGISTRY.values():
            agent.conversation_history = []
        print("[System] Cleared context for all agents")


def run_demo():
    """Runs a preset demo showing all three agent types in action."""
    demo_messages = [
        "What were the key findings from the Q4 revenue report?",
        "If our Q4 revenue was $2.4M and grew 18% YoY, what was Q3 of the previous year?",
        "Write a two-sentence executive summary of our Q4 performance for a board update.",
    ]

    print("\n🤖 Naples AI — Multi-Agent Chatbot Demo")
    print("=" * 60)

    for message in demo_messages:
        result = process_user_message(message)
        print(f"\n✅ Handled by: {result['agent_name']}\n")


def run_interactive():
    """Starts an interactive chat session with the multi-agent system."""
    print("\n🤖 Naples AI Multi-Agent Chatbot")
    print("Commands: 'reset' = clear all contexts | 'reset:researcher' = clear one agent | 'quit' = exit")
    print("-" * 60)

    while True:
        try:
            user_input = input("\nYou: ").strip()
        except (KeyboardInterrupt, EOFError):
            print("\nGoodbye!")
            break

        if not user_input:
            continue

        if user_input.lower() == "quit":
            print("Goodbye!")
            break

        if user_input.lower().startswith("reset"):
            parts = user_input.split(":")
            agent_key = parts[1].strip() if len(parts) > 1 else None
            reset_agent_context(agent_key)
            continue

        result = process_user_message(user_input)
        print(f"\n✅ Handled by: {result['agent_name']}")


if __name__ == "__main__":
    if len(sys.argv) > 1 and sys.argv[1] == "--interactive":
        run_interactive()
    else:
        run_demo()

Run the demo with python main.py or start an interactive session with python main.py --interactive. Here's what the demo output looks like:

Sample Output

🤖 Naples AI — Multi-Agent Chatbot Demo
============================================================

============================================================
User: What were the key findings from the Q4 revenue report?
============================================================
[Coordinator] Routing to: Researcher
  [Tool Call] search_knowledge_base({"query": "Q4 revenue report key findings"})
  [Tool Result] {"query": "Q4 revenue report key findings", "results": [{"id": 1, "title": "Q4 Revenue Report", "snippet": "Total Q4 revenue reached $2.4M, up 18% YoY. Top performing segment: enterprise licenses."}], "total_found": 1}

[Researcher] Based on the Q4 Revenue Report in the knowledge base, here are the key findings:
- Total Q4 revenue: $2.4M
- Year-over-year growth: 18%
- Top performing segment: Enterprise licenses

✅ Handled by: Researcher

============================================================
User: If our Q4 revenue was $2.4M and grew 18% YoY, what was Q3 of the previous year?
============================================================
[Coordinator] Routing to: Analyst
  [Tool Call] calculate({"expression": "2400000 / 1.18"})
  [Tool Result] {"result": 2033898.3050847457, "expression": "2400000 / 1.18"}

[Analyst] Here's the breakdown:
- Q4 current year: $2,400,000
- Growth rate: 18%
- Formula: $2,400,000 ÷ 1.18 = $2,033,898

The Q4 figure from the prior year was approximately $2,033,898 (roughly $2.03M).

✅ Handled by: Analyst

============================================================
User: Write a two-sentence executive summary of our Q4 performance for a board update.
============================================================
[Coordinator] Routing to: Writer

[Writer] Q4 delivered strong results with total revenue reaching $2.4M, representing 18% 
year-over-year growth driven by our enterprise segment. Performance exceeded prior-year 
levels across key metrics, positioning the company well for continued momentum heading 
into the new fiscal year.

✅ Handled by: Writer

How It Works: Agent Architecture and Decision Flow Explained

Let me walk through the full flow in plain English, because understanding this makes it much easier to extend the system later.

When a user message comes in, the coordinator agent reads it and replies with a routing label — nothing else. That label maps to one of three specialists in the registry. The coordinator never touches tools and never answers questions directly. Its only job is intent classification.

The selected specialist then receives the original user message with its own system prompt and tool access. If Claude decides a tool would help, it returns a tool_use stop reason instead of text. We execute the tool locally, append the result to the conversation history, and send everything back to Claude. Claude then generates its final answer using the tool output as context. This loop repeats until Claude returns end_turn.

Each agent's conversation_history list is separate. So the analyst's tool calls and math results don