This article was written with the assistance of AI
Prelude: Unveil the Secret of LLM’s Tool Call
Modern AI tools like Cursor have shown us how LLMs can go beyond passive text generation. With tool call, LLMs can actively interact with the real world: running commandline scripts, editing files, reading directories, or even sending messages. These interactions are automatically triggered by the LLM and the results are fed back to it. This cycle can repeat continuously, allowing the model to refine its understanding and generate follow-up tool calls based on the evolving context (known as ReAct mode).
Some LLM brands formalize this feature into their API calls. Here’s a typical API call structure that demonstrates how a tool-enabled request might look:
{
"model": "gpt-4-0613",
"messages": [
{
"role": "user",
"content": "What's the weather like in New York in Fahrenheit?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" },
"units": { "type": "string", "enum": ["metric", "imperial"] }
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}
Here’s the catch: both the input to the LLM and its output are just plain text. So how does it manage to use external tools?
Under the hood, the model sees these tool definitions purely as plain text embedded in the prompt — it has no access to code, execution context, or real-time APIs. Everything the model knows about the tool is contained in the text prompt it receives.
Here’s what a model might actually see in its prompt:
System:
You are a helpful coding assistant.
Available tools:
- get_current_weather: Get the current weather in a given location.
- location (string): The city to get weather for
- units (optional, string, default = "metric"): The units for temperature. Must be one of: metric, imperial
- get_news_headlines: Fetch the latest news headlines for a topic or region.
- topic (optional, string): The subject or keyword for filtering news
- region (optional, string): The geographical area for the news
- search_files: Search for files matching a query string in a given directory.
- query (string): The search keyword
- path (optional, string): The directory to search in
To call a tool, respond with a JSON object in the following format, and wrap it in a special tag so the host system can recognize it's a tool call:
<tool_call>
{
"name": "<tool_name>",
"arguments": {
"<param1>": <value1>,
"<param2>": <value2>
}
}
</tool_call>
User:
What's the weather like in New York in Fahrenheit?
From this plain-text input, the model is expected to reason over the query, recognize that it lacks the necessary information, and decide to invoke a tool. It will then generate a structured tool call like:
<tool_call>
{
"name": "get_current_weather",
"arguments": {
"location": "New York",
"units": "imperial"
}
}
</tool_call>
The host application monitors model responses and looks for any text wrapped in
The key insight: LLMs don’t truly “use” tools. They read and write plain text. Tools work because we teach the model to describe what it wants to do, and the host bridges the gap.
For a deeper look at how this pattern is implemented in real-world systems, see:
It’s powerful. But in most implementations, this system is rigid. Tools are manually registered. Context is hardcoded. There’s no way for a model to dynamically discover new tools, no unified structure for sharing data or prompts.
That’s where MCP comes in.
Act I: MCP is Easy
MCP (Model Context Protocol) defines a minimal but powerful architecture that standardizes how language models interact with tools and data in a dynamic, modular way.
Architecture
At a high level, MCP follows a client-server model. The MCP client (typically an LLM runtime like Claude or Cursor, or to be exact a part of them) connects to an MCP server, which exposes available tools and resources.
This diagram from the official documentation clearly illustrates the relationship between components:
Be noticed, the MCP server acts as the plugin side, implementing actual actions and providing capabilities. The MCP client is integrated into the user-facing application (e.g., Cursor) and is responsible for invoking those capabilities.
How the data flow?
The client asks: “What tools do you have?” (via tools/list
), and the server responds with a list. The model then calls a tool (via tools/call
), passing JSON arguments, and receives the output
The core APIs are:
tools/list
: tells the client what tools exist and what arguments they accept (structured JSON schema, very similar to OpenAI function calling). For example:
{
"jsonrpc": "2.0",
"method": "tools/list",
"id": "1"
}
Returns:
{
"jsonrpc": "2.0",
"id": "1",
"result": [
{
"name": "get_weather",
"description": "Get the weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": ["city"]
}
},
{
"name": "hello",
"description": "Return a simple greeting message"
}
]
}
tools/call
: executes a tool with arguments, and returns a result. For example:
{
"jsonrpc": "2.0",
"method": "tools/call",
"id": "2",
"params": {
"name": "get_weather",
"arguments": {
"city": "Tokyo"
}
}
}
Returns:
{
"jsonrpc": "2.0",
"id": "2",
"result": "Sunny, 26°C"
}
You may have already noticed that MCP uses JSON-RPC. If you don’t know what JSON-RPC is, it’s simply a specification that defines how to structure JSON-based remote procedure calls. It standardizes the shape of requests and responses — including fields like jsonrpc
, method
, params
, id
, and result
. There’s no complicated framework here: it’s just a lightweight envelope that makes APIs consistent and interoperable.
Implementing Server & Client with Golang
The official SDK provides implementations in TypeScript or Python. For Golang developers, there’s a 3rd-party library mcp-go, implements the MCP protocol and allows you to create a server with just a few lines of code:
// import "github.com/mark3labs/mcp-go/mcp"
// Create MCP server
s := server.NewMCPServer("Demo","1.0.0")
// Add tool
tool := mcp.NewTool("hello_world",
mcp.WithDescription("Say hello to someone"),
mcp.WithString("name",
mcp.Required(),
mcp.Description("Name of the person to greet"),
),
)
// Add tool handler
s.AddTool(tool, helloHandler)
// Start the stdio server
if err := server.ServeStdio(s); err != nil {
fmt.Printf("Server error: %v", err)
}
This is not far from writing a typical HTTP server.
Calling the tool from a client:
client := mcp.NewClient()
result, err := client.CallTool("get_weather", mcp.Args{"city": "Tokyo"})
For LLM agent developers, simply wrap the client as a native tool. Then, all the dots are connected.
Resources
Besides tools (which are actions), MCP also have the concept of resources. Examples include:
- resources/list: show available files, logs, datasets
- resources/read: fetch content of a specific resource
This structure allows the model to discover what contextual data is available, and then read the specific content it needs to reason or generate responses.
From technical perspective, resources are read-only tools.
How to Use MCP in Practice
To use MCP in the real world, end users need an MCP host — typically a chat-based LLM client like Claude Desktop, ChatWise or Cursor. Here’s how to get started with MCP in Cursor:
To configure Cursor to work with an MCP server, add the following to your MCP config file (typically located at ~/.cursor/mcp.config.json):
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
Here, playwright
is an arbitrary name for the MCP server — you can choose any identifier. The command and args fields specify how the MCP server is launched. In this case, npx @playwright/mcp@latest
will be executed as a subprocess by Cursor, and managed throughout the session.
Make sure that Cursor’s agent mode is enabled. Once it’s running, you can simply type natural language commands like:
- “open website apple.com”
- “click the login button”
The model will automatically determine which tools or resources to call via MCP, based on the tool metadata provided by the server. No special syntax, no manual invocation — just type and go.
Wrapping Up the Basics
MCP is just a simple protocol that defines how a language model can discover, read, and interact with external tools and resources. No more magic.
Act II: MCP is Not That Easy
The core idea of MCP is straightforward. But when you start developing an MCP server or client, you’ll discover that there are more parts to care.
Not Just Request - Response
When a session is established, regardless of the actual transport layer, MCP follows a simple handshake pattern:
- The client sends an
initialize
request to negotiate the supported protocol version and advertise its capabilities. - After the
initialize
exchange, the client and server can communicate using request/response messages. Notification
can be sent in either direction at any time after initialization. They are one-way messages that do not expect a response.
Request should be completed with a Response, and Notification is one-way message that does not expect a response. Request / Response / Notification — these concepts might look familiar, because they follow the exact structure of JSON-RPC. In fact, it’s JSON-RPC itself. When you use MCP, you’re really just using JSON-RPC with a specific schema tailored for LLM interaction.
Notification Types
Here are all the notification types currently defined in the protocol.
-
notifications/tools/list_changed
— Sent by the server to inform the client that the list of available tools has changed (e.g. a new tool was added, or a tool was removed). Similar notifications are available for resources and prompts (notifications/resources/list_changed
,notifications/prompts/list_changed
). -
notifications/resources/updated
— Indicates that the content of a specific resource has been updated. This is often used when the client has subscribed to changes. -
notifications/progress
— Used for long-running operations, allowing the server to send progress updates (e.g. indexing large datasets). -
notifications/message
— Servers send structured log or debug messages to the client.
Each notification name corresponds to the method field in a JSON-RPC message. The list above is manually gathered from the official specifications.
However, the real world is not yet fully implemented the spec.
From my testing on Cursor, message
and progress
are not yet well supported. message
does not appear in either the chat interface or the console. progress
does show up in the console, but it requires a progressToken field in the notification parameters — a token that should be provided by the client when sending the request. However, Cursor does not supply this token, making the progress notifications effectively non-functional.
The Transportation Layer
There are two officially supported transport mechanisms:
-
Stdio (Standard I/O): Typically used for local servers. The client spawns the server process and communicates via standard input and output streams. This is simple and fast, ideal for local integrations.
-
HTTP + Server-Sent Events (SSE) (2024-11-05): Common for remote or hosted servers. The client sends requests via HTTP POST and receives responses and notifications over a persistent SSE stream. This provides a lightweight way to maintain a long-lived connection over HTTP.
-
Streamable HTTP (2025-03-26): A newer version for remote connection, replacing HTTP+SSE.
Stdio was the first transport introduced by MCP and remains the most widely used today. It’s a practical choice for open-sourced servers, as it requires no additional deployment effort from developers. It behaves much like a plugin installed to the host application.
For the protocol, the behavior of listening to the notification is available for both client and server. But the HTTP-based MCP transport layer limits this capability to one way — only the client listens to the server. This limitation is intentional to keep the transport layer simple for remote servers.
Although HTTP+SSE has already been deprecated — replaced just five months after its introduction — it’s worth mentioning to illustrate how rough the early protocol was. It worked by establishing a long-lived, one-way (downward) SSE connection for receiving messages, while simulating upward message sending via a separate HTTP POST endpoint. This setup forced servers to maintain high-availability persistent connections, yet still only supported one-way communication — and reduced what could have been a simple, stateless request-response interaction to a rigid and stateful implementation. Its limitations were obvious, and its brief lifespan reflects the immaturity of the MCP protocol at that stage.
(SSE design to allows the server to push a continuous stream of events to the client. It is an HTTP connection, being kept open and sends each message as a line prefixed with data:
)
The Streamable HTTP method is fundamentally just HTTP. To be simple:
-
There is only one endpoint:
/message
. -
In simple stateless request-response scenarios, the client sends a JSON-RPC request in the body and receives response directly in body.
-
For more advanced scenarios, the server can upgrade the response to a SSE stream, enabling multiple notifications and the response over the same connection.
-
The initialize message is still required (as it is part of the protocol layer, not transport); it should be sent independently before any other request.
For more details, see the RFC and Streamable HTTP spec.
Additionally, the spec allows for custom transport implementations.
Multi-Agent
When the number of tools grows too large, they can’t all be presented to the model at once due to context window limitations. A common solution is to introduce multiple specialized agents, each responsible for a specific domain or topic, and each equipped with only the relevant subset of tools.
This setup reduces tool overload for each agent and makes the model’s decision-making more focused and efficient — for instance, one agent handles database queries, another manages file operations, and a third is responsible for deployment actions. These agents can then be orchestrated by a central planner or router model.
Another interesting discussion around multi-agent setups is using MCP as a bridge for communication between agents. Although MCP wasn’t originally designed for inter-agent messaging, it’s conceptually natural to wrap an agent as a tool, and allow another agent to delegate tasks via MCP tool calls. In simple cases, this can work. But for more complex workflows, there’s a killing limitation.
LLMs are not fast in current time, so practical systems often rely on streaming to show intermediate progress to improve user experience. In typical web-based LLM UIs, SSE is widely used to stream output incrementally. Here, it’s important to note that it doesn’t transmit the same application-level data used in the MCP HTTP+SSE transport. Each user prompt is an HTTP request, and the server response via Server-Sent Events (then ends connection when finished).
Now consider a scenario where an agent delegates a long-running task to another agent via MCP. Ideally, the user should see intermediate updates streamed back as the task progresses. From the name of ‘Streamable HTTP’, we know MCP supports streams. But what’s it streamed? Remember, MCP is built on JSON-RPC, which is not inherently stream-friendly. Each request expects a single, atomic response. As a result, MCP returns just one final, non-streamed JSON response, which cannot fulfill the process goal. Although streaming could be simulated with multiple Notifications, the protocol doesn’t define it, and the public clients don’t support it. The only standard streaming-like notification available — notifications/progress
— can only report numeric progress percentage, which is far too limited to support meaningful interactive feedback, ending up feeling more like a toy.
Finale:
Why has MCP drawn so much attention?
By connecting LLMs to pluggable countless external tools and data, numerous powerful use cases have been created, with results that often surpass our expectations. For example, control Blender to create 3D models step by step automatically, control, or the Manus — a phenomenon-level experience that showcases the power of such integrations, even though it doesn’t use MCP internally.
Previously, we treated the LLM as a monolithic system meant to solve everything — but quickly ran into its limits. Now, with MCP, the LLM acts as the brain: it delegates, using tools to interact with the world and solve more complex tasks. It’s not about sophisticated technical design; rather, the protocol emphasizes the idea, with a real implementation and, obviously, it will evolve.