A2A (Agent2Agent) protocol explained

A2A protocol: Opening a new chapter of interoperability between AI agents.
Core content:
1. Basic concepts of A2A protocol and its design principles
2. Three core participants in A2A protocol and their roles
3. The role of AgentCard and agent discovery mechanism
What is A2A Agreement
The A2A (Agent2Agent) protocol is an open protocol launched by Google Cloud to promote interoperability between different AI agents. Its main goal is to allow these agents to communicate and collaborate effectively in a dynamic, multi-agent ecosystem, regardless of whether they are built by different vendors or use different technical frameworks.
A2A Design Principles Summary
The design principles of the A2A (Agent2Agent) protocol are to enhance the collaboration between agents and ensure flexibility, security, and compatibility with existing systems. The following is a summary of these principles:
1. Embrace agency capabilities
• Allows agents to collaborate in their natural, unstructured patterns, without sharing memory, tools, or context, thus enabling realistic multi-agent scenarios.
2. Build on existing standards
• The protocol is built on widely accepted technical standards such as HTTP, SSE, and JSON-RPC, making it easy to integrate with an enterprise's existing IT stack.
3. Secure by default
• Designed to support enterprise-level authentication and authorization, ensuring that only authorized users and systems can access the agent, enhancing system security.
4. Support long-running tasks
• Flexibly supports a variety of scenarios from quick tasks to complex research, and can provide real-time feedback, notifications and status updates during task execution.
5. Modality is irrelevant
• Supports multiple forms of interaction, including text, audio and video streams, form, iframe, etc., which enhances the agent's interactivity and adaptability.
• User: A person (human or service) who uses the agent system to complete tasks
• Client: An entity that requests operations from an opaque proxy (service, agent, application) on behalf of a user.
• Server: An opaque (black box) remote agent, i.e., an A2A server.
Overall, the protocol takes more considerations in terms of openness, security, and flexibility. These are all points that MCP lacks. We will leave the comparison with MCP to the end. Let’s get to the point first - A2A in detail
A2A Participants
There are three participants in the A2A protocol:
Refer to the following figure
From the above diagram, we can clearly see the positions of the three participants. Compared with the previous MCP participants, there is a missing Host participant. This is a difference in design thinking, whether to open the implementation or standardize a mechanism. In the implementation of A2A, security and other factors have been implemented in other ways, but how the User finds the Agent he needs is indeed a legacy issue.
A2A Core Concepts
AgentCard
AgentCard is a JSON file that describes what functions the Agent provides. The official recommendation is to host it at https:// base url /.well-known/agent.json .
In this way, you can directly obtain AgentCard through HTTP GET to get a description of the Agent.
A natural extension is that we need a registry, whether it is public or private, so that we can easily find Agents.
But on the other hand, the registry can also be decentralized. Let's imagine a scenario like this: each website has a https:// base url /.well-known/agent.json , which describes what it can do, and then continuously broadcasts its AgentCard in a P2P network - even these AgentCards can be placed on IPFS or Ethereum, so that the collaborative relationship between Agents constitutes a self-organizing Agent network.
Back to A2A, the definition of an AgentCard is as follows:
// An AgentCard conveys key information:
// - Overall details (version, name, description, uses)
// - Skills: A set of capabilities the agent can perform
// - Default modalities/content types supported by the agent.
// - Authentication requirements
interface AgentCard {
// Human readable name of the agent.
// (eg "Recipe Agent")
name : string ;
// A human-readable description of the agent. Used to assist users and
// other agents in understanding what the agent can do.
// (eg "Agent that helps users with recipes and cooking.")
description : string ;
// A URL to the address the agent is hosted at.
url : string ;
// The service provider of the agent
provider ?: {
organization : string ;
url : string ;
};
// The version of the agent - format is up to the provider. (eg "1.0.0")
version : string ;
// A URL to documentation for the agent.
documentationUrl ?: string ;
// Optional capabilities supported by the agent.
capabilities : {
streaming ?: boolean ; // true if the agent supports SSE
pushNotifications ?: boolean ; // true if the agent can notify updates to client
stateTransitionHistory ?: boolean ; //true if the agent exposes status change history for tasks
};
// Authentication requirements for the agent.
// Intended to match OpenAPI authentication structure.
authentication : {
schemes : string []; // eg Basic, Bearer
credentials ?: string ; //credentials a client should use for private cards
};
// The set of interaction modes that the agent
// supports across all skills. This can be overridden per-skill.
defaultInputModes : string []; // supported mime types for input
defaultOutputModes : string []; // supported mime types for output
// Skills are a unit of capability that an agent can perform.
skills :
id : string ; // unique identifier for the agent's skill
name : string ; //human readable name of the skill
// description of the skill - will be used by the client or a human
// as a hint to understand what the skill does.
description : string ;
// Set of tagwords describing classes of capabilities for this specific
// skill (eg "cooking", "customer support", "billing")
tags : string [];
// The set of example scenarios that the skill can perform.
// Will be used by the client as a hint to understand how the skill can be
// used. (eg "I need a recipe for bread")
examples ?: string []; // example prompts for tasks
// The set of interaction modes that the skill supports
// (if different than the default)
inputModes ?: string []; // supported mime types for input
outputModes ?: string []; // supported mime types for output
}[];
}
The content is long, but relatively simple. We use the following figure to represent it:
The complete definition can be found here: https://github.com/sing1ee/a2a-agent-coder/blob/main/src/schema.ts
Task
A task is a stateful entity that allows a client to collaborate with a remote agent to achieve a specific result and generate the corresponding output. Within a task, messages are exchanged between the client and the remote agent, and the remote agent generates artifacts as results (the agent is the agent).
Tasks are always created by the client, and their state is determined by the remote agent. If the client requires, multiple tasks can belong to the same session (indicated by an optional sessionId). The client can set this optional sessionId when creating a task.
After receiving the request, the proxy can take the following actions:
• Immediate fulfillment of requests
• Schedule work to be performed later
• Deny request
• Negotiate different implementation options
• Ask the client for more information
• Delegation to other agents or systems
Even after completing a goal, the client can still request more information or make changes within the context of the same task. For example, a client can ask: “Draw a picture of a bunny”, the agent responds: “<picture>”, and the client can then ask: “Draw it red”.
Tasks are not only used to deliver artifacts (results) and messages (thoughts, instructions, etc.), but also maintain the state of the task and its optional history, including state changes and message records.
These features are very important, especially the context of the same task, which allows for multiple rounds of conversations. These states and historical records are all saved, which is very compatible with current AI interactions that are mainly in the form of chat.
The task is defined as follows:
interface Task {
id : string ; // unique identifier for the task
sessionId : string ; // client-generated id for the session holding the task.
status : TaskStatus ; // current status of the task
history ?: Message [];
artifacts ?: Artifact []; // collection of artifacts created by the agent.
metadata ?: Record < string , any >; // extension metadata
}
// TaskState and accompanying message.
interface TaskStatus {
state : TaskState ;
message ?: Message ; //additional status updates for client
timestamp ?: string ; // ISO datetime value
}
// sent by server during sendSubscribe or subscribe requests
interface TaskStatusUpdateEvent {
id : string ;
status : TaskStatus ;
final : boolean ; //indicates the end of the event stream
metadata ?: Record < string , any >;
}
// sent by server during sendSubscribe or subscribe requests
interface TaskArtifactUpdateEvent {
id : string ;
artifact : Artifact ;
metadata ?: Record < string , any >;
}
// Sent by the client to the agent to create, continue, or restart a task.
interface TaskSendParams {
id : string ;
sessionId ?: string ; //server creates a new sessionId for new tasks if not set
message : Message ;
historyLength ?: number ; //number of recent messages to be retrieved
// where the server should send notifications when disconnected.
pushNotification ?: PushNotificationConfig ;
metadata ?: Record < string , any >; // extension metadata
}
type TaskState =
| "submitted"
| "working"
| "input-required"
| "completed"
| "canceled"
| "failed"
| "unknown" ;
Artifact
An artifact is an output generated by an agent as the end result of a task. Artifacts are immutable, can be named, and can contain multiple parts. New parts can be appended to existing artifacts through streaming responses.
A task can generate multiple artifacts. For example, when executing "Create a Web Page", separate HTML artifacts and image artifacts may be generated.
It has to be said that A2A came at a very accurate time. Now some major application forms of AI are included in the protocol definition. Artifact is a very popular form.
Specific definition:
interface Artifact {
name ?: string ;
description ?: string ;
parts : Part [];
metadata ?: Record < string , any >;
index : number ;
append ?: boolean ;
lastChunk ?: boolean ;
}
Message
A message is an entity that contains any non-artifact content. This content can include agent thoughts, user context, instructions, error messages, status updates, or metadata.
All content from the client is sent in the form of messages. The agent communicates status or provides instructions through messages, and the generated results are sent in the form of artifacts.
Messages can contain multiple Parts to represent different types of content. For example, a user request may include a text description of the user and multiple files for context.
The definition is as follows:
interface Message {
role : "user" | "agent" ;
parts : Part [];
metadata ?: Record < string , any >;
}
Part
A Part is the complete content exchanged between a client and a remote agent as part of a message or artifact. Each Part has its own unique content type and metadata.
Following are the interface definitions for different types of parts:
TextPart
interface TextPart {
type : "text" ;
text : string ;
}
FilePart
interface FilePart {
type : "file" ;
file : {
name ?: string ;
mimeType ?: string ;
// Possible content
// oneof {
bytes ?: string ; // base64-encoded content
uri ?: string ;
//}
};
}
Data Part
interface DataPart {
type : "data" ;
data : Record < string , any >;
}
Comprehensive Type
type Part = ( TextPart | FilePart | DataPart ) & {
metadata : Record < string , any >;
};
For more message details, refer to the link: https://a2aprotocol.ai/blog/a2a-sample-methods-and-json-responses
Communication mechanisms and asynchronous support
A2A supports the following communication mechanisms:
• A2A supports a secure push notification mechanism that allows the agent to send updates to the client without being connected.
• Clients and servers can use the standard request/response pattern or stream updates via SSE.
When pushing notifications, the agent needs to verify the identity of the notification service and use trusted credentials for authentication to ensure the security of the notification.
Based on the above communication mechanism, A2A supports client polling when processing long-running tasks, and the agent can also push status updates to the client through SSE.
The most important thing here is asynchronous support. The client can register a webhook to asynchronously obtain the results of long-running tasks, which is the implementation of PushNotification. Currently, when using the LLMs API, everyone will encounter a problem, that is, the output is too slow, and you can't do other things during the output process. If there is an asynchronous callback, or polling, and re-subscription, then the client development can be more flexible and can bring a better experience to users.
Here is the definition of push:
interface PushNotificationConfig {
url : string ;
token ?: string ; // token unique to this task/session
authentication ?: {
schemes : string [];
credentials ?: string ;
};
}
interface TaskPushNotificationConfig {
id : string ; //task id
pushNotificationConfig : PushNotificationConfig ;
}
Error Handling
Error message format
The following is the server's response to the client when it encounters an error while processing the client's request: ErrorMessage
Format:
interface ErrorMessage {
code : number ;
message : string ;
data ?: any ;
}
Standard JSON-RPC error codes
The following are standard JSON-RPC error codes that the server can respond to in error scenarios:
Error Code | information | describe |
-32700 | JSON parse error | Invalid JSON was sent |
-32600 | Invalid Request | Request payload validation error |
-32601 | Method not found | Illegal methods |
-32602 | Invalid params | Invalid method parameter |
-32603 | Internal error | Internal JSON-RPC error |
-32000 to -32099 | Server error | Reserved for implementation specific error codes |
-32001 | Task not found | The task with the provided ID could not be found |
-32002 | Task cannot be canceled | Unable to cancel task by remote agent |
-32003 | Push notifications not supported | Push notifications are not supported by the proxy |
-32004 | Unsupported operation | Operation not supported |
-32005 | Incompatible content types | Incompatible content type between client and proxy |
Hands-on experience
I modified the official ts example to support OpenRouter, mainly changing the API format to be compatible with OpenAI. The code is here: https://github.com/sing1ee/a2a-agent-coder
I am doing this on a Mac, open your favorite terminal:
1. Install Bun
brew install oven-sh/bun/bun # For macOS and Linux
2. Clone the repository
git clone git@github.com:sing1ee/a2a-agent-coder.git
3. Install dependencies
cd a2a-agent-coder
bun i
4. Configure environment variables
. Refer to ***.env.example to create a .env*** file with the following content:
OPENAI_API_KEY=sk-or-v1-xxxxxxx
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_MODEL=anthropic/claude-3.5-haiku
I use OpenRouter, which has convenient payment and many models. If you want to try it, you can register an OpenRouter. Even if you don't recharge, you can get 50 free models per day, such as deepseek/deepseek-chat-v3-0324:free
to make the environment variable effective
export $(cat .env | xargs)
5. Run A2A Server
bun run agents:coder
6. Open a new terminal and run A2A Client. You don’t need to configure env here.
bun run a2a:cli
Here are the results of my previous runs:
bun run a2a:cli
# result
$ bun x tsx src/cli.ts
A2A Terminal Client
Agent URL: http://localhost:41241
Attempting to fetch agent card from: http://localhost:41241/.well-known/agent.json
✓ Agent Card Found:
Name: Coder Agent
Description: An agent that generates code based on natural language instructions and streams file outputs.
Version: 0.0.1
Starting Task ID: a1a608b3-3015-4404-a83f-6ccc05083761
Enter messages, or use '/new' to start a new task.
Coder Agent > You: implement binary search
Sending...
Coder Agent [4:28:00 PM]: ⏳ Status: working
Part 1: ? Text: Generating code...
Coder Agent [4:28:02 PM]: ⏳ Status: working
Part 1: ? File: Name: src/algorithms/binary_search.py, Source: """
Implementation of the binary search algorithm in Python.
"""
def binary_search(arr, target):
"""
Performs a binary search on a sorted array to find the index of a target value.
Args:
arr (list): A sorted list of elements.
target: The value to search for in the array.
Returns:
int: The index of the target value if found, otherwise -1.
"""
low = 0
high = len(arr) - 1
while low <= high:
mid = (low + high) // 2 # Integer division to find the middle index
if arr[mid] == target:
return mid # Target found at index mid
elif arr[mid] < target:
low = mid + 1 # Target is in the right half
else:
high = mid - 1 # Target is in the left half
return -1 # Target not found in the array
Coder Agent [4:28:02 PM]: ✅ Status: completed
SSE stream finished for method tasks/sendSubscribe.
--- End of response for this input ---
Coder Agent > You:
Exiting terminal client. Goodbye!
The flow chart of the operation process is as follows:
Currently, non-programmer users need to be patient if they want to experience it. They can also try it with the help of Cursor.
A2A vs. MCP
Many people are concerned about this issue, so I roughly summarized it as follows:
characteristic | A2A | MCP |
Main Application | Inter-agent communication and collaboration | Provide tools and context for models and connect to external resources |
Core Architecture | Client-Server (Proxy-Proxy) | Client-Host-Server (Application-LLM-External Resources) |
Standard interface | JSON spec, proxy cards, tasks, messages, artifacts | JSON-RPC 2.0, Resources, Tools, Memories, Tips |
Key Features | Multimodality, dynamic collaboration, security, task management, capability discovery | Modularity, secure boundaries, reusable connectors, SDKs, tool discovery |
Communication Protocol | HTTP, JSON-RPC, SSE | JSON-RPC 2.0 over stdio, HTTP with SSE |
Performance Focus | Asynchronous communication, handling load | Efficient context management, parallel processing, and caching to improve throughput |
Adoption and Community | Good initial industry support and emerging ecosystem | Industry-wide adoption and rapid community growth |
At the same time, I am also doing some thinking.
• How do we distinguish between Agents and Tools? Is there really an absolute boundary?
• From a technical point of view, A2A is applicable to more scenarios, including MCP scenarios.
• If there are many Agents and many MCP servers in the future, what kind of network will be formed? The former is more inclined to decentralization, while the latter is more inclined to centralization. The former is more inclined to decentralized autonomy, while the latter is centralized management.
Still thinking about it, more practice is needed.