A2A (Agent2Agent) protocol explained

Written by
Caleb Hayes
Updated on:July-01st-2025
Recommendation

A2A protocol: Opening a new chapter of interoperability between AI agents.

Core content:
1. Basic concepts of A2A protocol and its design principles
2. Three core participants in A2A protocol and their roles
3. The role of AgentCard and agent discovery mechanism

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

 

What is A2A Agreement

The A2A (Agent2Agent) protocol is an open protocol launched by Google Cloud to promote interoperability between different AI agents. Its main goal is to allow these agents to communicate and collaborate effectively in a dynamic, multi-agent ecosystem, regardless of whether they are built by different vendors or use different technical frameworks.

A2A Design Principles Summary

The design principles of the A2A (Agent2Agent) protocol are to enhance the collaboration between agents and ensure flexibility, security, and compatibility with existing systems. The following is a summary of these principles:

  1. 1.  Embrace agency capabilities

  • • Allows agents to collaborate in their natural, unstructured patterns, without sharing memory, tools, or context, thus enabling realistic multi-agent scenarios.

  • 2.  Build on existing standards

    • • The protocol is built on widely accepted technical standards such as HTTP, SSE, and JSON-RPC, making it easy to integrate with an enterprise's existing IT stack.

  • 3.  Secure by default

    • • Designed to support enterprise-level authentication and authorization, ensuring that only authorized users and systems can access the agent, enhancing system security.

  • 4.  Support long-running tasks

    • • Flexibly supports a variety of scenarios from quick tasks to complex research, and can provide real-time feedback, notifications and status updates during task execution.

  • 5.  Modality is irrelevant

    • • Supports multiple forms of interaction, including text, audio and video streams, form, iframe, etc., which enhances the agent's interactivity and adaptability.

    Overall, the protocol takes more considerations in terms of openness, security, and flexibility. These are all points that MCP lacks. We will leave the comparison with MCP to the end. Let’s get to the point first - A2A in detail

    A2A Participants

    There are three participants in the A2A protocol:

    • • User: A person (human or service) who uses the agent system to complete tasks

    • • Client: An entity that requests operations from an opaque proxy (service, agent, application) on behalf of a user.

    • • Server: An opaque (black box) remote agent, i.e., an A2A server.

    Refer to the following figure

    From the above diagram, we can clearly see the positions of the three participants. Compared with the previous MCP participants, there is a missing Host participant. This is a difference in design thinking, whether to open the implementation or standardize a mechanism. In the implementation of A2A, security and other factors have been implemented in other ways, but how the User finds the Agent he needs is indeed a legacy issue.

    A2A Core Concepts

    AgentCard

    AgentCard is a JSON file that describes what functions the Agent provides. The official recommendation is to host it at https:// base url /.well-known/agent.json .
    In this way, you can directly obtain AgentCard through HTTP GET to get a description of the Agent.

    A natural extension is that we need a registry, whether it is public or private, so that we can easily find Agents.

    But on the other hand, the registry can also be decentralized. Let's imagine a scenario like this: each website has a https:// base url /.well-known/agent.json , which describes what it can do, and then continuously broadcasts its AgentCard in a P2P network - even these AgentCards can be placed on IPFS or Ethereum, so that the collaborative relationship between Agents constitutes a self-organizing Agent network.

    Back to A2A, the definition of an AgentCard is as follows:

// An AgentCard conveys key information:
// - Overall details (version, name, description, uses)
// - Skills: A set of capabilities the agent can perform
// - Default modalities/content types supported by the agent.
// - Authentication requirements
interface  AgentCard  {
  // Human readable name of the agent.
  // (eg "Recipe Agent")
  namestring ;
  // A human-readable description of the agent. Used to assist users and
  // other agents in understanding what the agent can do.
  // (eg "Agent that helps users with recipes and cooking.")
  descriptionstring ;
  // A URL to the address the agent is hosted at.
  urlstring ;
  // The service provider of the agent
  provider ?: {
    organizationstring ;
    urlstring ;
  };
  // The version of the agent - format is up to the provider. (eg "1.0.0")
  versionstring ;
  // A URL to documentation for the agent.
  documentationUrl ?:  string ;
  // Optional capabilities supported by the agent.
  capabilities : {
    streaming ?:  boolean// true if the agent supports SSE
    pushNotifications ?:  boolean// true if the agent can notify updates to client
    stateTransitionHistory ?:  boolean//true if the agent exposes status change history for tasks
  };
  // Authentication requirements for the agent.
  // Intended to match OpenAPI authentication structure.
  authentication : {
    schemesstring [];  // eg Basic, Bearer
    credentials ?:  string//credentials a client should use for private cards
  };
  // The set of interaction modes that the agent
  // supports across all skills. This can be overridden per-skill.
  defaultInputModesstring [];  // supported mime types for input
  defaultOutputModesstring [];  // supported mime types for output
  // Skills are a unit of capability that an agent can perform.
  skills :
    idstring// unique identifier for the agent's skill
    namestring//human readable name of the skill
    // description of the skill - will be used by the client or a human
    // as a hint to understand what the skill does.
    descriptionstring ;
    // Set of tagwords describing classes of capabilities for this specific
    // skill (eg "cooking", "customer support", "billing")
    tagsstring [];
    // The set of example scenarios that the skill can perform.
    // Will be used by the client as a hint to understand how the skill can be
    // used. (eg "I need a recipe for bread")
    examples ?:  string [];  // example prompts for tasks
    // The set of interaction modes that the skill supports
    // (if different than the default)
    inputModes ?:  string [];  // supported mime types for input
    outputModes ?:  string [];  // supported mime types for output
  }[];
}

The content is long, but relatively simple. We use the following figure to represent it:

The complete definition can be found here: https://github.com/sing1ee/a2a-agent-coder/blob/main/src/schema.ts

Task

A task is a stateful entity that allows a client to collaborate with a remote agent to achieve a specific result and generate the corresponding output. Within a task, messages are exchanged between the client and the remote agent, and the remote agent generates artifacts as results (the agent is the agent).

Tasks are always created by the client, and their state is determined by the remote agent. If the client requires, multiple tasks can belong to the same session (indicated by an optional sessionId). The client can set this optional sessionId when creating a task.

After receiving the request, the proxy can take the following actions:

  • • Immediate fulfillment of requests

  • • Schedule work to be performed later

  • • Deny request

  • • Negotiate different implementation options

  • • Ask the client for more information

  • • Delegation to other agents or systems

Even after completing a goal, the client can still request more information or make changes within the context of the same task. For example, a client can ask: “Draw a picture of a bunny”, the agent responds: “<picture>”, and the client can then ask: “Draw it red”.

Tasks are not only used to deliver artifacts (results) and messages (thoughts, instructions, etc.), but also maintain the state of the task and its optional history, including state changes and message records.

These features are very important, especially the context of the same task, which allows for multiple rounds of conversations. These states and historical records are all saved, which is very compatible with current AI interactions that are mainly in the form of chat.

The task is defined as follows:

interface  Task  {
  idstring// unique identifier for the task
  sessionIdstring// client-generated id for the session holding the task.
  statusTaskStatus// current status of the task
  history ?:  Message [];
  artifacts ?:  Artifact [];  // collection of artifacts created by the agent.
  metadata ?:  Record < stringany >;  // extension metadata
}
// TaskState and accompanying message.
interface  TaskStatus  {
  stateTaskState ;
  message ?:  Message//additional status updates for client
  timestamp ?:  string// ISO datetime value
}
// sent by server during sendSubscribe or subscribe requests
interface  TaskStatusUpdateEvent  {
  idstring ;
  statusTaskStatus ;
  finalboolean//indicates the end of the event stream
  metadata ?:  Record < stringany >;
}
// sent by server during sendSubscribe or subscribe requests
interface  TaskArtifactUpdateEvent  {
  idstring ;
  artifactArtifact ;
  metadata ?:  Record < stringany >;
}
// Sent by the client to the agent to create, continue, or restart a task.
interface  TaskSendParams  {
  idstring ;
  sessionId ?:  string//server creates a new sessionId for new tasks if not set
  messageMessage ;
  historyLength ?:  number//number of recent messages to be retrieved
  // where the server should send notifications when disconnected.
  pushNotification ?:  PushNotificationConfig ;
  metadata ?:  Record < stringany >;  // extension metadata
}
type  TaskState  =
  |  "submitted"
  |  "working"
  |  "input-required"
  |  "completed"
  |  "canceled"
  |  "failed"
  |  "unknown" ;

Artifact

An artifact is an output generated by an agent as the end result of a task. Artifacts are immutable, can be named, and can contain multiple parts. New parts can be appended to existing artifacts through streaming responses.

A task can generate multiple artifacts. For example, when executing "Create a Web Page", separate HTML artifacts and image artifacts may be generated.

It has to be said that A2A came at a very accurate time. Now some major application forms of AI are included in the protocol definition. Artifact is a very popular form.

Specific definition:

interface  Artifact  {
  name ?:  string ;
  description ?:  string ;
  partsPart [];
  metadata ?:  Record < stringany >;
  indexnumber ;
  append ?:  boolean ;
  lastChunk ?:  boolean ;
}

Message

A message is an entity that contains any non-artifact content. This content can include agent thoughts, user context, instructions, error messages, status updates, or metadata.

All content from the client is sent in the form of messages. The agent communicates status or provides instructions through messages, and the generated results are sent in the form of artifacts.

Messages can contain multiple Parts to represent different types of content. For example, a user request may include a text description of the user and multiple files for context.

The definition is as follows:

interface  Message  {
  role"user"  |  "agent" ;
  partsPart [];
  metadata ?:  Record < stringany >;
}

Part

A Part is the complete content exchanged between a client and a remote agent as part of a message or artifact. Each Part has its own unique content type and metadata.

Following are the interface definitions for different types of parts:

TextPart

interface  TextPart  {
type"text" ;
textstring ;
}

FilePart

interface  FilePart  {
type"file" ;
file : {
  name ?:  string ;
  mimeType ?:  string ;
  // Possible content
  // oneof {
  bytes ?:  string// base64-encoded content
  uri ?:  string ;
  //}
};
}

Data Part

interface  DataPart  {
type"data" ;
dataRecord < stringany >;
}

Comprehensive Type

type  Part  = ( TextPart  |  FilePart  |  DataPart ) & {
metadataRecord < stringany >;
};

For more message details, refer to the link: https://a2aprotocol.ai/blog/a2a-sample-methods-and-json-responses

Communication mechanisms and asynchronous support

A2A supports the following communication mechanisms:

  • • A2A supports a secure push notification mechanism that allows the agent to send updates to the client without being connected.

  • • Clients and servers can use the standard request/response pattern or stream updates via SSE.

When pushing notifications, the agent needs to verify the identity of the notification service and use trusted credentials for authentication to ensure the security of the notification.
Based on the above communication mechanism, A2A supports client polling when processing long-running tasks, and the agent can also push status updates to the client through SSE.

The most important thing here is asynchronous support. The client can register a webhook to asynchronously obtain the results of long-running tasks, which is the implementation of PushNotification. Currently, when using the LLMs API, everyone will encounter a problem, that is, the output is too slow, and you can't do other things during the output process. If there is an asynchronous callback, or polling, and re-subscription, then the client development can be more flexible and can bring a better experience to users.

Here is the definition of push:

interface  PushNotificationConfig  {
  urlstring ;
  token ?:  string// token unique to this task/session
  authentication ?: {
    schemesstring [];
    credentials ?:  string ;
  };
}
interface  TaskPushNotificationConfig  {
  idstring//task id
  pushNotificationConfigPushNotificationConfig ;
}

Error Handling

Error message format

The following is the server's response to the client when it encounters an error while processing the client's request: ErrorMessage Format:

interface  ErrorMessage  {
codenumber ;
messagestring ;
data ?:  any ;
}

Standard JSON-RPC error codes

The following are standard JSON-RPC error codes that the server can respond to in error scenarios:

Error Codeinformationdescribe
-32700JSON parse errorInvalid JSON was sent
-32600Invalid RequestRequest payload validation error
-32601Method not foundIllegal methods
-32602Invalid paramsInvalid method parameter
-32603Internal errorInternal JSON-RPC error
-32000 to -32099Server errorReserved for implementation specific error codes
-32001Task not foundThe task with the provided ID could not be found
-32002Task cannot be canceledUnable to cancel task by remote agent
-32003Push notifications not supportedPush notifications are not supported by the proxy
-32004Unsupported operationOperation not supported
-32005Incompatible content typesIncompatible content type between client and proxy

Hands-on experience

I modified the official ts example to support OpenRouter, mainly changing the API format to be compatible with OpenAI. The code is here: https://github.com/sing1ee/a2a-agent-coder

I am doing this on a Mac, open your favorite terminal:

  1. 1. Install Bun

brew install oven-sh/bun/bun # For macOS and Linux
  1. 2. Clone the repository

git clone git@github.com:sing1ee/a2a-agent-coder.git
  1. 3. Install dependencies

cd a2a-agent-coder
bun i
  1. 4. Configure environment variables
    . Refer to ***.env.example to create a .env*** file with the following content:

OPENAI_API_KEY=sk-or-v1-xxxxxxx
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_MODEL=anthropic/claude-3.5-haiku

I use OpenRouter, which has convenient payment and many models. If you want to try it, you can register an OpenRouter. Even if you don't recharge, you can get 50 free models per day, such as deepseek/deepseek-chat-v3-0324:free
to make the environment variable effective

export $(cat .env | xargs)
  1. 5. Run A2A Server

bun run agents:coder
  1. 6. Open a new terminal and run A2A Client. You don’t need to configure env here.

bun run a2a:cli

Here are the results of my previous runs:

bun run a2a:cli

result
bun x tsx src/cli.ts
A2A Terminal Client
Agent URL: http://localhost:41241
Attempting to fetch agent card from: http://localhost:41241/.well-known/agent.json
✓ Agent Card Found:
  Name: Coder Agent
  Description: An agent that generates code based on natural language instructions and streams file outputs.
  Version: 0.0.1
Starting Task ID: a1a608b3-3015-4404-a83f-6ccc05083761
Enter messages, or use '/new' to start a new task.
Coder Agent > You: implement binary search
Sending...

Coder Agent [4:28:00 PM]: ⏳ Status: working
  Part 1: ? Text: Generating code...

Coder Agent [4:28:02 PM]: ⏳ Status: working
  Part 1: ? File: Name: src/algorithms/binary_search.py, Source: """
Implementation of the binary search algorithm in Python.
"""

def binary_search(arr, target):
    """
    Performs a binary search on a sorted array to find the index of a target value.

    Args:
        arr (list): A sorted list of elements.
        target: The value to search for in the array.

    Returns:
        int: The index of the target value if found, otherwise -1.
    """
    low = 0
    high = len(arr) - 1

    while low <= high:
        mid = (low + high) // 2 # Integer division to find the middle index

        if arr[mid] == target:
            return mid # Target found at index mid
        elif arr[mid] < target:
            low = mid + 1 # Target is in the right half
        else:
            high = mid - 1 # Target is in the left half

    return -1 # Target not found in the array


Coder Agent [4:28:02 PM]: ✅ Status: completed
SSE stream finished for method tasks/sendSubscribe.
--- End of response for this input ---
Coder Agent > You:
Exiting terminal client. Goodbye!

The flow chart of the operation process is as follows:

Currently, non-programmer users need to be patient if they want to experience it. They can also try it with the help of Cursor.

A2A vs. MCP

Many people are concerned about this issue, so I roughly summarized it as follows:

characteristicA2AMCP
Main ApplicationInter-agent communication and collaborationProvide tools and context for models and connect to external resources
Core ArchitectureClient-Server (Proxy-Proxy)Client-Host-Server (Application-LLM-External Resources)
Standard interfaceJSON spec, proxy cards, tasks, messages, artifactsJSON-RPC 2.0, Resources, Tools, Memories, Tips
Key FeaturesMultimodality, dynamic collaboration, security, task management, capability discoveryModularity, secure boundaries, reusable connectors, SDKs, tool discovery
Communication ProtocolHTTP, JSON-RPC, SSEJSON-RPC 2.0 over stdio, HTTP with SSE
Performance FocusAsynchronous communication, handling loadEfficient context management, parallel processing, and caching to improve throughput
Adoption and CommunityGood initial industry support and emerging ecosystemIndustry-wide adoption and rapid community growth

At the same time, I am also doing some thinking.

  • • How do we distinguish between Agents and Tools? Is there really an absolute boundary?

  • • From a technical point of view, A2A is applicable to more scenarios, including MCP scenarios.

  • • If there are many Agents and many MCP servers in the future, what kind of network will be formed? The former is more inclined to decentralization, while the latter is more inclined to centralization. The former is more inclined to decentralized autonomy, while the latter is centralized management.

Still thinking about it, more practice is needed.