Designing Collaborative Multi-Agent Systems with the A2A Protocol

It feels like every other AI announcement lately mentions “agents.” And already, the AI community has 2025 pegged as “the year of AI agents,” sometimes without much more detail than “They’ll be amazing!” Often forgotten in this hype are the fundamentals. Everybody is dreaming of armies of agents, booking hotels and flights, researching complex topics, […]

Jun 19, 2025 - 21:10
 0
Designing Collaborative Multi-Agent Systems with the A2A Protocol

It feels like every other AI announcement lately mentions “agents.” And already, the AI community has 2025 pegged as “the year of AI agents,” sometimes without much more detail than “They’ll be amazing!” Often forgotten in this hype are the fundamentals. Everybody is dreaming of armies of agents, booking hotels and flights, researching complex topics, and writing PhD theses for us. And yet we see little substance that addresses a critical engineering challenge of these ambitious systems: How do these independent agents, built by different teams using different tech, often with completely opaque inner workings, actually collaborate?

But enterprises aren’t often fooled by these hype cycles and promises. Instead, they tend to cut through the noise and ask the hard questions: If every company spins up its own clever agent for accounting, another for logistics, a third for customer service, and you have your own personal assistant agent trying to wrangle them all—how do they coordinate? How does the accounting agent securely pass info to the logistics agent without a human manually copying data between dashboards? How does your assistant delegate booking a flight without needing to know the specific, proprietary, and likely undocumented inner workings of one particular travel agent?

Right now, the answer is often “they don’t” or “with a whole lot of custom, brittle, painful integration code.” It’s becoming a digital Tower of Babel: Agents get stuck in their own silos, unable to talk to each other. And without that collaboration, they can’t deliver on their promise of tackling complex, real-world tasks together.

The Agent2Agent (A2A) Protocol attempts to address these pressing questions. Its goal is to provide that missing common language, a set of rules for how different agents and AI systems can interact without needing to lay open their internal secrets or get caught in custom-built, one-off integrations.

Hendrick van Cleve III (Attr.) – The Tower of Babel (public domain)

In this article, we’ll dive into the details of A2A. We’ll look at:

  • The core ideas behind it: What underlying principles is it built on?
  • How it actually works: What are the key mechanisms?
  • Where it fits in the broader landscape, in particular, how it compares to and potentially complements the Model Context Protocol (MCP), which tackles the related (but different) problem of agents using tools.
  • What we think comes next in the area of multi-agent system design.

A2A Protocol Overview

At its core, the A2A protocol is an effort to establish a way for AI agents to communicate and collaborate. Its aim is to provide a standard framework allowing agents to:

  • Discover capabilities: Identify other available agents and understand their functions.
  • Negotiate interaction: Determine the appropriate modality for exchanging information for a specific task—simple text, structured forms, perhaps even bidirectional multimedia streams.
  • Collaborate securely: Execute tasks cooperatively, passing instructions and data reliably and safely.

But just listing goals like “discovery” and “collaboration” on paper is easy. We’ve seen plenty of ambitious tech standards stumble because they didn’t grapple with the messy realities early on (OSI network model, anyone?). When we’re trying to get countless different systems, built by different teams, to actually cooperate without creating chaos, we need more than a wishlist. We need some firm guiding principles baked in from the start. These reflect the hard-won lessons about what it takes to make complex systems actually work: How do we handle and make trade-offs when it comes to security, robustness, and practical usage?

With that in mind, A2A was built with these tenets:

  • Simple: Instead of reinventing the wheel, A2A leverages well-established and widely understood existing standards. This lowers the barrier to adoption and integration, allowing developers to build upon familiar technologies.
  • Enterprise ready: A2A includes robust mechanisms for authentication (verifying agent identities), security (protecting data in transit and at rest), privacy (ensuring sensitive information is handled appropriately), tracing (logging interactions for auditability), and monitoring (observing the health and performance of agent communications).
  • Async first: A2A is designed with asynchronous communication as a primary consideration, allowing tasks to proceed over extended periods and seamlessly integrate human-in-the-loop workflows.
  • Modality agnostic: A2A supports interactions across various modalities, including text, bidirectional audio/video streams, interactive forms, and even embedded iframes for richer user experiences. This flexibility allows agents to communicate and present information in the most appropriate format for the task and user.
  • Opaque execution: This is a cornerstone of A2A. Each agent participating in a collaboration remains invisible to the others. They don’t need to reveal their internal reasoning processes, their knowledge representation, memory, or the specific tools they might be using. Collaboration occurs through well-defined interfaces and message exchanges, preserving the autonomy and intellectual property of each agent. Note that, while agents operate this way by default (without revealing their specific implementation, tools, or way of thinking), an individual remote agent can choose to selectively reveal aspects of its state or reasoning process via messages, especially for UX purposes, such as providing user notifications to the caller agent. As long as the decision to reveal information is the responsibility of the remote agent, the interaction maintains its opaque nature.

Taken together, these tenets paint a picture of a protocol trying to be practical, secure, flexible, and respectful of the independent nature of agents. But principles on paper are one thing; how does A2A actually implement these ideas? To see that, we need to shift from the design philosophy to the nuts and bolts—the specific mechanisms and components that make agent-to-agent communication work.

Key Mechanisms and Components of A2A

Translating these principles into practice requires specific mechanisms. Central to enabling agents to understand each other within the A2A framework is the Agent Card. This component functions as a standardized digital business card for an AI agent, typically provided as a metadata file. Its primary purpose is to publicly declare what an agent is, what it can do, where it can be reached, and how to interact with it.

Here’s a simplified example of what an Agent Card might look like, conveying the essential information:

{
  "name": "StockInfoAgent",
  "description": "Provides current stock price information.",
  "url": "http://stock-info.example.com/a2a",
  "provider": { "organization": "ABCorp" },
  "version": "1.0.0",
  "skills": [
    {
      "id": "get_stock_price_skill",
      "name": "Get Stock Price",
      "description": "Retrieves current stock price for a company"
    }
  ]
}

(shortened for brevity)

The Agent Card serves as the key connector between the different actors in the A2A protocol. A client—which could be another agent or perhaps the application the user is interacting with—finds the Agent Card for the service it needs. It uses the details from the card, like the URL, to contact the remote agent (server), which then performs the requested task without exposing its internal methods and sends back the results according to the A2A rules.

Once agents are able to read each other’s capabilities, A2A structures their collaboration around completing specific tasks. A task represents the fundamental unit of work requested by a client from a remote agent. Importantly, each task is stateful, allowing it to track progress over time, which is essential for handling operations that might not be instantaneous—aligning with A2A’s “async first” principle.

Communication related to a task primarily uses messages. These carry the ongoing dialogue, including initial instructions from the client, status updates, requests for clarification, or even intermediate “thoughts” from the agent. When the task is complete, the final tangible outputs are delivered as artifacts, which are immutable results like files or structured data. Both messages and artifacts are composed of one or more parts, the granular pieces of content, each with a defined type (like text or an image).

This entire exchange relies on standard web technologies like HTTP and common data formats, ensuring a broad foundation for implementation and compatibility. By defining these core objects—task, message, artifact, and part—A2A provides a structured way for agents to manage requests, exchange information, and deliver results, whether the work takes seconds or hours.

Security is, of course, a critical concern for any protocol aiming for enterprise adoption, and A2A addresses this directly. Rather than inventing entirely new security mechanisms, it leans heavily on established practices. A2A aligns with standards like the OpenAPI specification for defining authentication methods and generally encourages treating agents like other secure enterprise applications. This allows the protocol to integrate into existing corporate security frameworks, such as established identity and access management (IAM) systems for authenticating agents, applying existing network security rules and firewall policies to A2A endpoints, or potentially feeding A2A interaction logs into centralized security information and event management (SIEM) platforms for monitoring and auditing.

A core principle is keeping sensitive credentials, such as API keys or access tokens, separate from the main A2A message content. Clients are expected to obtain these credentials through an independent process. Once obtained, they are transmitted securely using standard HTTP headers, a common practice in web APIs. Remote agents, in turn, clearly state their authentication requirements—often within their Agent Cards—and use standard HTTP response codes to manage access attempts, signaling success or failure in a predictable way. This reliance on familiar web security patterns lowers the barrier to implementing secure agent interactions.

A2A also facilitates the creation of a distributed “interaction memory” across a multi-agent system by providing a standardized protocol for agents to exchange and reference task-specific information, including unique identifiers (taskId, sessionId), status updates, message histories, and artifacts. While A2A itself doesn’t store this memory, it enables each participating A2A client and server agent to maintain its portion of the overall task context. Collectively, these individual agent memories, linked and synchronized through A2A’s structured communication, form the comprehensive interaction memory of the entire multi-agent system, allowing for coherent and stateful collaboration on complex tasks.

So, in a nutshell, A2A is an attempt to bring rules and standardization to the rapidly evolving world of agents by defining how independent systems can discover each other, collaborate on tasks (even long-running ones), and handle security using well-trodden web paths, all while keeping their inner workings private. It’s focused squarely on agent-to-agent communication, trying to solve the problem of isolated digital workers unable to coordinate.

But getting agents to talk to each other is only one piece of the interoperability puzzle facing AI developers today. There’s another standard gaining significant traction that tackles a related yet distinct challenge: How do these sophisticated AI applications interact with the outside world—the databases, APIs, files, and specialized functions often referred to as “tools”? This brings us to Anthropic’s Model Context Protocol, or MCP.

MCP: Model Context Protocol Overview

It wasn’t so long ago, really, that large language models (LLMs), while impressive text generators, were often mocked for their sometimes hilarious blind spots. Asked to do simple arithmetic, count the letters in a word accurately, or tell you the current weather, and the results could be confidently delivered yet completely wrong. This wasn’t just a quirk; it highlighted a fundamental limitation: The models operated purely on the patterns learned from their static training data, disconnected from live information sources or the ability to execute reliable procedures. But these days are mostly over (or so it seems)—state-of-the-art AI models are vastly more effective than their predecessors from just a year or two ago.

A key reason for the effectiveness of AI systems (agents or not) is their ability to connect beyond their training data: interacting with databases and APIs, accessing local files, and employing specialized external tools. Similarly to interagent communication, however, there are some hard challenges that need to be tackled first.

Integrating these AI systems with external “tools” involves collaboration between AI developers, agent architects, tool providers, and others. A significant hurdle is that tool integration methods are often tied to specific LLM providers (like OpenAI, Anthropic, or Google), and these providers handle tool usage differently. Defining a tool for one system requires a specific format; using that same tool with another system often demands a different structure.

Consider the following examples.

OpenAI’s API expects a function definition structured this way:

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Retrieves weather data ...",
    "parameters": {...}
  }
}

Whereas Anthropic’s API uses a different layout:

{
  "name": "get_weather",
  "description": "Retrieves weather data ...",
  "input_schema": {...}
}

This incompatibility means tool providers must develop and maintain separate integrations for each AI model provider they want to support. If an agent built with Anthropic models needs certain tools, those tools must follow Anthropic’s format. If another developer wants to use the same tools with a different model provider, they essentially duplicate the integration effort, adapting definitions and logic for the new provider.

Format differences aren’t the only challenge; language barriers also create integration difficulties. For example, getting a Python-based agent to directly use a tool built around a Java library requires considerable development effort.

This integration challenge is precisely what the Model Context Protocol was designed to solve. It offers a standard way for different AI applications and external tools to interact.

Similar to A2A, MCP operates using two key parts, starting with the MCP server. This component is responsible for exposing the tool’s functionality. It contains the underlying logic—maybe Python code hitting a weather API or routines for data access—developed in a suitable language. Servers commonly bundle related capabilities, like file operations or database access tools. The second component is the MCP client. This piece sits inside the AI application (the chatbot, agent, or coding assistant). It finds and connects to MCP servers that are available. When the AI app or model needs something from the outside world, the client talks to the right server using the MCP standard.

The key is that communication between client and server adheres to the MCP standard. This adherence ensures that any MCP-compatible client can interact with any MCP server, no matter the client’s underlying AI model or the language used to build the server.

Adopting this standard offers several advantages:

  • Build once, use anywhere: Create a capability as an MCP server once; any MCP-supporting application can use it.
  • Language flexibility: Develop servers in the language best suited for the task.
  • Leverage ecosystem: Use existing open source MCP servers instead of building every integration from scratch.
  • Enhance AI capabilities: Easily give agents, chatbots, and assistants access to diverse real-world tools.

Adoption of MCP is accelerating, demonstrated by providers such as GitHub and Slack, which now offer servers implementing the protocol.

MCP and A2A

But how do the Model Context Protocol and the Agent2Agent (A2A) Protocol relate? Do they solve the same problem or serve different functions? The lines can blur, especially since many agent frameworks allow treating one agent as a tool for another (agent as a tool).

Both protocols improve interoperability within AI systems, but they operate at different levels. By examining their differences in implementation and goals we can clearly identify key differentiators.

MCP focuses on standardizing the link between an AI application (or agent) and specific, well-defined external tools or capabilities. MCP uses precise, structured schemas (like JSON Schema) to define tools, establishing a clear API-like contract for predictable and efficient execution. For example, an agent needing the weather would use MCP to call a get_weather tool on an MCP weather server, specifying the location “London.” The required input and output are strictly defined by the server’s MCP schema. This approach removes ambiguity and solves the problem of incompatible tool definitions across LLM providers for that specific function call. MCP usually involves synchronous calls, supporting reliable and repeatable execution of functions (unless, of course, the weather in London has changed in the meantime, which is entirely plausible).

A2A, on the other hand, standardizes how autonomous agents communicate and collaborate. It excels at managing complex, multistep tasks involving coordination, discussion, and delegation. Rather than depending on rigid function schemas, A2A interactions utilize natural language, making the protocol better suited for ambiguous goals or tasks requiring interpretation. A good example would be “Summarize market trends for sustainable packaging.” Asynchronous communication is a key tenet of A2A, which also includes mechanisms to oversee the lifecycle of potentially lengthy tasks. This involves tracking status (like working, completed, and input required) and managing the necessary dialogue between agents. Consider a vacation planner agent using A2A to delegate book_flights and reserve_hotel tasks to specialized travel agents while monitoring their status. In essence, A2A’s focus is the orchestration of workflows and collaboration between agents.

This distinction highlights why MCP and A2A function as complementary technologies, not competitors. To borrow an analogy: MCP is like standardizing the wrench a mechanic uses—defining precisely how the tool engages with the bolt. A2A is like establishing a protocol for how that mechanic communicates with a specialist mechanic across the workshop (“Hearing a rattle from the front left, can you diagnose?”), initiating a dialogue and collaborative process.

In sophisticated AI systems, we can easily imagine them working together: A2A might orchestrate the overall workflow, managing delegation and communication between different agents, while those individual agents might use MCP under the hood to interact with specific databases, APIs, or other discrete tools needed to complete their part of the larger task.

Putting It All Together

We’ve discussed A2A for agent collaboration and MCP for tool interaction as separate concepts. But their real potential might lie in how they work together. Let’s walk through a simple, practical scenario to see how these two protocols could function in concert within a multi-agent system.

Imagine a user asks their primary interface agent—let’s call it the Host Agent—a straightforward question: “What’s Google’s stock price right now?”

The Host Agent, designed for user interaction and orchestrating tasks, doesn’t necessarily know how to fetch stock prices itself. However, it knows (perhaps by consulting an agent registry via an Agent Card) about a specialized Stock Info Agent that handles financial data. Using A2A, the Host Agent delegates the task: It sends an A2A message to the Stock Info Agent, essentially saying, “Request: Current stock price for GOOGL.”

The Stock Info Agent receives this A2A task. Now, this agent knows the specific procedure to get the data. It doesn’t need to discuss it further with the Host Agent; its job is to retrieve the price. To do this, it turns to its own toolset, specifically an MCP stock price server. Using MCP, the Stock Info Agent makes a precise, structured call to the server—effectively get_stock_price(symbol: "GOOGL"). This isn’t a collaborative dialogue like the A2A exchange; it’s a direct function call using the standardized MCP format.

The MCP server does its job: looks up the price and returns a structured response, maybe {"price": "174.92 USD"}, back to the Stock Info Agent via MCP.

With the data in hand, the Stock Info Agent completes its A2A task. It sends a final A2A message back to the Host Agent, reporting the result: "Result: Google stock is 174.92 USD."

Finally, the Host Agent takes this information received via A2A and presents it to the user.

Even in this simple example, the complementary roles become clear. A2A handles the higher-level coordination and delegation between autonomous agents (Host delegates to Stock Info). MCP handles the standardized, lower-level interaction between an agent and a specific tool (Stock Info uses the price server). This creates a separation of concerns: The Host agent doesn’t need to know about MCP or stock APIs, and the Stock Info agent doesn’t need to handle complex user interaction—it just fulfills A2A tasks, using MCP tools where necessary. Both agents remain largely opaque to each other, interacting only through the defined protocols. This modularity, enabled by using both A2A for collaboration and MCP for tool use, is key to building more complex, capable, and maintainable AI systems.

Conclusion and Future Work

We’ve outlined the challenges of making AI agents collaborate, explored Google’s A2A protocol as a potential standard for interagent communication, and compared and contrasted it with Anthropic’s Model Context Protocol. Standardizing tool use and agent interoperability are important steps forward in enabling effective and efficient multi-agent system (MAS) design.

But the story is far from over, and agent discoverability is one of the immediate next challenges that need to be tackled. When talking to enterprises it becomes glaringly obvious that this is often very high on their priority list. Because, while A2A defines how agents communicate once connected, the question of how they find each other in the first place remains a significant area for development. Simple approaches can be implemented—like publishing an Agent Card at a standard web address and capturing that address in a directory—but that feels insufficient for building a truly dynamic and scalable ecosystem. This is where we see the concept of curated agent registries come into focus, and it’s perhaps one of the most exciting areas of future work for MAS.

We imagine an internal “agent store” (akin to an app store) or professional listing for an organization’s AI agents. Developers could register their agents, complete with versioned skills and capabilities detailed in their Agent Cards. Clients needing a specific function could then query this registry, searching not just by name but by required skills, trust levels, or other vital attributes. Such a registry wouldn’t just simplify discovery; it would foster specialization, enable better governance, and make the whole system more transparent and manageable. It moves us from simply finding an agent to finding the right agent for the job based on its declared skills.

However, even sophisticated registries can only help us find agents based on those declared capabilities. Another fascinating, and perhaps more fundamental, challenge for the future: dealing with emergent capabilities. One of the remarkable aspects of modern agents is their ability to combine diverse tools in novel ways to tackle unforeseen problems. An agent equipped with various mapping, traffic, and event data tools, for instance, might have “route planning” listed on its Agent Card. But by creatively combining those tools, it might also be capable of generating complex disaster evacuation routes or highly personalized multistop itineraries—crucial capabilities likely unlisted simply because they weren’t explicitly predefined. How do we reconcile the need for predictable, discoverable skills with the powerful, adaptive problem-solving that makes agents so promising? Finding ways for agents to signal or for clients to discover these unlisted possibilities without sacrificing structure is a significant open question for the A2A community and the broader field (as highlighted in discussions like this one).

Addressing this challenge adds another layer of complexity when envisioning future MAS architectures. Looking down the road, especially within large organizations, we might see the registry idea evolve into something akin to the “data mesh” concept—multiple, potentially federated registries serving specific domains. This could lead to an “agent mesh”: a resilient, adaptable landscape where agents collaborate effectively under a unified centralized governance layer and distributed management capabilities (e.g., introducing notions of a data/agent steward who manages the quality, accuracy, and compliance of a business unit data/agents). But ensuring this mesh can leverage both declared and emergent capabilities will be key. Exploring that fully, however, is likely a topic for another day.

Ultimately, protocols like A2A and MCP are vital building blocks, but they’re not the entire map. To build multi-agent systems that are genuinely collaborative and robust, we need more than just standard communication rules. It means stepping back and thinking hard about the overall architecture, wrestling with practical headaches like security and discovery (both the explicit kind and the implicit, emergent sort), and acknowledging that these standards themselves will have to adapt as we learn. The journey from today’s often-siloed agents to truly cooperative ecosystems is ongoing, but initiatives like A2A offer valuable markers along the way. It’s undoubtedly a tough engineering road ahead. Yet, the prospect of AI systems that can truly work together and tackle complex problems in flexible ways? That’s a destination worth the effort.