Back to Blog
AI Infrastructure

MCP Hit 97 Million Downloads — But Can It Work in Production?

Every major AI company backs MCP. Then Perplexity's CTO said they're abandoning it. Tool descriptions eat 40-72% of context windows. Adoption and production-readiness are very different things.

Augmi Team|
mcpmodel-context-protocolai-agentsprotocolsproductioninfrastructure
MCP Hit 97 Million Downloads — But Can It Work in Production?

MCP hit 97 million downloads, but can it actually work in production?

The protocol everyone adopted and nobody mastered


Something strange is happening with the Model Context Protocol. The numbers say it won. 97 million monthly SDK downloads. 10,000+ public servers. Official backing from OpenAI, Google, Anthropic, Microsoft, and Amazon. Bloomberg runs it across 9,500 engineers. Block claims 75% time savings. The Linux Foundation gave it a permanent home.

And yet Perplexity’s CTO just announced they’re ripping it out of their internal stack.

Both of these things are true at the same time. That tension tells you everything about where AI infrastructure actually stands in March 2026.

The adoption metrics don’t lie

Let’s start with what’s real. MCP’s Python and TypeScript SDKs crossed 97 million monthly downloads. Not GitHub stars or conference buzz. Developers pulling packages into actual codebases. For context, Express.js gets about 30 million weekly downloads. React gets about 25 million. MCP is in that tier now.

The enterprise deployments are just as concrete. Bloomberg applied dependency inversion through MCP, treating prompts and toolchains as configuration rather than hardcoded components. They built remote multi-tenant MCP servers with identity-awareness and separated agent logic from application logic. Experimentation cycles that took weeks collapsed to minutes. Nine thousand five hundred engineers across the organization now use the system.

When Anthropic donated MCP to the Linux Foundation’s Agentic AI Foundation in December 2025, it removed the last credible objection. OpenAI, Google, Microsoft, Amazon, Block, Bloomberg, and Cloudflare all signed on as platinum members. The “single-vendor protocol” argument died. MCP belongs to the industry now.

Then why is Perplexity leaving?

At Ask 2026, Perplexity CTO Denis Yarats dropped a bomb that ricocheted through every AI engineering Slack channel I’m in. Perplexity is moving away from MCP for internal production systems. They still maintain an MCP server for external developers, but internally, they’re switching to APIs and CLIs.

The reason is specific and damning: tool descriptions eat context windows alive.

Every MCP tool comes with a schema. Name, description, parameters, types, response format. An AI agent needs all of this in its context to decide which tools to call. With a handful of tools, fine. With dozens or hundreds, the overhead becomes absurd. One team documented burning 143,000 of 200,000 available tokens (72% of their context window) on tool definitions alone. Before the user even asks a question.

Scalekit’s analysis found MCP costs 4-32x more tokens than CLI equivalents for identical operations. Not a rounding error. An architectural problem.

Yarats also cited authentication friction. Each MCP server handles its own auth flow, which creates compound complexity when an agent needs to talk to fifteen different servers. The Perplexity Agent API, their alternative, routes to models from OpenAI, Anthropic, Google, xAI, and NVIDIA through a single endpoint with one API key.

Here’s my read: Perplexity’s move is pragmatic, not prophetic. Their specific use case (many tools, tight token budgets, latency-sensitive search) hits MCP’s weakest point directly. An enterprise running five MCP servers for internal tooling won’t feel the same pain. But Yarats identified something the MCP community needs to take seriously. The protocol’s verbosity is a real constraint, not a theoretical one.

The security problem nobody wants to talk about

While the adoption debate plays out in public, a quieter and potentially more dangerous problem is building underneath.

On March 19, Qualys published a report calling MCP servers “the new shadow IT for AI.” Their framing is blunt: MCP servers get deployed to test something, bind to localhost on a random port, and then quietly become production dependencies that security teams never see.

The numbers are uncomfortable. According to Astrix research cited in the report, 53% of MCP servers rely on static secrets, long-lived tokens that are difficult to rotate and routinely over-scoped. Even read-only MCP endpoints leak valuable reconnaissance data: internal system names, tool schemas, resource paths, namespace structures.

But the deeper issue is architectural. MCP tools can trigger deployments, modify configurations, run database queries, create tickets. They’re described in natural language and invoked autonomously by AI agents based on context. A prompt injection attack doesn’t need to exploit a traditional vulnerability. It just needs to nudge an agent toward selecting the wrong tool with the wrong parameters.

Doyensec’s March 5 research documented specific attack vectors: tool poisoning (including “rug pulls” where a server changes tool behavior after gaining trust), prompt injection via tool responses, and OAuth implementation flaws enabling one-click account takeover. They found MCP servers that failed to properly bind OAuth state to user sessions, allowing CSRF-style attacks where a malicious link leaks authorization codes.

Solo.io went further, calling MCP authorization “a non-starter for enterprise” in its current form. The core issue: the MCP authorization flow is multi-hop by nature. A user talks to an AI host, which talks to an MCP client, which talks to one or more MCP servers, which talk to backend APIs. Identity and permissions must flow through every boundary. Enterprise SSO wasn’t designed for this topology.

The three-layer stack taking shape

While these debates continue, something architecturally significant is crystallizing. Three protocols are forming a stack that looks like the layered model that made the internet work.

MCP sits at the tool layer. How agents access external capabilities: databases, APIs, file systems, services. Think of it as the application layer of agentic AI.

A2A handles collaboration. Google released A2A in April 2025 and donated it to the Linux Foundation two months later. It defines how agents discover each other (via Agent Cards at /.well-known/agent-card.json), delegate tasks, synchronize state, and authenticate. If MCP gives agents hands, A2A gives them colleagues.

WebMCP opens the browser. In February 2026, Chrome 146 Canary shipped with built-in WebMCP support, developed jointly by Google and Microsoft engineers through the W3C. WebMCP lets any website expose structured, callable tools to AI agents through a browser API (navigator.modelContext). Two approaches: a declarative API that annotates existing HTML forms, and an imperative JavaScript API for complex interactions.

This part doesn’t get enough attention. WebMCP means billions of existing web pages could become structured tool interfaces for agents. Not through scraping or prompt engineering, through a native browser API that websites opt into. I think this will matter more than most people realize.

Google published a developer guide showing all three protocols working together: a supply chain agent that checks inventory databases (MCP), coordinates with supplier agents (A2A), executes transactions, and renders dashboards, all through standardized protocol layers rather than custom integration code.

The roadmap admits what’s broken

The MCP team published their 2026 roadmap on March 9, and it’s one of the more honest technical roadmaps I’ve read. No aspirational hand-waving. Four priority areas, each addressing problems the community has already hit.

Transport scalability. Stateful sessions fight with load balancers. Horizontal scaling requires workarounds. No standard discovery mechanism exists for registries. The fix: evolving the HTTP transport model and creating .well-known metadata formats so servers can advertise capabilities without requiring a full connection.

Agent communication. The Tasks primitive (SEP-1686) shipped experimentally and real deployments surfaced gaps. Retry semantics for transient failures don’t exist. Expiry policies for completed results aren’t defined. Boring but essential. These are the details that separate experimental from production-grade.

Governance maturation. The current bottleneck is that everything flows through core maintainers. The new model: specialized working groups that can approve proposals in their domain, with core maintainers keeping strategic oversight. This is how you scale an open-source protocol without creating governance gridlock.

Enterprise readiness. Audit trails, SSO-integrated auth, gateway behavior, configuration portability. There’s no dedicated working group yet. It’s an open invitation. The roadmap explicitly notes that most solutions will extend the spec rather than modify core functionality.

No fixed release dates. Working groups control delivery timelines. This is either mature governance or a recipe for drift, depending on your experience with open-source foundations. Honestly, I could see it going either way.

What this means if you’re building

Here’s where I land after spending weeks in the data.

MCP will survive its growing pains. The combination of network effects, institutional backing, and vendor-neutral governance creates durability that individual technical complaints can’t overcome. HTTP survived much worse criticism in the 1990s.

The context window problem is real but solvable. Dynamic tool loading, registry-based filtering, and compressed schema formats are all active areas of development. As context windows expand (and they will), the ratio of overhead to useful context improves. But teams running 50+ tools today need workarounds now. Not eventually.

Enterprise security is 12-18 months behind adoption. If you’re deploying MCP in a regulated industry, you need gateway solutions, centralized auth, and audit logging that the spec doesn’t yet provide. Third-party MCP gateways (Cloudflare, Solo.io, and others) are filling this gap, but the landscape is fragmented.

The three-layer stack is the right mental model. MCP for tools, A2A for agents, WebMCP for web interfaces. Building deeply against one while ignoring the others will create the same kind of technical debt that ignoring REST created for SOAP-era architectures.

Perplexity is an edge case, not the center. Their token-sensitive, many-tool, low-latency use case amplifies MCP’s weaknesses. Most enterprise deployments with 5-20 tools in a 200K context window won’t feel the same pressure. But if you’re building something like Perplexity, with many integrations and tight response budgets, pay attention to their experience.

The honest summary: MCP is probably the most important AI infrastructure protocol of the decade. It’s also immature, insecure by default, and expensive in ways its creators are only now addressing. Both things are true. The teams that succeed will be the ones who adopted early enough to learn the failure modes and disciplined enough to build guardrails the spec doesn’t yet require.

97 million downloads bought MCP inevitability. The next year determines whether it earns trust.


Sources: MCP Official Blog, Qualys TotalAI, Doyensec, Solo.io, Bloomberg case study via Glama, Perplexity Ask 2026, Google Developers Blog, VentureBeat, The New Stack, Linux Foundation AAIF announcement.

0 views