Vibe coding is dead, long live agentic engineering

A year ago, Andrej Karpathy fired off what he’d later call a “shower of thoughts throwaway tweet” and accidentally named an entire movement. Vibe coding, the practice of conversationally prompting an LLM to write code while you mostly just accept whatever it produces, captured the giddiness of early generative AI tooling. Developers loved it. Product managers loved it. Investors really loved it.

Then people started shipping vibe-coded software to production. Production fought back.

In February 2026, Karpathy returned with a correction. Vibe coding is “passe,” he wrote. The new term is agentic engineering, “agentic because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight. Engineering to emphasize that there is an art & science and expertise to it.”

That distinction matters more than it sounds. On one side you’ve got 41% of worldwide code generated by AI, a $2 billion ARR company that barely existed three years ago. On the other: an 87% vulnerability rate in AI-generated pull requests, open source maintainers drowning in bot submissions, and a growing class of production failures that nobody fully understands because nobody fully wrote the code.

Here’s what’s actually happening.

The numbers are big and real

Let’s start with the stuff that’s hard to argue with.

Cursor crossed $2 billion in annualized revenue in February 2026, according to Bloomberg. That’s a doubling in three months. The first billion took three years. The second took one quarter. Cursor is now valued at $29.3 billion, and roughly 60% of its revenue comes from enterprise customers, a shift from its indie developer roots that tells you where the real money is flowing.

GitHub Copilot hit 4.7 million paid subscribers and penetrated approximately 90% of Fortune 100 companies. At $10/month for the base tier, it’s the default entry point. Copilot’s suggestion acceptance rate hovers around 30%, which sounds low until you realize that’s 30% of all keystrokes in those codebases being handled by a machine.

41% of all code written worldwide is now AI-generated. That figure comes from aggregated industry surveys and gets cited across enough independent sources that it’s become the consensus benchmark. In 2024, that represented roughly 256 billion lines. Three-quarters of professional developers are either actively using AI coding tools or planning to adopt them this year.

These aren’t projections. This is the baseline.

The production failure problem

Here’s what the growth charts don’t show: vibe-coded software is failing in production at rates that should worry anyone shipping it. The failure modes are predictable, which somehow makes it worse.

The single most common cause? Missing error handling when APIs fail. AI-generated code calls external services without checking response status. When the API returns a 500 or the network drops, the app crashes. A junior developer learns to handle this in their first month. But AI agents don’t learn from production incidents the way humans do. They learn from training data, and training data is full of happy-path examples.

The numbers get worse when you look systematically. AI-authored code shows 1.7x more issues than human-written code. Misconfigurations are 75% more common. Security vulnerabilities appear at 2.74x the human rate.

A recent study by Help Net Security tested three major coding agents (Claude Code, OpenAI Codex, and Google Gemini) by having them build real applications. Across 30 pull requests, 87% contained at least one security vulnerability. Broken access control appeared in every single agent’s output. OAuth implementations were consistently flawed. Rate limiting middleware was defined but never actually connected. JWT secrets were hardcoded as fallbacks.

The agents wrote code that looked correct. It compiled. It passed basic tests. And it was riddled with the kind of security holes that make a pentester’s day.

The deeper problem is context degradation. The longer an agent works on a codebase, the worse its output gets. It loses track of earlier decisions. Code starts contradicting itself. An agent writing 1,000 PRs per week with even a 1% vulnerability rate creates 10 new security holes weekly. Scale the productivity gains, and you scale the risk.

The market fragmented, and that’s probably good

If you haven’t been tracking the AI coding tool landscape closely, you might think it’s still Copilot versus everything else. Not anymore. The market has split into two distinct stacks, and most serious engineering teams use tools from both.

Stack 1: IDE agents, tools embedded directly into your editor that enhance the moment-to-moment coding experience.

Cursor ($16/mo) leads here with the most polished UX, background agents, and the largest community. Its enterprise pivot is paying off.
Windsurf ($15/mo) ranked #1 in LogRocket’s Power Rankings in February 2026, differentiating through its “Flow” paradigm that maintains context across longer sessions.
GitHub Copilot ($10/mo) remains the volume leader with 4.7 million subscribers and that Fortune 100 footprint. It added Agent Mode this year.

Stack 2: Terminal agents, standalone CLI tools that operate autonomously across entire codebases.

Claude Code sits at the top of SWE-bench with Opus 4.6 (80.8% score) and a 1 million token context window. Agent Teams allows multi-agent orchestration. This is the one people reach for on complex refactors and autonomous feature builds.
Aider is the open-source option with deep git integration, where every AI change gets automatically committed with a descriptive message. Works with Claude, GPT, DeepSeek, and local models.
OpenAI Codex runs GPT 5.2 in a CLI interface. Google’s Antigravity launched with multi-agent orchestration from day one.

The practical recommendation emerging from developers who’ve used all of them: pick one from each stack. Cursor or Windsurf for your daily IDE work. Claude Code or Codex for the hard problems where you need an agent to think across hundreds of files for hours.

What Anthropic’s report actually says

Anthropic released its 2026 Agentic Coding Trends Report in March. It’s the most data-grounded assessment of where things stand. Eight trends, but three matter most.

Developers use AI in 60% of their work but fully delegate only 0-20% of tasks. Sit with that for a second. Even at companies where AI coding tools are standard, humans maintain active oversight on 80-100% of what the agent produces. Fully autonomous coding agents aren’t dead as a concept, but the reality is much further away than the hype suggests. Developers treat AI as a “constant collaborator,” not a replacement.

Multi-agent systems are replacing single agents. The report describes organizations moving from one agent doing everything to “groups of specialized agents working in parallel under an orchestrator.” One agent handles testing. Another reviews security. A third writes documentation. This requires a new skill that didn’t exist two years ago: task decomposition for AI systems. Figuring out how to break work into chunks that agents can handle independently, with clear interfaces between them.

Non-engineers are building software. Domain experts in sales, legal, marketing, and operations are using coding agents to build internal tools. This is different from the no-code movement because the output is actual code that can be version-controlled, tested, and maintained. The report calls this “agentic coding beyond engineering teams.” It’s quietly reshaping who builds software inside organizations.

The security crisis is real and getting structural

The 87% vulnerability rate from the Help Net Security study isn’t an outlier. It reflects a structural problem. AI coding agents don’t understand your application’s risk model. They don’t know your threat landscape or read your security policies. They produce code that’s functionally correct and structurally insecure.

The ten recurring vulnerability patterns are consistent across all major agents: broken access control, business logic failures, OAuth flaws, missing WebSocket auth, rate limiting gaps, weak JWT management. These are textbook OWASP Top 10 issues. The agents know about them in isolation (ask Claude about broken access control and you’ll get a perfect answer) but they consistently fail to apply that knowledge in context.

The threat extends beyond code quality too. The ClawHub supply chain attack, where researchers found 1,184 malicious packages (roughly one in five in the ecosystem), shows that agent infrastructure itself is becoming an attack surface. When agents can install dependencies, execute code, and access credentials, a compromised agent skill becomes a full system compromise.

A generic security reminder in prompts improved secure output from 56% to 66% in testing with Claude Opus 4.5 Thinking. That’s meaningful but hardly sufficient. The gap between “sometimes writes secure code” and “reliably produces production-grade security” remains wide.

What “structured human oversight” actually looks like

Karpathy’s phrase “acting as oversight” is doing a lot of work, and the industry is just starting to figure out what it means in practice.

The Anthropic report’s data point (60% AI usage, 0-20% full delegation) suggests most teams have arrived at an informal version: use AI for the first draft, review everything, fix what it gets wrong. That works at small scale. It breaks fast. When your agents are producing thousands of lines per day across dozens of repositories, you need systems.

The emerging practice has a few components. Spec-driven development means writing functional specifications that guide agent behavior before the agent writes a line of code. Automated security scanning treats all AI-generated code as untrusted by default, same as you’d treat a pull request from a contractor you’ve never worked with. Agentic governors are senior engineers whose role shifts from writing code to reviewing agent output, maintaining architectural coherence, and verifying compliance. Scaled oversight uses AI itself to review AI output, with humans adjudicating disagreements.

This is the operational reality of agentic engineering. Not glamorous. Doesn’t fit in a tweet. But it’s the difference between shipping reliable software and shipping time bombs.

Enterprise adoption is the real story

Gartner predicts 40% of enterprise applications will integrate task-specific AI agents by the end of 2026, up from less than 5% in 2025. That’s an 8x increase in twelve months.

But Gartner also predicts over 40% of agentic AI projects will be canceled by the end of 2027. Both can be true at the same time: massive adoption followed by massive culling as organizations discover the gap between demo-quality agent output and production-grade reliability.

The enterprise story is really about who buys these tools and why. Cursor’s revenue going from indie-developer-driven to 60% enterprise tells you where the money is moving. Individual developers adopted AI coding tools because they were fun and made them faster. Enterprises are adopting them because the productivity math is hard to ignore. But they’re also discovering that governance, security, and oversight infrastructure needs to come along for the ride.

Where this goes

Agentic engineering isn’t a rebrand of vibe coding. It’s an acknowledgment that vibe coding was a phase, a necessary and occasionally reckless one, and the industry has moved past it.

The developers who thrive here won’t be the ones who write the most code or even the ones who prompt the best. They’ll be the ones who can decompose complex problems into agent-manageable tasks and design oversight systems that catch vulnerabilities before they hit production. Maintaining architectural coherence across a codebase that’s increasingly written by machines is its own skill. We’re only starting to figure out what it looks like.

Writing code was never the hard part. Deciding what to build, understanding why it matters, making sure it works at scale. The agents handle the typing now. Everything else just got more important.

Sources: Bloomberg, TechCrunch, Anthropic 2026 Agentic Coding Trends Report, Gartner, Help Net Security, The New Stack, GitHub. All statistics sourced from Q1 2026 data unless otherwise noted.