Last updated: March 2026

OpenClaw held the top spot through much of February 2026. That is a meaningful signal for agent interest, but not enough on its own to settle the broader market story.
According to Artificial Analysis, an independent AI benchmarking firm, we’ve crossed into what they call the “post-ChatGPT era”: one where personal AI agents, not chatbots, define how people interact with AI. Their 2025 year-end report lays out five major trends, and the first three all point to the same conclusion: 2026 is the year agents go mainstream.
The Chinese tech publication “特工宇宙” (Agent Universe) connected the dots in a March 1 piece: OpenClaw’s sustained visibility may reflect more than one tool getting popular. It suggests the infrastructure for agentic AI is maturing enough to support wider adoption.
Reasoning Models Became Standard Issue
A year ago, OpenAI’s o1 was the only reasoning model on the market. By the end of 2025, every major lab had shipped one. These models don’t just respond — they “think” before answering, generating internal reasoning chains that boost performance on complex tasks.
The shift shows up in the benchmarks. At the start of 2025, the smartest models were non-reasoning systems. By December, the top three spots all belonged to reasoning-first architectures from OpenAI, Anthropic, and Google.
This matters for agents because reasoning models handle multi-step workflows better. They can plan, backtrack, and adjust strategy mid-task instead of blindly executing commands. That’s the difference between a chatbot that answers questions and an agent that gets things done.
Code Agents Led, Enterprise Agents Follow
2025 was the year coding agents proved the concept. Tools like Cursor, Windsurf, and Cline showed that AI could handle long-horizon tasks: refactoring codebases, debugging across multiple files, implementing features end-to-end.
Artificial Analysis expects 2026 to be the year that same capability spreads beyond code. Agents that can navigate enterprise software, coordinate across tools, and execute multi-step business workflows without constant human supervision.
OpenClaw sits right at this inflection point. It’s not a coding tool — it’s a general-purpose agent runtime that connects to email, calendars, messaging platforms, cloud services, and local systems. Its adoption curve suggests that users are increasingly willing to try agents that do more than write code.
The Security Reckoning Arrived Early
OpenClaw’s popularity also made it a target. In late February, security researchers at Oasis Security disclosed “ClawJacked,” a high-severity vulnerability that let malicious websites hijack locally running OpenClaw instances via WebSocket connections.
The attack worked because OpenClaw’s gateway trusted localhost connections too much. A developer visiting a compromised site could have their agent silently taken over — no user prompt, no warning. The attacker would gain full control: read logs, dump configs, interact with connected services.
OpenClaw patched it in under 24 hours (version 2026.2.25), but the incident exposed a broader problem: AI agents have massive blast radius. They hold credentials for multiple systems, execute commands across enterprise tools, and operate with elevated permissions. Compromise one agent, compromise everything it touches.
Other vulnerabilities followed: log poisoning (CVE-2026-25593), command injection (CVE-2026-24763), authentication bypass (CVE-2026-25475). All patched, but the pattern is clear — agentic AI introduces a new attack surface that traditional security models weren’t built for.
Microsoft’s Defender team issued a blunt advisory: treat OpenClaw like untrusted code execution with persistent credentials. Don’t run it on your main workstation. Use isolated VMs, dedicated non-privileged credentials, and continuous monitoring.
The Malware Ecosystem Evolved Too
ClawHub, OpenClaw’s skill marketplace, became a distribution channel for malware. Researchers found 71 malicious skills disguised as legitimate tools — cryptocurrency utilities that redirected funds, image generators that stole wallet keys, productivity scripts that downloaded Atomic Stealer.
One threat actor, operating as “BobVonNeumann” on Moltbook (a social network for AI agents), ran an agent-to-agent scam. Their malicious agent promoted infected skills directly to other agents, exploiting the trust that agents extend to each other by default. Supply chain attack meets social engineering.
Trend Micro documented the infection chain: a normal-looking SKILL.md file installs a “prerequisite” by fetching instructions from an external site. The instructions include a command to download and run an Atomic Stealer payload. The skill passes VirusTotal scans because the malicious code isn’t in the skill itself — it’s fetched at runtime.
This is the new threat model. Agents don’t just execute what you tell them. They fetch dependencies, follow instructions from external sources, and make decisions about what code to run. If an attacker can inject malicious instructions anywhere in that chain, they own the agent.
What the Numbers Say
Artificial Analysis tracked cost and intelligence improvements through 2025. The standout stat: o1-level intelligence got 128x cheaper over the year. Smaller models reached higher capability levels, and hardware/software optimizations drove per-token costs down.
By year-end, OpenAI, xAI, and Anthropic held a clear lead in reasoning model intelligence, pulling ahead of other labs. But the gap between open-weight and proprietary models stayed roughly constant — open models kept pace, but didn’t close the frontier gap.
China’s labs, particularly those in Beijing, emerged as the center of gravity for open-weight frontier models. DeepSeek, Moonshot AI, and Minimax all shipped competitive reasoning systems. The U.S. still has strong proprietary contenders such as Google Gemini 3.1 Pro, while China remains especially active in open-weight releases.
The Agent Workflow Shift
One insight from the report: when you move to agent workflows, more output tokens doesn’t mean more intelligence. What matters is effective tool use.
Reasoning models generate long internal monologues during the “thinking” phase, which inflates token counts. But the real performance gains come from how well the model coordinates external tools: APIs, databases, file systems, other agents.
This changes how you evaluate models. Benchmark scores on static datasets matter less. What matters is: can it maintain context across a 50-step workflow? Does it recover gracefully from API errors? Can it parallelize subtasks?
OpenClaw’s architecture is built for this. It’s not a model — it’s a runtime that lets any model (Claude, GPT, Gemini) orchestrate tools and maintain state across long-running tasks. That helps explain why it has continued to attract adoption even as the underlying models keep changing.
Where This Goes
Artificial Analysis predicts 2026 as the year general-purpose agents move from early adopters to mainstream enterprise use. The infrastructure is ready: reasoning models are standard, tool-calling is universal, and frameworks like OpenClaw handle the orchestration layer.
But the security model isn’t ready. Microsoft’s advice to run agents in isolated VMs isn’t practical at scale. If agents are going to operate across enterprise systems, we need better identity isolation, runtime sandboxing, and credential management.
The malware problem needs solving too. ClawHub’s open marketplace model is powerful but vulnerable. We need better supply chain security: code signing, sandboxed skill execution, runtime behavior monitoring.
OpenClaw’s February surge suggests that demand exists. The security incidents make clear that the risks are real. 2026 will determine whether the industry can scale agents safely, or whether security constraints slow adoption.
The post-ChatGPT era is here. The question is whether we’re ready for it.
Related guide: OpenAI’s $110B Funding Round and What It Signals for the AI Market.
Related Reading:
- Best AI Coding Tools 2026
- AI Security Tools and Best Practices
- Enterprise AI Adoption Trends