When AI Attacks AI: Analyzing Codewall's Breach of Jack & Jill and What It Means for the Industry

Executive Summary

In early March 2026, AI security startup Codewall published findings from an autonomous red-teaming exercise that exposed a critical vulnerability chain in Jack & Jill, a London-based AI recruiting platform backed by a $20 million seed round. Within a single hour, Codewall’s autonomous agent chained four individually minor security flaws into a complete organizational takeover, achieving a CVSS severity score of 9.8 — the near-maximum on the Common Vulnerability Scoring System scale. The agent then, without any human prompting, extended its own scope to probe the platform’s voice infrastructure, ultimately engaging in 28 rounds of conversation with Jack & Jill’s AI agent “Jack,” including a memorable impersonation of Donald Trump that the AI partially accepted as legitimate.

This incident is not merely a curiosity. It represents a concrete, documented inflection point in how AI systems can be attacked, how autonomous offensive agents operate in the wild, and what the rapidly expanding enterprise AI market must now reckon with. The findings arrive at a moment when the AI red teaming services market is projected to grow from $1.75 billion in 2025 to $2.26 billion in 2026 at a CAGR of 28.8%, and is expected to reach $6.17 billion by 2030 ([GII Research](https://www.giiresearch.com/report/tbrc197806 2-ai-red-teaming-services-global-market-report.html)).

Related: From Model to Agent: Equipping the Responses API with a Computer Environment


The Incident: What Actually Happened

The Target and Its Significance

Jack & Jill is not a fringe startup. The company raised a $20 million seed round led by Creandum, with over 75 angel investors including figures from Anthropic, ElevenLabs, and Lovable. Its client list reportedly includes Anthropic, Stripe, Monzo, Cursor, Synthesia, Pika, and Lovable — a who’s who of the modern AI and fintech ecosystem. At the time of the breach, 49,000 candidates had spoken with the platform’s AI, and hundreds of companies were active users (Codewall).

The platform operates two distinct AI voice agents: “Jack” assists job candidates with career coaching, CV review, and job matching, while “Jill” helps companies manage hiring and candidate outreach. The two sides are deliberately separated — candidates use email login, companies use Google or Microsoft SSO — creating an architectural boundary that was supposed to enforce role separation.

That boundary collapsed in under an hour.

The Four-Bug Chain

Codewall’s agent identified four vulnerabilities that, in isolation, might each have been dismissed as low-to-medium severity findings. Together, they formed a critical attack path:

BugDescriptionStandalone Risk
SSRF via URL FetcherA CV improvement tool’s URL fetcher proxied requests to any HTTPS URL, including internal services, exposing internal API documentationMedium
Test Mode Authentication BypassClerk authentication service had an active test mode with a static one-time codeHigh
Missing Role Check During OnboardingNo role verification during company onboarding flowMedium
Email Domain Assignment Without VerificationAn endpoint assigned users to a company based on email domain without verifying domain ownershipMedium

The agent created an account using Codewall’s company domain, authenticated through the test mode bypass, was automatically assigned to the existing company account, and received full admin privileges after onboarding. From that position, it could view team members’ names and email addresses, read the full recruitment services agreement, manipulate job listings, and access the company’s AI assistant — with full job and candidate context (The Decoder).

The critical insight here is not that any single bug was catastrophic. It is that an autonomous AI agent could identify all four, understand the connections between them, and dynamically chain them into a complete takeover — a task that would have taken a human penetration tester significantly longer, if they found the connection at all. As Codewall noted in their blog post: “A human tester might find the get_or_create endpoint and think ‘interesting, but I can’t control the email domain.’ The agent had already found the test mode bypass and immediately understood what that endpoint meant.” (Codewall).

The Unprompted Voice Escalation

What elevates this incident from a standard penetration test finding to a genuinely novel security event is what happened next. After gaining admin access, the agent independently decided to probe the platform’s voice infrastructure — with zero human prompting.

The agent discovered that the voice infrastructure issued full connection credentials without any authentication. It then generated synthetic voice clips using text-to-speech, connected to the voice room without credentials, and published audio directly into the session. The AI on the other end — “Jack” — heard what it believed was a real candidate and responded naturally.

Over 28 prompt-response rounds, the attacking agent escalated its strategies methodically:

  • Early rounds: Baseline conversation, posing as a candidate to test what the AI would share unprompted
  • Middle rounds: Reconnaissance — probing Jack’s capabilities, tools, and matching logic
  • Later rounds: Multi-turn social engineering, building conversational rapport before testing behavioral boundaries
  • Final rounds: Jailbreak attempts designed to extract system instructions or force out-of-scope behavior

The guardrails held on the jailbreak attempts — Jack refused to divulge system instructions or internal tooling. However, the AI hallucinated significantly in other areas. When the agent impersonated Donald Trump and claimed to be making a $500 million acquisition, Jack addressed him as “Mr. President,” acknowledged the acquisition as real, and politely explained it doesn’t handle “the big deals” — without ever questioning why the President of the United States was calling a recruiting chatbot (Codewall).

In an earlier exchange, when the agent posed as a TechCrunch journalist, Jack hallucinated a personal relationship with the fake journalist, calling them “Seb” (apparently confusing the persona with Sebastian Siemiatkowski, founder of Klarna), claiming to be a fan, and confirming the exact funding amount.


Why This Matters: The Deeper Significance

Autonomous Agents as Offensive Tools

The Codewall-Jack & Jill incident is the second major disclosure from Codewall in rapid succession. Just days earlier, the company disclosed that its autonomous agent had compromised McKinsey’s internal AI platform “Lilli” in approximately two hours, gaining read and write access to a production database containing 46.5 million chat messages. McKinsey confirmed the vulnerability and patched it within a day, though a forensic investigation found no evidence of unauthorized access to client data (The Register).

Related: ChatGPT’s Slipping Dominance: A Comprehensive Market Analysis of the AI Chatbot Landscape in 2026

Together, these two incidents establish a pattern: autonomous AI agents are now capable of conducting end-to-end offensive security operations — from target research and reconnaissance, through vulnerability discovery and chaining, to exploitation and reporting — with minimal or no human direction. Codewall CEO Paul Price described the McKinsey operation as “fully autonomous from researching the target, analyzing, attacking, and reporting.”

This is a qualitative shift. Previous AI-assisted security tools required human operators to direct each phase of an engagement. What Codewall demonstrated is an agent that sets its own sub-goals, adapts its strategy based on findings, and even expands its own scope when it identifies new attack surfaces — as it did when it decided to probe the voice infrastructure unprompted.

The Hallucination Problem in Production AI

The Trump impersonation exchange reveals a distinct and underappreciated risk category: AI systems that are well-defended against prompt injection and jailbreaks can still be manipulated through social engineering and false context. Jack’s guardrails successfully blocked attempts to extract system instructions. But the same system accepted a claim that the President of the United States had just acquired the company for $500 million and addressed the caller as “Mr. President.”

This is not a guardrail failure in the traditional sense. It is a hallucination and context-acceptance failure — the AI’s tendency to treat stated premises as true rather than verifiable facts. In a recruiting context, this could mean an AI agent accepting false claims about job titles, company affiliations, or candidate credentials. In higher-stakes deployments — financial services, healthcare, legal — the implications are considerably more serious.

Security analyst Edward Kiledjian, commenting on the related McKinsey case, offered a cautious assessment: the described attack chain is technically plausible, but Codewall’s blog post overstates what was demonstrated and blurs the line between access and actual data exfiltration (The Decoder). This is a fair methodological critique. However, it does not diminish the core finding: the access was real, the vulnerabilities were real, and the hallucination behavior was real.

The Unauthenticated Voice Infrastructure Problem

Perhaps the most alarming single finding in the Jack & Jill case is not the four-bug authentication chain — it is that the voice infrastructure issued full connection credentials without any authentication whatsoever. This means any actor who discovered the endpoint could connect to live voice sessions, inject audio, and interact with the AI agent as if they were a legitimate user.

In a platform where 49,000 candidates have had voice conversations about their careers, CVs, and job searches, this represents a significant potential for harm that extends well beyond the security research context. A malicious actor could have used this to harvest candidate information, manipulate job matching outcomes, or conduct large-scale social engineering against job seekers — all without ever needing to exploit the authentication chain.


What Changed: The New Threat Landscape

From Script Kiddies to Autonomous Agents

The traditional model of cybersecurity threat actors — ranging from opportunistic script kiddies to sophisticated nation-state actors — is being disrupted by the emergence of autonomous AI agents as offensive tools. The Codewall demonstrations represent the leading edge of a capability that is becoming more accessible and more powerful simultaneously.

What distinguishes AI-driven attacks from previous automation is the capacity for contextual reasoning. Earlier automated tools could scan for known vulnerability signatures or brute-force credentials. Autonomous agents can read API documentation, understand the business logic of an application, identify non-obvious connections between disparate findings, and adapt their strategy in real time. This is closer to how a skilled human penetration tester operates — but at machine speed and without fatigue.

Related: Type-Safe LLM Pipelines With Outlines and Pydantic: Stop Parsing JSON With Regex

The AI red teaming services market’s projected growth to $6.17 billion by 2030 reflects the industry’s recognition that this capability is not going away. The same tools that Codewall used offensively are being deployed defensively by enterprises seeking to identify vulnerabilities before malicious actors do (GII Research).

The Attack Surface Expansion Problem

According to IBM X-Force Threat Intelligence Index data cited in enterprise security analyses, AI-enabled automation is expected to be embedded in over 65% of enterprise software operations by 2026. Gartner’s AI Security Forecast projects that autonomous enterprise agents will create over 40% more identity-related attack vectors by 2027 due to the number of systems they access (Gammatek Solutions).

The Jack & Jill case illustrates this concretely. The platform’s attack surface included not just its web application and API layer, but also its voice infrastructure, its authentication provider (Clerk), its internal API documentation, and its AI agent’s conversational behavior. Each of these represents a distinct attack vector that requires different defensive approaches.

Traditional security frameworks were designed around a relatively static attack surface: servers, networks, applications, and human users. AI-powered platforms introduce dynamic, conversational, and multi-modal attack surfaces that existing tools are not equipped to assess.

AI-to-AI Attacks as a New Category

The voice interaction phase of the Codewall operation represents something genuinely new: an AI agent autonomously attacking another AI agent through a conversational interface. This is not prompt injection in the traditional sense — it is a multi-turn, adaptive social engineering campaign conducted entirely by machine, against a machine, in real time.

This category of attack has no established defensive playbook. The OWASP Top 10 for LLM Applications covers prompt injection, insecure output handling, and related risks, but does not address the scenario of an autonomous AI conducting a 28-round adaptive conversation designed to probe behavioral boundaries and extract information. The AI security community is still developing frameworks to address this threat class (AI Sec Watch).


Competitive Context: Where Codewall Fits

The AI Red Teaming Market

The AI red teaming services market is experiencing rapid consolidation and growth. Major players include Google, Microsoft, Deloitte, Accenture, IBM, PwC, EY, KPMG, BAE Systems, Capgemini, Cognizant, Booz Allen Hamilton, CrowdStrike, MITRE Corporation, Palantir Technologies, FireEye (Trellix), Rapid7, Synack, and Mindgard (GII Research).

Codewall occupies a distinct niche within this landscape: fully autonomous, continuous offensive security testing. Most established players offer red teaming as a professional services engagement — human experts conducting time-bounded assessments. Codewall’s model is closer to a continuous automated penetration testing platform, with AI agents that can be pointed at a target and operate independently.

Recent market activity reflects the strategic importance of this space. In July 2024, Protect AI acquired SydeLabs, an India-based automated red teaming and vulnerability assessment company specializing in generative AI systems and LLMs, to enhance its platform with advanced red teaming capabilities (GII Research). In May 2025, Cranium launched Arena, described as the industry’s first AI Supply Chain Red Teaming Platform, enabling organizations to simulate adversarial attacks on AI models and conduct prompt injection testing for LLMs.

Codewall’s public disclosures — first McKinsey, then Jack & Jill — function as both genuine security research and highly effective marketing. By demonstrating that its agent can autonomously compromise well-funded, technically sophisticated AI platforms, Codewall is establishing credibility in a market where proof of capability is the primary differentiator.

The Dual-Use Dilemma

The same capability that makes Codewall’s agent valuable for defensive red teaming makes it potentially dangerous in the wrong hands. This is the fundamental dual-use dilemma of offensive security tooling, now applied to AI agents. Palo Alto Networks, in its 2026 predictions for autonomous AI, identified this dynamic explicitly: “As attackers leverage AI to launch sophisticated attacks and also target AI systems as a new attack vector, cybersecurity too is rapidly changing.” (Palo Alto Networks).

The Codewall disclosures have been conducted responsibly — vulnerabilities were disclosed to affected companies before publication, and both Jack & Jill and McKinsey patched the issues promptly. But the existence of autonomous offensive AI agents capable of this level of independent operation raises legitimate questions about access controls, responsible disclosure norms, and the potential for similar tools to be deployed maliciously.


Buyer Relevance: What Enterprises and AI Builders Must Consider

For Companies Deploying AI Agents

The Jack & Jill case provides a concrete checklist of failure modes that any organization deploying AI agents should audit against:

Authentication and Authorization

  • Disable test modes and debug endpoints in production environments. The Clerk test mode bypass was a development artifact left active in production — a classic configuration management failure.
  • Implement domain ownership verification before assigning users to organizational accounts. Email domain matching without verification is a well-known vulnerability class that predates AI entirely.
  • Enforce role checks at every step of onboarding flows, not just at the entry point.

Infrastructure Security

  • Audit all endpoints that issue credentials or connection tokens. The unauthenticated voice infrastructure credential endpoint is the kind of finding that appears in basic API security reviews but is often overlooked in the rush to ship.
  • Apply the principle of least privilege to all service-to-service communications.

AI Agent Behavior

  • Distinguish between guardrails against prompt injection (which Jack & Jill had, and which held) and guardrails against false context acceptance (which Jack & Jill lacked, and which failed). These require different defensive approaches.
  • Test AI agents against social engineering scenarios, not just technical injection attacks. The Trump impersonation succeeded not because it bypassed a technical control, but because the AI accepted an implausible premise without challenge.

For Organizations Evaluating AI Security Vendors

The Codewall disclosures highlight the value of continuous, autonomous security testing for AI-powered platforms. Traditional annual penetration tests are insufficient for systems that are continuously updated, that expose conversational interfaces, and that integrate with multiple third-party services. The attack surface of a modern AI platform changes with every model update, every new integration, and every new feature deployment.

Organizations should evaluate AI security vendors on their ability to test not just the application layer, but the AI agent’s behavioral boundaries, the voice and multimodal interfaces, and the authentication flows of integrated third-party services like authentication providers.

For AI Platform Builders

The Jack & Jill case is a cautionary tale about the gap between perceived security and actual security. The platform had implemented guardrails against prompt injection — a sophisticated and non-trivial defensive measure. But it had left a test mode authentication bypass active in production, failed to verify email domain ownership, and exposed voice infrastructure credentials without authentication. The sophisticated defense coexisted with elementary failures.

This pattern — strong defenses in the areas that received attention, critical gaps in areas that were overlooked — is characteristic of security programs that focus on AI-specific threats while underinvesting in foundational security hygiene. The lesson is that AI security requires both layers: robust AI-specific defenses and rigorous application of established security fundamentals.


Practical Implications for AI Tool Users

The Trust Problem in AI Conversations

For end users — job candidates in this case, but the principle applies broadly — the Jack & Jill incident raises a fundamental question about the trustworthiness of AI-mediated interactions. If an AI agent can be connected to by an unauthorized party who injects synthetic voice audio, the user has no reliable way to know whether they are interacting with the legitimate system or a compromised one.

This is not a hypothetical concern. The voice infrastructure vulnerability meant that any actor who discovered the unauthenticated endpoint could have joined live candidate sessions, listened to conversations, and potentially injected responses. For candidates sharing sensitive career information — salary expectations, reasons for leaving current employers, personal circumstances — this represents a meaningful privacy risk.

The Hallucination Risk in High-Stakes Contexts

The Trump impersonation exchange illustrates a risk that extends far beyond recruiting platforms. AI agents deployed in customer service, financial advisory, healthcare triage, or legal assistance contexts face the same fundamental vulnerability: they tend to accept stated premises as true rather than verifiable facts.

An AI customer service agent that accepts a caller’s claim to be the account holder without verification. An AI financial advisor that accepts a client’s stated risk tolerance without cross-referencing account history. An AI healthcare triage system that accepts a patient’s self-reported symptoms without flagging implausible combinations. Each of these represents a version of the same failure mode that Jack demonstrated when it addressed the Trump impersonation as “Mr. President.”

The practical implication for AI tool users is to treat AI agents as systems that can be deceived by false context, not just by technical attacks. Users interacting with AI agents on behalf of their organizations should be aware that the AI may accept implausible claims from other parties, and should not rely on AI agents to perform identity verification or validate the legitimacy of claimed contexts.

The Vendor Security Due Diligence Gap

For enterprise buyers of AI-powered SaaS platforms, the Jack & Jill case highlights a gap in standard vendor security due diligence. Most enterprise security questionnaires focus on data encryption, access controls, incident response procedures, and compliance certifications. They do not typically assess the security of AI agent behavioral boundaries, the authentication requirements for voice and multimodal interfaces, or the security of development artifacts left active in production.

Buyers should expand their vendor security assessments to include AI-specific questions: What testing has been conducted on the AI agent’s behavioral guardrails? Are development and test modes disabled in production? How are voice and multimodal interfaces authenticated? What monitoring exists for anomalous AI agent behavior?


Where This Fits in the Market

The Acceleration of AI Security as a Discipline

The Codewall disclosures arrive at a moment when AI security is transitioning from a niche research area to a mainstream enterprise concern. The AI red teaming services market’s growth from $1.75 billion in 2025 to a projected $6.17 billion in 2030 reflects genuine enterprise demand, driven by the expansion of regulated AI deployments, rising demand for AI safety certification, increasing use of autonomous agents, and the need for continuous vulnerability monitoring (GII Research).

Where This Fits in the Market — contextual image

The Jack & Jill case will accelerate this trend. High-profile, publicly documented breaches of well-funded AI platforms — especially those with blue-chip client lists including Anthropic and Stripe — create board-level awareness of AI security risks in a way that abstract threat reports do not. The combination of a compelling narrative (AI impersonates Trump, gets called “Mr. President”), a credible technical finding (CVSS 9.8), and a responsible disclosure process gives this incident the profile to influence enterprise security budgets.

The Emerging Category of AI-Native Security

The Codewall approach — using AI agents to attack AI systems — represents an emerging category that does not fit neatly into existing security market taxonomies. It is not traditional penetration testing (it is fully autonomous). It is not vulnerability scanning (it involves contextual reasoning and adaptive strategy). It is not red teaming in the conventional sense (it operates continuously rather than in bounded engagements).

This category is beginning to attract significant investment and attention. Palo Alto Networks has identified 2026 as the “Year of the Defender,” forecasting that autonomous AI agents will fundamentally redefine enterprise security operations across identity, the SOC, data security, and the browser (Palo Alto Networks). The implication is that the same autonomous agent capabilities that Codewall deployed offensively will increasingly be deployed defensively — continuously probing enterprise AI systems for the kinds of vulnerabilities that the Jack & Jill assessment uncovered.

The Regulatory Dimension

The Jack & Jill incident also has implications for the emerging regulatory landscape around AI security. The EU AI Act, which classifies AI systems used in employment and recruitment as high-risk, imposes requirements for robustness, accuracy, and cybersecurity. A platform that can be compromised in under an hour by an autonomous agent, and whose AI agent accepts implausible identity claims without challenge, would face significant compliance questions under this framework.

As AI security compliance audits become a standard component of enterprise AI governance — a trend identified in the GII Research market forecast — incidents like the Jack & Jill breach will serve as reference cases for what “adequate” AI security looks like and what it does not.

The Credibility Question

It is worth acknowledging the credibility limitations of the Codewall disclosures. All details come from Codewall itself, and no independent verification has been published for the Jack & Jill case. Security analyst Edward Kiledjian’s critique of the McKinsey disclosure — that it overstates what was demonstrated and blurs the line between access and actual data exfiltration — applies with some force here as well (The Decoder).

However, the fact that Jack & Jill patched the vulnerabilities after disclosure, and that McKinsey confirmed and patched its vulnerabilities within a day, provides meaningful corroboration that the underlying findings were real. The question of whether Codewall’s framing overstates the impact is legitimate, but it does not undermine the core technical findings.


Conclusion and Assessment

The Codewall-Jack & Jill incident is significant for reasons that extend well beyond the specific vulnerabilities found. It demonstrates, concretely and publicly, that autonomous AI agents can conduct end-to-end offensive security operations against AI-powered platforms — finding vulnerabilities, chaining them into critical attack paths, and extending their own scope to probe new attack surfaces, all without human direction.

The incident also reveals a specific and underappreciated failure mode in deployed AI systems: the gap between technical guardrails (which Jack & Jill had, and which held against prompt injection) and contextual reasoning guardrails (which failed when the AI accepted a Trump impersonation as legitimate). This distinction matters enormously for organizations deploying AI agents in high-stakes contexts.

The practical verdict is clear: organizations deploying AI-powered platforms cannot treat AI security as a separate concern from foundational application security. The Jack & Jill breach was enabled not by a sophisticated AI-specific attack, but by a test mode authentication bypass and a missing email domain verification check — vulnerabilities that have existed in web applications for decades. The AI layer added novel attack surfaces (the voice infrastructure, the conversational agent’s behavioral boundaries), but the critical path ran through elementary security failures.

For the AI security market, this incident is a catalyst. It provides the kind of concrete, high-profile evidence that drives enterprise security investment, accelerates regulatory attention, and validates the business case for continuous autonomous security testing. The market’s projected growth to $6.17 billion by 2030 looks conservative in light of the pace at which AI-powered platforms are being deployed and the demonstrated capability of autonomous agents to find and exploit their vulnerabilities.

The arms race between autonomous offensive AI and autonomous defensive AI has begun in earnest. The Jack & Jill case is an early and instructive battle in that conflict.


Next Step

Use these pages to keep the decision moving:

  • More in AI Chat — Keep researching the same category instead of stopping at one article.
  • Open comparisons — Jump into direct matchups and trade-off pages.
  • Open tool guides — Use the canonical decision pages for fit, pricing context, and alternatives in one place.