Inside OpenAI: Engineers Managing 20 AI Agents Are Leaving Everyone Else Behind

Last updated: December 2025

OpenAI Codex

There’s a podcast interview making the rounds right now that deserves attention. Sherwin Wu — the guy who runs OpenAI’s API and developer platform — sat down and basically described what engineering looks like inside OpenAI today. Not the polished marketing version. The actual day-to-day.

The short version: OpenAI describes Codex as part of the daily workflow for the vast majority of its engineers. A very large share of code now starts with AI assistance, and the teams leaning hardest into the tools appear to ship meaningfully more pull requests than their peers. The role of “software engineer” is starting to look more like “tech lead managing a fleet of AI agents.”

If you’re still debating whether AI will change programming, OpenAI’s own team already has the answer. They’re living it.

The Numbers That Matter

Here are the specific data points Sherwin shared, because they’re more concrete than the usual “AI is transforming everything” hand-waving:

OpenAI says roughly 95% of engineers use Codex daily
OpenAI says pull requests are routinely reviewed by Codex before human review
Heavy Codex users reportedly submit around 70% more PRs than light users — and the gap keeps growing
Code review time dropped from 10-15 minutes per PR to 2-3 minutes
One experimental team maintains a codebase that is 100% AI-generated — roughly 1 million lines of code from ~1,500 PRs, built by just 3 engineers

That last example deserves a double-take. Three engineers. A million lines of production code. Much of it written with Codex in the loop. OpenAI published the details in their Harness Engineering blog post — not as a demo, but as an internal product used across the company.

The New Engineering Workflow

Here’s what actually changed. Engineers at OpenAI are not coding in the traditional sense anymore. Sherwin described the workflow like this:

You fire up 10 to 20 parallel Codex threads. Each one is working on a different task. Your job is to guide, review, course-correct, and validate. You’re not typing code. You’re managing agents.

The analogy he used was from SICP (Structure and Interpretation of Computer Programs) — programming as spellcasting. You don’t need to know every low-level detail. You need to know what to ask for and how to verify the result. AI just pushed that abstraction to its logical extreme.

But here’s the part that resonated most: he compared it to The Sorcerer’s Apprentice from Fantasia. Mickey Mouse tells the brooms to carry water, falls asleep, and wakes up to a flood. That’s the risk of running 20 AI agents without paying attention. The leverage is enormous, but so is the blast radius when things go wrong.

“When I see engineers running 20 Codex threads simultaneously,” Sherwin said, “I don’t just think ‘cool’ — I think about the skill, seniority, and judgment required to do that well.”

Why Most AI Deployments Have Negative ROI

This was the most surprising part of the interview. Sherwin — whose team literally sells the API that powers most AI startups — admitted that many enterprise AI deployments probably have negative ROI.

His diagnosis: the problem is almost always a top-down mandate without bottom-up adoption.

The pattern he sees failing:

CEO announces “we’re going AI-first”
Everyone gets told to use AI tools
Nobody actually knows how
No internal champions emerge to figure out best practices
Tools sit unused, money wasted

The pattern that works:

Top-down buy-in (budget, tools, clear support)
A small “tiger team” of enthusiastic early adopters
That team figures out real workflows, documents them, runs internal demos
Excitement spreads organically

The tiger team doesn’t even need to be engineers. Sherwin said the best internal AI champions are often “engineering-adjacent” — ops people, support leads, Excel power users who get excited about new tools and naturally evangelize them.

The Scaffolding Problem

Sherwin dropped a line that should make a lot of AI startups nervous: “Models eat your scaffolding for breakfast.”

Think about the AI coding-tooling landscape over the past three years:

2022-2023: Vector databases were the hot infrastructure play. Everyone needed embeddings, retrieval, RAG pipelines.
2024: Agent frameworks exploded. LangChain, CrewAI, AutoGen — everyone building orchestration layers.
2025-2026: Models got smart enough to handle much of this natively. A lot of that scaffolding became unnecessary.

His advice to builders: design for where models are going, not where they are today. The teams that win are the ones building products that are 80% good enough right now — and then “click” into place when the next model generation lands.

“Many companies succeeded by building around a capability that wasn’t quite there yet. When the model caught up, their product went from ‘okay’ to ‘amazing’ overnight.”

The One-Person Billion-Dollar Company (And Its Second-Order Effects)

Everyone’s heard Sam Altman’s “one-person billion-dollar company” prediction. Sherwin took it further with second and third-order effects that most people haven’t considered:

Second-order: If one person can build a billion-dollar company, starting any company becomes dramatically easier. Expect an explosion of small, vertical SaaS businesses — not just one unicorn, but thousands of $10M-$50M companies run by one or two people.

Third-order: The VC model breaks. When most successful companies are small and profitable rather than massive and venture-scale, the entire startup funding ecosystem shifts. Great for individual founders. Challenging for funds that need 100x returns.

His vision: a world where a few massive platforms support an ecosystem of thousands of tiny, highly specialized software companies. Each one serves a narrow niche. Each one is run by one or two people with high agency and good AI tools.

A $10M/year business run by one person is “set for life” money. And that’s becoming achievable.

What’s Coming in 12-18 Months

Sherwin shared two predictions for the near future:

Longer task duration. Current frontier models can reliably handle tasks lasting a few minutes to maybe an hour. Within 12-18 months, expect models that can work on multi-hour tasks — you assign a 6-hour project, check in periodically, and get results. This changes the entire product surface area for AI tools.

Audio goes native. Multimodal models with native speech-to-speech capabilities are coming fast. Sherwin thinks audio is massively underestimated in enterprise — most business still runs on conversations, calls, and meetings. When AI can natively participate in spoken workflows, it unlocks a huge category that text-based tools can’t touch.

The Management Shift

One insight that doesn’t get enough attention: how AI changes engineering management.

Sherwin’s philosophy — spend 50%+ of your time on your top 10% performers. In the AI era, this becomes even more critical because top performers with AI tools pull away faster. The gap between your best engineer and your average engineer is no longer 2x — it might be 10x.

His metaphor: think of engineers as surgeons. The manager’s job is to be the surgical team — anticipating what the surgeon needs, removing obstacles before they appear, keeping the operating room running smoothly. In the AI era, the “surgeons” are moving faster than ever, so the support system matters more than ever.

He also predicted managers will be able to handle larger teams. The current best practice of 6-8 direct reports per manager may expand significantly as AI tools help managers understand what their teams are doing, track context across projects, and even predict upcoming blockers.

The Business Process Opportunity Nobody’s Talking About

The final point Sherwin made — and the one he seemed most passionate about — is that Silicon Valley is obsessed with AI for coding while ignoring the bigger opportunity: business process automation.

Most work in the world isn’t open-ended knowledge work like software engineering. It’s repeatable, rule-based, SOP-driven processes: customer support scripts, insurance claims processing, logistics coordination, compliance checks. These processes run on determinism and consistency — exactly what AI agents are good at.

“If you talk to any non-tech company,” Sherwin said, “they have massive amounts of business processes. This opportunity is enormous, and it’s way bigger than the discussion about it on Twitter would suggest.”

What This Means for You

If you’re an engineer: the window to learn AI-augmented workflows is open now, but it won’t stay open forever. Start with one tool — Codex, Cursor, Claude Code, whatever — and push it hard. The engineers who figure out how to manage 10-20 parallel AI threads will be the ones who are impossible to replace.

If you’re a manager: invest disproportionately in your top performers. Give them room to experiment with AI tools. Then make them teach everyone else.

If you’re building a product: design for where models will be in 12 months, not where they are today. And think hard about whether your “scaffolding” will survive the next model upgrade.

If you’re thinking about starting something: the barrier to building software has never been lower. A vertical SaaS product serving a specific niche, built and run by one person with AI tools — that’s not a fantasy anymore. It’s a business plan.

As Sherwin put it: “The next two to three years will be the most interesting period in tech and startups for a very long time. Don’t take it for granted.”