From Model to Agent: Equipping the Responses API with a Computer Environment

A Comprehensive Analysis of OpenAI’s Agentic Infrastructure, Pricing, and Competitive Landscape


Introduction

The trajectory of AI development has shifted decisively from single-turn question-answering toward long-horizon, autonomous agents capable of executing complex, multi-step workflows. OpenAI’s Responses API represents the clearest institutional expression of this shift. By evolving beyond the Chat Completions API and formally deprecating the Assistants API (scheduled for full phase-out in August 2026), OpenAI has consolidated its agentic infrastructure into a unified, extensible platform that equips language models with real execution environments, reusable skill bundles, and persistent state management (OpenAI Responses API Overhaul).

This report examines the practical implications of this architectural evolution for AI tool users and developers, covering the core features of the Responses API, the Skills and Shell tool primitives, current pricing structures, and how OpenAI’s offering compares to alternative multi-agent frameworks such as CrewAI and LangGraph.

Related: How to Use AI Without Getting Fired: A Professional’s Guide (2026)


The Responses API: Architecture and Core Philosophy

From Chat Completions to a Superset

The Responses API is best understood as a superset of the Chat Completions API. Where Chat Completions handled straightforward input-output exchanges, the Responses API introduces built-in support for multi-turn conversations, structured outputs, persistent agent workflows, and integrated tooling — all without requiring developers to manually manage conversation state (OpenAI Responses API Overhaul).

This is a meaningful architectural distinction. In the Chat Completions model, developers were responsible for reconstructing conversation history on every API call, managing context windows manually, and wiring together tool calls through custom orchestration logic. The Responses API offloads much of this complexity to the platform layer, using conversation IDs and structured text formats to maintain context across turns automatically.

Deprecation of the Assistants API

The Assistants API, which was OpenAI’s previous attempt at a stateful, tool-augmented agent interface, is being phased out as of August 2026. The Responses API absorbs and expands upon its best features — agent memory, tool integrations, and multi-turn support — while embedding them in a more scalable and unified framework. Developers transitioning from the Assistants API will find that the migration path is incremental, with OpenAI providing comprehensive guidance and continued support for Chat Completions during the transition period (OpenAI Responses API Overhaul).


Equipping Agents with a Computer Environment: Skills and Shell

The Shell Tool

The Shell tool is one of the most consequential additions to the Responses API ecosystem. It provides agents with access to a real terminal environment — either an OpenAI-hosted container or a locally managed runtime — where they can install dependencies, run scripts, read and write files, and produce artifacts such as reports or data outputs (Shell + Skills + Compaction: Tips for long-running agents).

Equipping Agents with a Computer Environment: Skills and Shell — contextual image

This is a fundamental capability upgrade. Rather than simulating actions through text, agents using the Shell tool can execute code in a sandboxed environment with controlled internet access. This enables use cases that were previously impractical: reading large datasets, updating files programmatically, writing and running applications, and producing structured outputs for downstream consumption.

The Shell tool supports two deployment modes:

  • Hosted containers: Managed by OpenAI, with environment.type = "container_auto". Skills are uploaded and unzipped into the runtime automatically.
  • Local shell mode: The developer controls the runtime. Skills are provided via local file paths rather than uploaded skill_reference attachments (Skills | OpenAI API).

Skills: Reusable, Versioned Instruction Bundles

Skills are the procedural layer that sits atop the Shell tool. A skill is a versioned bundle of files — scripts, templates, assets, and a required SKILL.md manifest — that can be uploaded once and referenced across multiple agent runs (Skills in OpenAI API).

The SKILL.md file is the discovery and routing mechanism. It contains frontmatter with a name and description that the model reads to decide whether to invoke the skill. The body of SKILL.md provides the full workflow: when to use the skill, how to run it, expected outputs, and edge cases. OpenAI’s documentation explicitly recommends including negative examples — cases where the skill should not be triggered — to improve routing accuracy (Skills in OpenAI API).

A typical skill folder structure looks like this:

my_skill/
├── SKILL.md # Required manifest
├── analyze.py # Optional scripts
├── requirements.txt # Optional dependencies
└── templates/ # Optional assets

Skills are uploaded via POST /v1/skills and referenced in API calls using skill_reference objects with optional version pinning. This versioning capability is practically significant: it allows teams to iterate on skill logic without breaking existing agent workflows, and to roll back to a known-good version if a new iteration introduces regressions.

Server-Side Compaction

For long-running agents, context window exhaustion is a real operational concern. The Responses API addresses this with server-side compaction — an automatic mechanism that summarizes and compresses long agentic runs so that agents never hit context limits mid-task (Shell + Skills + Compaction). This is particularly valuable for agents processing large datasets or conducting extended multi-step research workflows.

Practical Implications for Tool Users

The combination of Skills, Shell, and compaction creates a practical framework for deploying agents that do real knowledge work. A developer building a data analysis agent, for example, can:

  1. Package a CSV analysis workflow as a skill with a SKILL.md describing when to invoke it
  2. Upload the skill via the API
  3. Reference it in a Responses API call with the Shell tool configured for a hosted container
  4. Let the model decide when to invoke the skill, execute the analysis in the container, and write outputs to /mnt/output

This pattern — described in OpenAI’s cookbook as the “hosted shell pattern” — dramatically reduces the boilerplate required to build production-grade agents (Skills in OpenAI API).


Pricing: What It Costs to Run Agents in 2026

Model Pricing

As of early 2026, OpenAI’s flagship model for agentic workloads is GPT-5.2, priced at $1.75 per million input tokens and $14.00 per million output tokens. For enterprise use cases requiring maximum reasoning depth and extended context, GPT-5.2 Pro is available at $21.00 per million input tokens and $168.00 per million output tokens (AI API Pricing Comparison 2026).

Related: ChatGPT vs Claude vs Gemini for Coding (2026)

The latest model referenced in OpenAI’s own documentation is GPT-5.4, which appears in code examples for the Skills and Shell tools (Skills | OpenAI API). Budget-conscious developers can use GPT-5 mini for simpler tasks at significantly lower cost.

ModelInput (per 1M tokens)Output (per 1M tokens)Best For
GPT-5 miniLow single digitsLow single digitsSimple tasks, high volume
GPT-5.2$1.75$14.00General agentic workloads
GPT-5.2 Pro$21.00$168.00Enterprise, max reasoning

Rate Limits and Usage Tiers

Every OpenAI API account is subject to rate limits measured across multiple dimensions: requests per minute (RPM), requests per day (RPD), tokens per minute (TPM), tokens per day (TPD), and images per minute (IPM) for vision/audio workloads (ChatGPT API Pricing 2026).

Free-tier accounts are capped at modest throughput, but limits scale automatically as users spend more or move into paid tiers. OpenAI’s usage tier system ties monthly spending limits to account age and payment history, creating a progressive unlock mechanism for higher throughput.

Cost Considerations for Agentic Workloads

Agentic workloads introduce cost dynamics that differ from simple chat completions. Key considerations include:

  • Long context costs: Models like GPT-5 support 100K+ token contexts. Using large contexts increases token bills linearly. Retrieval-augmented generation (RAG) — sending only relevant document snippets rather than full corpora — is the recommended mitigation (ChatGPT API Pricing 2026).
  • Multimodal tokens: Audio and video tokens are priced separately. As multimodal use grows — chatbots that transcribe speech, agents that process images — developers must account for these additional token streams.
  • Prompt caching: OpenAI offers prompt caching to reduce costs on repeated context, which is particularly valuable for agents that reuse system prompts or skill instructions across many runs.

The overall pricing trend is favorable: each new model generation has tended to debut at higher capability with only modest price increases, and OpenAI’s cost-cutting has kept per-token rates in the low single-digit dollars per million tokens for most use cases (ChatGPT API Pricing 2026).


Competitive Landscape: Responses API vs. CrewAI vs. LangGraph

Framework Comparison Overview

The Responses API is not the only path to building multi-agent systems. CrewAI and LangGraph represent the two dominant open-source alternatives, each with a distinct philosophy and target audience.

CategoryOpenAI Responses APICrewAILangGraph
Primary audienceDevelopers using OpenAI modelsBeginners, rapid prototypersEngineering teams, production
Orchestration modelPlatform-managed, tool-nativeRole/task assignmentGraph-based state machine
Ease of setupModerate (API-first)EasySteep learning curve
ScalabilityHigh (hosted containers)ModerateHigh (async, distributed)
State managementServer-side (conversation IDs)LimitedExplicit, graph-based
Vendor lock-inHigh (OpenAI-specific)LowLow
Best forProduction agents on OpenAI modelsQuick MVPs, researchComplex, non-linear workflows

CrewAI

CrewAI adopts a collaborative intelligence approach, enabling multi-agent systems where specialized agents work together toward shared objectives. It is beginner-friendly, easy to install, and well-suited for sequential or goal-driven workflows. Its hierarchical process generates a supervisor agent to oversee task execution and agent coordination (CrewAI vs. LangGraph).

However, CrewAI’s flexibility is limited. Conditional logic within workflows can be tricky, and it is not well-suited for real-time, interaction-heavy use cases. It is primarily designed for research and quick prototypes rather than production-grade deployments.

Related: How Balyasny Asset Management built an AI research engine for investing

LangGraph

LangGraph, built on LangChain, takes a state-centric approach using directed acyclic graphs (DAGs) to model complex, non-linear agent interactions. It is designed for scale, with asynchronous and distributed systems in mind, and handles conditional logic and highly interconnected agents well (CrewAI vs. LangGraph).

The tradeoff is a steeper learning curve. LangGraph requires a deeper understanding of graph structures — nodes, edges, state transitions — and more effort for initial setup and configuration. It is the preferred choice for engineering teams building production workflows with complex decision-making requirements.

Where the Responses API Fits

The Responses API occupies a distinct position: it is the most tightly integrated option for developers already committed to OpenAI’s model ecosystem. Its Skills and Shell primitives provide capabilities that neither CrewAI nor LangGraph offer natively — specifically, the ability to execute code in OpenAI-hosted containers with versioned, reusable skill bundles.

The key tradeoff is vendor lock-in. The Responses API is inherently OpenAI-specific. Developers who need model-agnostic infrastructure, or who want to self-host their agent runtime, will find LangGraph more appropriate. For teams that want to prototype quickly without deep framework knowledge, CrewAI remains the most accessible entry point.


Open Responses: The Interoperability Initiative

OpenAI has also introduced “Open Responses,” an open-source specification designed to enable interoperability across AI platforms. This initiative allows agent workflows to be constructed without being restricted to proprietary technologies, opening new avenues for cross-platform compatibility (OpenAI Responses API Overhaul).

The Agent Skills open standard, which the Skills feature aligns with, is part of this broader interoperability push. By standardizing the SKILL.md manifest format and the skill bundle structure, OpenAI is positioning skills as a portable primitive that could, in principle, be adopted by other platforms.


Assessment and Practical Recommendations

Based on the available evidence, the Responses API with Skills and Shell represents the most complete platform-native solution for building production agents on OpenAI models as of March 2026. The combination of hosted execution environments, versioned skill bundles, server-side compaction, and integrated tooling (web search, file search, computer use) addresses the core operational challenges of long-horizon agentic work.

For teams evaluating their options:

  • Choose the Responses API if you are building production agents on OpenAI models, need hosted execution environments, and want to minimize infrastructure management overhead.
  • Choose LangGraph if you need model-agnostic infrastructure, complex conditional workflows, or self-hosted deployment with fine-grained state control.
  • Choose CrewAI if you are in the research or prototyping phase and need to move quickly with minimal setup.

The pricing is economically viable for most production use cases. At $1.75 per million input tokens for GPT-5.2, even moderately complex agentic workflows — processing thousands of documents, running multi-step analyses — remain cost-competitive with equivalent human labor. The key cost management levers are model selection (mini vs. standard vs. pro), prompt caching, RAG for context reduction, and careful monitoring of multimodal token usage.

The AI agent market is projected to grow from $5.1 billion in 2024 to $47.1 billion by 2030, driven precisely by the kind of unified APIs that lower barriers to agentic application development (OpenAI Responses API Overhaul). OpenAI’s Responses API, with its computer environment capabilities, is well-positioned to capture a significant share of that growth — provided developers are willing to accept the associated vendor dependency.