Last updated: March 2026

Investment analysts at Balyasny Asset Management used to spend days parsing thousands of documents for a single research task. Now they do it in hours. The firm built an AI research system using GPT-5.4 that’s deployed across ~95% of their 180 investment teams, handling everything from merger arbitrage to macroeconomic scenario analysis.
This isn’t a chatbot wrapper. It’s a purpose-built research engine with a 12-dimension evaluation pipeline, federated deployment architecture, and agent workflows that reason like skilled analysts. Here’s how they built it and what other financial institutions can learn.
Why Balyasny needed an AI research engine
Investment research is high-stakes and time-sensitive. Analysts must synthesize market data, broker research, regulatory filings, earnings calls, and expert interviews to make conviction-driven decisions. Speed matters — a competitor who analyzes a merger filing 30 minutes faster has an edge.
Traditional research workflows hit three walls:
Volume. Financial data is exploding. A single merger arbitrage analysis might require reading 50+ SEC filings, 20 broker reports, and monitoring real-time news feeds. Human analysts can’t keep up.
Structure. Investment decisions require combining structured data (financial statements, price data) with unstructured data (analyst commentary, news articles). Off-the-shelf AI tools handle one or the other, not both.
Compliance. Financial institutions operate under strict regulatory standards. Any AI system must maintain audit trails, respect data access controls, and produce explainable outputs. Consumer AI tools don’t meet these requirements.
Balyasny saw an opportunity: build an AI system that thinks like an analyst, moves at machine speed, and operates within institutional compliance boundaries.
The technical architecture
In late 2022, Balyasny established an Applied AI team: 20 researchers, engineers, and domain experts tasked with building AI-native tools. Their flagship product is an AI investment research system designed to reason, retrieve, and act like a skilled analyst.
Model selection: GPT-5.4 as the reasoning engine
Before deploying any models, Balyasny built one of the most sophisticated evaluation pipelines in finance. They measure models across 12+ dimensions:
- Forecasting accuracy
- Numerical reasoning
- Scenario analysis
- Robustness to noisy inputs
- Multi-step planning
- Tool execution
- Hallucination reduction
These evaluations run against Balyasny’s internal benchmarks and proprietary financial data — not public test sets. The goal: find models that perform on real investment tasks, not academic benchmarks.
GPT-5.4 emerged as the strongest reasoning engine, particularly for multi-step planning and tool execution. But Balyasny doesn’t use GPT-5.4 exclusively. They run a hybrid system: GPT-5.4 for complex reasoning tasks, internal models for specialized financial analysis, selected task-by-task based on empirical performance.
“We evaluate models the way we evaluate investments: on fundamentals. GPT-5.4 proved it could plan, reason, and execute with real rigor,” says Su Wang, Senior Research Scientist at Balyasny.
Deployment model: Centralized core, federated customization
Balyasny’s 180 investment teams operate across different asset classes — macro, commodities, equities, merger arbitrage. Each team has distinct workflows and data requirements. How do you build one AI system that serves all of them?
Balyasny chose a federated deployment model:
Centralized core: The Applied AI team develops and maintains:
- Agent frameworks (reasoning, planning, tool orchestration)
- Toolchains (data retrieval, document synthesis, calculation engines)
- Compliance guardrails (data access controls, audit logging, explainability)
Local customization: Each investment team can:
- Deploy team-specific agents tailored to their asset class
- Access scoped data (only what they’re authorized to see)
- Customize workflows without breaking compliance
This architecture means the Applied AI team focuses on scaling infrastructure and model evaluation, while investment teams focus on applying AI to their specific strategies. It also ensures universal compliance — critical in an industry where data security and risk management are non-negotiable.
“Our early investments and the cost curve were off. Today, every one of our investment teams can decide how to apply the latest AI to their process, in a secure environment and with real-time expert guidance,” says Kevin Byrne, Chief Operating Officer.
Agent workflows: How the system actually works
The AI research system operates as an agent: it reasons about tasks, retrieves relevant data, executes tools, and produces structured outputs. Here’s what that looks like in practice.
Workflow 1: Central Bank Speech Analysis
Before AI: An analyst spends 2 days reading a central bank speech, cross-referencing historical policy statements, analyzing economic indicators, and building scenario models.
With AI: The agent completes the same task in ~30 minutes.
How it works:
- Ingest: Agent reads the central bank speech and identifies key policy signals
- Retrieve: Agent pulls historical speeches, economic data, and market reactions to similar statements
- Analyze: Agent builds scenario models (e.g., “If the Fed raises rates 50bps, what happens to bond yields?”)
- Output: Structured report with traceable reasoning paths and cited sources
Impact: 96x speed improvement. Analysts use the saved time for higher-level strategy work.
Workflow 2: Merger Arbitrage Superforecaster
Before AI: Analysts manually track merger deals using spreadsheets, setting alerts for new SEC filings and news. Updating deal probability estimates is a manual, time-consuming process.
With AI: The Merger Arbitrage Superforecaster agent monitors deals continuously and updates probabilities in real-time.
How it works:
- Monitor: Agent tracks all active merger deals, watching for new filings, press releases, and regulatory updates
- Evaluate: When new information arrives, agent re-evaluates deal probability using historical data and current market conditions
- Alert: Agent notifies analysts when probabilities shift significantly
- Explain: Agent provides reasoning for probability changes with cited sources
Impact: Replaced manual spreadsheets and alerts. Analysts get real-time updates instead of stale data.
Workflow 3: Deep Research Synthesis
Before AI: Synthesizing tens of thousands of documents (earnings calls, broker research, expert interviews, regulatory filings) takes days.
With AI: The agent completes deep research tasks in hours.
How it works:
- Scope: Analyst defines research question (e.g., “What are the competitive dynamics in the EV battery market?”)
- Retrieve: Agent searches internal databases and external sources, pulling relevant documents
- Synthesize: Agent reads and summarizes key findings, identifying patterns and contradictions
- Structure: Agent produces a research report with sections, citations, and confidence levels
Impact: Days → hours. Higher analyst confidence due to comprehensive coverage and traceable reasoning.
The design partnership with OpenAI
Balyasny didn’t just buy GPT-5.4 and integrate it. They became a design partner, giving OpenAI direct visibility into how investment teams use AI in production.
OpenAI teams observed actual workflows: where the system succeeds, where it struggles, and what high performance looks like in a commercial context. That visibility led to faster iterations, tighter feedback loops, and better model behavior on finance-specific tasks.
For example, early feedback from merger arbitrage teams revealed that agents needed to continuously re-evaluate deal probabilities as new information arrived. Balyasny and OpenAI worked together to extend agent planning capabilities and tool access, replacing a slow, manual workflow with real-time probabilistic monitoring.
“We didn’t just tell OpenAI what we needed. We showed them. And that made all the difference,” says Jonathan Park, Product Manager at Balyasny.
Real-time feedback loops drive continuous improvement
Because AI is embedded in daily workflows, Balyasny collects structured feedback in real time:
- User evaluations (thumbs up/down, quality ratings)
- Outcome audits (did the AI’s analysis match reality?)
- Tool execution quality (did the agent use the right tools correctly?)
This feedback loop drives rapid improvements to both models and the orchestration layer. When analysts flag issues, the Applied AI team can diagnose problems, adjust prompts, fine-tune models, or extend tool capabilities.
The system gets better every week because it’s learning from real investment decisions, not synthetic test cases.
Constraints and trade-offs
Building an AI research engine for finance isn’t just a technical challenge. It’s a compliance, cost, and organizational challenge.
Compliance overhead
Financial institutions operate under strict regulatory standards. Balyasny’s AI system must:
- Maintain audit trails for every decision
- Respect data access controls (analysts can only see what they’re authorized to see)
- Produce explainable outputs (regulators need to understand how decisions were made)
The federated deployment model adds complexity — every investment team gets customized access, but universal compliance guardrails must hold. This requires careful architecture and ongoing monitoring.
Cost and infrastructure
Running GPT-5.4 at scale isn’t cheap. Balyasny processes tens of thousands of documents per research task, with hundreds of analysts using the system daily. The infrastructure costs are significant.
Balyasny mitigates this by:
- Using a hybrid model approach (GPT-5.4 for reasoning, cheaper models for simple tasks)
- Caching frequently accessed data
- Optimizing prompts to reduce token usage
Human oversight remains essential
AI augments analysts, it doesn’t replace them. High-stakes investment decisions still require human judgment. The AI system provides structured, explainable insights that increase conviction, but humans make the final call.
“It’s like adding a teammate who never forgets, always cites sources, and double-checks the details before sending anything back,” says Charlie Sweat, Portfolio Manager at Balyasny.
Common mistakes to avoid
Based on Balyasny’s experience, here are the pitfalls other financial institutions should avoid:
Deploying without evaluation. Don’t trust vendor benchmarks. Build your own evaluation pipeline using real financial data and real tasks. Models that perform well on academic benchmarks often fail on domain-specific work.
Treating AI as a black box. If you can’t explain how the AI reached a conclusion, you can’t use it for high-stakes decisions. Build explainability and audit trails from day one.
Ignoring compliance. Financial AI systems must meet institutional standards for data security, access controls, and regulatory compliance. Consumer AI tools don’t cut it.
Static deployment. AI systems need continuous feedback loops. If you’re not collecting structured feedback from users and iterating weekly, you’re falling behind.
What other financial institutions can learn
Balyasny’s approach offers a playbook for AI deployment in finance:
-
Evaluate rigorously before deploying. Build evaluation pipelines that measure models on real tasks, not public benchmarks. Test against proprietary data and internal workflows.
-
Involve AI vendors in actual workflows. Don’t just send requirements docs. Show vendors how your teams work, where they struggle, and what success looks like. That visibility drives better products.
-
Design for feedback, not static tools. Embed AI in daily workflows so you can collect real-time feedback. Use that feedback to iterate rapidly on models and orchestration.
-
Centralize infrastructure, customize locally. Build core components (agent frameworks, compliance guardrails) centrally. Let teams customize workflows for their specific needs while maintaining universal standards.
-
Keep humans in the loop. AI augments expertise, it doesn’t replace it. Design systems that increase analyst confidence and conviction, not systems that make decisions autonomously.
The future roadmap
Balyasny continues to expand its AI capabilities:
Reinforcement Fine-Tuning (RFT): Sharpen model behavior on complex, high-value tasks by training on real investment outcomes.
Deeper agent orchestration: Expand agent capabilities across more financial domains (credit analysis, portfolio optimization, risk modeling).
Multimodal inputs: Process financial charts, statements, and filings as images, not just text.
Frontier model evaluation: Continuously assess new models (GPT-5.5, Claude Opus 5, etc.) for domain fit.
The goal isn’t to replace analysts. It’s to give them superpowers: the ability to apply first principles thinking faster, across more data, and with more structure.
“AI is enabling our teams to apply first principles thinking faster, across more data, and with more structure,” says Charlie Flanagan, Chief AI Officer at Balyasny.
Related tools and resources
If you’re building AI systems for financial analysis, these tools are worth exploring:
- ChatGPT alternatives — a practical look at current ChatGPT options and trade-offs
- ChatGPT vs Claude vs Gemini — a broader model comparison for analysis-heavy workflows
- Perplexity review — AI-powered research tool with real-time web search and citations
For financial institutions considering AI deployment, the lesson from Balyasny is clear: evaluate rigorously, design for feedback, and keep humans in the loop. The firms that get this right will have a significant competitive advantage in the years ahead.