The 2026 H1 Agent Stack: Models, Harnesses, Runtimes, and AI Workspaces
A concise 2026 H1 landscape of AI agents, coding agents, harnesses, runtimes, browser and sandbox infrastructure, observability, governance, and AI workspaces — with MCPlato positioned as part of the workspace layer.
Published on 2026-05-29
The agent race in 2026 H1 no longer looks like a simple model leaderboard.
Better models still matter. Claude 4, Claude Sonnet 4.5, Claude Opus 4.8, Gemini 2.5 Pro, DeepSeek R1/V3.1, Qwen3-Coder, and Mistral Magistral all pushed the base layer forward in reasoning, coding, context, and tool use.12345678 But the competitive question has changed:
Who can put those models into reliable work?
That means harnesses, runtimes, browsers, sandboxes, evals, observability, governance, permissions, and user-facing workspaces. The model is the engine. The agent product is the vehicle. The harness and workspace decide whether the vehicle can run inside a real company without losing state, authority, or trust.
The layered 2026 H1 agent stack
A useful way to read the market is as a stack, not a directory of logos.
A layered 2026 H1 agent stack from foundation models to AI workspace
Figure 1: The 2026 H1 agent stack is moving upward from model capability into execution, observability, governance, and workspace continuity.
| Layer | What it contributes | Representative examples |
|---|---|---|
| Foundation models | Reasoning, coding, long context, computer/tool use, planning | Claude 4 / Sonnet 4.5 / Opus 4.8, Gemini 2.5 Pro, DeepSeek R1/V3.1, Qwen3-Coder, Mistral Magistral |
| Agent products | Packaged workflows for coding, research, app building, operations, and enterprise processes | Claude Code, OpenAI Codex, GitHub Copilot coding agent, Cursor, Devin, Jules, Replit Agent, Lovable, Bolt.new, Manus, Perplexity Labs |
| Harness / runtime | State, retries, human-in-the-loop, orchestration, memory, structured tool calls | LangGraph/LangChain, LlamaIndex, AutoGen, CrewAI, OpenAI Agents SDK, Vercel AI SDK, Mastra, PydanticAI, Agno, Letta |
| Browser and sandbox infra | Safe execution environments, browser automation, code sandboxes, task isolation | Browserbase, Stagehand, Playwright MCP, E2B, Daytona, Temporal, Arcade, Composio |
| Observability and evals | Traces, cost, latency, regression tests, prompt/tool debugging, production review | LangSmith, Langfuse, Helicone, model and agent benchmarks |
| Enterprise governance | Visibility, access control, policy, agent inventory, auditability, compliance workflows | Microsoft Copilot Studio, Salesforce Agentforce, ServiceNow AI Control Tower, MCP-based integration patterns |
| AI workspace | The user-facing place where multi-step work, files, sessions, artifacts, and decisions persist | MCPlato, Dust, Hebbia, workspace-style agent platforms |
The important point is not that every product must cover every layer. It is that serious agent work now needs all of them somewhere in the system.
Product clusters, not a raw directory
1. Coding agents became the first mass-market agent category
Coding agents are the clearest proof that agents can move beyond chat. Claude Code became generally available alongside Claude 4 and is documented as an agentic coding tool for terminal and development workflows.19 OpenAI Codex, GitHub Copilot coding agent, Cursor, Devin, Google Jules, and Replit Agent all point to the same direction: developers want agents that can inspect repositories, edit files, run commands, open pull requests, and continue work across local and cloud contexts.101112131415
This cluster is ahead because software work already has useful guardrails: files, diffs, tests, logs, branches, CI, and review. The lesson for the rest of the market is not “everything should be coding.” It is that agents need reviewable artifacts and verification loops.
2. App builders and general agents turned prompts into workflows
Lovable, Bolt.new, Replit Agent, and Manus are examples of products centered on producing apps, websites, or executable work; Perplexity describes Labs as a creation feature for projects such as reports, dashboards, and lightweight apps.16171819 OpenAI's developer documentation describes computer-use and agent-building primitives, including a visual browser tool surface, so its agent direction is better treated as part of the same workflow shift rather than as a simple chat feature.2021
These products compress the distance between intent and artifact. Their challenge is the same challenge facing the broader agent market: once the task becomes long-running, multi-step, or externally visible, the product needs state, permissions, rollback, and a clear handoff from generated draft to production asset.
3. Enterprise agents are shifting from adoption to control
Enterprise agent platforms are now talking less like demo tools and more like operating systems for governed automation. Microsoft Copilot Studio emphasizes capabilities for scaling agent adoption.2223 Salesforce Agentforce 3 highlights visibility and control through a Command Center, MCP support, lower latency, and industry actions.24 ServiceNow positions AI Control Tower as a product for managing the AI lifecycle and governing agents, models, and workflows; its product page is a safer reference point than relying only on a press-release URL.25
Zapier Agents, Lindy, Gumloop, Dust, and Hebbia sit closer to business-team workflow automation and knowledge work.2627282930 They matter because agent adoption is not only an engineering problem. Sales, finance, legal, operations, recruiting, research, and support teams also need agent systems that can use tools without quietly bypassing policy.
4. Frameworks and runtimes became the agent middle layer
LangGraph/LangChain, LangSmith, LlamaIndex, AutoGen, CrewAI, OpenAI Agents SDK, Vercel AI SDK, Mastra, PydanticAI, Agno, and Letta represent the build layer beneath packaged products.313233343536373839404142
This layer is where durable state, memory, tool routing, human approval, structured outputs, and multi-agent orchestration become reusable primitives. It is also where many teams discover that “agent” is not one abstraction. A retrieval assistant, a coding worker, a browser operator, a finance analyst, and a customer-service agent need different runtime contracts.
5. Infra and observability became production requirements
Browserbase, Stagehand, Playwright MCP, E2B, Daytona, Temporal, Arcade, and Composio are not peripheral tools. They are part of the agent control plane.4344454647484950
Agents need browsers because much of the working web still lacks clean APIs. They need sandboxes because code and tools must run in isolated environments. They need durable workflow engines because long tasks fail and resume. They need integration gateways because credentials, permissions, and action scopes should not be improvised inside a prompt.
LangSmith, Langfuse, and Helicone show the same maturation from the observability side.325152 If an agent is touching customer data, production systems, or expensive model calls, teams need traces, evals, cost visibility, latency visibility, and regression checks.
Five trends to watch
1. Model-only differentiation is fading into runtime differentiation
The best models are converging on strong coding, tool use, long context, and planning. Anthropic reports Claude 4 coding results and Claude Code availability, while Gemini 2.5 Pro emphasizes coding and long-context capability, DeepSeek V3.1 frames itself as a step toward the agent era, and Qwen3-Coder highlights large-scale code-agent training environments.1467
That makes the runtime more important, not less. When multiple base models can reason well enough, teams choose the stack that can preserve state, call tools safely, evaluate outcomes, and keep humans in control.
2. Observability is becoming the production gate
The question “Did the model answer?” is too weak for agents. Production teams need to know:
- Which tools were called?
- What state changed?
- What evidence supports completion?
- How much did the run cost?
- Where did latency appear?
- Which prompt, model, tool, or environment change caused a regression?
This is why LangSmith, Langfuse, Helicone, benchmark suites, and enterprise command centers are becoming part of the buying discussion. A company cannot govern what it cannot see.
3. Browser and code sandboxes are becoming first-class infra
Computer-use agents and coding agents need safe operating surfaces. Browserbase and Stagehand focus on browser automation for AI agents; Playwright MCP exposes browser control through MCP; E2B and Daytona focus on isolated execution environments; Temporal frames durable execution for agentic AI workflows.434445464753
This is one of the most important shifts of 2026 H1: the “agent environment” is becoming a product category. The environment is where autonomy becomes either useful or dangerous.
4. Governance and protocols are becoming default expectations
MCP is important because it gives the market a shared language for connecting models to tools and context.5455 But protocols do not remove governance requirements. They make governance more urgent: once tools are easier to connect, teams need clearer policies for who can connect them, what actions are allowed, how credentials are scoped, and how activity is audited.
Salesforce Agentforce, ServiceNow AI Control Tower, and Microsoft Copilot Studio all reflect this enterprise reality.242523 Agent adoption now depends on visibility, policy, permissions, and operational ownership, not only prompt quality.
5. Async multi-session workspace is the missing user layer
A single chat thread is a poor container for long work. Real agent work often branches: one session researches, another drafts, another tests, another reviews, another waits for a scheduled follow-up. Users need a place where those workstreams, files, decisions, and artifacts remain inspectable.
This is where MCPlato fits naturally. MCPlato is best understood as an AI workspace layer: an environment for local materials, multiple sessions, background or scheduled work, artifacts, and permissioned observable execution.56 It should not be treated as a universal replacement for coding agents, enterprise control towers, or browser infrastructure. Its role is different: helping users organize and supervise AI work that spans documents, research, browser context, office outputs, and asynchronous follow-through.
In other words, MCPlato belongs on the workspace layer of the agent stack: close to the user, close to the materials, and above the lower-level runtime and infra components that make execution possible.
A practical decision framework
A decision matrix for choosing agent products by autonomy horizon and governance needs
Figure 2: Agent stack choices should be based on autonomy horizon and governance pressure, not on a single universal ranking.
Use five questions before choosing an agent stack.
| Question | If the answer is “yes,” prioritize |
|---|---|
| Will the agent modify code, data, records, or external systems? | Sandbox, permissions, audit logs, review gates, rollback paths |
| Will the task run longer than one prompt or one session? | Durable state, checkpoints, background execution, workspace continuity |
| Will the agent use browsers or execute code? | Browser automation infra, isolated sandboxes, credential boundaries |
| Will multiple teams rely on the output? | Observability, evals, cost tracking, policy, ownership |
| Will users need to supervise many parallel workstreams? | AI workspace, multi-session orchestration, artifacts, summaries, handoff discipline |
A simple mapping helps:
- Short coding task: start with a coding-native agent such as Claude Code, Codex, Cursor, Jules, Devin, Replit Agent, or GitHub Copilot coding agent.
- App prototype: consider Lovable, Bolt.new, Replit Agent, or similar builder surfaces, then add review before production use.
- Business workflow automation: look at Copilot Studio, Agentforce, ServiceNow, Zapier Agents, Lindy, Gumloop, Dust, or Hebbia depending on data, governance, and domain fit.
- Custom agent product: assemble runtime and infra pieces such as LangGraph, LlamaIndex, CrewAI, OpenAI Agents SDK, Vercel AI SDK, MCP, Browserbase, E2B, Temporal, Composio, Langfuse, Helicone, and LangSmith.
- Cross-material knowledge work: use an AI workspace pattern, where MCPlato is a relevant example, especially when the work spans local materials, research, artifacts, multiple sessions, and permissioned execution.
Conclusion
The 2026 H1 agent landscape is not a battle between “models” and “products.” It is the emergence of a full stack.
Models provide the reasoning substrate. Agent products package common jobs. Harnesses and runtimes keep work stateful. Browser and sandbox infrastructure make tool use safer. Observability and evals make execution inspectable. Governance makes autonomy acceptable in organizations. AI workspaces give users a place to coordinate long-running work.
The winners will not simply be the teams with the biggest model benchmark number. They will be the teams that can turn model intelligence into reliable, reviewable, permissioned workflows.
References
Footnotes
-
Anthropic, “Introducing Claude 4,” https://www.anthropic.com/news/claude-4 ↩ ↩2 ↩3
-
Anthropic, “Claude Sonnet 4.5,” https://www.anthropic.com/news/claude-sonnet-4-5 ↩
-
Anthropic, “Claude Opus 4.8,” https://www.anthropic.com/news/claude-opus-4-8 ↩
-
Google, “Gemini 2.5 Pro coding performance,” https://developers.googleblog.com/en/gemini-2-5-pro-io-improved-coding-performance/ ↩ ↩2
-
DeepSeek, “DeepSeek-R1 release,” https://api-docs.deepseek.com/news/news250120 ↩
-
DeepSeek, “DeepSeek-V3.1 release,” https://api-docs.deepseek.com/news/news250821 ↩ ↩2
-
Qwen, “Qwen3-Coder,” https://qwenlm.github.io/blog/qwen3-coder/ ↩ ↩2
-
Mistral AI, “Magistral,” https://mistral.ai/news/magistral ↩
-
Anthropic, “Claude Code overview,” https://code.claude.com/docs/en/overview ↩
-
OpenAI Codex developer documentation, https://developers.openai.com/codex ↩
-
GitHub, “GitHub Copilot coding agent in public preview,” https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/ ↩
-
Cursor changelog, https://cursor.com/changelog ↩
-
Cognition, “Devin 2,” https://cognition.ai/blog/devin-2 ↩
-
Google, “Jules now available,” https://blog.google/innovation-and-ai/models-and-research/google-labs/jules-now-available/ ↩
-
Replit, “Introducing Agent 3,” https://replit.com/blog/introducing-agent-3-our-most-autonomous-agent-yet ↩
-
Lovable, https://lovable.dev/ ↩
-
Bolt.new, https://bolt.new/ ↩
-
Manus, https://manus.im/ ↩
-
Perplexity, “Getting started with Labs,” https://www.perplexity.ai/hub/getting-started ↩
-
OpenAI developer documentation, “Computer use,” https://developers.openai.com/api/docs/guides/tools-computer-use ↩
-
OpenAI developer documentation, “Agents,” https://developers.openai.com/api/docs/guides/agents ↩
-
Microsoft Copilot Studio release plan, https://learn.microsoft.com/en-us/power-platform/release-plan/2025wave2/microsoft-copilot-studio/ ↩
-
Microsoft, “6 core capabilities to scale agent adoption in 2026,” https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/6-core-capabilities-to-scale-agent-adoption-in-2026/ ↩ ↩2
-
Salesforce, “Salesforce launches Agentforce 3,” https://www.salesforce.com/ap/news/press-releases/2025/06/24/salesforce-launches-agentforce-3-to-solve-the-biggest-blockers-to-scaling-ai-agents-visibility-and-control/ ↩ ↩2
-
ServiceNow, “AI Control Tower,” https://www.servicenow.com/products/ai-control-tower.html ↩ ↩2
-
Zapier, “AI agents survey,” https://zapier.com/blog/ai-agents-survey/ ↩
-
Lindy Agents, https://www.lindy.ai/agents ↩
-
Gumloop, https://www.gumloop.com/ ↩
-
Dust documentation, “Welcome to Dust,” https://docs.dust.tt/docs/welcome-to-dust ↩
-
Hebbia product, https://www.hebbia.com/product ↩
-
LangChain, “LangChain and LangGraph 1.0,” https://www.langchain.com/blog/langchain-langgraph-1dot0 ↩
-
LangSmith platform, https://www.langchain.com/langsmith-platform ↩ ↩2
-
LlamaIndex, “Introducing LlamaIndex 0.11,” https://www.llamaindex.ai/blog/introducing-llamaindex-0-11 ↩
-
Microsoft Research, AutoGen, https://www.microsoft.com/en-us/research/project/autogen/ ↩
-
CrewAI, “CrewAI OSS 1.0,” https://blog.crewai.com/crewai-oss-1-0-we-are-going-ga/ ↩
-
OpenAI Agents SDK, https://openai.github.io/openai-agents-python/ ↩
-
Vercel AI SDK documentation, https://ai-sdk.dev/docs/introduction ↩
-
Vercel, “Agentic infrastructure,” https://vercel.com/blog/agentic-infrastructure ↩
-
Mastra, https://mastra.ai/ ↩
-
PydanticAI documentation, https://pydantic.dev/docs/ai/ ↩
-
Agno documentation, https://docs.agno.com/introduction ↩
-
Letta, “Letta v1 agent,” https://www.letta.com/blog/letta-v1-agent ↩
-
Browserbase for AI, https://www.browserbase.com/industry/ai ↩ ↩2
-
Browserbase Stagehand, https://www.browserbase.com/stagehand ↩ ↩2
-
Microsoft Playwright MCP, https://github.com/microsoft/playwright-mcp ↩ ↩2
-
E2B Enterprise, https://e2b.dev/enterprise ↩ ↩2
-
Daytona sandboxes, https://www.daytona.io/docs/en/sandboxes/ ↩ ↩2
-
Temporal AI solutions, https://temporal.io/solutions/ai ↩
-
Arcade, https://www.arcade.dev/ ↩
-
Composio, https://composio.dev/ ↩
-
Langfuse documentation, https://langfuse.com/docs ↩
-
Helicone, https://www.helicone.ai/ ↩
-
Temporal, Agentic AI, https://temporal.io/ai/agentic-ai ↩
-
Anthropic, “Model Context Protocol,” https://www.anthropic.com/news/model-context-protocol ↩
-
Model Context Protocol, “2026 MCP Roadmap,” https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/ ↩
-
MCPlato, https://mcplato.com/en/ ↩
