ai-agents

agent-stack

agent-harness

runtime

ai-workspace

mcplato

The 2026 H1 Agent Stack: Models, Harnesses, Runtimes, and AI Workspaces

A concise 2026 H1 landscape of AI agents, coding agents, harnesses, runtimes, browser and sandbox infrastructure, observability, governance, and AI workspaces — with MCPlato positioned as part of the workspace layer.

MCPlato Research TeamPublished on 2026-05-29

The agent race in 2026 H1 no longer looks like a simple model leaderboard.

Better models still matter. Claude 4, Claude Sonnet 4.5, Claude Opus 4.8, Gemini 2.5 Pro, DeepSeek R1/V3.1, Qwen3-Coder, and Mistral Magistral all pushed the base layer forward in reasoning, coding, context, and tool use.¹²³⁴⁵⁶⁷⁸ But the competitive question has changed:

Who can put those models into reliable work?

That means harnesses, runtimes, browsers, sandboxes, evals, observability, governance, permissions, and user-facing workspaces. The model is the engine. The agent product is the vehicle. The harness and workspace decide whether the vehicle can run inside a real company without losing state, authority, or trust.

The layered 2026 H1 agent stack

A useful way to read the market is as a stack, not a directory of logos.

A layered 2026 H1 agent stack from foundation models to AI workspace

Figure 1: The 2026 H1 agent stack is moving upward from model capability into execution, observability, governance, and workspace continuity.

Layer	What it contributes	Representative examples
Foundation models	Reasoning, coding, long context, computer/tool use, planning	Claude 4 / Sonnet 4.5 / Opus 4.8, Gemini 2.5 Pro, DeepSeek R1/V3.1, Qwen3-Coder, Mistral Magistral
Agent products	Packaged workflows for coding, research, app building, operations, and enterprise processes	Claude Code, OpenAI Codex, GitHub Copilot coding agent, Cursor, Devin, Jules, Replit Agent, Lovable, Bolt.new, Manus, Perplexity Labs
Harness / runtime	State, retries, human-in-the-loop, orchestration, memory, structured tool calls	LangGraph/LangChain, LlamaIndex, AutoGen, CrewAI, OpenAI Agents SDK, Vercel AI SDK, Mastra, PydanticAI, Agno, Letta
Browser and sandbox infra	Safe execution environments, browser automation, code sandboxes, task isolation	Browserbase, Stagehand, Playwright MCP, E2B, Daytona, Temporal, Arcade, Composio
Observability and evals	Traces, cost, latency, regression tests, prompt/tool debugging, production review	LangSmith, Langfuse, Helicone, model and agent benchmarks
Enterprise governance	Visibility, access control, policy, agent inventory, auditability, compliance workflows	Microsoft Copilot Studio, Salesforce Agentforce, ServiceNow AI Control Tower, MCP-based integration patterns
AI workspace	The user-facing place where multi-step work, files, sessions, artifacts, and decisions persist	MCPlato, Dust, Hebbia, workspace-style agent platforms

The important point is not that every product must cover every layer. It is that serious agent work now needs all of them somewhere in the system.

Product clusters, not a raw directory

1. Coding agents became the first mass-market agent category

Coding agents are the clearest proof that agents can move beyond chat. Claude Code became generally available alongside Claude 4 and is documented as an agentic coding tool for terminal and development workflows.¹⁹ OpenAI Codex, GitHub Copilot coding agent, Cursor, Devin, Google Jules, and Replit Agent all point to the same direction: developers want agents that can inspect repositories, edit files, run commands, open pull requests, and continue work across local and cloud contexts.¹⁰¹¹¹²¹³¹⁴¹⁵

This cluster is ahead because software work already has useful guardrails: files, diffs, tests, logs, branches, CI, and review. The lesson for the rest of the market is not “everything should be coding.” It is that agents need reviewable artifacts and verification loops.

2. App builders and general agents turned prompts into workflows

Lovable, Bolt.new, Replit Agent, and Manus are examples of products centered on producing apps, websites, or executable work; Perplexity describes Labs as a creation feature for projects such as reports, dashboards, and lightweight apps.¹⁶¹⁷¹⁸¹⁹ OpenAI's developer documentation describes computer-use and agent-building primitives, including a visual browser tool surface, so its agent direction is better treated as part of the same workflow shift rather than as a simple chat feature.²⁰²¹

These products compress the distance between intent and artifact. Their challenge is the same challenge facing the broader agent market: once the task becomes long-running, multi-step, or externally visible, the product needs state, permissions, rollback, and a clear handoff from generated draft to production asset.

3. Enterprise agents are shifting from adoption to control

Enterprise agent platforms are now talking less like demo tools and more like operating systems for governed automation. Microsoft Copilot Studio emphasizes capabilities for scaling agent adoption.²²²³ Salesforce Agentforce 3 highlights visibility and control through a Command Center, MCP support, lower latency, and industry actions.²⁴ ServiceNow positions AI Control Tower as a product for managing the AI lifecycle and governing agents, models, and workflows; its product page is a safer reference point than relying only on a press-release URL.²⁵

Zapier Agents, Lindy, Gumloop, Dust, and Hebbia sit closer to business-team workflow automation and knowledge work.²⁶²⁷²⁸²⁹³⁰ They matter because agent adoption is not only an engineering problem. Sales, finance, legal, operations, recruiting, research, and support teams also need agent systems that can use tools without quietly bypassing policy.

4. Frameworks and runtimes became the agent middle layer

LangGraph/LangChain, LangSmith, LlamaIndex, AutoGen, CrewAI, OpenAI Agents SDK, Vercel AI SDK, Mastra, PydanticAI, Agno, and Letta represent the build layer beneath packaged products.³¹³²³³³⁴³⁵³⁶³⁷³⁸³⁹⁴⁰⁴¹⁴²

This layer is where durable state, memory, tool routing, human approval, structured outputs, and multi-agent orchestration become reusable primitives. It is also where many teams discover that “agent” is not one abstraction. A retrieval assistant, a coding worker, a browser operator, a finance analyst, and a customer-service agent need different runtime contracts.

5. Infra and observability became production requirements

Browserbase, Stagehand, Playwright MCP, E2B, Daytona, Temporal, Arcade, and Composio are not peripheral tools. They are part of the agent control plane.⁴³⁴⁴⁴⁵⁴⁶⁴⁷⁴⁸⁴⁹⁵⁰

Agents need browsers because much of the working web still lacks clean APIs. They need sandboxes because code and tools must run in isolated environments. They need durable workflow engines because long tasks fail and resume. They need integration gateways because credentials, permissions, and action scopes should not be improvised inside a prompt.

LangSmith, Langfuse, and Helicone show the same maturation from the observability side.³²⁵¹⁵² If an agent is touching customer data, production systems, or expensive model calls, teams need traces, evals, cost visibility, latency visibility, and regression checks.

Five trends to watch

1. Model-only differentiation is fading into runtime differentiation

The best models are converging on strong coding, tool use, long context, and planning. Anthropic reports Claude 4 coding results and Claude Code availability, while Gemini 2.5 Pro emphasizes coding and long-context capability, DeepSeek V3.1 frames itself as a step toward the agent era, and Qwen3-Coder highlights large-scale code-agent training environments.¹⁴⁶⁷

That makes the runtime more important, not less. When multiple base models can reason well enough, teams choose the stack that can preserve state, call tools safely, evaluate outcomes, and keep humans in control.

2. Observability is becoming the production gate

The question “Did the model answer?” is too weak for agents. Production teams need to know:

Which tools were called?
What state changed?
What evidence supports completion?
How much did the run cost?
Where did latency appear?
Which prompt, model, tool, or environment change caused a regression?

This is why LangSmith, Langfuse, Helicone, benchmark suites, and enterprise command centers are becoming part of the buying discussion. A company cannot govern what it cannot see.

3. Browser and code sandboxes are becoming first-class infra

Computer-use agents and coding agents need safe operating surfaces. Browserbase and Stagehand focus on browser automation for AI agents; Playwright MCP exposes browser control through MCP; E2B and Daytona focus on isolated execution environments; Temporal frames durable execution for agentic AI workflows.⁴³⁴⁴⁴⁵⁴⁶⁴⁷⁵³

This is one of the most important shifts of 2026 H1: the “agent environment” is becoming a product category. The environment is where autonomy becomes either useful or dangerous.

4. Governance and protocols are becoming default expectations

MCP is important because it gives the market a shared language for connecting models to tools and context.⁵⁴⁵⁵ But protocols do not remove governance requirements. They make governance more urgent: once tools are easier to connect, teams need clearer policies for who can connect them, what actions are allowed, how credentials are scoped, and how activity is audited.

Salesforce Agentforce, ServiceNow AI Control Tower, and Microsoft Copilot Studio all reflect this enterprise reality.²⁴²⁵²³ Agent adoption now depends on visibility, policy, permissions, and operational ownership, not only prompt quality.

5. Async multi-session workspace is the missing user layer

A single chat thread is a poor container for long work. Real agent work often branches: one session researches, another drafts, another tests, another reviews, another waits for a scheduled follow-up. Users need a place where those workstreams, files, decisions, and artifacts remain inspectable.

This is where MCPlato fits naturally. MCPlato is best understood as an AI workspace layer: an environment for local materials, multiple sessions, background or scheduled work, artifacts, and permissioned observable execution.⁵⁶ It should not be treated as a universal replacement for coding agents, enterprise control towers, or browser infrastructure. Its role is different: helping users organize and supervise AI work that spans documents, research, browser context, office outputs, and asynchronous follow-through.

In other words, MCPlato belongs on the workspace layer of the agent stack: close to the user, close to the materials, and above the lower-level runtime and infra components that make execution possible.

A practical decision framework

A decision matrix for choosing agent products by autonomy horizon and governance needs

Figure 2: Agent stack choices should be based on autonomy horizon and governance pressure, not on a single universal ranking.

Use five questions before choosing an agent stack.

Question	If the answer is “yes,” prioritize
Will the agent modify code, data, records, or external systems?	Sandbox, permissions, audit logs, review gates, rollback paths
Will the task run longer than one prompt or one session?	Durable state, checkpoints, background execution, workspace continuity
Will the agent use browsers or execute code?	Browser automation infra, isolated sandboxes, credential boundaries
Will multiple teams rely on the output?	Observability, evals, cost tracking, policy, ownership
Will users need to supervise many parallel workstreams?	AI workspace, multi-session orchestration, artifacts, summaries, handoff discipline

A simple mapping helps:

Short coding task: start with a coding-native agent such as Claude Code, Codex, Cursor, Jules, Devin, Replit Agent, or GitHub Copilot coding agent.
App prototype: consider Lovable, Bolt.new, Replit Agent, or similar builder surfaces, then add review before production use.
Business workflow automation: look at Copilot Studio, Agentforce, ServiceNow, Zapier Agents, Lindy, Gumloop, Dust, or Hebbia depending on data, governance, and domain fit.
Custom agent product: assemble runtime and infra pieces such as LangGraph, LlamaIndex, CrewAI, OpenAI Agents SDK, Vercel AI SDK, MCP, Browserbase, E2B, Temporal, Composio, Langfuse, Helicone, and LangSmith.
Cross-material knowledge work: use an AI workspace pattern, where MCPlato is a relevant example, especially when the work spans local materials, research, artifacts, multiple sessions, and permissioned execution.

Conclusion

The 2026 H1 agent landscape is not a battle between “models” and “products.” It is the emergence of a full stack.

Models provide the reasoning substrate. Agent products package common jobs. Harnesses and runtimes keep work stateful. Browser and sandbox infrastructure make tool use safer. Observability and evals make execution inspectable. Governance makes autonomy acceptable in organizations. AI workspaces give users a place to coordinate long-running work.

The winners will not simply be the teams with the biggest model benchmark number. They will be the teams that can turn model intelligence into reliable, reviewable, permissioned workflows.

References

Footnotes

Anthropic, “Introducing Claude 4,” https://www.anthropic.com/news/claude-4 ↩ ↩² ↩³
Anthropic, “Claude Sonnet 4.5,” https://www.anthropic.com/news/claude-sonnet-4-5 ↩
Anthropic, “Claude Opus 4.8,” https://www.anthropic.com/news/claude-opus-4-8 ↩
Google, “Gemini 2.5 Pro coding performance,” https://developers.googleblog.com/en/gemini-2-5-pro-io-improved-coding-performance/ ↩ ↩²
DeepSeek, “DeepSeek-R1 release,” https://api-docs.deepseek.com/news/news250120 ↩
DeepSeek, “DeepSeek-V3.1 release,” https://api-docs.deepseek.com/news/news250821 ↩ ↩²
Qwen, “Qwen3-Coder,” https://qwenlm.github.io/blog/qwen3-coder/ ↩ ↩²
Mistral AI, “Magistral,” https://mistral.ai/news/magistral ↩
Anthropic, “Claude Code overview,” https://code.claude.com/docs/en/overview ↩
OpenAI Codex developer documentation, https://developers.openai.com/codex ↩
GitHub, “GitHub Copilot coding agent in public preview,” https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/ ↩
Cursor changelog, https://cursor.com/changelog ↩
Cognition, “Devin 2,” https://cognition.ai/blog/devin-2 ↩
Google, “Jules now available,” https://blog.google/innovation-and-ai/models-and-research/google-labs/jules-now-available/ ↩
Replit, “Introducing Agent 3,” https://replit.com/blog/introducing-agent-3-our-most-autonomous-agent-yet ↩
Lovable, https://lovable.dev/ ↩
Bolt.new, https://bolt.new/ ↩
Manus, https://manus.im/ ↩
Perplexity, “Getting started with Labs,” https://www.perplexity.ai/hub/getting-started ↩
OpenAI developer documentation, “Computer use,” https://developers.openai.com/api/docs/guides/tools-computer-use ↩
OpenAI developer documentation, “Agents,” https://developers.openai.com/api/docs/guides/agents ↩
Microsoft Copilot Studio release plan, https://learn.microsoft.com/en-us/power-platform/release-plan/2025wave2/microsoft-copilot-studio/ ↩
Microsoft, “6 core capabilities to scale agent adoption in 2026,” https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/6-core-capabilities-to-scale-agent-adoption-in-2026/ ↩ ↩²
Salesforce, “Salesforce launches Agentforce 3,” https://www.salesforce.com/ap/news/press-releases/2025/06/24/salesforce-launches-agentforce-3-to-solve-the-biggest-blockers-to-scaling-ai-agents-visibility-and-control/ ↩ ↩²
ServiceNow, “AI Control Tower,” https://www.servicenow.com/products/ai-control-tower.html ↩ ↩²
Zapier, “AI agents survey,” https://zapier.com/blog/ai-agents-survey/ ↩
Lindy Agents, https://www.lindy.ai/agents ↩
Gumloop, https://www.gumloop.com/ ↩
Dust documentation, “Welcome to Dust,” https://docs.dust.tt/docs/welcome-to-dust ↩
Hebbia product, https://www.hebbia.com/product ↩
LangChain, “LangChain and LangGraph 1.0,” https://www.langchain.com/blog/langchain-langgraph-1dot0 ↩
LangSmith platform, https://www.langchain.com/langsmith-platform ↩ ↩²
LlamaIndex, “Introducing LlamaIndex 0.11,” https://www.llamaindex.ai/blog/introducing-llamaindex-0-11 ↩
Microsoft Research, AutoGen, https://www.microsoft.com/en-us/research/project/autogen/ ↩
CrewAI, “CrewAI OSS 1.0,” https://blog.crewai.com/crewai-oss-1-0-we-are-going-ga/ ↩
OpenAI Agents SDK, https://openai.github.io/openai-agents-python/ ↩
Vercel AI SDK documentation, https://ai-sdk.dev/docs/introduction ↩
Vercel, “Agentic infrastructure,” https://vercel.com/blog/agentic-infrastructure ↩
Mastra, https://mastra.ai/ ↩
PydanticAI documentation, https://pydantic.dev/docs/ai/ ↩
Agno documentation, https://docs.agno.com/introduction ↩
Letta, “Letta v1 agent,” https://www.letta.com/blog/letta-v1-agent ↩
Browserbase for AI, https://www.browserbase.com/industry/ai ↩ ↩²
Browserbase Stagehand, https://www.browserbase.com/stagehand ↩ ↩²
Microsoft Playwright MCP, https://github.com/microsoft/playwright-mcp ↩ ↩²
E2B Enterprise, https://e2b.dev/enterprise ↩ ↩²
Daytona sandboxes, https://www.daytona.io/docs/en/sandboxes/ ↩ ↩²
Temporal AI solutions, https://temporal.io/solutions/ai ↩
Arcade, https://www.arcade.dev/ ↩
Composio, https://composio.dev/ ↩
Langfuse documentation, https://langfuse.com/docs ↩
Helicone, https://www.helicone.ai/ ↩
Temporal, Agentic AI, https://temporal.io/ai/agentic-ai ↩
Anthropic, “Model Context Protocol,” https://www.anthropic.com/news/model-context-protocol ↩
Model Context Protocol, “2026 MCP Roadmap,” https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/ ↩
MCPlato, https://mcplato.com/en/ ↩

AI Agent Harness Comparison: OpenClaw, Claude Code, Hermes Agent, and MCPlato
A practical, evidence-based guide to choosing an AI agent harness by workflow, permissions, deployment model, and measurable pilot results.
Pi vs Hermes vs Codex vs Claude Code: Which AI Agent Fits?
A source-based comparison of Pi Agent, Hermes Agent, Codex, Claude Code, and MCPlato for coding, automation, permissions, and long-running work.
Harness and Agent: The Layered Architecture of AI Systems
Exploring the relationship between tool layer and agent layer, and how MCPlato implements MCP-native architecture
Engineering Breakthrough for Long-Running AI Agents: Why Anthropic's Harness Framework Matters
AI fails at long tasks not because it's not smart enough, but because it lacks engineering work methods. Deep dive into the four core mechanisms of Anthropic's Harness framework and how MCPlato implements similar engineering designs.
Why SaaS-Bench Shows AI Agents Need Harnesses, Not Just Bigger Models
SaaS-Bench tests computer-use agents on real professional SaaS workflows and exposes the gap between partial progress and verified completion. The result points to agent harnesses, workspace state, verification, permissions, and recovery as the next product layer.