June 11, 2026 · 6 min read · Perspective

From Overview to Running: Deploying the Co-Scientist Pattern

By the Micantis Team
This is the practical companion to Co-Scientist Is a Pattern, Not a Product. If you haven't read the overview, start there.

The overview argued that the co-scientist pattern is open: a substrate, a set of tools, a playbook, and an LLM you choose. That's the architecture. This is the piece about what you actually do with it on Monday morning.

The orchestration layer

Between the LLM and the tools sits an agent harness. The harness takes a user question, decides whether to call a tool or answer directly, executes the tool, feeds the result back to the LLM, decides whether more work is needed, and produces an answer. For anything beyond a single-shot prompt, the harness is where most of the engineering effort goes: sandboxed Python execution, long-running sessions that survive disconnections, scoped permissions on tool access, checkpointing so a 4-hour task doesn't lose state, end-to-end tracing so you can debug what the agent did.

Six months ago this was DIY territory. Today there are four reasonable ways to run it, and the choice matters.

Four ways to run it

The same three-layer architecture from the overview runs on any of these. The choice is about how much infrastructure you want to own.

Roll your own

Anthropic's SDK (or the equivalent from whatever model vendor you've chosen) plus a framework like LangGraph, or just async Python if your needs are simple. You write the agent loop, you maintain the sandboxing, you handle credential management and session state.

  • Pros: Full control. No vendor lock-in at the orchestration layer. Runs wherever you can run Python. Cheapest at scale once the infrastructure is amortized.
  • Cons: You are now maintaining production agent infrastructure. Every model upgrade means rework. Sandboxing, scoped permissions, and durable sessions are real engineering, not weekends.
  • When this makes sense: Your team has serious infrastructure capacity, your security posture demands it, or your agent needs are simple enough that a thin custom loop is sufficient.

Claude Managed Agents

Anthropic's productized agent harness, launched April 8, 2026.1 You define the agent's tasks, tools, and guardrails; Anthropic runs the orchestration on their infrastructure. Includes sandboxed code execution, long-running sessions (autonomous for hours, surviving disconnections), credential management, scoped permissions, and execution tracing in the Claude Console.

  • Pros: Production-grade infrastructure on day one. Tuned for Claude models specifically, which Anthropic reports gives measurably better outcomes than a generic loop. MCP support is first-class; the harness was built around it.
  • Cons: You're committed to Claude as the LLM at this layer. Pricing is $0.08 per session-hour active runtime plus standard token costs, so heavy workloads have real ongoing cost.
  • When this makes sense: You've decided Claude is the right LLM for the work, you want to ship in days rather than months, and the operational overhead of your own runtime isn't where you want your engineers spending time.

Google's Vertex AI agent offerings, or AWS Bedrock Agents

Both major clouds have productized agent runtimes that follow the same shape: orchestration harness, tool dispatch, sandboxing, session management. Bedrock Agents has the longer track record in regulated-industry deployments; Vertex AI's agent stack is the natural choice if you're already standardized on Gemini and Google Cloud.

  • Pros: If you're already a heavy GCP or AWS customer, the data-residency, identity, and compliance story is already settled. FedRAMP, GovCloud, and similar are available where they matter.
  • Cons: MCP support across the major cloud agent platforms is newer than Anthropic's. Check current documentation for the specific MCP server registration story before committing.
  • When this makes sense: You've already made your strategic cloud and LLM decisions, and you want the agent runtime to live inside those choices.

How Micantis wires in, regardless of runtime

The substrate exposes its capabilities as MCP servers: micantis.data_query for live lab data, micantis.spec_library for acceptance criteria, micantis.method_library for canonical analyses, micantis.test_plans for protocol generation, micantis.report_templates for output formats. Each is a named tool the agent reaches by name through MCP.

For any of the four runtimes above, the wire-up is the same conceptually: register the MCP server endpoints with the agent, configure authentication, and the harness handles the rest. For Managed Agents specifically, registration is a Claude Console operation. For DIY, you include an MCP client library in your agent loop and add the Micantis endpoints. The mechanics differ in detail; the architecture does not.

The playbook from the skeleton loads as the agent's system context at deployment time. Same file, every runtime. Update the playbook, redeploy (or reload context, depending on the runtime), and the agent's behavior updates with it. The playbook is portable across runtimes by design. If you start on Managed Agents and decide later to migrate to DIY or to Bedrock, the playbook comes with you intact.

The data residency question

For regulated customers, this is where the runtime choice matters most.

The relevant Anthropic update is the May 19, 2026 self-hosted sandboxes and MCP tunnels release for Managed Agents.2 With self-hosted sandboxes, the agent's code execution and tool calls happen inside your environment. The orchestration harness runs on Anthropic's infrastructure, but the data path is: your data → your sandbox → token-level API calls back to Claude for reasoning. Your data stays in your environment; only the prompts and the tool-call results that the LLM needs to reason over cross the boundary, and those can be filtered or redacted before they leave.

For Bedrock, PrivateLink keeps API traffic within AWS. For Vertex AI, VPC Service Controls do the same within GCP. For DIY with a fully local LLM (a model running on your own hardware), no data leaves your network at all.

Defense, aerospace, medical, and other regulated customers usually land on one of three: Managed Agents with self-hosted sandboxes, Bedrock or Vertex with private networking, or DIY with a local LLM. Each has a trade-off between ease of deployment and the data control story you can tell your security team. None of them require your data to traverse the public internet to a vendor's cloud, which is the constraint that actually matters.

What day one looks like

Pick a runtime. For most teams the fastest path is Managed Agents. The production infrastructure is already in place, and MCP support is first-class. Register the Micantis MCP servers as tools through the Claude Console. Load your filled-in playbook as the agent's system context.

Ask your first question. "What finished on the test floor over the weekend? Anything off-spec?"

The agent calls micantis.data_query to pull the latest cycler results, applies the acceptance criteria from micantis.spec_library, runs the standard analyses through micantis.method_library, formats the response per your playbook's house style, and surfaces anomalies your playbook says are worth flagging. You read the answer, drink your coffee, decide what to do about the three cells that drifted on impedance.

Time from blank console to first useful answer, with a filled-in playbook and the MCP endpoints registered: an afternoon. Time from "we're going to evaluate this" to a meaningful pilot: a week or two, depending on how much of the playbook your team already had written down in some form.


The substrate, the tools, the playbook — those are the durable pieces. They outlive whichever runtime you pick today, because they're not tied to one. Choose the runtime that fits your security posture, your cloud, your LLM strategy, and your team's appetite for infrastructure work. Re-choose later if the constraints change. What's underneath stays portable: open formats, MCP servers, your playbook, your judgment.

The orchestration harness used to be the thing engineers built themselves because no one was selling it. That's no longer true. Take advantage.


References

  1. Anthropic. Claude Managed Agents: get to production 10x faster. April 8, 2026. claude.com/blog/claude-managed-agents.
  2. Anthropic. New in Claude Managed Agents: self-hosted sandboxes and MCP tunnels. May 19, 2026. claude.com/blog/claude-managed-agents-updates.