Your agent doesn't have to answer in plain text

The Forge

Last issue I said we'd build one of these. Here it is.

MCP Apps shipped as an official MCP extension, and it closes a gap that has been open since the protocol started. Until now, a tool could only hand back text. Your agent calls a tool, the tool returns a blob of JSON or a paragraph of prose, and the model reads it back to you. That works for an answer. It falls apart the moment the right response is a button, a form, a chart, or a thing you click.

MCP Apps fix that. A tool can now return a ui:// resource, a self-contained HTML page, and the host renders it in a sandboxed iframe right inside the conversation. The user clicks the button, the UI calls back to your server, and the result flows into both the page and the model's context. One tool, one HTML file, real UI.

Here is the part to get straight before you build one: an MCP App is not a website you bolt onto your agent. It is a tool that happens to render. The model still decides when to call it, the host still sandboxes it, and the value only shows up when the answer is genuinely interactive. A widget that displays one line of text is a worse version of returning that line of text. This issue is the full setup to ship your first app, plus the rule for when a tool should render and when it should just talk.

The short version: render UI when the user needs to act, return text when the user just needs to know.

The Blueprint

Three pieces get you a working app: one tool that points at a UI, one HTML file the tool renders, and one tunnel that hands it to Claude. Copy, paste, customize.

Step 0: install the packages. You need Node 18 or higher. The ext-apps package carries both halves, the server helpers and the browser-side App class.

npm install @modelcontextprotocol/ext-apps @modelcontextprotocol/sdk
npm install -D vite vite-plugin-singlefile express cors tsx typescript

Step 1: write a tool that renders. A normal MCP tool returns content. An app tool does that and adds one field, _meta.ui.resourceUri, pointing at a ui:// resource. That single field is what tells the host "render this, do not just read it." Save this as server.ts:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import {
  registerAppTool,
  registerAppResource,
  RESOURCE_MIME_TYPE,
} from "@modelcontextprotocol/ext-apps/server";
import express from "express";
import cors from "cors";
import fs from "node:fs/promises";
import path from "node:path";

const server = new McpServer({ name: "status-app", version: "1.0.0" });

// The ui:// scheme is the whole trick. It marks this as an app resource.
const resourceUri = "ui://build-status/app.html";

registerAppTool(
  server,
  "build-status",
  {
    title: "Build status",
    description: "Show the current build status as an interactive card.",
    inputSchema: {},
    _meta: { ui: { resourceUri } },
  },
  async () => {
    // Replace this with a real call to your CI or your own system.
    const status = { state: "passing", commit: "a1b2c3d", at: new Date().toISOString() };
    return { content: [{ type: "text", text: JSON.stringify(status) }] };
  },
);

registerAppResource(
  server,
  resourceUri,
  resourceUri,
  { mimeType: RESOURCE_MIME_TYPE },
  async () => {
    const html = await fs.readFile(path.join(import.meta.dirname, "dist", "app.html"), "utf-8");
    return { contents: [{ uri: resourceUri, mimeType: RESOURCE_MIME_TYPE, text: html }] };
  },
);

const expressApp = express();
expressApp.use(cors());
expressApp.use(express.json());
expressApp.post("/mcp", async (req, res) => {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: undefined,
    enableJsonResponse: true,
  });
  res.on("close", () => transport.close());
  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});
expressApp.listen(3001, () => console.log("MCP App on http://localhost:3001/mcp"));

Note the single tool doing double duty. The host calls build-status, gets the result, fetches the UI, and pushes that first result into the page. The same tool is what the UI calls again when the user wants fresh data. You do not need a second endpoint for the button.

Step 2: write the UI. The page is plain HTML. The App class is the bridge: connect() opens the channel to the host, ontoolresult receives the data the host pushes in, and callServerTool() lets a click ask the server for more. Save the page as app.html:

<!DOCTYPE html>
<html lang="en">
  <head><meta charset="UTF-8" /><title>Build status</title></head>
  <body>
    <p><strong>Build:</strong> <code id="state">loading...</code></p>
    <p><small id="meta"></small></p>
    <button id="refresh">Refresh</button>
    <script type="module" src="/src/app.ts"></script>
  </body>
</html>

And the logic as src/app.ts:

import { App } from "@modelcontextprotocol/ext-apps";

const stateEl = document.getElementById("state")!;
const metaEl = document.getElementById("meta")!;
const app = new App({ name: "status-app", version: "1.0.0" });

function render(text?: string) {
  if (!text) return;
  const s = JSON.parse(text);
  stateEl.textContent = s.state;
  metaEl.textContent = `commit ${s.commit}, checked ${s.at}`;
}

app.connect();

// Fires when the host hands the page its first tool result.
app.ontoolresult = (result) => {
  render(result.content?.find((c) => c.type === "text")?.text);
};

// A click asks the server for fresh data. Each call is a round-trip, so plan for latency.
document.getElementById("refresh")!.addEventListener("click", async () => {
  const result = await app.callServerTool({ name: "build-status", arguments: {} });
  render(result.content?.find((c) => c.type === "text")?.text);
});

Step 3: build it, serve it, hand it to Claude. The HTML and its assets bundle into one file so the sandbox can load it without wrestling with CSP. Add these two scripts to package.json (with "type": "module"), then build, serve, and tunnel:

# package.json scripts:
#   "build": "INPUT=app.html vite build",
#   "serve": "npx tsx server.ts"

npm run build
npm run serve

# in a second terminal, expose the local server:
npx cloudflared tunnel --url http://localhost:3001

Take the https://...trycloudflare.com URL the tunnel prints, open Claude, go to Settings, Connectors, Add custom connector, and paste it with /mcp on the end. Ask Claude for the build status. It calls your tool, the card renders in the chat, and the Refresh button calls back to your server. That is a real MCP App doing real work.

The Anvil

Now the part the launch demos skip: where apps break, and how to stop the bleeding.

The iframe is sandboxed on purpose, and it will bite you. Your UI renders with a deny-by-default content security policy. External scripts, fonts, and stylesheets do not load unless you configure CSP, which is exactly why the bundle-to-one-file step exists. Inline your assets or bundle them with vite-plugin-singlefile and the sandbox stays happy. Reach for a CDN <script> tag and your widget renders blank with a console error you will not see until you go looking.

Every button click is a network round-trip. callServerTool() is not a local function call. It goes to your server and back. One Refresh button is fine. A widget that fires a tool call on every keystroke is a widget that feels broken. Debounce input, show a loading state, and design the UI so the user expects a beat between click and result.

A widget is not always the right answer. This is the one that matters. The temptation with a new toy is to render everything. Resist it. If the tool returns a single fact, return text and let the model say it. UI earns its place when the user needs to do something: confirm a destructive action, pick from options, scrub a value, watch a thing update. Rendering a one-line answer as an interactive card is slower for the user and more code for you, for no gain.

Custom connectors need a paid Claude plan. Testing your app in Claude itself (Pro, Max, or Team) requires a paid plan to add the connector. If you are just iterating on the UI, the ext-apps repo ships a basic-host you can run locally with no account at all. Point it at your server with SERVERS='["http://localhost:3001/mcp"]' npm start and develop against that, then move to Claude when the widget is real.

The rule of thumb: match the surface to the answer. Text for things the user needs to know, UI for things the user needs to do. The best MCP App is the one your user does not notice is an app, because the button was simply the obvious way to answer.

Sparks

A few more things worth your attention this week:

The fastest way to scaffold one of these is to let Claude do it. Anthropic shipped a create-mcp-app skill: install it in Claude Code (/plugin install mcp-apps@modelcontextprotocol-ext-apps), then say "create an MCP App that shows a color picker" and it writes the server, UI, and config for you. The manual build above is here so you understand what the skill generates.
Claude Managed Agents went to public beta. You define an agent in a YAML file (name, model, system prompt, guardrails) and Anthropic runs it on their infrastructure, prototype to deployed in days. Worth a look if you are tired of standing up your own agent hosting. We may build one in a future issue.
The 2026-07-28 MCP spec release candidate we flagged last issue lists MCP Apps as a first-class extension alongside the stateless core and the Tasks extension for long-running work. The UI you build today is on the roadmap, not a side experiment.
MCP Apps render across hosts, not just Claude. ChatGPT and Microsoft 365 Copilot both support the extension, so a well-built app is portable in a way most agent UI is not.

The Smith's Take

For two years the entire surface area between an agent and a person has been a text box. Every tool, every result, every action funneled through prose. MCP Apps are the first real crack in that, and the builders who get value from them are not the ones who render everything. They are the ones who looked at a result and asked one question: does the user need to read this, or do something with it?

That is the whole game. A tool that returns text is not lesser. A tool that returns UI is not better. They are answers to different questions, and the skill is knowing which question you are looking at. Render the button when there is a decision to make. Return the sentence when there is only a fact to deliver.

Build the status widget tonight. Point Claude at it through a tunnel, click the button, and watch the call come back to your server. Once you have seen one app render in the chat, you will know exactly which of your tools have been waiting for a face.

Build agents that actually work.

Michael