MCP Sampling is a bidirectional communication mechanism in the Model Context Protocol. Normally your agent calls out to an MCP server to use its tools. With sampling, the MCP server can call back into your agent and request an LLM completion mid-execution.

This enables MCP servers that need AI reasoning as part of their tool execution — for example, a server that generates personalised content, a messaging bot that routes incoming messages through your agent, or a multi-turn orchestration workflow.

Quick Start

The fastest way to enable sampling is to pass a runner's ask method directly as the handler:

import { AgentBuilder, createSamplingHandler, McpToolset } from "@iqai/adk";

// 1. Create an agent that will handle sampling requests
const { runner } = await AgentBuilder.withModel("gemini-2.5-flash")
  .withInstruction("You are a helpful assistant.")
  .build();

// 2. Wrap its ask method as a sampling handler
const samplingHandler = createSamplingHandler(runner.ask);

// 3. Pass to an MCP toolset
const toolset = new McpToolset({
  name: "My MCP Server",
  description: "Server with sampling capabilities",
  samplingHandler,
  transport: {
    mode: "stdio",
    command: "node",
    args: ["./my-mcp-server/dist/index.js"],
  },
});

const tools = await toolset.getTools();

Model selection with runner.ask

When you pass runner.ask as the handler, the runner uses its own configured model (e.g. "gemini-2.5-flash" above). The model preference sent by the MCP server is ignored. To honour it, write a custom handler that reads request.model.

createSamplingHandler is a type helper

createSamplingHandler(handler) is an identity function — it returns the function you pass in unchanged. Its only purpose is TypeScript type inference, ensuring your function matches the SamplingHandler signature. You can omit it and pass the function directly if you prefer.

How It Works

When an MCP server calls session.requestSampling(), ADK-TS handles the entire protocol conversion for you:

Receives the raw MCP request — the McpSamplingHandler validates it against the MCP schema.
Converts MCP messages to ADK-TS format — { role, content: { type, text } } becomes Content[] with { role, parts: [{ text }] }. Roles map directly: "user" stays "user", "assistant" becomes "model".
Creates an LlmRequest with the converted contents, model preference, temperature, and maxTokens.
Calls your sampling handler with this LlmRequest.
Converts your response back to MCP format and returns it to the MCP server.

Supported content types

text, image (inline base64), and audio (inline base64) are fully converted between MCP and ADK-TS formats. tool_use and tool_result content types are converted to text placeholders.

Understanding the LlmRequest

Your sampling handler receives an LlmRequest with these fields:

Field	Type	Description
`model`	`string`	Model the MCP server prefers, or `"gemini-2.0-flash"` by default
`contents`	`Content[]`	Conversation messages converted from MCP format. Each entry has `role` (`"user"` or `"model"`) and `parts` (array of `{ text }`, `{ inlineData }`, etc.)
`config.temperature`	`number`	From the MCP request's `temperature` field
`config.maxOutputTokens`	`number`	From the MCP request's `maxTokens` field

System Prompt Placement

The system prompt is contents[0], not config.systemInstruction

If the MCP server sends a systemPrompt, McpSamplingHandler prepends it to the conversation as contents[0] (a "user" role message). It is not placed in config.systemInstruction. This means:

contents[0] — system prompt (if present)
contents[1] (or contents[0] if no system prompt) — the first user message

Extracting Text

Use the static LlmRequest.extractTextFromContent() helper to pull text out of a Content object:

import { AgentBuilder, createSamplingHandler, LlmRequest } from "@iqai/adk";

const { runner } = await AgentBuilder.withModel("gemini-2.5-flash")
  .withInstruction("You are a helpful assistant.")
  .build();

const samplingHandler = createSamplingHandler(async request => {
  // The last entry is the actual user message
  const lastContent = request.contents[request.contents.length - 1];
  const messageText = LlmRequest.extractTextFromContent(lastContent);

  // If the MCP server provided a systemPrompt, it is prepended as contents[0]
  const systemText =
    request.contents.length > 1
      ? LlmRequest.extractTextFromContent(request.contents[0])
      : "";

  console.log("System prompt:", systemText);
  console.log("Message:", messageText);

  return runner.ask(messageText);
});

Custom Sampling Handlers

Instead of forwarding straight to runner.ask, you can write handlers that inspect, transform, or route requests before calling any LLM.

For a complete runnable example, see the MCP integrations example.

Routing to Different Agents

A common pattern is to inspect the system prompt and route the request to a specialised agent:

import { AgentBuilder, createSamplingHandler, LlmRequest } from "@iqai/adk";

// Two specialised agents
const { runner: creativeRunner } = await AgentBuilder.withModel(
  "gemini-2.5-flash",
)
  .withInstruction("You are a warm, creative writer.")
  .build();

const { runner: factRunner } = await AgentBuilder.withModel("gemini-2.5-flash")
  .withInstruction("You are a concise, factual encyclopedia.")
  .build();

const samplingHandler = createSamplingHandler(async request => {
  const lastContent = request.contents[request.contents.length - 1];
  const messageText = LlmRequest.extractTextFromContent(lastContent);

  const systemText =
    request.contents.length > 1
      ? LlmRequest.extractTextFromContent(request.contents[0])
      : "";

  // Route based on keywords in the system prompt
  if (systemText.toLowerCase().includes("creative")) {
    return creativeRunner.ask(messageText);
  }
  if (systemText.toLowerCase().includes("encyclopedia")) {
    return factRunner.ask(messageText);
  }

  return creativeRunner.ask(messageText); // default
});

Adding Context Before Forwarding

Enrich the prompt with extra information before forwarding to the LLM:

const samplingHandler = createSamplingHandler(async request => {
  const lastContent = request.contents[request.contents.length - 1];
  const messageText = LlmRequest.extractTextFromContent(lastContent);

  const enrichedPrompt = `
Context: The current user is Alice, timezone UTC+5.
Active reminders: ${JSON.stringify(await getActiveReminders())}

MCP server request: ${messageText}
  `.trim();

  return runner.ask(enrichedPrompt);
});

Returning String vs LlmResponse

Your handler can return either a plain string or a full LlmResponse. Returning a string is simpler and covers most cases — the framework wraps it in the correct MCP response format automatically.

import { createSamplingHandler, LlmResponse } from "@iqai/adk";

// Simple — return a string (recommended for most cases)
const simpleHandler = createSamplingHandler(async _request => {
  return "Hello from the handler!";
});

// Advanced — return a full LlmResponse for explicit control
const advancedHandler = createSamplingHandler(async _request => {
  return new LlmResponse({
    content: {
      role: "model",
      parts: [{ text: "Hello from the handler!" }],
    },
  });
});

Honouring the MCP Server's Model Preference

When you need to use whichever model the MCP server requested:

import { AgentBuilder, createSamplingHandler, LlmRequest } from "@iqai/adk";

const samplingHandler = createSamplingHandler(async request => {
  const lastContent = request.contents[request.contents.length - 1];
  const messageText = LlmRequest.extractTextFromContent(lastContent);

  // request.model contains the server's preference (or the default "gemini-2.0-flash")
  const { runner } = await AgentBuilder.withModel(request.model).build();
  return runner.ask(messageText);
});

Writing an MCP Server That Uses Sampling

On the server side, use session.requestSampling() to request an LLM completion from the connected ADK-TS client.

FastMCP session access

In FastMCP, the context.session passed to a tool's execute function is the auth object, not the FastMCPSession. To call requestSampling(), use server.sessions[0] instead (for stdio transport there is always exactly one session).

import { FastMCP } from "fastmcp";
import { z } from "zod";

const server = new FastMCP({
  name: "my-server",
  version: "1.0.0",
});

server.addTool({
  name: "summarize_data",
  description: "Summarizes data using an LLM via sampling",
  parameters: z.object({
    data: z.string().describe("The data to summarize"),
  }),
  execute: async ({ data }) => {
    // Access the FastMCPSession (not context.session)
    const session = server.sessions[0];

    if (!session?.requestSampling) {
      return "Sampling not available.";
    }

    const response = await session.requestSampling({
      messages: [
        {
          role: "user",
          content: {
            type: "text",
            text: `Please summarize this data concisely:\n\n${data}`,
          },
        },
      ],
      systemPrompt: "You are a concise summarizer. Respond in 2-3 sentences.",
      maxTokens: 200,
    });

    return response?.content?.type === "text"
      ? response.content.text
      : "No response received.";
  },
});

await server.start({ transportType: "stdio" });

Sampling Request Options

Parameter	Type	Required	Description
`messages`	Array	Yes	Conversation messages with `role` and `content`
`maxTokens`	number	Yes	Maximum tokens in the response
`systemPrompt`	string	No	Prepended to the conversation on the ADK-TS client side as `contents[0]`
`temperature`	number	No	Controls randomness (0–1)
`modelPreferences`	object	No	Hint which model to use via `hints[].name`
`includeContext`	string	No	Context inclusion directive (`"thisServer"`, `"allServers"`)

Multi-Turn Conversations

You can send multi-turn conversations in a single sampling request:

const response = await session.requestSampling({
  messages: [
    {
      role: "user",
      content: { type: "text", text: "My name is Alice." },
    },
    {
      role: "assistant",
      content: { type: "text", text: "Hello Alice! How can I help?" },
    },
    {
      role: "user",
      content: { type: "text", text: "What's my name?" },
    },
  ],
  maxTokens: 50,
});

Requesting a Specific Model

const response = await session.requestSampling({
  messages: [
    {
      role: "user",
      content: { type: "text", text: "Explain quantum computing." },
    },
  ],
  maxTokens: 500,
  modelPreferences: {
    hints: [{ name: "gemini-2.5-flash" }],
  },
});

Model preferences are hints, not guarantees

modelPreferences.hints is a suggestion. The sampling handler on the ADK-TS side decides which model is actually used. If no hint is provided, ADK-TS defaults to "gemini-2.0-flash".

Managing the Sampling Handler at Runtime

You can swap or remove the sampling handler after the toolset is created, without re-initialising the connection:

import {
  AgentBuilder,
  createSamplingHandler,
  LlmRequest,
  McpToolset,
} from "@iqai/adk";

const toolset = new McpToolset({
  name: "Dynamic Server",
  description: "Server with a swappable sampling handler",
  transport: { mode: "stdio", command: "node", args: ["./server.js"] },
});

const tools = await toolset.getTools();

const { runner: basicRunner } =
  await AgentBuilder.withModel("gemini-2.0-flash").build();
const { runner: advancedRunner } =
  await AgentBuilder.withModel("gemini-2.5-flash").build();

// Set initial handler
toolset.setSamplingHandler(async request => {
  const text = LlmRequest.extractTextFromContent(
    request.contents[request.contents.length - 1],
  );
  return basicRunner.ask(text);
});

// Upgrade to a better model later
toolset.setSamplingHandler(async request => {
  const text = LlmRequest.extractTextFromContent(
    request.contents[request.contents.length - 1],
  );
  return advancedRunner.ask(text);
});

// Disable sampling entirely
toolset.removeSamplingHandler();

Real-World Architecture Patterns

Coordinator Agent with Sub-Agents

For complex applications such as a Telegram bot, a common pattern is to use sampling as the entry point into a multi-agent system with persistent sessions:

The Telegram Personal Assistant example demonstrates this pattern in a production-ready application. It uses:

Sampling as the entry point — McpTelegram with createSamplingHandler(runner.ask) routes incoming messages into the agent system
Hierarchical multi-agent architecture — a coordinator delegates to specialised sub-agents (reminders, shopping lists) based on user intent
Database-backed sessions — PostgreSQL persistence via createDatabaseSessionService
Background services — poll state and send scheduled reminders back through the Telegram runner
Shared state across agents — all agents read/write to the same session via context.state

MCP Sampling

On this page