MCP Sampling
Enable MCP servers to request LLM completions through your ADK-TS agent
MCP Sampling is a bidirectional communication mechanism in the Model Context Protocol. Normally your agent calls out to an MCP server to use its tools. With sampling, the MCP server can call back into your agent and request an LLM completion mid-execution.
This enables MCP servers that need AI reasoning as part of their tool execution — for example, a server that generates personalised content, a messaging bot that routes incoming messages through your agent, or a multi-turn orchestration workflow.
Quick Start
The fastest way to enable sampling is to pass a runner's ask method directly as the handler:
import { AgentBuilder, createSamplingHandler, McpToolset } from "@iqai/adk";
// 1. Create an agent that will handle sampling requests
const { runner } = await AgentBuilder.withModel("gemini-2.5-flash")
.withInstruction("You are a helpful assistant.")
.build();
// 2. Wrap its ask method as a sampling handler
const samplingHandler = createSamplingHandler(runner.ask);
// 3. Pass to an MCP toolset
const toolset = new McpToolset({
name: "My MCP Server",
description: "Server with sampling capabilities",
samplingHandler,
transport: {
mode: "stdio",
command: "node",
args: ["./my-mcp-server/dist/index.js"],
},
});
const tools = await toolset.getTools();Model selection with runner.ask
When you pass runner.ask as the handler, the runner uses its own configured model (e.g. "gemini-2.5-flash" above). The model preference sent by the MCP server is ignored. To honour it, write a custom handler that reads request.model.
createSamplingHandler is a type helper
createSamplingHandler(handler) is an identity function — it returns the function you pass in unchanged. Its only purpose is TypeScript type inference, ensuring your function matches the SamplingHandler signature. You can omit it and pass the function directly if you prefer.
How It Works
When an MCP server calls session.requestSampling(), ADK-TS handles the entire protocol conversion for you:
- Receives the raw MCP request — the
McpSamplingHandlervalidates it against the MCP schema. - Converts MCP messages to ADK-TS format —
{ role, content: { type, text } }becomesContent[]with{ role, parts: [{ text }] }. Roles map directly:"user"stays"user","assistant"becomes"model". - Creates an
LlmRequestwith the converted contents, model preference, temperature, andmaxTokens. - Calls your sampling handler with this
LlmRequest. - Converts your response back to MCP format and returns it to the MCP server.
Supported content types
text, image (inline base64), and audio (inline base64) are fully converted between MCP and ADK-TS formats. tool_use and tool_result content types are converted to text placeholders.
Understanding the LlmRequest
Your sampling handler receives an LlmRequest with these fields:
| Field | Type | Description |
|---|---|---|
model | string | Model the MCP server prefers, or "gemini-2.0-flash" by default |
contents | Content[] | Conversation messages converted from MCP format. Each entry has role ("user" or "model") and parts (array of { text }, { inlineData }, etc.) |
config.temperature | number | From the MCP request's temperature field |
config.maxOutputTokens | number | From the MCP request's maxTokens field |
System Prompt Placement
The system prompt is contents[0], not config.systemInstruction
If the MCP server sends a systemPrompt, McpSamplingHandler prepends it to the conversation as contents[0] (a "user" role message). It is not placed in config.systemInstruction. This means:
contents[0]— system prompt (if present)contents[1](orcontents[0]if no system prompt) — the first user message
Extracting Text
Use the static LlmRequest.extractTextFromContent() helper to pull text out of a Content object:
import { AgentBuilder, createSamplingHandler, LlmRequest } from "@iqai/adk";
const { runner } = await AgentBuilder.withModel("gemini-2.5-flash")
.withInstruction("You are a helpful assistant.")
.build();
const samplingHandler = createSamplingHandler(async request => {
// The last entry is the actual user message
const lastContent = request.contents[request.contents.length - 1];
const messageText = LlmRequest.extractTextFromContent(lastContent);
// If the MCP server provided a systemPrompt, it is prepended as contents[0]
const systemText =
request.contents.length > 1
? LlmRequest.extractTextFromContent(request.contents[0])
: "";
console.log("System prompt:", systemText);
console.log("Message:", messageText);
return runner.ask(messageText);
});Custom Sampling Handlers
Instead of forwarding straight to runner.ask, you can write handlers that inspect, transform, or route requests before calling any LLM.
For a complete runnable example, see the MCP integrations example.
Routing to Different Agents
A common pattern is to inspect the system prompt and route the request to a specialised agent:
import { AgentBuilder, createSamplingHandler, LlmRequest } from "@iqai/adk";
// Two specialised agents
const { runner: creativeRunner } = await AgentBuilder.withModel(
"gemini-2.5-flash",
)
.withInstruction("You are a warm, creative writer.")
.build();
const { runner: factRunner } = await AgentBuilder.withModel("gemini-2.5-flash")
.withInstruction("You are a concise, factual encyclopedia.")
.build();
const samplingHandler = createSamplingHandler(async request => {
const lastContent = request.contents[request.contents.length - 1];
const messageText = LlmRequest.extractTextFromContent(lastContent);
const systemText =
request.contents.length > 1
? LlmRequest.extractTextFromContent(request.contents[0])
: "";
// Route based on keywords in the system prompt
if (systemText.toLowerCase().includes("creative")) {
return creativeRunner.ask(messageText);
}
if (systemText.toLowerCase().includes("encyclopedia")) {
return factRunner.ask(messageText);
}
return creativeRunner.ask(messageText); // default
});Adding Context Before Forwarding
Enrich the prompt with extra information before forwarding to the LLM:
const samplingHandler = createSamplingHandler(async request => {
const lastContent = request.contents[request.contents.length - 1];
const messageText = LlmRequest.extractTextFromContent(lastContent);
const enrichedPrompt = `
Context: The current user is Alice, timezone UTC+5.
Active reminders: ${JSON.stringify(await getActiveReminders())}
MCP server request: ${messageText}
`.trim();
return runner.ask(enrichedPrompt);
});Returning String vs LlmResponse
Your handler can return either a plain string or a full LlmResponse. Returning a string is simpler and covers most cases — the framework wraps it in the correct MCP response format automatically.
import { createSamplingHandler, LlmResponse } from "@iqai/adk";
// Simple — return a string (recommended for most cases)
const simpleHandler = createSamplingHandler(async _request => {
return "Hello from the handler!";
});
// Advanced — return a full LlmResponse for explicit control
const advancedHandler = createSamplingHandler(async _request => {
return new LlmResponse({
content: {
role: "model",
parts: [{ text: "Hello from the handler!" }],
},
});
});Honouring the MCP Server's Model Preference
When you need to use whichever model the MCP server requested:
import { AgentBuilder, createSamplingHandler, LlmRequest } from "@iqai/adk";
const samplingHandler = createSamplingHandler(async request => {
const lastContent = request.contents[request.contents.length - 1];
const messageText = LlmRequest.extractTextFromContent(lastContent);
// request.model contains the server's preference (or the default "gemini-2.0-flash")
const { runner } = await AgentBuilder.withModel(request.model).build();
return runner.ask(messageText);
});Writing an MCP Server That Uses Sampling
On the server side, use session.requestSampling() to request an LLM completion from the connected ADK-TS client.
FastMCP session access
In FastMCP, the context.session passed to a tool's execute function is the auth object, not the FastMCPSession. To call requestSampling(), use server.sessions[0] instead (for stdio transport there is always exactly one session).
import { FastMCP } from "fastmcp";
import { z } from "zod";
const server = new FastMCP({
name: "my-server",
version: "1.0.0",
});
server.addTool({
name: "summarize_data",
description: "Summarizes data using an LLM via sampling",
parameters: z.object({
data: z.string().describe("The data to summarize"),
}),
execute: async ({ data }) => {
// Access the FastMCPSession (not context.session)
const session = server.sessions[0];
if (!session?.requestSampling) {
return "Sampling not available.";
}
const response = await session.requestSampling({
messages: [
{
role: "user",
content: {
type: "text",
text: `Please summarize this data concisely:\n\n${data}`,
},
},
],
systemPrompt: "You are a concise summarizer. Respond in 2-3 sentences.",
maxTokens: 200,
});
return response?.content?.type === "text"
? response.content.text
: "No response received.";
},
});
await server.start({ transportType: "stdio" });Sampling Request Options
| Parameter | Type | Required | Description |
|---|---|---|---|
messages | Array | Yes | Conversation messages with role and content |
maxTokens | number | Yes | Maximum tokens in the response |
systemPrompt | string | No | Prepended to the conversation on the ADK-TS client side as contents[0] |
temperature | number | No | Controls randomness (0–1) |
modelPreferences | object | No | Hint which model to use via hints[].name |
includeContext | string | No | Context inclusion directive ("thisServer", "allServers") |
Multi-Turn Conversations
You can send multi-turn conversations in a single sampling request:
const response = await session.requestSampling({
messages: [
{
role: "user",
content: { type: "text", text: "My name is Alice." },
},
{
role: "assistant",
content: { type: "text", text: "Hello Alice! How can I help?" },
},
{
role: "user",
content: { type: "text", text: "What's my name?" },
},
],
maxTokens: 50,
});Requesting a Specific Model
const response = await session.requestSampling({
messages: [
{
role: "user",
content: { type: "text", text: "Explain quantum computing." },
},
],
maxTokens: 500,
modelPreferences: {
hints: [{ name: "gemini-2.5-flash" }],
},
});Model preferences are hints, not guarantees
modelPreferences.hints is a suggestion. The sampling handler on the ADK-TS side decides which model is actually used. If no hint is provided, ADK-TS defaults to "gemini-2.0-flash".
Managing the Sampling Handler at Runtime
You can swap or remove the sampling handler after the toolset is created, without re-initialising the connection:
import {
AgentBuilder,
createSamplingHandler,
LlmRequest,
McpToolset,
} from "@iqai/adk";
const toolset = new McpToolset({
name: "Dynamic Server",
description: "Server with a swappable sampling handler",
transport: { mode: "stdio", command: "node", args: ["./server.js"] },
});
const tools = await toolset.getTools();
const { runner: basicRunner } =
await AgentBuilder.withModel("gemini-2.0-flash").build();
const { runner: advancedRunner } =
await AgentBuilder.withModel("gemini-2.5-flash").build();
// Set initial handler
toolset.setSamplingHandler(async request => {
const text = LlmRequest.extractTextFromContent(
request.contents[request.contents.length - 1],
);
return basicRunner.ask(text);
});
// Upgrade to a better model later
toolset.setSamplingHandler(async request => {
const text = LlmRequest.extractTextFromContent(
request.contents[request.contents.length - 1],
);
return advancedRunner.ask(text);
});
// Disable sampling entirely
toolset.removeSamplingHandler();Real-World Architecture Patterns
Coordinator Agent with Sub-Agents
For complex applications such as a Telegram bot, a common pattern is to use sampling as the entry point into a multi-agent system with persistent sessions:
The Telegram Personal Assistant example demonstrates this pattern in a production-ready application. It uses:
- Sampling as the entry point —
McpTelegramwithcreateSamplingHandler(runner.ask)routes incoming messages into the agent system - Hierarchical multi-agent architecture — a coordinator delegates to specialised sub-agents (reminders, shopping lists) based on user intent
- Database-backed sessions — PostgreSQL persistence via
createDatabaseSessionService - Background services — poll state and send scheduled reminders back through the Telegram runner
- Shared state across agents — all agents read/write to the same session via
context.state