TypeScriptADK-TS

Production

Privacy controls, sampling, performance tuning, and troubleshooting for ADK-TS observability in production

Running telemetry in production requires three adjustments from a development setup: disable content capture to protect user data, reduce the sampling ratio to limit overhead, and increase the metric export interval to reduce network traffic.

Production Configuration

This is a complete production-ready initialization. Read the sections below for the reasoning behind each setting.

import { telemetryService } from "@iqai/adk";

await telemetryService.initialize({
  appName: process.env.OTEL_SERVICE_NAME || "my-agent-app",
  appVersion: process.env.APP_VERSION || "1.0.0",
  otlpEndpoint:
    process.env.OTEL_EXPORTER_OTLP_ENDPOINT ||
    "https://your-backend.com/v1/traces",
  otlpHeaders: {
    "api-key": process.env.OTEL_API_KEY || "",
  },
  environment: process.env.NODE_ENV || "production",
  enableTracing: true,
  enableMetrics: true,
  enableAutoInstrumentation: false,
  captureMessageContent: false,
  samplingRatio: 0.1,
  metricExportIntervalMs: 300000,
  resourceAttributes: {
    "deployment.name": process.env.DEPLOYMENT_NAME || "production",
    team: process.env.TEAM_NAME || "platform",
  },
});

Corresponding environment variables:

OTEL_SERVICE_NAME=my-agent-app
APP_VERSION=1.0.0
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend.com/v1/traces
OTEL_API_KEY=your-api-key
ADK_CAPTURE_MESSAGE_CONTENT=false
NODE_ENV=production

Privacy: Disable Content Capture

When captureMessageContent is enabled (the default), ADK-TS writes LLM prompts, completions, and tool arguments into span attributes. In production this almost certainly captures personally identifiable information.

export ADK_CAPTURE_MESSAGE_CONTENT=false
await telemetryService.initialize({
  appName: "my-agent-app",
  otlpEndpoint: "https://your-backend.com/v1/traces",
  captureMessageContent: false,
});

With content capture disabled, span attributes like gen_ai.tool.call.arguments, gen_ai.input.messages, and gen_ai.output.messages are not set. Metadata — model name, token counts, duration, status — is always captured regardless of this setting.

Default is capture-on

captureMessageContent defaults to true. You must explicitly disable it in production or all LLM prompts will appear in your trace backend.

Sampling Ratio

At 100% sampling (samplingRatio: 1.0), every agent run produces a trace. For high-traffic services this creates significant storage and ingestion costs. Reduce sampling in production — you still get representative data for debugging while cutting overhead substantially.

await telemetryService.initialize({
  appName: "my-agent-app",
  otlpEndpoint: "https://your-backend.com/v1/traces",
  samplingRatio: 0.1, // capture 10% of traces
});

Recommended ratios by environment:

EnvironmentRatioNotes
Development1.0Capture everything
Staging0.5Half-rate for representative coverage
Production0.10.2Balance visibility with cost

Metric Export Interval

Metrics are buffered and flushed on a timer. The default is 60 seconds; in production, 5–10 minutes is usually sufficient and reduces network overhead:

await telemetryService.initialize({
  appName: "my-agent-app",
  otlpEndpoint: "https://your-backend.com/v1/traces",
  metricExportIntervalMs: 300000, // 5 minutes
});

Initialize Before Agents

Spans emitted before initialize() completes are not captured. Build your telemetry setup before any agent or runner construction:

// ✅ Correct order
await telemetryService.initialize({
  /* config */
});
const { runner } = await AgentBuilder.create("my-agent")
  .withModel("gemini-2.5-flash")
  .build();

// ❌ Traces will be missing
const { runner } = await AgentBuilder.create("my-agent")
  .withModel("gemini-2.5-flash")
  .build();
await telemetryService.initialize({
  /* config */
});

Graceful Shutdown

shutdown() flushes all buffered spans and metrics before the process exits. Without it, any data written after the last export interval is silently dropped.

process.on("SIGTERM", async () => {
  await telemetryService.shutdown(5000);
  process.exit(0);
});

process.on("SIGINT", async () => {
  await telemetryService.shutdown(5000);
  process.exit(0);
});

The timeout (5000 ms) covers the round-trip to your backend. Increase it if you're on a high-latency link or during graceful drain periods. Use flush() instead of shutdown() if you want to flush without tearing down the provider — for example, between test runs.

Secrets in Environment Variables

Never hardcode API keys:

// ✅ Use environment variables
otlpHeaders: {
  "api-key": process.env.OTEL_API_KEY,
},

// ❌ Never hardcode credentials
otlpHeaders: {
  "api-key": "sk-abc123...",
},

Structured Custom Attributes

Add business context to every span via resourceAttributes. These appear on all spans and metrics from this process, making it straightforward to filter by deployment or team in your backend:

resourceAttributes: {
  "deployment.name": "us-east-1-prod",
  "team": "platform",
  "customer.tier": "enterprise",
},

Troubleshooting

No Traces Appearing

Verify the endpoint is reachable:

curl -X POST https://your-backend.com/v1/traces \
  -H "Content-Type: application/json" \
  -d '{}'

Check that the OTLP receiver is running:

docker ps | grep jaeger   # or tempo, otel-collector, etc.
docker logs jaeger

Enable OpenTelemetry diagnostic logging to see what the SDK is doing:

import { diag, DiagConsoleLogger, DiagLogLevel } from "@opentelemetry/api";

diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);

await telemetryService.initialize({
  /* config */
});

High Overhead

Reduce sampling and content capture:

samplingRatio: 0.05,           // 5% sampling
captureMessageContent: false,  // no content in spans
enableAutoInstrumentation: false,
metricExportIntervalMs: 600000, // 10-minute metric interval

Metrics Not Appearing

ADK-TS derives the metrics endpoint from the traces endpoint by replacing /v1/traces with /v1/metrics. If your traces endpoint does not contain /v1/traces, metrics will not export correctly.

Also check that your backend supports OTLP metrics ingestion — Jaeger does not. Use Grafana/Tempo, Datadog, or a Prometheus-backed collector for metrics.

Metrics are exported on a timer, so wait for at least one full metricExportIntervalMs before checking the backend.

Content Not Appearing in Spans

Check that ADK_CAPTURE_MESSAGE_CONTENT is not set to false in the environment, and that captureMessageContent is not set to false in the configuration object:

echo $ADK_CAPTURE_MESSAGE_CONTENT
# Should print "true" or be empty

Sensitive Data in Traces

If you accidentally shipped content to your backend:

  1. Set captureMessageContent: false immediately and redeploy.
  2. Contact your observability platform to purge affected traces.
  3. Audit your data retention and access control policies.

Alerts to Configure

Set up these alerts in your observability backend to catch problems before users do:

ConditionSuggested threshold
Error rate> 5% over 5 minutes
p95 agent latency> 5 seconds
p95 LLM latency> 10 seconds
Token usage per minute> your budget cap

Data Retention Guidelines

DataRecommended retention
Traces7–30 days
Metrics30–90 days
LogsPer compliance requirements

Next Steps