Production
Privacy controls, sampling, performance tuning, and troubleshooting for ADK-TS observability in production
Running telemetry in production requires three adjustments from a development setup: disable content capture to protect user data, reduce the sampling ratio to limit overhead, and increase the metric export interval to reduce network traffic.
Production Configuration
This is a complete production-ready initialization. Read the sections below for the reasoning behind each setting.
import { telemetryService } from "@iqai/adk";
await telemetryService.initialize({
appName: process.env.OTEL_SERVICE_NAME || "my-agent-app",
appVersion: process.env.APP_VERSION || "1.0.0",
otlpEndpoint:
process.env.OTEL_EXPORTER_OTLP_ENDPOINT ||
"https://your-backend.com/v1/traces",
otlpHeaders: {
"api-key": process.env.OTEL_API_KEY || "",
},
environment: process.env.NODE_ENV || "production",
enableTracing: true,
enableMetrics: true,
enableAutoInstrumentation: false,
captureMessageContent: false,
samplingRatio: 0.1,
metricExportIntervalMs: 300000,
resourceAttributes: {
"deployment.name": process.env.DEPLOYMENT_NAME || "production",
team: process.env.TEAM_NAME || "platform",
},
});Corresponding environment variables:
OTEL_SERVICE_NAME=my-agent-app
APP_VERSION=1.0.0
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend.com/v1/traces
OTEL_API_KEY=your-api-key
ADK_CAPTURE_MESSAGE_CONTENT=false
NODE_ENV=productionPrivacy: Disable Content Capture
When captureMessageContent is enabled (the default), ADK-TS writes LLM prompts, completions, and tool arguments into span attributes. In production this almost certainly captures personally identifiable information.
export ADK_CAPTURE_MESSAGE_CONTENT=falseawait telemetryService.initialize({
appName: "my-agent-app",
otlpEndpoint: "https://your-backend.com/v1/traces",
captureMessageContent: false,
});With content capture disabled, span attributes like gen_ai.tool.call.arguments, gen_ai.input.messages, and gen_ai.output.messages are not set. Metadata — model name, token counts, duration, status — is always captured regardless of this setting.
Default is capture-on
captureMessageContent defaults to true. You must explicitly disable it in
production or all LLM prompts will appear in your trace backend.
Sampling Ratio
At 100% sampling (samplingRatio: 1.0), every agent run produces a trace. For high-traffic services this creates significant storage and ingestion costs. Reduce sampling in production — you still get representative data for debugging while cutting overhead substantially.
await telemetryService.initialize({
appName: "my-agent-app",
otlpEndpoint: "https://your-backend.com/v1/traces",
samplingRatio: 0.1, // capture 10% of traces
});Recommended ratios by environment:
| Environment | Ratio | Notes |
|---|---|---|
| Development | 1.0 | Capture everything |
| Staging | 0.5 | Half-rate for representative coverage |
| Production | 0.1–0.2 | Balance visibility with cost |
Metric Export Interval
Metrics are buffered and flushed on a timer. The default is 60 seconds; in production, 5–10 minutes is usually sufficient and reduces network overhead:
await telemetryService.initialize({
appName: "my-agent-app",
otlpEndpoint: "https://your-backend.com/v1/traces",
metricExportIntervalMs: 300000, // 5 minutes
});Initialize Before Agents
Spans emitted before initialize() completes are not captured. Build your telemetry setup before any agent or runner construction:
// ✅ Correct order
await telemetryService.initialize({
/* config */
});
const { runner } = await AgentBuilder.create("my-agent")
.withModel("gemini-2.5-flash")
.build();
// ❌ Traces will be missing
const { runner } = await AgentBuilder.create("my-agent")
.withModel("gemini-2.5-flash")
.build();
await telemetryService.initialize({
/* config */
});Graceful Shutdown
shutdown() flushes all buffered spans and metrics before the process exits. Without it, any data written after the last export interval is silently dropped.
process.on("SIGTERM", async () => {
await telemetryService.shutdown(5000);
process.exit(0);
});
process.on("SIGINT", async () => {
await telemetryService.shutdown(5000);
process.exit(0);
});The timeout (5000 ms) covers the round-trip to your backend. Increase it if you're on a high-latency link or during graceful drain periods. Use flush() instead of shutdown() if you want to flush without tearing down the provider — for example, between test runs.
Secrets in Environment Variables
Never hardcode API keys:
// ✅ Use environment variables
otlpHeaders: {
"api-key": process.env.OTEL_API_KEY,
},
// ❌ Never hardcode credentials
otlpHeaders: {
"api-key": "sk-abc123...",
},Structured Custom Attributes
Add business context to every span via resourceAttributes. These appear on all spans and metrics from this process, making it straightforward to filter by deployment or team in your backend:
resourceAttributes: {
"deployment.name": "us-east-1-prod",
"team": "platform",
"customer.tier": "enterprise",
},Troubleshooting
No Traces Appearing
Verify the endpoint is reachable:
curl -X POST https://your-backend.com/v1/traces \
-H "Content-Type: application/json" \
-d '{}'Check that the OTLP receiver is running:
docker ps | grep jaeger # or tempo, otel-collector, etc.
docker logs jaegerEnable OpenTelemetry diagnostic logging to see what the SDK is doing:
import { diag, DiagConsoleLogger, DiagLogLevel } from "@opentelemetry/api";
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);
await telemetryService.initialize({
/* config */
});High Overhead
Reduce sampling and content capture:
samplingRatio: 0.05, // 5% sampling
captureMessageContent: false, // no content in spans
enableAutoInstrumentation: false,
metricExportIntervalMs: 600000, // 10-minute metric intervalMetrics Not Appearing
ADK-TS derives the metrics endpoint from the traces endpoint by replacing /v1/traces with /v1/metrics. If your traces endpoint does not contain /v1/traces, metrics will not export correctly.
Also check that your backend supports OTLP metrics ingestion — Jaeger does not. Use Grafana/Tempo, Datadog, or a Prometheus-backed collector for metrics.
Metrics are exported on a timer, so wait for at least one full metricExportIntervalMs before checking the backend.
Content Not Appearing in Spans
Check that ADK_CAPTURE_MESSAGE_CONTENT is not set to false in the environment, and that captureMessageContent is not set to false in the configuration object:
echo $ADK_CAPTURE_MESSAGE_CONTENT
# Should print "true" or be emptySensitive Data in Traces
If you accidentally shipped content to your backend:
- Set
captureMessageContent: falseimmediately and redeploy. - Contact your observability platform to purge affected traces.
- Audit your data retention and access control policies.
Alerts to Configure
Set up these alerts in your observability backend to catch problems before users do:
| Condition | Suggested threshold |
|---|---|
| Error rate | > 5% over 5 minutes |
| p95 agent latency | > 5 seconds |
| p95 LLM latency | > 10 seconds |
| Token usage per minute | > your budget cap |
Data Retention Guidelines
| Data | Recommended retention |
|---|---|
| Traces | 7–30 days |
| Metrics | 30–90 days |
| Logs | Per compliance requirements |