Production Deployment
Privacy controls, performance tuning, best practices, and troubleshooting for observability
This guide covers production-ready configuration for observability in ADK-TS, including privacy controls, performance tuning, best practices, and troubleshooting.
Production Configuration
Here's a complete production-ready configuration:
import { telemetryService } from "@iqai/adk";
await telemetryService.initialize({
// Required
appName: process.env.OTEL_SERVICE_NAME || "my-agent-app",
otlpEndpoint:
process.env.OTEL_EXPORTER_OTLP_ENDPOINT ||
"https://your-backend.com/v1/traces",
// Environment
environment: process.env.NODE_ENV || "production",
appVersion: process.env.APP_VERSION || "1.0.0",
// Authentication
otlpHeaders: {
"api-key": process.env.OTEL_API_KEY || "",
},
// Feature flags
enableTracing: true,
enableMetrics: true,
enableAutoInstrumentation: false,
// Privacy and performance
captureMessageContent: false,
samplingRatio: 0.1,
metricExportIntervalMs: 300000,
// Custom attributes
resourceAttributes: {
"deployment.name": process.env.DEPLOYMENT_NAME || "production",
team: process.env.TEAM_NAME || "platform",
},
});Environment Variables
# Service identification
export OTEL_SERVICE_NAME=my-agent-app
export APP_VERSION=1.0.0
# Privacy
export ADK_CAPTURE_MESSAGE_CONTENT=false
# Performance
export OTEL_SAMPLING_RATIO=0.1
export METRIC_EXPORT_INTERVAL_MS=300000
# OTLP endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend.com/v1/traces
export OTEL_API_KEY=your-api-key
# Environment
export NODE_ENV=productionConfiguration Reference
| Setting | Development | Production | Description |
|---|---|---|---|
captureMessageContent | true | false | Capture LLM prompts/completions and tool arguments |
samplingRatio | 1.0 | 0.1–0.2 | Percentage of traces to sample (0.0-1.0) |
metricExportIntervalMs | 60000 | 300000–600000 | Metric export interval in milliseconds |
enableAutoInstrumentation | false | false | HTTP/database auto-tracing (enable only if needed) |
Privacy in Production
Always set captureMessageContent: false in production to protect user data
and comply with privacy regulations.
Best Practices
Initialize Before Agent Operations
Always initialize telemetry before creating or running agents:
// ✅ Correct order
await telemetryService.initialize({
/* config */
});
const agent = AgentBuilder.withModel("gemini-2.5-flash").build();
// ❌ Wrong - traces may be missed
const agent = AgentBuilder.withModel("gemini-2.5-flash").build();
await telemetryService.initialize({
/* config */
});Disable Content Capture
Protect user privacy by disabling content capture in production:
export ADK_CAPTURE_MESSAGE_CONTENT=falseawait telemetryService.initialize({
captureMessageContent: false,
});When disabled, tool arguments, tool responses, LLM prompts, and completions are not recorded. Metadata like model name, token counts, and duration are still captured.
Use Environment Variables for Secrets
Never hardcode API keys or endpoints:
// ✅ Good - use environment variables
otlpHeaders: {
"api-key": process.env.OTEL_API_KEY,
},
// ❌ Bad - hardcoded secret
otlpHeaders: {
"api-key": "sk-abc123...",
},Implement Graceful Shutdown
Always shutdown telemetry to flush pending data:
process.on("SIGTERM", async () => {
await telemetryService.shutdown(5000);
process.exit(0);
});
process.on("SIGINT", async () => {
await telemetryService.shutdown(5000);
process.exit(0);
});The timeout (5000ms) ensures telemetry is flushed even if the backend is slow. Adjust based on your network conditions.
Optimize Sampling for High Traffic
Reduce sampling ratio to minimize overhead in high-traffic scenarios:
await telemetryService.initialize({
samplingRatio: 0.1, // Sample 10% of traces
});Recommended sampling ratios:
- Development:
1.0(100%) - Staging:
0.5(50%) - Production:
0.1–0.2(10-20%)
Adjust Metric Export Interval
Export metrics less frequently in production to reduce overhead:
await telemetryService.initialize({
metricExportIntervalMs: 300000, // Export every 5 minutes
});Recommended intervals:
- Development:
60000(1 minute) - Production:
300000–600000(5-10 minutes)
Use HTTPS for OTLP Endpoints
Always use HTTPS in production:
// ✅ Good
otlpEndpoint: "https://your-backend.com/v1/traces",
// ❌ Bad for production
otlpEndpoint: "http://your-backend.com/v1/traces",Follow OpenTelemetry Semantic Conventions
Use standard attribute names for consistency:
import { SEMCONV, ADK_ATTRS } from "@iqai/adk";
span.setAttribute(SEMCONV.GEN_AI_REQUEST_MODEL, "gpt-4");
span.setAttribute(ADK_ATTRS.SESSION_ID, sessionId);Add Business Context with Custom Attributes
Include custom attributes relevant to your domain:
resourceAttributes: {
"deployment.name": "production",
"team": "platform",
"region": "us-east-1",
"customer.tier": "enterprise",
},Record Exceptions in Catch Blocks
Always record exceptions with context:
try {
await riskyOperation();
} catch (error) {
telemetryService.recordException(error as Error, {
"error.context": "data_validation",
"error.severity": "high",
});
throw error;
}Create Focused Spans
Create separate spans for distinct operations:
// ✅ Good - focused spans
await telemetryService.withSpan("fetch_data", async () => {
return await fetchData();
});
await telemetryService.withSpan("process_data", async () => {
return await processData();
});
// ❌ Bad - one span for everything
await telemetryService.withSpan("do_everything", async () => {
await fetchData();
await processData();
await saveData();
});Set Up Alerts for Critical Metrics
Create alerts for error rates and latency thresholds:
- Error rate exceeds 5%
- 95th percentile latency exceeds 2 seconds
- Token usage exceeds budget
- Sampling rate drops below threshold
Monitor Data Ingestion Costs
Be aware of data ingestion costs for paid platforms. Use sampling and export intervals to control costs.
Configure Data Retention Policies
Set appropriate retention policies in your observability backend:
| Data Type | Recommended Retention |
|---|---|
| Traces | 7–30 days |
| Metrics | 30–90 days |
| Logs | Per compliance requirements |
Test Locally Before Deploying
Always test telemetry with Jaeger locally before deploying to production:
# Start Jaeger locally
docker run -d --name jaeger -p 4318:4318 -p 16686:16686 jaegertracing/all-in-one:latest
# Test your agent
await telemetryService.initialize({
otlpEndpoint: "http://localhost:4318/v1/traces",
});
# View traces at http://localhost:16686Troubleshooting
No Traces Appearing
Check endpoint URL:
# Verify endpoint is correct and reachable
curl -X POST https://your-backend.com/v1/tracesVerify backend is running:
# For Jaeger
docker ps | grep jaeger
# Check logs
docker logs jaegerEnable debug logging:
import { diag, DiagConsoleLogger, DiagLogLevel } from "@opentelemetry/api";
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);
await telemetryService.initialize({
/* config */
});Check network connectivity:
# Test connection to OTLP endpoint
nc -zv your-backend.com 4318High Overhead
If telemetry is causing performance issues:
Reduce sampling ratio:
samplingRatio: 0.05, // Sample only 5% of tracesDisable auto-instrumentation:
enableAutoInstrumentation: false,Increase export interval:
metricExportIntervalMs: 600000, // Export every 10 minutesDisable content capture:
captureMessageContent: false,Content Not Captured
If you need content for debugging but it's not appearing:
Check environment variable:
echo $ADK_CAPTURE_MESSAGE_CONTENT
# Should be 'true' or unset for content captureExplicitly enable in configuration:
captureMessageContent: true,Connection Issues
Verify API keys have proper permissions:
# Test API key with curl
curl -H "api-key: $OTEL_API_KEY" https://your-backend.com/v1/tracesCheck firewall rules allow outbound connections:
# Check if port is accessible
telnet your-backend.com 4318Review header format requirements:
Some backends require specific header formats. Check your platform's documentation:
// Datadog example
otlpHeaders: {
"DD-API-KEY": process.env.DD_API_KEY,
},
// Honeycomb example
otlpHeaders: {
"x-honeycomb-team": process.env.HONEYCOMB_API_KEY,
"x-honeycomb-dataset": "my-dataset",
},Traces Contain Sensitive Data
If you accidentally captured sensitive data:
-
Immediately disable content capture:
export ADK_CAPTURE_MESSAGE_CONTENT=false -
Contact your observability platform to purge affected traces
-
Review your data retention policies and ensure proper access controls
Metrics Not Appearing
Verify metrics endpoint:
ADK-TS automatically converts trace endpoint to metrics endpoint:
http://localhost:4318/v1/traces → http://localhost:4318/v1/metricsCheck backend supports metrics:
Jaeger only supports traces. Use Grafana/Tempo, Datadog, or Prometheus for metrics.
Verify export interval hasn't expired:
Metrics are exported at intervals. Wait for the configured metricExportIntervalMs before checking.