Performance
Optimization, monitoring, and scalability considerations
Performance optimization in the ADK Runtime involves multiple layers, from efficient event processing to resource management and service optimization.
Event Processing Optimization
Minimizing Event Overhead
Events are the core communication mechanism, so optimizing their creation and processing is crucial:
// Efficient event creation
class OptimizedAgent extends BaseAgent {
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
// Batch multiple state changes in a single event
const actions = new EventActions({
stateDelta: {
"user_preference": "dark_mode",
"session_count": ctx.session.events.length + 1,
"last_activity": new Date().toISOString()
}
});
// Single event with multiple changes instead of multiple events
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: "Settings updated successfully" }] },
actions
});
}
}
Efficient Serialization
Keep event payloads reasonably sized for better performance:
// Avoid large content in events
class EfficientAgent extends BaseAgent {
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
const largeData = await this.processLargeDataset();
// Store large data as artifact, reference in event
if (ctx.artifactService) {
await ctx.artifactService.saveArtifact({
appName: ctx.session.appName,
userId: ctx.session.userId,
sessionId: ctx.session.id,
filename: "analysis_results.json",
artifact: {
inlineData: {
mimeType: "application/json",
data: JSON.stringify(largeData)
}
}
});
// Event contains reference, not the data itself
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: {
parts: [{
text: "Analysis complete. Results saved as analysis_results.json"
}]
}
});
}
}
}
Streaming for User Experience
Use partial events for better perceived performance:
class StreamingAgent extends BaseAgent {
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
const steps = [
"Analyzing request...",
"Gathering information...",
"Processing data...",
"Generating response..."
];
// Stream progress updates
for (const step of steps) {
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: step }] },
partial: true // Not persisted, just for streaming
});
await this.performStep(step);
}
// Final result
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: "Here's your complete analysis..." }] },
partial: false // Persisted to session
});
}
}
Resource Management
Connection Pooling
Efficient management of expensive resources like LLM connections:
class ConnectionPool {
private pools = new Map<string, any[]>();
private maxPoolSize = 10;
async getConnection(model: string): Promise<any> {
const pool = this.pools.get(model) || [];
if (pool.length > 0) {
return pool.pop();
}
// Create new connection if pool is empty
return await this.createNewConnection(model);
}
releaseConnection(model: string, connection: any): void {
const pool = this.pools.get(model) || [];
if (pool.length < this.maxPoolSize) {
pool.push(connection);
this.pools.set(model, pool);
} else {
// Pool is full, dispose connection
this.disposeConnection(connection);
}
}
}
// Use in LLM integration
const connectionPool = new ConnectionPool();
class OptimizedLlmAgent extends LlmAgent {
async callLlm(request: LlmRequest): Promise<LlmResponse> {
const connection = await connectionPool.getConnection(this.model);
try {
return await connection.call(request);
} finally {
connectionPool.releaseConnection(this.model, connection);
}
}
}
Memory Management
Proper cleanup and garbage collection optimization:
class MemoryEfficientAgent extends BaseAgent {
private cache = new Map<string, any>();
private maxCacheSize = 1000;
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
try {
// Use WeakMap for automatic cleanup
const tempData = new WeakMap();
// Process with memory awareness
const result = await this.processWithMemoryManagement(ctx, tempData);
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: result }] }
});
} finally {
// Explicit cleanup
this.cleanupCache();
}
}
private cleanupCache(): void {
if (this.cache.size > this.maxCacheSize) {
// Remove oldest entries (simple LRU)
const entries = Array.from(this.cache.entries());
const toRemove = entries.slice(0, Math.floor(this.maxCacheSize * 0.2));
for (const [key] of toRemove) {
this.cache.delete(key);
}
}
}
}
Service Optimization
Optimize service calls and reduce latency:
class OptimizedSessionService extends BaseSessionService {
private cache = new Map<string, Session>();
private cacheExpiry = new Map<string, number>();
private cacheTtl = 5 * 60 * 1000; // 5 minutes
async getSession(appName: string, userId: string, sessionId: string): Promise<Session | null> {
const cacheKey = `${appName}:${userId}:${sessionId}`;
const now = Date.now();
// Check cache first
const cached = this.cache.get(cacheKey);
const expiry = this.cacheExpiry.get(cacheKey);
if (cached && expiry && expiry > now) {
return cached;
}
// Load from storage
const session = await this.loadFromStorage(appName, userId, sessionId);
if (session) {
// Cache the result
this.cache.set(cacheKey, session);
this.cacheExpiry.set(cacheKey, now + this.cacheTtl);
}
return session;
}
async appendEvent(session: Session, event: Event): Promise<Event> {
// Batch events for better performance
await this.batchEventAppend(session, event);
// Invalidate cache
const cacheKey = `${session.appName}:${session.userId}:${session.id}`;
this.cache.delete(cacheKey);
this.cacheExpiry.delete(cacheKey);
return event;
}
}
Monitoring and Observability
Performance Metrics
Track key performance indicators for runtime health:
import { Registry, Histogram, Counter, Gauge } from 'prom-client';
class PerformanceMonitor {
private registry = new Registry();
private invocationDuration = new Histogram({
name: 'adk_invocation_duration_seconds',
help: 'Duration of invocations',
labelNames: ['agent_name', 'status'],
buckets: [0.1, 0.5, 1, 2, 5, 10, 30]
});
private eventCount = new Counter({
name: 'adk_events_total',
help: 'Total number of events generated',
labelNames: ['event_type', 'agent_name']
});
private activeInvocations = new Gauge({
name: 'adk_active_invocations',
help: 'Number of currently active invocations'
});
constructor() {
this.registry.registerMetric(this.invocationDuration);
this.registry.registerMetric(this.eventCount);
this.registry.registerMetric(this.activeInvocations);
}
startInvocation(agentName: string): () => void {
const startTime = Date.now();
this.activeInvocations.inc();
return () => {
const duration = (Date.now() - startTime) / 1000;
this.invocationDuration.observe({ agent_name: agentName, status: 'success' }, duration);
this.activeInvocations.dec();
};
}
recordEvent(eventType: string, agentName: string): void {
this.eventCount.inc({ event_type: eventType, agent_name: agentName });
}
}
// Integration with Runner
class MonitoredRunner extends Runner {
private monitor = new PerformanceMonitor();
async *runAsync(params: any): AsyncGenerator<Event> {
const endInvocation = this.monitor.startInvocation(this.agent.name);
try {
for await (const event of super.runAsync(params)) {
this.monitor.recordEvent(
event.content ? 'content' : 'action',
event.author
);
yield event;
}
} finally {
endInvocation();
}
}
}
Telemetry Integration
Built-in OpenTelemetry support for distributed tracing:
import { tracer } from '@iqai/adk';
class TracedAgent extends BaseAgent {
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
const span = tracer.startSpan(`agent.${this.name}.execute`, {
attributes: {
'agent.name': this.name,
'invocation.id': ctx.invocationId,
'user.id': ctx.session.userId
}
});
try {
span.addEvent('agent.started');
// Your agent logic with span events
span.addEvent('processing.started');
const result = await this.processRequest(ctx);
span.addEvent('processing.completed', {
'result.length': result.length
});
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: result }] }
});
span.setStatus({ code: SpanStatusCode.OK });
} catch (error) {
span.recordException(error as Error);
span.setStatus({
code: SpanStatusCode.ERROR,
message: error instanceof Error ? error.message : 'Unknown error'
});
throw error;
} finally {
span.end();
}
}
}
Error Tracking
Comprehensive error monitoring and alerting:
class ErrorTracker {
private errorCounts = new Map<string, number>();
private errorThreshold = 10;
private timeWindow = 5 * 60 * 1000; // 5 minutes
recordError(error: Error, context: { agentName: string; invocationId: string }): void {
const errorKey = `${context.agentName}:${error.constructor.name}`;
const currentCount = this.errorCounts.get(errorKey) || 0;
this.errorCounts.set(errorKey, currentCount + 1);
// Alert if threshold exceeded
if (currentCount + 1 >= this.errorThreshold) {
this.sendAlert(errorKey, currentCount + 1, error, context);
}
// Reset counts periodically
setTimeout(() => {
this.errorCounts.delete(errorKey);
}, this.timeWindow);
}
private sendAlert(errorKey: string, count: number, error: Error, context: any): void {
console.error(`Alert: High error rate for ${errorKey}: ${count} errors`, {
error: error.message,
stack: error.stack,
context
});
// Send to monitoring service
// this.monitoringService.sendAlert(...)
}
}
// Integration with agents
const errorTracker = new ErrorTracker();
class MonitoredAgent extends BaseAgent {
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
try {
// Your agent logic
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: "Success!" }] }
});
} catch (error) {
errorTracker.recordError(error as Error, {
agentName: this.name,
invocationId: ctx.invocationId
});
throw error;
}
}
}
Scalability Considerations
Horizontal Scaling
Design for stateless execution across multiple instances:
// Stateless agent design
class StatelessAgent extends BaseAgent {
// Avoid instance variables that hold state
// Use context and services for all state
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
// Get state from session, not instance variables
const userPreferences = ctx.session.state?.get('user_preferences') || {};
// Process stateless-ly
const result = await this.processStateless(ctx.userContent, userPreferences);
// Update state through context
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: result }] },
actions: new EventActions({
stateDelta: {
'last_processed': new Date().toISOString()
}
})
});
}
private async processStateless(input: any, preferences: any): Promise<string> {
// Pure function - no side effects
return `Processed: ${JSON.stringify(input)} with preferences: ${JSON.stringify(preferences)}`;
}
}
Load Balancing
Distribute work across multiple Runner instances:
// Load balancer for Runner instances
class LoadBalancedRunnerPool {
private runners: Runner[] = [];
private currentIndex = 0;
constructor(runnerConfigs: any[]) {
this.runners = runnerConfigs.map(config => new Runner(config));
}
async *runAsync(params: any): AsyncGenerator<Event> {
// Round-robin load balancing
const runner = this.runners[this.currentIndex % this.runners.length];
this.currentIndex++;
for await (const event of runner.runAsync(params)) {
yield event;
}
}
}
// Usage
const pool = new LoadBalancedRunnerPool([
{ appName: "app", agent: agent1, sessionService: service1 },
{ appName: "app", agent: agent2, sessionService: service2 },
{ appName: "app", agent: agent3, sessionService: service3 }
]);
Caching Strategies
Implement intelligent caching for better performance:
class CachingStrategy {
private cache = new Map<string, { data: any; expiry: number; hits: number }>();
private maxSize = 10000;
private defaultTtl = 5 * 60 * 1000; // 5 minutes
async get<T>(key: string, factory: () => Promise<T>, ttl?: number): Promise<T> {
const cached = this.cache.get(key);
const now = Date.now();
if (cached && cached.expiry > now) {
cached.hits++;
return cached.data;
}
// Generate new data
const data = await factory();
// Store in cache
this.cache.set(key, {
data,
expiry: now + (ttl || this.defaultTtl),
hits: 1
});
// Cleanup if needed
this.cleanup();
return data;
}
private cleanup(): void {
if (this.cache.size <= this.maxSize) return;
// Remove expired entries first
const now = Date.now();
for (const [key, value] of this.cache.entries()) {
if (value.expiry <= now) {
this.cache.delete(key);
}
}
// Remove least used entries if still over limit
if (this.cache.size > this.maxSize) {
const entries = Array.from(this.cache.entries())
.sort((a, b) => a[1].hits - b[1].hits)
.slice(0, Math.floor(this.maxSize * 0.2));
for (const [key] of entries) {
this.cache.delete(key);
}
}
}
}
// Usage in agents
const cache = new CachingStrategy();
class CachedAgent extends BaseAgent {
protected async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator<Event> {
const cacheKey = `agent_response:${this.name}:${JSON.stringify(ctx.userContent)}`;
const response = await cache.get(cacheKey, async () => {
return await this.expensiveOperation(ctx.userContent);
});
yield new Event({
invocationId: ctx.invocationId,
author: this.name,
content: { parts: [{ text: response }] }
});
}
}
Best Practices
Performance Guidelines
- Event Batching: Combine related state changes into single events
- Lazy Loading: Load services and resources only when needed
- Connection Reuse: Pool expensive connections (LLM, database)
- Caching: Cache expensive computations with appropriate TTL
- Monitoring: Track key metrics for performance insights
Resource Optimization
- Memory Management: Use WeakMap/WeakSet for automatic cleanup
- CPU Efficiency: Avoid blocking operations in event loops
- I/O Optimization: Batch database operations where possible
- Network Efficiency: Minimize external service calls
Monitoring Strategy
- Key Metrics: Track invocation duration, event counts, error rates
- Distributed Tracing: Use OpenTelemetry for request correlation
- Alerting: Set up alerts for performance degradation
- Profiling: Regular performance profiling in production
Performance Testing
Regular performance testing with realistic workloads is essential for maintaining optimal runtime performance. Consider load testing with concurrent invocations and monitoring resource utilization patterns.