Testing Agents
Current approaches and future automated testing methods
Agent testing involves validating behavior across multiple dimensions from tool usage to response quality. Currently, @iqai/adk
supports manual testing approaches with automated frameworks coming soon.
Current Testing Approaches
Manual Testing with Examples
The most immediate way to test agents is through the provided examples and custom test scripts:
import { LlmAgent, Runner, InMemorySessionService } from '@iqai/adk';
async function testAgent() {
const agent = new LlmAgent({
name: "test_agent",
model: "gemini-2.5-flash",
description: "Agent for testing",
instruction: "You are a helpful assistant",
});
const sessionService = new InMemorySessionService();
const session = await sessionService.createSession("test_app", "test_user");
const runner = new Runner({
appName: "test_app",
agent,
sessionService,
});
// Test specific scenarios
await testScenario(runner, session.id, "What is 2 + 2?");
await testScenario(runner, session.id, "Help me plan a meeting");
}
async function testScenario(runner: Runner, sessionId: string, query: string) {
console.log(`Testing: ${query}`);
const events = [];
for await (const event of runner.runAsync({
userId: "test_user",
sessionId,
newMessage: { parts: [{ text: query }] },
})) {
events.push(event);
if (event.content?.parts && !event.partial) {
console.log(`Response: ${event.content.parts.map(p => p.text).join('')}`);
}
}
// Manual validation
validateResponse(events, query);
}
function validateResponse(events: Event[], query: string) {
// Custom validation logic
const finalEvents = events.filter(e => e.isFinalResponse());
console.log(`Generated ${finalEvents.length} final responses`);
// Check for specific criteria
if (query.includes("plan")) {
// Verify planning behavior
}
if (query.includes("calculate")) {
// Verify calculation accuracy
}
}
Session-Based Testing
Use session services to test multi-turn conversations:
async function testConversationFlow() {
const sessionService = new InMemorySessionService();
const session = await sessionService.createSession("test_app", "user_123");
const testTurns = [
"Hello, I need help with my project",
"It's a web development project using React",
"What are the best practices for state management?",
"Can you provide specific examples?"
];
for (const turn of testTurns) {
console.log(`\n--- Turn: ${turn} ---`);
await testScenario(runner, session.id, turn);
// Analyze session state
const updatedSession = await sessionService.getSession("test_app", "user_123", session.id);
console.log(`Session has ${updatedSession?.events?.length || 0} events`);
}
}
Tool Usage Validation
Test agent tool interactions by monitoring function calls:
function validateToolUsage(events: Event[]) {
const functionCalls = events
.flatMap(e => e.getFunctionCalls())
.map(fc => ({ name: fc.name, args: fc.args }));
console.log('Function calls made:', functionCalls);
// Validate expected tool usage
const expectedTools = ['search_tool', 'calculator_tool'];
const usedTools = functionCalls.map(fc => fc.name);
for (const tool of expectedTools) {
if (usedTools.includes(tool)) {
console.log(`✅ Expected tool '${tool}' was used`);
} else {
console.log(`❌ Expected tool '${tool}' was NOT used`);
}
}
}
Testing Strategies
Unit Testing Patterns
Test individual agent capabilities in isolation:
describe('Agent Capabilities', () => {
test('should handle basic calculations', async () => {
const response = await runSingleQuery(agent, "What is 15 * 7?");
expect(response).toContain('105');
});
test('should use search tool for information queries', async () => {
const events = await runAndCollectEvents(agent, "What is the weather in Paris?");
const toolCalls = events.flatMap(e => e.getFunctionCalls());
expect(toolCalls.some(fc => fc.name === 'search_tool')).toBe(true);
});
test('should maintain context across turns', async () => {
const session = await createTestSession();
await runQuery(session, "My name is Alice");
const response = await runQuery(session, "What is my name?");
expect(response.toLowerCase()).toContain('alice');
});
});
Integration Testing
Test complete workflows and agent interactions:
async function testWorkflowIntegration() {
// Test end-to-end workflow
const workflow = [
{ query: "I need to book a flight", expectedTools: ['search_flights'] },
{ query: "From New York to London on March 15", expectedTools: ['book_flight'] },
{ query: "Send confirmation to my email", expectedTools: ['send_email'] }
];
for (const step of workflow) {
const events = await runAndCollectEvents(agent, step.query);
validateToolUsage(events, step.expectedTools);
}
}
Performance Testing
Monitor response times and resource usage:
async function testPerformance() {
const queries = [
"Simple question",
"Complex multi-step query with calculations",
"Query requiring multiple tool calls"
];
for (const query of queries) {
const startTime = Date.now();
await runSingleQuery(agent, query);
const duration = Date.now() - startTime;
console.log(`Query: "${query}" took ${duration}ms`);
// Set performance expectations
if (duration > 10000) {
console.warn(`⚠️ Query took longer than expected: ${duration}ms`);
}
}
}
Coming Soon: Automated Evaluation
Future Framework
The comprehensive automated evaluation framework is under development and will provide structured testing capabilities.
Unit Testing with Test Files
Rapid evaluation during active development:
Planned Features:
- Individual JSON test files with expected behaviors
- Fast execution for quick feedback loops
- Integration with development workflows
- Automated regression detection
Test File Structure (Coming Soon):
{
"name": "basic_calculation_test",
"agent_config": {
"name": "calculator_agent",
"tools": ["calculator_tool"]
},
"test_cases": [
{
"user_query": "What is 42 + 17?",
"expected_tools": ["calculator_tool"],
"expected_response_contains": ["59"],
"max_response_time_ms": 5000
}
]
}
Integration Testing with EvalSets
Comprehensive evaluation of complex agent behaviors:
Planned Features:
- Multi-turn conversation scenarios
- Complex workflow validation
- Batch processing of evaluation scenarios
- Statistical analysis of results
EvalSet Structure (Coming Soon):
{
"evalset_name": "customer_support_scenarios",
"sessions": [
{
"initial_state": { "user_type": "premium" },
"conversation_turns": [
{
"user_message": "I have an issue with my account",
"expected_agent_behavior": {
"should_escalate": false,
"should_use_tools": ["account_lookup"],
"response_tone": "helpful"
}
}
]
}
]
}
Automated Evaluation Classes
AgentEvaluator (Coming Soon):
- Comprehensive agent performance assessment
- Configurable evaluation criteria
- Integration with CI/CD pipelines
- Performance trend analysis
TrajectoryEvaluator (Coming Soon):
- Tool usage pattern analysis
- Decision path optimization
- Efficiency measurement
- Strategy consistency validation
ResponseEvaluator (Coming Soon):
- Semantic similarity scoring
- Quality assessment metrics
- Reference comparison
- Automated scoring algorithms
Best Practices
Test Design
Representative Scenarios:
- Cover typical user journeys
- Include edge cases and error conditions
- Test various complexity levels
- Validate cross-functional scenarios
Clear Success Criteria:
- Define specific, measurable expectations
- Set appropriate performance thresholds
- Balance automation with human judgment
- Document evaluation rationale
Test Maintenance
Version Control:
- Track test scenarios and expected outcomes
- Maintain test data consistency
- Document evaluation methodology
- Regular review and updates
Continuous Improvement:
- Analyze test failures systematically
- Update tests based on production feedback
- Expand coverage as agents evolve
- Balance comprehensive testing with execution speed
Integration Guidelines
Development Workflow:
- Run quick tests during development
- Comprehensive tests before deployment
- Automated regression testing
- Performance monitoring in production
Team Collaboration:
- Share test scenarios across team members
- Document testing conventions
- Regular review of testing strategy
- Clear escalation procedures for test failures
Getting Started
Start with manual testing using the examples, then gradually build custom test scripts as your agent functionality grows. This provides immediate validation while preparing for the upcoming automated framework.