Model Fallback Plugin
Automatically retry and fall back to alternative models on rate limits
When your primary model hits a rate limit (HTTP 429), the Model Fallback Plugin automatically retries it, then falls back to alternative models if retries are exhausted. Your agent stays running even when one provider is overloaded.
Quick start
import { AgentBuilder } from "@iqai/adk";
const { runner } = await AgentBuilder.create("resilient_agent")
.withModel("gpt-4o")
.withFallbackModels("gpt-4o-mini", "gemini-2.0-flash")
.build();
// If gpt-4o hits rate limits:
// 1. Retries gpt-4o up to 3 times (1s delay each)
// 2. Falls back to gpt-4o-mini (same retry logic)
// 3. Falls back to gemini-2.0-flash as last resortHow it works
Request → Primary Model → 429 Error
↓
Retry same model (up to maxRetries)
with retryDelayMs delay between attempts
↓
Still failing? → Try fallback[0]
↓
Still failing? → Try fallback[1]
↓
All exhausted → Error propagatesEach fallback model gets the full retry cycle before moving to the next. After a request completes, the next request starts fresh with the primary model (no "sticky" fallback).
Usage
The simplest approach — withFallbackModels() creates and registers the plugin for you:
import { AgentBuilder } from "@iqai/adk";
const { runner } = await AgentBuilder.create("resilient_agent")
.withModel("gpt-4o")
.withFallbackModels("gpt-4o-mini", "gemini-2.0-flash")
.build();import { LlmAgent, ModelFallbackPlugin } from "@iqai/adk";
const agent = new LlmAgent({
name: "resilient_agent",
description: "Agent with automatic model fallback",
model: "gpt-4o",
plugins: [new ModelFallbackPlugin(["gpt-4o-mini", "gemini-2.0-flash"])],
});For custom retry counts or delays:
import { AgentBuilder, ModelFallbackPlugin } from "@iqai/adk";
const { runner } = await AgentBuilder.create("custom_fallback_agent")
.withModel("gpt-4o")
.withPlugins(
new ModelFallbackPlugin(
["gpt-4o-mini", "gemini-2.0-flash"], // fallback models
5, // maxRetries per model (default: 3)
2000, // retryDelayMs (default: 1000)
),
)
.build();Configuration
| Option | Type | Default | Description |
|---|---|---|---|
fallbackModels | string[] | Required | Ordered list of fallback model names |
maxRetries | number | 3 | Retry attempts per model before falling back |
retryDelayMs | number | 1000 | Delay (ms) between retry attempts |
Order fallback models by priority — most capable first, cheapest last. Mix providers (e.g., OpenAI primary, Google fallback) to avoid correlated rate limits.
What gets detected
The plugin uses RateLimitError.isRateLimitError() to catch rate limits across all providers:
- Status code: HTTP 429
- Error types:
rate_limit_error,RateLimitError - Message patterns: "rate limit", "too many requests", "resource exhausted", "quota exceeded"
Only rate limit errors trigger fallback. Network errors, auth errors, and invalid requests propagate normally — this is by design.
Using RateLimitError directly
import { RateLimitError } from "@iqai/adk";
try {
const response = await runner.ask("Hello!");
} catch (error) {
if (RateLimitError.isRateLimitError(error)) {
console.log("Rate limited, please wait...");
}
}When to use it
- Rate-limit-prone APIs — models with strict per-minute or per-day quotas
- High-traffic production — many users sharing the same API key
- Multi-provider strategy — automatic cross-provider failover
- Cost optimization — fall back from expensive to cheaper models under load
Troubleshooting
| Issue | Fix |
|---|---|
| Fallback not triggering | Verify it's a 429 error. Other errors (auth, network) aren't handled. |
| All models exhausted | Increase maxRetries or add more fallback models. |
| Non-429 errors not caught | By design — only rate limits trigger fallback. |
Good to know
- Per-request state — each request starts fresh with the primary model
- No sticky fallback — the plugin doesn't "remember" rate-limited models
- Non-rate-limit errors — propagate normally, not handled by this plugin