When your primary model hits a rate limit (HTTP 429), the Model Fallback Plugin automatically retries it, then falls back to alternative models if retries are exhausted. Your agent stays running even when one provider is overloaded.

Quick start

import { AgentBuilder } from "@iqai/adk";

const { runner } = await AgentBuilder.create("resilient_agent")
  .withModel("gpt-4o")
  .withFallbackModels("gpt-4o-mini", "gemini-2.0-flash")
  .build();

// If gpt-4o hits rate limits:
// 1. Retries gpt-4o up to 3 times (1s delay each)
// 2. Falls back to gpt-4o-mini (same retry logic)
// 3. Falls back to gemini-2.0-flash as last resort

How it works

Request → Primary Model → 429 Error
                              ↓
                    Retry same model (up to maxRetries)
                    with retryDelayMs delay between attempts
                              ↓
                    Still failing? → Try fallback[0]
                              ↓
                    Still failing? → Try fallback[1]
                              ↓
                    All exhausted → Error propagates

Each fallback model gets the full retry cycle before moving to the next. After a request completes, the next request starts fresh with the primary model (no "sticky" fallback).

Usage

The simplest approach — withFallbackModels() creates and registers the plugin for you:

import { AgentBuilder } from "@iqai/adk";

const { runner } = await AgentBuilder.create("resilient_agent")
  .withModel("gpt-4o")
  .withFallbackModels("gpt-4o-mini", "gemini-2.0-flash")
  .build();

import { LlmAgent, ModelFallbackPlugin } from "@iqai/adk";

const agent = new LlmAgent({
  name: "resilient_agent",
  description: "Agent with automatic model fallback",
  model: "gpt-4o",
  plugins: [new ModelFallbackPlugin(["gpt-4o-mini", "gemini-2.0-flash"])],
});

For custom retry counts or delays:

import { AgentBuilder, ModelFallbackPlugin } from "@iqai/adk";

const { runner } = await AgentBuilder.create("custom_fallback_agent")
  .withModel("gpt-4o")
  .withPlugins(
    new ModelFallbackPlugin(
      ["gpt-4o-mini", "gemini-2.0-flash"], // fallback models
      5,    // maxRetries per model (default: 3)
      2000, // retryDelayMs (default: 1000)
    ),
  )
  .build();

Configuration

Option	Type	Default	Description
`fallbackModels`	`string[]`	Required	Ordered list of fallback model names
`maxRetries`	`number`	`3`	Retry attempts per model before falling back
`retryDelayMs`	`number`	`1000`	Delay (ms) between retry attempts

Order fallback models by priority — most capable first, cheapest last. Mix providers (e.g., OpenAI primary, Google fallback) to avoid correlated rate limits.

What gets detected

The plugin uses RateLimitError.isRateLimitError() to catch rate limits across all providers:

Status code: HTTP 429
Error types: rate_limit_error, RateLimitError
Message patterns: "rate limit", "too many requests", "resource exhausted", "quota exceeded"

Only rate limit errors trigger fallback. Network errors, auth errors, and invalid requests propagate normally — this is by design.

Using RateLimitError directly

import { RateLimitError } from "@iqai/adk";

try {
  const response = await runner.ask("Hello!");
} catch (error) {
  if (RateLimitError.isRateLimitError(error)) {
    console.log("Rate limited, please wait...");
  }
}

When to use it

Rate-limit-prone APIs — models with strict per-minute or per-day quotas
High-traffic production — many users sharing the same API key
Multi-provider strategy — automatic cross-provider failover
Cost optimization — fall back from expensive to cheaper models under load

Troubleshooting

Issue	Fix
Fallback not triggering	Verify it's a 429 error. Other errors (auth, network) aren't handled.
All models exhausted	Increase `maxRetries` or add more fallback models.
Non-429 errors not caught	By design — only rate limits trigger fallback.

Good to know

Per-request state — each request starts fresh with the primary model
No sticky fallback — the plugin doesn't "remember" rate-limited models
Non-rate-limit errors — propagate normally, not handled by this plugin

Model Fallback Plugin

Quick start

How it works

Usage

Configuration

What gets detected

Using RateLimitError directly

When to use it

Troubleshooting

Good to know

Next steps

🔄 Reflect and Retry

📊 Langfuse

🔌 Plugins Overview

On this page