TypeScriptADK-TS

Model Fallback Plugin

Automatically retry and fall back to alternative models on rate limits

When your primary model hits a rate limit (HTTP 429), the Model Fallback Plugin automatically retries it, then falls back to alternative models if retries are exhausted. Your agent stays running even when one provider is overloaded.

Quick start

import { AgentBuilder } from "@iqai/adk";

const { runner } = await AgentBuilder.create("resilient_agent")
  .withModel("gpt-4o")
  .withFallbackModels("gpt-4o-mini", "gemini-2.0-flash")
  .build();

// If gpt-4o hits rate limits:
// 1. Retries gpt-4o up to 3 times (1s delay each)
// 2. Falls back to gpt-4o-mini (same retry logic)
// 3. Falls back to gemini-2.0-flash as last resort

How it works

Request → Primary Model → 429 Error

                    Retry same model (up to maxRetries)
                    with retryDelayMs delay between attempts

                    Still failing? → Try fallback[0]

                    Still failing? → Try fallback[1]

                    All exhausted → Error propagates

Each fallback model gets the full retry cycle before moving to the next. After a request completes, the next request starts fresh with the primary model (no "sticky" fallback).

Usage

The simplest approach — withFallbackModels() creates and registers the plugin for you:

import { AgentBuilder } from "@iqai/adk";

const { runner } = await AgentBuilder.create("resilient_agent")
  .withModel("gpt-4o")
  .withFallbackModels("gpt-4o-mini", "gemini-2.0-flash")
  .build();
import { LlmAgent, ModelFallbackPlugin } from "@iqai/adk";

const agent = new LlmAgent({
  name: "resilient_agent",
  description: "Agent with automatic model fallback",
  model: "gpt-4o",
  plugins: [new ModelFallbackPlugin(["gpt-4o-mini", "gemini-2.0-flash"])],
});

For custom retry counts or delays:

import { AgentBuilder, ModelFallbackPlugin } from "@iqai/adk";

const { runner } = await AgentBuilder.create("custom_fallback_agent")
  .withModel("gpt-4o")
  .withPlugins(
    new ModelFallbackPlugin(
      ["gpt-4o-mini", "gemini-2.0-flash"], // fallback models
      5,    // maxRetries per model (default: 3)
      2000, // retryDelayMs (default: 1000)
    ),
  )
  .build();

Configuration

OptionTypeDefaultDescription
fallbackModelsstring[]RequiredOrdered list of fallback model names
maxRetriesnumber3Retry attempts per model before falling back
retryDelayMsnumber1000Delay (ms) between retry attempts

Order fallback models by priority — most capable first, cheapest last. Mix providers (e.g., OpenAI primary, Google fallback) to avoid correlated rate limits.

What gets detected

The plugin uses RateLimitError.isRateLimitError() to catch rate limits across all providers:

  • Status code: HTTP 429
  • Error types: rate_limit_error, RateLimitError
  • Message patterns: "rate limit", "too many requests", "resource exhausted", "quota exceeded"

Only rate limit errors trigger fallback. Network errors, auth errors, and invalid requests propagate normally — this is by design.

Using RateLimitError directly

import { RateLimitError } from "@iqai/adk";

try {
  const response = await runner.ask("Hello!");
} catch (error) {
  if (RateLimitError.isRateLimitError(error)) {
    console.log("Rate limited, please wait...");
  }
}

When to use it

  • Rate-limit-prone APIs — models with strict per-minute or per-day quotas
  • High-traffic production — many users sharing the same API key
  • Multi-provider strategy — automatic cross-provider failover
  • Cost optimization — fall back from expensive to cheaper models under load

Troubleshooting

IssueFix
Fallback not triggeringVerify it's a 429 error. Other errors (auth, network) aren't handled.
All models exhaustedIncrease maxRetries or add more fallback models.
Non-429 errors not caughtBy design — only rate limits trigger fallback.

Good to know

  • Per-request state — each request starts fresh with the primary model
  • No sticky fallback — the plugin doesn't "remember" rate-limited models
  • Non-rate-limit errors — propagate normally, not handled by this plugin

Next steps