agirouter.

agirouter
.pricing


Pricing that scales
from idea to production

BUILD


Get started with fast inference, reliability, and no daily rate limits


Free Llama Vision 11B + FLUX.1 [schnell]

$1 credit for all other models

  • Fully pay as you go, and easily add credits
  • No daily rate limits, up to 6000 requests and 2M tokens per minute for LLMs
  • Deploy on-demand dedicated endpoints (no rate limits)
  • Monitoring dashboard with 24-hr data
  • Email and in-app chat support

SCALE


Scale production traffic, with reserved GPUs, and advanced config


INCLUDES EVERYTHING IN BUILD PLUS

  • Up to 9,000 requests per minute and 5M tokens per minute for LLMs
  • Premium support
  • Support via private Slack channel
  • Monitoring dashboard with 30-day data (coming soon!)
  • Discounts on monthly reserved dedicated GPU
  • Advanced dedicated endpoint configuration
  • 99% availability dedicated endpoints SLA
  • HIPAA compliance

ENTERPRISE


Private deployments and model optimization at scale


INCLUDES EVERYTHING IN SCALE PLUS

  • Custom rate limits and no token limits
  • VPC deployment
  • Enterprise grade security & compliance
  • Monitoring dashboard with 1 year data (coming soon!)
  • Continuous model optimization
  • Dedicated success representative
  • 99.9% dedicated endpoints SLA with geo redundancy
  • Priority access to hardware including H100 & H200 GPUs
  • Custom regions

Inference pricing

Over 100 leading open-source Chat, Multimodal, Language, Image, Code, and Embedding models are available through the Agirouter Inference API. For these models you pay just for what you use.

Serverless Endpoints

Prices are per 1 million tokens including input and output tokens for Chat, Multimodal, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models.

LLAMA 3.2, LLAMA 3.1, LLAMA 3 MODELS

MODEL SIZETYPELITETURBOREFERENCE
Up to 3BText$0.06
8BText$0.10$0.18$0.20
11BVision$0.18
70BText$0.54$0.88$0.90
90BVision$1.20
405BText$3.50

For vision models, images are converted to 1,601 to 6,404 tokens depending on image size.

Qwen Models

MODEL SIZEPRICE 1M TOKENS
Qwen 2 72B$0.90
Qwen 2.5 7B$0.30
Qwen 2.5 72B$1.20
Qwen 2.5 Coder 32B$0.80
Qwen QwQ 32B Preview$1.20

ALL OTHER CHat, language, code and moderation models

MODEL SIZEPRICE 1M TOKENS
Up to 4B$0.10
4.1B - 8B$0.20
8.1B - 21B$0.30
21.1B - 41B$0.80
41.1B - 80B$0.90
80.1B - 110B$1.80

Mixture-of-experts

MODEL SIZEPRICE 1M TOKENS
Up to 56B total parameters$0.60
56.1B - 176B total parameters$1.20
176.1B - 480B total parameters$2.40

FLUX Image models

MODELPRICE PER MPIMAGES per $1 (1MP)
FLUX.1 [dev]$0.02540
FLUX.1 [schnell]$0.0027370
FLUX1.1 [pro]$0.0425
FLUX.1 [pro]$0.0520
FLUX.1 Canny [dev]$0.02540
FLUX.1 Depth [dev]$0.02540
FLUX.1 Depth [dev]$0.02540

For all FLUX models except pro - prices are based on the default steps and will scale linearly with additional steps.

STABILITY IMAGE MODELS

IMAGE SIZE25 STEPS50 STEPS75 STEPS100 STEPS
512X512$0.001$0.002$0.0035$0.005
1024X1024$0.01$0.02$0.035$0.05

RERANK MODELS

MODEL SIZEPRICE 1M TOKENS
8B$0.10

Dedicated endpoints

When hosting your own model you pay per minute for the GPU endpoints, whether it is a model you fine-tuned using Agirouter Fine-tuning or any other model you choose to host. You can start or stop your endpoint any time through the web-based Playground.

YOUR FINED-TUNED MODEL

HARDWARE TYPEPRICE PER MINUTE HOSTED
1x RTX-6000 48GB$0.034
1x L40 48GB$0.034
1x L40S 48GB$0.048
1x A100 PCIe 80GB$0.050
1x A100 SXM 40GB$0.050
1x A100 SXM 80GB$0.054
1x H100 80GB$0.098

Interested in a dedicated endpoint for your own model?


Fine-tuning pricing

Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.

Try the interactive calculator

1
0
ESTIMATED COST$10010.00

Agirouter GPU Clusters Pricing

Agirouter Compute provides private, state of the art clusters with H100, H200, and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.

HARDWARE TYPE AVAILAIBLENETWORKINGPRICING
A100 PCIe 80GB200 Gbps non-blocking EthernetStarting at $1.30/hr
A100 SXM 80GB200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs availableStarting at $1.30/hr
H100 80GB3.2 Tbps InfinibandStarting at $1.75/hr
H200 141GB3.2 Tbps InfinibandStarting at $2.09/hr