agirouter
.pricing

Pricing that scales
from idea to production

BUILD

Get started with fast inference, reliability, and no daily rate limits

Free Llama Vision 11B + FLUX.1 [schnell]

$1 credit for all other models

Fully pay as you go, and easily add credits
No daily rate limits, up to 6000 requests and 2M tokens per minute for LLMs
Deploy on-demand dedicated endpoints (no rate limits)
Monitoring dashboard with 24-hr data
Email and in-app chat support

SCALE

Scale production traffic, with reserved GPUs, and advanced config

INCLUDES EVERYTHING IN BUILD PLUS

Up to 9,000 requests per minute and 5M tokens per minute for LLMs
Premium support
Support via private Slack channel
Monitoring dashboard with 30-day data (coming soon!)
Discounts on monthly reserved dedicated GPU
Advanced dedicated endpoint configuration
99% availability dedicated endpoints SLA
HIPAA compliance

ENTERPRISE

Private deployments and model optimization at scale

INCLUDES EVERYTHING IN SCALE PLUS

Custom rate limits and no token limits
VPC deployment
Enterprise grade security & compliance
Monitoring dashboard with 1 year data (coming soon!)
Continuous model optimization
Dedicated success representative
99.9% dedicated endpoints SLA with geo redundancy
Priority access to hardware including H100 & H200 GPUs
Custom regions

Inference pricing

Over 100 leading open-source Chat, Multimodal, Language, Image, Code, and Embedding models are available through the Agirouter Inference API. For these models you pay just for what you use.

Serverless Endpoints

Prices are per 1 million tokens including input and output tokens for Chat, Multimodal, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models.

LLAMA 3.2, LLAMA 3.1, LLAMA 3 MODELS

MODEL SIZE	TYPE	LITE	TURBO	REFERENCE
Up to 3B	Text	$0.06
8B	Text	$0.10	$0.18	$0.20
11B	Vision		$0.18
70B	Text	$0.54	$0.88	$0.90
90B	Vision		$1.20
405B	Text		$3.50

For vision models, images are converted to 1,601 to 6,404 tokens depending on image size.

Qwen Models

MODEL SIZE	PRICE 1M TOKENS
Qwen 2 72B	$0.90
Qwen 2.5 7B	$0.30
Qwen 2.5 72B	$1.20
Qwen 2.5 Coder 32B	$0.80
Qwen QwQ 32B Preview	$1.20

ALL OTHER CHat, language, code and moderation models

MODEL SIZE	PRICE 1M TOKENS
Up to 4B	$0.10
4.1B - 8B	$0.20
8.1B - 21B	$0.30
21.1B - 41B	$0.80
41.1B - 80B	$0.90
80.1B - 110B	$1.80

Mixture-of-experts

MODEL SIZE	PRICE 1M TOKENS
Up to 56B total parameters	$0.60
56.1B - 176B total parameters	$1.20
176.1B - 480B total parameters	$2.40

FLUX Image models

MODEL	PRICE PER MP	IMAGES per $1 (1MP)
FLUX.1 [dev]	$0.025	40
FLUX.1 [schnell]	$0.0027	370
FLUX1.1 [pro]	$0.04	25
FLUX.1 [pro]	$0.05	20
FLUX.1 Canny [dev]	$0.025	40
FLUX.1 Depth [dev]	$0.025	40
FLUX.1 Depth [dev]	$0.025	40

For all FLUX models except pro - prices are based on the default steps and will scale linearly with additional steps.

STABILITY IMAGE MODELS

IMAGE SIZE	25 STEPS	50 STEPS	75 STEPS	100 STEPS
512X512	$0.001	$0.002	$0.0035	$0.005
1024X1024	$0.01	$0.02	$0.035	$0.05

RERANK MODELS

MODEL SIZE	PRICE 1M TOKENS
8B	$0.10

Dedicated endpoints

When hosting your own model you pay per minute for the GPU endpoints, whether it is a model you fine-tuned using Agirouter Fine-tuning or any other model you choose to host. You can start or stop your endpoint any time through the web-based Playground.

YOUR FINED-TUNED MODEL

HARDWARE TYPE	PRICE PER MINUTE HOSTED
1x RTX-6000 48GB	$0.034
1x L40 48GB	$0.034
1x L40S 48GB	$0.048
1x A100 PCIe 80GB	$0.050
1x A100 SXM 40GB	$0.050
1x A100 SXM 80GB	$0.054
1x H100 80GB	$0.098

Interested in a dedicated endpoint for your own model?

Fine-tuning pricing

Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.

1. Download checkpoints and final model weights.
2. View job status and logs through CLI or Playgrounds.
3. Deploy a model instantly once it’s fine-tuned.

Try the interactive calculator

MODEL:

TRAINING DATASET (TOKENS)

VALIDATION DATASET (TOKENS)

EPOCHS (# OF ITERATIONS)

NUMBER OF EVALUATIONS

ESTIMATED COST$10010.00

Agirouter GPU Clusters Pricing

Agirouter Compute provides private, state of the art clusters with H100, H200, and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.

HARDWARE TYPE AVAILAIBLE	NETWORKING	PRICING
A100 PCIe 80GB	200 Gbps non-blocking Ethernet	Starting at $1.30/hr
A100 SXM 80GB	200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs available	Starting at $1.30/hr
H100 80GB	3.2 Tbps Infiniband	Starting at $1.75/hr
H200 141GB	3.2 Tbps Infiniband	Starting at $2.09/hr

agirouter.pricing

Pricing that scales from idea to production