agirouter
.pricing
Pricing that scales
from idea to production
BUILD
Get started with fast inference, reliability, and no daily rate limits
Free Llama Vision 11B + FLUX.1 [schnell]
$1 credit for all other models
- Fully pay as you go, and easily add credits
- No daily rate limits, up to 6000 requests and 2M tokens per minute for LLMs
- Deploy on-demand dedicated endpoints (no rate limits)
- Monitoring dashboard with 24-hr data
- Email and in-app chat support
SCALE
Scale production traffic, with reserved GPUs, and advanced config
INCLUDES EVERYTHING IN BUILD PLUS
- Up to 9,000 requests per minute and 5M tokens per minute for LLMs
- Premium support
- Support via private Slack channel
- Monitoring dashboard with 30-day data (coming soon!)
- Discounts on monthly reserved dedicated GPU
- Advanced dedicated endpoint configuration
- 99% availability dedicated endpoints SLA
- HIPAA compliance
ENTERPRISE
Private deployments and model optimization at scale
INCLUDES EVERYTHING IN SCALE PLUS
- Custom rate limits and no token limits
- VPC deployment
- Enterprise grade security & compliance
- Monitoring dashboard with 1 year data (coming soon!)
- Continuous model optimization
- Dedicated success representative
- 99.9% dedicated endpoints SLA with geo redundancy
- Priority access to hardware including H100 & H200 GPUs
- Custom regions
Inference pricing
Over 100 leading open-source Chat, Multimodal, Language, Image, Code, and Embedding models are available through the Agirouter Inference API. For these models you pay just for what you use.
Serverless Endpoints
Prices are per 1 million tokens including input and output tokens for Chat, Multimodal, Language and Code models, only including input tokens for Embedding models, and based on image size and steps for Image models.
LLAMA 3.2, LLAMA 3.1, LLAMA 3 MODELS
| MODEL SIZE | TYPE | LITE | TURBO | REFERENCE |
|---|---|---|---|---|
| Up to 3B | Text | $0.06 | ||
| 8B | Text | $0.10 | $0.18 | $0.20 |
| 11B | Vision | $0.18 | ||
| 70B | Text | $0.54 | $0.88 | $0.90 |
| 90B | Vision | $1.20 | ||
| 405B | Text | $3.50 |
For vision models, images are converted to 1,601 to 6,404 tokens depending on image size.
Qwen Models
| MODEL SIZE | PRICE 1M TOKENS |
|---|---|
| Qwen 2 72B | $0.90 |
| Qwen 2.5 7B | $0.30 |
| Qwen 2.5 72B | $1.20 |
| Qwen 2.5 Coder 32B | $0.80 |
| Qwen QwQ 32B Preview | $1.20 |
ALL OTHER CHat, language, code and moderation models
| MODEL SIZE | PRICE 1M TOKENS |
|---|---|
| Up to 4B | $0.10 |
| 4.1B - 8B | $0.20 |
| 8.1B - 21B | $0.30 |
| 21.1B - 41B | $0.80 |
| 41.1B - 80B | $0.90 |
| 80.1B - 110B | $1.80 |
Mixture-of-experts
| MODEL SIZE | PRICE 1M TOKENS |
|---|---|
| Up to 56B total parameters | $0.60 |
| 56.1B - 176B total parameters | $1.20 |
| 176.1B - 480B total parameters | $2.40 |
FLUX Image models
| MODEL | PRICE PER MP | IMAGES per $1 (1MP) |
|---|---|---|
| FLUX.1 [dev] | $0.025 | 40 |
| FLUX.1 [schnell] | $0.0027 | 370 |
| FLUX1.1 [pro] | $0.04 | 25 |
| FLUX.1 [pro] | $0.05 | 20 |
| FLUX.1 Canny [dev] | $0.025 | 40 |
| FLUX.1 Depth [dev] | $0.025 | 40 |
| FLUX.1 Depth [dev] | $0.025 | 40 |
For all FLUX models except pro - prices are based on the default steps and will scale linearly with additional steps.
STABILITY IMAGE MODELS
| IMAGE SIZE | 25 STEPS | 50 STEPS | 75 STEPS | 100 STEPS |
|---|---|---|---|---|
| 512X512 | $0.001 | $0.002 | $0.0035 | $0.005 |
| 1024X1024 | $0.01 | $0.02 | $0.035 | $0.05 |
RERANK MODELS
| MODEL SIZE | PRICE 1M TOKENS |
|---|---|
| 8B | $0.10 |
Dedicated endpoints
When hosting your own model you pay per minute for the GPU endpoints, whether it is a model you fine-tuned using Agirouter Fine-tuning or any other model you choose to host. You can start or stop your endpoint any time through the web-based Playground.
YOUR FINED-TUNED MODEL
| HARDWARE TYPE | PRICE PER MINUTE HOSTED |
|---|---|
| 1x RTX-6000 48GB | $0.034 |
| 1x L40 48GB | $0.034 |
| 1x L40S 48GB | $0.048 |
| 1x A100 PCIe 80GB | $0.050 |
| 1x A100 SXM 40GB | $0.050 |
| 1x A100 SXM 80GB | $0.054 |
| 1x H100 80GB | $0.098 |
Fine-tuning pricing
Pricing for fine-tuning is based on model size, dataset size, and the number of epochs.
- 1. Download checkpoints and final model weights.
- 2. View job status and logs through CLI or Playgrounds.
- 3. Deploy a model instantly once it’s fine-tuned.
Try the interactive calculator
Agirouter GPU Clusters Pricing
Agirouter Compute provides private, state of the art clusters with H100, H200, and A100 GPUs, connected over fast 200 Gbps non-blocking Ethernet or up to 3.2 Tbps InfiniBand networks.
| HARDWARE TYPE AVAILAIBLE | NETWORKING | PRICING |
|---|---|---|
| A100 PCIe 80GB | 200 Gbps non-blocking Ethernet | Starting at $1.30/hr |
| A100 SXM 80GB | 200 Gbps non-blocking Ethernet or 1.6 Tbps Infiniband configs available | Starting at $1.30/hr |
| H100 80GB | 3.2 Tbps Infiniband | Starting at $1.75/hr |
| H200 141GB | 3.2 Tbps Infiniband | Starting at $2.09/hr |