Models & Pricing
All prices are per million tokens.
All Types
All Architectures
All Sizes
| Model | Tinker ID | Type | Arch | Size | Context | Prefill | Sample | Train |
|---|---|---|---|---|---|---|---|---|
| Nemotron-3-Nano-30B-A3B | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | Hybrid | MoE | Medium | 64K | $0.13 | $0.33 | $0.40 |
| Nemotron-3-Super-120B-A12B | nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | Hybrid | MoE | Large | 64K | $0.38 | $0.96 | $1.16 |
| Nemotron-3-Super-120B-A12B (256K) | nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16:peft:262144 | Hybrid | MoE | Large | 256K | $0.76 | $1.92 | $2.32 |
| Qwen3.5-397B-A17B | Qwen/Qwen3.5-397B-A17B | Hybrid + Vision | MoE | Large | 64K | $2.00 | $5.00 | $6.00 |
| Qwen3.5-397B-A17B (256K) | Qwen/Qwen3.5-397B-A17B:peft:262144 | Hybrid + Vision | MoE | Large | 256K | $4.00 | $10.00 | $12.00 |
| Qwen3.5-35B-A3B | Qwen/Qwen3.5-35B-A3B | Hybrid + Vision | MoE | Medium | 64K | $0.36 | $0.89 | $1.07 |
| Qwen3.5-27B | Qwen/Qwen3.5-27B | Hybrid + Vision | Dense | Medium | 64K | $1.24 | $3.73 | $3.73 |
| Qwen3.5-4B | Qwen/Qwen3.5-4B | Hybrid + Vision | Dense | Compact | 64K | $0.22 | $0.67 | $0.67 |
| Qwen3-VL-235B-A22B-Instruct | Qwen/Qwen3-VL-235B-A22B-Instruct | Vision | MoE | Large | 32K | $0.43 | $1.07 | $1.29 |
| Qwen3-VL-30B-A3B-Instruct | Qwen/Qwen3-VL-30B-A3B-Instruct | Vision | MoE | Medium | 32K | $0.12 | $0.30 | $0.36 |
| Qwen3-235B-A22B-Instruct-2507 | Qwen/Qwen3-235B-A22B-Instruct-2507 | Instruction | MoE | Large | 32K | $0.43 | $1.07 | $1.29 |
| Qwen3-30B-A3B-Instruct-2507 | Qwen/Qwen3-30B-A3B-Instruct-2507 | Instruction | MoE | Medium | 32K | $0.12 | $0.30 | $0.36 |
| Qwen3-30B-A3B | Qwen/Qwen3-30B-A3B | Hybrid | MoE | Medium | 32K | $0.12 | $0.30 | $0.36 |
| Qwen3-30B-A3B-Base | Qwen/Qwen3-30B-A3B-Base | Base | MoE | Medium | 32K | $0.12 | $0.30 | $0.36 |
| Qwen3-32B | Qwen/Qwen3-32B | Hybrid | Dense | Medium | 32K | $0.44 | $1.33 | $1.33 |
| Qwen3-8B | Qwen/Qwen3-8B | Hybrid | Dense | Small | 32K | $0.13 | $0.40 | $0.40 |
| Qwen3-8B-Base | Qwen/Qwen3-8B-Base | Base | Dense | Small | 32K | $0.13 | $0.40 | $0.40 |
| Qwen3-4B-Instruct-2507 | Qwen/Qwen3-4B-Instruct-2507 | Instruction | Dense | Compact | 32K | $0.07 | $0.22 | $0.22 |
| GPT-OSS-120B | openai/gpt-oss-120b | Reasoning | MoE | Medium | 32K | $1.59 | $4.76 | $5.72 |
| GPT-OSS-120B (128K) | openai/gpt-oss-120b:peft:131072 | Reasoning | MoE | Medium | 128K | $3.18 | $9.52 | $11.44 |
| GPT-OSS-20B | openai/gpt-oss-20b | Reasoning | MoE | Small | 32K | $0.27 | $0.80 | $0.96 |
| DeepSeek-V3.1 | deepseek-ai/DeepSeek-V3.1 | Hybrid | MoE | Large | 64K | $0.68 | $1.71 | $2.06 |
| DeepSeek-V3.1-Base | deepseek-ai/DeepSeek-V3.1-Base | Base | MoE | Large | 64K | $0.68 | $1.71 | $2.06 |
| Llama-3.1-70B | meta-llama/Llama-3.1-70B | Base | Dense | Large | 32K | $1.51 | $4.53 | $4.53 |
| Llama-3.3-70B-Instruct | meta-llama/Llama-3.3-70B-Instruct | Instruction | Dense | Large | 32K | $1.51 | $4.53 | $4.53 |
| Llama-3.1-8B | meta-llama/Llama-3.1-8B | Base | Dense | Small | 32K | $0.13 | $0.40 | $0.40 |
| Llama-3.1-8B-Instruct | meta-llama/Llama-3.1-8B-Instruct | Instruction | Dense | Small | 32K | $0.13 | $0.40 | $0.40 |
| Llama-3.2-3B | meta-llama/Llama-3.2-3B | Base | Dense | Compact | 32K | $0.07 | $0.20 | $0.20 |
| Llama-3.2-1B | meta-llama/Llama-3.2-1B | Base | Dense | Compact | 32K | $0.04 | $0.13 | $0.13 |
| Kimi-K2-Thinking | moonshotai/Kimi-K2-Thinking | Reasoning | MoE | Large | 64K | $0.68 | $1.71 | $2.06 |
| Kimi-K2.5 | moonshotai/Kimi-K2.5 | Reasoning + Vision | MoE | Large | 32K | $0.68 | $1.71 | $2.06 |
| Kimi-K2.5 (128K) | moonshotai/Kimi-K2.5:peft:131072 | Reasoning + Vision | MoE | Large | 128K | $1.36 | $3.42 | $4.12 |
Pricing Terms
- Prefill: Processing input/prompt tokens (forward pass only)
- Sample: Generating output tokens (forward pass + sampling)
- Train: Forward and backward pass for gradient computation
- Context: Maximum sequence length. Models with
:peft:suffix support extended context at higher prices. - Tinker ID: The exact string to pass to
create_lora_training_client(base_model=...)orcreate_sampling_client(base_model=...)
MoE models are priced by active parameters, making them significantly more cost-effective than dense models of similar quality.
Choosing a Model
- Cost-effective: Use MoE models (highlighted in amber)
- Research/post-training: Use Base models
- Task-specific fine-tuning: Start with an Instruction or Hybrid model
- Low latency: Use Instruction models (no chain-of-thought)
- High intelligence: Use Reasoning or Hybrid models (chain-of-thought)
- Vision tasks: Use models with Vision in the type
For the latest pricing, see the Tinker Console.