Models & Pricing
All prices are per million tokens.
All Types
All Architectures
All Sizes
| Model | Tinker ID | Type | Arch | Size | Context | Prefill | Sample | Train |
|---|
Pricing Terms
- Prefill: Processing input/prompt tokens (forward pass only)
- Sample: Generating output tokens (forward pass + sampling)
- Train: Forward and backward pass for gradient computation
- Context: Maximum sequence length. Models with
:peft:suffix support extended context at higher prices. - Tinker ID: The exact string to pass to
create_lora_training_client(base_model=...)orcreate_sampling_client(base_model=...)
MoE models are priced by active parameters, making them significantly more cost-effective than dense models of similar quality.
Choosing a Model
- Cost-effective: Use MoE models (highlighted in amber)
- Research/post-training: Use Base models
- Task-specific fine-tuning: Start with an Instruction or Hybrid model
- Low latency: Use Instruction models (no chain-of-thought)
- High intelligence: Use Reasoning or Hybrid models (chain-of-thought)
- Vision tasks: Use models with Vision in the type
Retired Models
These models have been retired and can no longer be used for training or inference, grouped by retirement date. See Model deprecations for the recommended replacement for each.
June 12, 2026
- Qwen:
Qwen3-235B-A22B-Instruct-2507,Qwen3-VL-235B-A22B-Instruct,Qwen3.5-35B-A3B,Qwen3.5-27B,Qwen3-32B,Qwen3-30B-A3B,Qwen3-30B-A3B-Instruct-2507,Qwen3-VL-30B-A3B-Instruct,Qwen3-30B-A3B-Base,Qwen3-8B-Base,Qwen3-4B-Instruct-2507 - Llama:
Llama-3.3-70B-Instruct,Llama-3.1-70B,Llama-3.1-8B,Llama-3.1-8B-Instruct,Llama-3.2-3B,Llama-3.2-1B - DeepSeek:
DeepSeek-V3.1-Base - Kimi:
Kimi-K2-Thinking