Models & Pricing

All prices are per million tokens.

All Types

Base

Instruction

Reasoning

Hybrid

Vision

All Architectures

Dense

MoE

All Sizes

Compact

Small

Medium

Large

Model	Tinker ID	Type	Arch	Size	Context	Prefill	Sample	Train

Pricing Terms

Prefill: Processing input/prompt tokens (forward pass only)
Sample: Generating output tokens (forward pass + sampling)
Train: Forward and backward pass for gradient computation
Context: Maximum sequence length. Models with :peft: suffix support extended context at higher prices.
Tinker ID: The exact string to pass to create_lora_training_client(base_model=...) or create_sampling_client(base_model=...)

MoE models are priced by active parameters, making them significantly more cost-effective than dense models of similar quality.

Choosing a Model

Cost-effective: Use MoE models (highlighted in amber)
Research/post-training: Use Base models
Task-specific fine-tuning: Start with an Instruction or Hybrid model
Low latency: Use Instruction models (no chain-of-thought)
High intelligence: Use Reasoning or Hybrid models (chain-of-thought)
Vision tasks: Use models with Vision in the type

Retired Models

These models have been retired and can no longer be used for training or inference, grouped by retirement date. See Model deprecations for the recommended replacement for each.

June 12, 2026

Qwen: Qwen3-235B-A22B-Instruct-2507, Qwen3-VL-235B-A22B-Instruct, Qwen3.5-35B-A3B, Qwen3.5-27B, Qwen3-32B, Qwen3-30B-A3B, Qwen3-30B-A3B-Instruct-2507, Qwen3-VL-30B-A3B-Instruct, Qwen3-30B-A3B-Base, Qwen3-8B-Base, Qwen3-4B-Instruct-2507
Llama: Llama-3.3-70B-Instruct, Llama-3.1-70B, Llama-3.1-8B, Llama-3.1-8B-Instruct, Llama-3.2-3B, Llama-3.2-1B
DeepSeek: DeepSeek-V3.1-Base
Kimi: Kimi-K2-Thinking