Available Models in Tinker
The table below shows the models that are currently available in Tinker. We plan to update this list as new models are released.
What model should I use?
- In general, use MoE models, which are more cost effective than the dense models.
- Use 🐙 Base models only if you're doing research or are running the full post-training pipeline yourself
- If you want to create a model that is good at a specific task or domain, use an existing post-trained model model, and fine-tune it on your own data or environment.
- If you care about latency, use one of the "⚡ Instruction" models, which will start outputting tokens without a chain-of-thought.
- If you care about intelligence and robustness, use one of the "🤔 Hybrid" models, which can use long chain-of-thought.
Full Listing
Model Name | Training Type | Architecture | Size |
---|---|---|---|
Qwen/Qwen3-235B-A22B-Instruct-2507 (opens in a new tab) | ⚡ Instruction | 🔀 MoE | 🦖 Large |
Qwen/Qwen3-30B-A3B-Instruct-2507 (opens in a new tab) | ⚡ Instruction | 🔀 MoE | 🦅 Medium |
Qwen/Qwen3-30B-A3B (opens in a new tab) | 🤔 Hybrid | 🔀 MoE | 🦅 Medium |
Qwen/Qwen3-30B-A3B-Base (opens in a new tab) | 🐙 Base | 🔀 MoE | 🦅 Medium |
Qwen/Qwen3-32B (opens in a new tab) | 🤔 Hybrid | 🧱 Dense | 🦅 Medium |
Qwen/Qwen3-8B (opens in a new tab) | 🤔 Hybrid | 🧱 Dense | 🦆 Small |
Qwen/Qwen3-8B-Base (opens in a new tab) | 🐙 Base | 🧱 Dense | 🦆 Small |
Qwen/Qwen3-4B-Instruct-2507 (opens in a new tab) | ⚡ Instruction | 🧱 Dense | 🐣 Compact |
meta-llama/Llama-3.1-70B (opens in a new tab) | 🐙 Base | 🧱 Dense | 🦖 Large |
meta-llama/Llama-3.3-70B-Instruct (opens in a new tab) | ⚡ Instruction | 🧱 Dense | 🦖 Large |
meta-llama/Llama-3.1-8B (opens in a new tab) | 🐙 Base | 🧱 Dense | 🦆 Small |
meta-llama/Llama-3.1-8B-Instruct (opens in a new tab) | ⚡ Instruction | 🧱 Dense | 🦆 Small |
meta-llama/Llama-3.2-3B (opens in a new tab) | 🐙 Base | 🧱 Dense | 🐣 Compact |
meta-llama/Llama-3.2-1B (opens in a new tab) | 🐙 Base | 🧱 Dense | 🐣 Compact |
Legend
Training Types
- 🐙 Base: Foundation models trained on raw text data, suitable for post-training research and custom fine-tuning
- ⚡ Instruction: Models fine-tuned for following instructions and chat, optimized for fast inference
- 🤔 Hybrid: Models that can operate in both thinking and non-thinking modes
Architecture
- 🧱 Dense: Standard transformer architecture with all parameters active
- 🔀 MoE: Mixture of Experts architecture with sparse activation
Model Sizes
- 🐣 Compact: 1B-4B parameters
- 🦆 Small: 8B parameters
- 🦅 Medium: 30B-32B parameters
- 🦖 Large: 70B+ parameters
Note that the MoE models are much more cost effective than the dense models. For example, the Qwen3-30B-A3B model has only 3B active parameters, so it'll cost around the same as a 3B dense model for training and inference.