Model Lineup

Available Models in Tinker

The table below shows the models that are currently available in Tinker. We plan to update this list as new models are released.

What model should I use?

  • In general, use MoE models, which are more cost effective than the dense models.
  • Use 🐙 Base models only if you're doing research or are running the full post-training pipeline yourself
  • If you want to create a model that is good at a specific task or domain, use an existing post-trained model model, and fine-tune it on your own data or environment.
    • If you care about latency, use one of the "⚡ Instruction" models, which will start outputting tokens without a chain-of-thought.
    • If you care about intelligence and robustness, use one of the "🤔 Hybrid" models, which can use long chain-of-thought.

Full Listing

Model NameTraining TypeArchitectureSize
Qwen/Qwen3-235B-A22B-Instruct-2507 (opens in a new tab)⚡ Instruction🔀 MoE🦖 Large
Qwen/Qwen3-30B-A3B-Instruct-2507 (opens in a new tab)⚡ Instruction🔀 MoE🦅 Medium
Qwen/Qwen3-30B-A3B (opens in a new tab)🤔 Hybrid🔀 MoE🦅 Medium
Qwen/Qwen3-30B-A3B-Base (opens in a new tab)🐙 Base🔀 MoE🦅 Medium
Qwen/Qwen3-32B (opens in a new tab)🤔 Hybrid🧱 Dense🦅 Medium
Qwen/Qwen3-8B (opens in a new tab)🤔 Hybrid🧱 Dense🦆 Small
Qwen/Qwen3-8B-Base (opens in a new tab)🐙 Base🧱 Dense🦆 Small
Qwen/Qwen3-4B-Instruct-2507 (opens in a new tab)⚡ Instruction🧱 Dense🐣 Compact
meta-llama/Llama-3.1-70B (opens in a new tab)🐙 Base🧱 Dense🦖 Large
meta-llama/Llama-3.3-70B-Instruct (opens in a new tab)⚡ Instruction🧱 Dense🦖 Large
meta-llama/Llama-3.1-8B (opens in a new tab)🐙 Base🧱 Dense🦆 Small
meta-llama/Llama-3.1-8B-Instruct (opens in a new tab)⚡ Instruction🧱 Dense🦆 Small
meta-llama/Llama-3.2-3B (opens in a new tab)🐙 Base🧱 Dense🐣 Compact
meta-llama/Llama-3.2-1B (opens in a new tab)🐙 Base🧱 Dense🐣 Compact

Legend

Training Types

  • 🐙 Base: Foundation models trained on raw text data, suitable for post-training research and custom fine-tuning
  • Instruction: Models fine-tuned for following instructions and chat, optimized for fast inference
  • 🤔 Hybrid: Models that can operate in both thinking and non-thinking modes

Architecture

  • 🧱 Dense: Standard transformer architecture with all parameters active
  • 🔀 MoE: Mixture of Experts architecture with sparse activation

Model Sizes

  • 🐣 Compact: 1B-4B parameters
  • 🦆 Small: 8B parameters
  • 🦅 Medium: 30B-32B parameters
  • 🦖 Large: 70B+ parameters

Note that the MoE models are much more cost effective than the dense models. For example, the Qwen3-30B-A3B model has only 3B active parameters, so it'll cost around the same as a 3B dense model for training and inference.