Model Lineup

Available Models in Tinker

The table below shows the models that are currently available in Tinker. We plan to update this list as new models are released.

What model should I use?

  • In general, use MoE models, which are more cost effective than the dense models.
  • Use Base models only if you're doing research or are running the full post-training pipeline yourself
  • If you want to create a model that is good at a specific task or domain, use an existing post-trained model, and fine-tune it on your own data or environment.
    • If you care about latency, use one of the Instruction models, which will start outputting tokens without a chain-of-thought.
    • If you care about intelligence and robustness, use one of the Hybrid or Reasoning models, which can use long chain-of-thought.

Full Listing

Model NameTraining TypeArchitectureSize
Qwen/Qwen3.5-397B-A17BHybrid + VisionMoELarge
Qwen/Qwen3.5-35B-A3BHybrid + VisionMoEMedium
Qwen/Qwen3.5-27BHybrid + VisionDenseMedium
Qwen/Qwen3.5-4BHybrid + VisionDenseCompact
Qwen/Qwen3-VL-235B-A22B-InstructVisionMoELarge
Qwen/Qwen3-VL-30B-A3B-InstructVisionMoEMedium
Qwen/Qwen3-235B-A22B-Instruct-2507InstructionMoELarge
Qwen/Qwen3-30B-A3B-Instruct-2507InstructionMoEMedium
Qwen/Qwen3-30B-A3BHybridMoEMedium
Qwen/Qwen3-30B-A3B-BaseBaseMoEMedium
Qwen/Qwen3-32BHybridDenseMedium
Qwen/Qwen3-8BHybridDenseSmall
Qwen/Qwen3-8B-BaseBaseDenseSmall
Qwen/Qwen3-4B-Instruct-2507InstructionDenseCompact
openai/gpt-oss-120bReasoningMoEMedium
openai/gpt-oss-20bReasoningMoESmall
deepseek-ai/DeepSeek-V3.1HybridMoELarge
deepseek-ai/DeepSeek-V3.1-BaseBaseMoELarge
meta-llama/Llama-3.1-70BBaseDenseLarge
meta-llama/Llama-3.3-70B-InstructInstructionDenseLarge
meta-llama/Llama-3.1-8BBaseDenseSmall
meta-llama/Llama-3.1-8B-InstructInstructionDenseSmall
meta-llama/Llama-3.2-3BBaseDenseCompact
meta-llama/Llama-3.2-1BBaseDenseCompact
moonshotai/Kimi-K2-ThinkingReasoningMoELarge
moonshotai/Kimi-K2.5Reasoning + VisionMoELarge

Legend

Training Types

  • Base: Foundation models trained on raw text data, suitable for post-training research and custom fine-tuning.
  • Instruction: Models fine-tuned for following instructions and chat, optimized for fast inference.
  • Reasoning: Models that always use chain-of-thought reasoning before their "visible" output that responds to the prompt.
  • Hybrid: Models that can operate in both thinking and non-thinking modes, where the non-thinking mode requires using a special renderer or argument that disables chain-of-thought.
  • Vision: Vision-language models (VLMs) that can process images alongside text. See Vision Inputs for usage.

Architecture

  • Dense: Standard transformer architecture with all parameters active
  • MoE: Mixture of Experts architecture with sparse activation

Model Sizes

  • Compact: 1B-4B parameters
  • Small: 8B parameters
  • Medium: 30B-32B parameters
  • Large: 70B+ parameters

Note that the MoE models are much more cost effective than the dense models as their cost is proportional to the number of active parameters and not the total number of parameters.