Skip to content

Models & Pricing

All prices are per million tokens.

All Types
All Types
Base
Instruction
Reasoning
Hybrid
Vision
All Architectures
All Architectures
Dense
MoE
All Sizes
All Sizes
Compact
Small
Medium
Large
Model Tinker ID Type Arch Size Context Prefill Sample Train
Nemotron-3-Nano-30B-A3Bnvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16HybridMoEMedium64K$0.13$0.33$0.40
Nemotron-3-Super-120B-A12Bnvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16HybridMoELarge64K$0.38$0.96$1.16
Nemotron-3-Super-120B-A12B (256K)nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16:peft:262144HybridMoELarge256K$0.76$1.92$2.32
Qwen3.5-397B-A17BQwen/Qwen3.5-397B-A17BHybrid + VisionMoELarge64K$2.00$5.00$6.00
Qwen3.5-397B-A17B (256K)Qwen/Qwen3.5-397B-A17B:peft:262144Hybrid + VisionMoELarge256K$4.00$10.00$12.00
Qwen3.5-35B-A3BQwen/Qwen3.5-35B-A3BHybrid + VisionMoEMedium64K$0.36$0.89$1.07
Qwen3.5-27BQwen/Qwen3.5-27BHybrid + VisionDenseMedium64K$1.24$3.73$3.73
Qwen3.5-4BQwen/Qwen3.5-4BHybrid + VisionDenseCompact64K$0.22$0.67$0.67
Qwen3-VL-235B-A22B-InstructQwen/Qwen3-VL-235B-A22B-InstructVisionMoELarge32K$0.43$1.07$1.29
Qwen3-VL-30B-A3B-InstructQwen/Qwen3-VL-30B-A3B-InstructVisionMoEMedium32K$0.12$0.30$0.36
Qwen3-235B-A22B-Instruct-2507Qwen/Qwen3-235B-A22B-Instruct-2507InstructionMoELarge32K$0.43$1.07$1.29
Qwen3-30B-A3B-Instruct-2507Qwen/Qwen3-30B-A3B-Instruct-2507InstructionMoEMedium32K$0.12$0.30$0.36
Qwen3-30B-A3BQwen/Qwen3-30B-A3BHybridMoEMedium32K$0.12$0.30$0.36
Qwen3-30B-A3B-BaseQwen/Qwen3-30B-A3B-BaseBaseMoEMedium32K$0.12$0.30$0.36
Qwen3-32BQwen/Qwen3-32BHybridDenseMedium32K$0.44$1.33$1.33
Qwen3-8BQwen/Qwen3-8BHybridDenseSmall32K$0.13$0.40$0.40
Qwen3-8B-BaseQwen/Qwen3-8B-BaseBaseDenseSmall32K$0.13$0.40$0.40
Qwen3-4B-Instruct-2507Qwen/Qwen3-4B-Instruct-2507InstructionDenseCompact32K$0.07$0.22$0.22
GPT-OSS-120Bopenai/gpt-oss-120bReasoningMoEMedium32K$1.59$4.76$5.72
GPT-OSS-120B (128K)openai/gpt-oss-120b:peft:131072ReasoningMoEMedium128K$3.18$9.52$11.44
GPT-OSS-20Bopenai/gpt-oss-20bReasoningMoESmall32K$0.27$0.80$0.96
DeepSeek-V3.1deepseek-ai/DeepSeek-V3.1HybridMoELarge64K$0.68$1.71$2.06
DeepSeek-V3.1-Basedeepseek-ai/DeepSeek-V3.1-BaseBaseMoELarge64K$0.68$1.71$2.06
Llama-3.1-70Bmeta-llama/Llama-3.1-70BBaseDenseLarge32K$1.51$4.53$4.53
Llama-3.3-70B-Instructmeta-llama/Llama-3.3-70B-InstructInstructionDenseLarge32K$1.51$4.53$4.53
Llama-3.1-8Bmeta-llama/Llama-3.1-8BBaseDenseSmall32K$0.13$0.40$0.40
Llama-3.1-8B-Instructmeta-llama/Llama-3.1-8B-InstructInstructionDenseSmall32K$0.13$0.40$0.40
Llama-3.2-3Bmeta-llama/Llama-3.2-3BBaseDenseCompact32K$0.07$0.20$0.20
Llama-3.2-1Bmeta-llama/Llama-3.2-1BBaseDenseCompact32K$0.04$0.13$0.13
Kimi-K2-Thinkingmoonshotai/Kimi-K2-ThinkingReasoningMoELarge64K$0.68$1.71$2.06
Kimi-K2.5moonshotai/Kimi-K2.5Reasoning + VisionMoELarge32K$0.68$1.71$2.06
Kimi-K2.5 (128K)moonshotai/Kimi-K2.5:peft:131072Reasoning + VisionMoELarge128K$1.36$3.42$4.12

Pricing Terms

  • Prefill: Processing input/prompt tokens (forward pass only)
  • Sample: Generating output tokens (forward pass + sampling)
  • Train: Forward and backward pass for gradient computation
  • Context: Maximum sequence length. Models with :peft: suffix support extended context at higher prices.
  • Tinker ID: The exact string to pass to create_lora_training_client(base_model=...) or create_sampling_client(base_model=...)

MoE models are priced by active parameters, making them significantly more cost-effective than dense models of similar quality.

Choosing a Model

  • Cost-effective: Use MoE models (highlighted in amber)
  • Research/post-training: Use Base models
  • Task-specific fine-tuning: Start with an Instruction or Hybrid model
  • Low latency: Use Instruction models (no chain-of-thought)
  • High intelligence: Use Reasoning or Hybrid models (chain-of-thought)
  • Vision tasks: Use models with Vision in the type

For the latest pricing, see the Tinker Console.