Skip to content

Recipes

Ready-to-run training recipes built on Tinker and Tinker Cookbook.

Getting started

Minimal launch scripts for building your own experiments:

  • rl_basic.py — template for reinforcement learning
  • sl_basic.py — template for supervised learning
  • rl_loop.py — minimal RL loop using the Tinker API directly
  • sl_loop.py — minimal SL loop using the Tinker API directly

All recipes

Recipe Description
Chat SFT Supervised fine-tuning on conversational datasets (NoRobots, Tulu3)
Math RL RL for math reasoning on arithmetic, MATH, and GSM8K
Code RL RL on competitive programming with sandboxed execution (DeepCoder)
Preference Learning RLHF and DPO pipelines for aligning with human preferences
Search Tool RL Tool-use RL for multi-hop QA with vector search (Search-R1)
Prompt Distillation Internalize long prompts into model parameters
Multi-Agent RL Multi-turn and self-play environments (Guess Number, 20 Questions, Tic-Tac-Toe)
Model Distillation Off-policy SFT and on-policy KL distillation from teacher models
Rubric-Based Grading LLM-as-judge rewards using structured rubrics
Verifiers RL RL with environments from Prime Intellect's Environments Hub
VLM Image Classification Fine-tune vision-language models as image classifiers
Harbor RL RL on Harbor tasks (Terminal-Bench) with sandboxed bash agents
Agent RL Train tool-using agents with MCP servers and LLM-judge grading (APEX)

Logging and resuming

All recipes support these CLI arguments:

  • wandb_project — log to Weights & Biases (omit for local-only logging)
  • log_path — custom output directory (default: /tmp/tinker-examples/<run_name>)
  • {log_path}/metrics.jsonl — training metrics
  • {log_path}/checkpoints.jsonl — saved checkpoint records

To resume a run, pass the same log_path from a previous run.