Recipes

Ready-to-run training recipes built on Tinker and Tinker Cookbook.

Getting started

Minimal launch scripts for building your own experiments:

Recipe	Description
Chat SFT	Supervised fine-tuning on conversational datasets (NoRobots, Tulu3)
Math RL	RL for math reasoning on arithmetic, MATH, and GSM8K
Code RL	RL on competitive programming with sandboxed execution (DeepCoder)
Preference Learning	RLHF and DPO pipelines for aligning with human preferences
Search Tool RL	Tool-use RL for multi-hop QA with vector search (Search-R1)
Prompt Distillation	Internalize long prompts into model parameters
Multi-Agent RL	Multi-turn and self-play environments (Guess Number, 20 Questions, Tic-Tac-Toe)
Model Distillation	Off-policy SFT and on-policy KL distillation from teacher models
Rubric-Based Grading	LLM-as-judge rewards using structured rubrics
Verifiers RL	RL with environments from Prime Intellect's Environments Hub
VLM Image Classification	Fine-tune vision-language models as image classifiers
Harbor RL	RL on Harbor tasks (Terminal-Bench) with sandboxed bash agents
Agent RL	Train tool-using agents with MCP servers and LLM-judge grading (APEX)

All recipes support these CLI arguments:

wandb_project — log to Weights & Biases (omit for local-only logging)
log_path — custom output directory (default: /tmp/tinker-examples/<run_name>)
{log_path}/metrics.jsonl — training metrics
{log_path}/checkpoints.jsonl — saved checkpoint records

To resume a run, pass the same log_path from a previous run.