Recipes
Ready-to-run training recipes built on Tinker and Tinker Cookbook.
Getting started
Minimal launch scripts for building your own experiments:
rl_basic.py— template for reinforcement learningsl_basic.py— template for supervised learningrl_loop.py— minimal RL loop using the Tinker API directlysl_loop.py— minimal SL loop using the Tinker API directly
All recipes
| Recipe | Description |
|---|---|
| Chat SFT | Supervised fine-tuning on conversational datasets (NoRobots, Tulu3) |
| Math RL | RL for math reasoning on arithmetic, MATH, and GSM8K |
| Code RL | RL on competitive programming with sandboxed execution (DeepCoder) |
| Preference Learning | RLHF and DPO pipelines for aligning with human preferences |
| Search Tool RL | Tool-use RL for multi-hop QA with vector search (Search-R1) |
| Prompt Distillation | Internalize long prompts into model parameters |
| Multi-Agent RL | Multi-turn and self-play environments (Guess Number, 20 Questions, Tic-Tac-Toe) |
| Model Distillation | Off-policy SFT and on-policy KL distillation from teacher models |
| Rubric-Based Grading | LLM-as-judge rewards using structured rubrics |
| Verifiers RL | RL with environments from Prime Intellect's Environments Hub |
| VLM Image Classification | Fine-tune vision-language models as image classifiers |
| Harbor RL | RL on Harbor tasks (Terminal-Bench) with sandboxed bash agents |
| Agent RL | Train tool-using agents with MCP servers and LLM-judge grading (APEX) |
Logging and resuming
All recipes support these CLI arguments:
wandb_project— log to Weights & Biases (omit for local-only logging)log_path— custom output directory (default:/tmp/tinker-examples/<run_name>){log_path}/metrics.jsonl— training metrics{log_path}/checkpoints.jsonl— saved checkpoint records
To resume a run, pass the same log_path from a previous run.