Tutorials
A progressive learning path from first API call to advanced training techniques. All tutorials are marimo notebooks — read here or run them interactively.
Prerequisites
Install the Tinker SDK, cookbook, and marimo:
Set your API key (get one from the Tinker Console):
Run tutorials interactively
All tutorials are marimo notebooks in the tinker-cookbook repo. Clone the repo and run any tutorial:
git clone https://github.com/thinking-machines-lab/tinker-cookbook.git
cd tinker-cookbook
uv run marimo edit tutorials/101_hello_tinker.py
Or download a single file and run it directly:
curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/101_hello_tinker.py
uv run marimo edit 101_hello_tinker.py
Basics
Start here. Learn the core Tinker SDK operations.
Beginner
Hello Tinker
ServiceClient, SamplingClient, and basic text generation
Beginner
Your First SFT
Renderers, Datum construction, forward-backward, optimizer step
Beginner
Async Patterns
Futures, concurrent requests, throughput optimization
Beginner
First RL
GRPO algorithm, reward functions, GSM8K math training
Core Concepts
Deep dives into specific SDK features. Each tutorial is standalone.
Intermediate
Rendering
How tokenization and chat templates work across model families
Intermediate
Loss Functions
Cross-entropy, importance sampling, PPO, custom losses
Intermediate
Completers
TokenCompleter vs MessageCompleter for RL environments
Intermediate
Weights Management
Save, load, download, and publish model weights
Intermediate
Evaluations
Evaluate your fine-tuned models during and after training
Cookbook Abstractions
Learn the higher-level training patterns from tinker-cookbook.
Intermediate
Env & EnvGroupBuilder
Core RL types: Env, EnvGroupBuilder, RLDataset, ProblemEnv
Intermediate
Custom Environment
Implementing a custom ProblemEnv subclass with format compliance
Intermediate
SFT with Config
Using train.Config and dataset builders for supervised learning
Intermediate
RL with Config
Full GRPO implementation using cookbook abstractions
Advanced
Advanced techniques for experienced users.
Advanced
SL Hyperparameters
Sweep learning rate and LoRA rank with the cookbook's sweep module
Advanced
RL Hyperparameters
KL penalty, advantage estimation, reward shaping
Advanced
DPO & Preferences
Direct preference optimization, RLHF pipeline
Advanced
Sequence Extension
Multi-turn RL with conversation history
Advanced
Multi-Agent RL
Self-play, competitive environments
Advanced
Prompt Distillation
Distilling long system prompts into model weights
Advanced
RLHF Pipeline
Full 3-stage RLHF: SFT → preference model → RL training
Deployment
Getting trained models into production.