Skip to content

Tinker Documentation

Tutorials

Tutorials

A progressive learning path from first API call to advanced training techniques. All tutorials are marimo notebooks — read here or run them interactively.

Prerequisites

Install the Tinker SDK, cookbook, and marimo:

uv pip install tinker tinker-cookbook marimo

Set your API key (get one from the Tinker Console):

export TINKER_API_KEY="your-api-key-here"

Run tutorials interactively

All tutorials are marimo notebooks in the tinker-cookbook repo. Clone the repo and run any tutorial:

git clone https://github.com/thinking-machines-lab/tinker-cookbook.git
cd tinker-cookbook
uv run marimo edit tutorials/101_hello_tinker.py

Or download a single file and run it directly:

curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/101_hello_tinker.py
uv run marimo edit 101_hello_tinker.py

Basics

Start here. Learn the core Tinker SDK operations.

ServiceClient, SamplingClient, and basic text generation

Renderers, Datum construction, forward-backward, optimizer step

Futures, concurrent requests, throughput optimization

GRPO algorithm, reward functions, GSM8K math training

Core Concepts

Deep dives into specific SDK features. Each tutorial is standalone.

How tokenization and chat templates work across model families

Cross-entropy, importance sampling, PPO, custom losses

TokenCompleter vs MessageCompleter for RL environments

Weights Management

Save, load, download, and publish model weights

Evaluate your fine-tuned models during and after training

Cookbook Abstractions

Learn the higher-level training patterns from tinker-cookbook.

Env & EnvGroupBuilder

Core RL types: Env, EnvGroupBuilder, RLDataset, ProblemEnv

Custom Environment

Implementing a custom ProblemEnv subclass with format compliance

SFT with Config

Using train.Config and dataset builders for supervised learning

Full GRPO implementation using cookbook abstractions

Advanced

Advanced techniques for experienced users.

SL Hyperparameters

Sweep learning rate and LoRA rank with the cookbook's sweep module

RL Hyperparameters

KL penalty, advantage estimation, reward shaping

DPO & Preferences

Direct preference optimization, RLHF pipeline

Sequence Extension

Multi-turn RL with conversation history

Self-play, competitive environments

Prompt Distillation

Distilling long system prompts into model weights

Full 3-stage RLHF: SFT → preference model → RL training

Deployment

Getting trained models into production.

Export to HuggingFace

Merge LoRA adapter into a full HuggingFace model

Build LoRA Adapter

Convert to PEFT format for efficient serving

Upload trained models to HuggingFace Hub