Tutorial 303: SFT with Config

Prerequisites

Run it interactively [source]

curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/303_sft_with_config.py && marimo edit 303_sft_with_config.py

Configure and run a full SFT pipeline using train.Config, ChatDatasetBuilder, and evaluator builders -- zero custom loop code.

The cookbook's supervised training module provides a complete pipeline:

ChatDatasetBuilder -- loads and tokenizes chat data
train.Config -- bundles all hyperparameters
train.main(config) -- runs the pipelined training loop with checkpointing, evaluation, and logging

This is the recommended way to run SFT when you do not need a custom training loop.

Step 1 -- Define a ChatDatasetBuilder

A ChatDatasetBuilder converts raw data into tokenized Datum batches. We will create a simple instruction-following dataset inline.

import chz
import datasets

from tinker_cookbook.supervised.common import datum_from_model_input_weights
from tinker_cookbook.supervised.data import SupervisedDatasetFromHFDataset
from tinker_cookbook.supervised.types import (
    ChatDatasetBuilder,
    ChatDatasetBuilderCommonConfig,
    SupervisedDataset,
)

# Create a simple instruction-following dataset
EXAMPLES = [
    {
        "messages": [
            {"role": "user", "content": "What is 2 + 3?"},
            {"role": "assistant", "content": "2 + 3 = 5"},
        ]
    },
    {
        "messages": [
            {"role": "user", "content": "Translate 'hello' to French."},
            {"role": "assistant", "content": "Bonjour"},
        ]
    },
    {
        "messages": [
            {"role": "user", "content": "What color is the sky?"},
            {"role": "assistant", "content": "The sky is blue."},
        ]
    },
] * 10  # Repeat for a small dataset

@chz.chz
class SimpleDatasetBuilder(ChatDatasetBuilder):
    """Builds a toy instruction-following dataset."""

    def __call__(self) -> tuple[SupervisedDataset, SupervisedDataset | None]:
        hf_dataset = datasets.Dataset.from_list(EXAMPLES)
        renderer = self.renderer

        def example_to_data(example):
            model_input, weights = renderer.build_supervised_example(example["messages"])
            return [
                datum_from_model_input_weights(
                    model_input, weights, max_length=self.common_config.max_length
                )
            ]

        train_ds = SupervisedDatasetFromHFDataset(
            hf_dataset, batch_size=self.common_config.batch_size, flatmap_fn=example_to_data
        )

Step 2 -- Build the Config

train.Config bundles the model name, dataset builder, learning rate, evaluation settings, and checkpoint paths. The train.main function handles the entire loop.

from tinker_cookbook.supervised import train

MODEL_NAME = "Qwen/Qwen3.5-4B"
LOG_PATH = "/tmp/tinker-tutorials/sft-config"

dataset_builder = SimpleDatasetBuilder(
    common_config=ChatDatasetBuilderCommonConfig(
        model_name_for_tokenizer=MODEL_NAME,
        renderer_name="qwen3_5_disable_thinking",
        max_length=512,
        batch_size=4,
    ),
)

config = train.Config(
    log_path=LOG_PATH,
    model_name=MODEL_NAME,
    recipe_name="tutorial_sft",
    dataset_builder=dataset_builder,
    learning_rate=1e-4,
    lr_schedule="linear",
    num_epochs=1,
    lora_rank=32,
    save_every=5,
    eval_every=5,
    max_steps=10,  # Short run for the tutorial
)

print(f"Model:         {config.model_name}")
print(f"Learning rate: {config.learning_rate}")
print(f"LR schedule:   {config.lr_schedule}")
print(f"LoRA rank:     {config.lora_rank}")
print(f"Log path:      {config.log_path}")

Output

Model:         Qwen/Qwen3.5-4B
Learning rate: 0.0001
LR schedule:   linear
LoRA rank:     32
Log path:      /tmp/tinker-tutorials/sft-config

Step 3 -- Run training

A single call to train.main(config) runs the full pipeline: dataset construction, client setup, pipelined forward-backward passes, optimizer steps, checkpointing, and evaluation.

api_key = mo.ui.text(kind="password", label="Paste your Tinker API key")
api_key  # noqa: B018

import os

mo.stop(
    "TINKER_API_KEY" not in os.environ and not api_key.value,
    "Paste your API key above",
)

if api_key.value:
    os.environ["TINKER_API_KEY"] = api_key.value

# Run the full SFT pipeline
await train.main(config)

Step 4 -- Inspect outputs

After training, checkpoints and metrics are saved under log_path. The final checkpoint can be loaded for sampling or further training.

from pathlib import Path

log_dir = Path(LOG_PATH)
if log_dir.exists():
    for f in sorted(log_dir.iterdir()):
        print(f"  {f.name}")
else:
    print("(Log directory not found -- training may not have run)")

Output

  checkpoints.jsonl
  code.diff
  config.json
  logs.log
  metrics.jsonl
  timing_spans.jsonl

Summary

The train.Config + train.main() pattern gives you a production-ready SFT pipeline with:

Pipelined GPU requests for throughput
LR scheduling (linear, cosine, constant)
Periodic checkpointing with TTL
Pluggable evaluator builders
Resume from checkpoint

For custom training logic, drop down to the manual loop shown in tutorial 102.