Skip to content

Chat SFT

Supervised fine-tuning on conversational datasets to turn a base model into a chat assistant.

What you'll build

A chat-capable model fine-tuned on NoRobots or Tulu3 using LoRA. The training uses standard next-token prediction on assistant turns.

Prerequisites

uv pip install tinker-cookbook

Key concepts

  • Supervised fine-tuning (SFT) — train on human-written assistant responses using next-token prediction loss
  • LoRA — parameter-efficient fine-tuning that trains low-rank adapter weights instead of the full model

Run it

NoRobots dataset

python -m tinker_cookbook.recipes.chat_sl.train \
    model_name=Qwen/Qwen3-8B-Base \
    dataset=no_robots \
    learning_rate=5e-4 \
    batch_size=64 \
    lora_rank=64 \
    eval_every=20 \
    save_every=20 \
    wandb_project=cookbook_sl

Tulu3 dataset

python -m tinker_cookbook.recipes.chat_sl.train \
    model_name=Qwen/Qwen3-8B-Base \
    dataset=tulu3 \
    learning_rate=5e-4 \
    batch_size=128 \
    lora_rank=64 \
    eval_every=500 \
    save_every=500 \
    wandb_project=cookbook_sl

Expected results

Dataset Steps test/nll
NoRobots 140 1.788
Tulu3 1740 0.50

Tulu3 performance can be further improved by training longer with a higher lora_rank and lower batch_size.

Adding your own dataset

The base classes in tinker_cookbook/supervised/data.py support loading new data in the following ways:

  • SupervisedDatasetFromHFDataset — loads a dataset from HuggingFace Hub with a postprocessing function
  • StreamingSupervisedDatasetFromHFDataset — works similarly, but supports streaming for large datasets
  • FromConversationFileBuilder — supports data loading from a JSONL file

You can also pass a path to a JSONL file directly with dataset=path/to/file.jsonl.

Learn more