Prompt Distillation

Internalize a long prompt into model parameters so the model behaves as if prompted, without needing the prompt at inference.

What you'll build

A language classifier that predicts two-character language codes (en, fr, zh, ja, etc.) without seeing the classification prompt. A teacher model generates labeled examples with the prompt, then a student model is fine-tuned to replicate that behavior without it.

Prerequisites

uv pip install tinker-cookbook

Key concepts

Context distillation — fine-tuning a model to replicate prompted behavior without the prompt present at inference
Teacher-student — the teacher generates training data using the full prompt; the student learns to match that output without the prompt

How it works

Two-stage process

Prompt distillation -- also known as context distillation -- works in two stages:

Creating data for distillation: A teacher language model uses the target prompt \(p\) to generate responses \(r\) on a set of queries \(q\); i.e. \(r \sim \text{teacher}(\cdot|p, q)\)
Training the student model: A student model is fine-tuned to predict the responses \(r\) to the query \(q\) but without accessing \(p\), hence learning to behave as if the target prompt is in its context; i.e. \(\text{student}(\cdot | q)\) should predict \(r\)

After training, the model responds correctly without ever seeing the classification prompt at inference time. In the example recipe, the same model (Qwen/Qwen3-30B-A3B) is used as both teacher and student, though in general they need not be identical.

Supported language labels

The language classification task supports the following labels:

ar (Arabic), de (German), el (Greek), en (English), es (Spanish),
fr (French), hi (Hindi), ru (Russian), tr (Turkish), ur (Urdu),
vi (Vietnamese), zh (Chinese - Simplified), ot (Other/Unknown)

The recipe in create_data.py also includes handling strategies for inputs containing code, numerical content, or multiple languages.

Run it

Step 1: Generate training data

mkdir -p /tmp/tinker-datasets
python -m tinker_cookbook.recipes.prompt_distillation.create_data \
    output_file=/tmp/tinker-datasets/prompt_distillation_lang.jsonl

Step 2: Train the student model

python -m tinker_cookbook.recipes.prompt_distillation.train

Expected results

After training, the model responds with the correct language code given raw text input, without any classification prompt. For example:

Query: 一生、バンドしてくれる？
Response: ja