Prompt Distillation
Internalize a long prompt into model parameters so the model behaves as if prompted, without needing the prompt at inference.
What you'll build
A language classifier that predicts two-character language codes (en, fr, zh, ja, etc.) without seeing the classification prompt. A teacher model generates labeled examples with the prompt, then a student model is fine-tuned to replicate that behavior without it.
Prerequisites
Key concepts
- Context distillation — fine-tuning a model to replicate prompted behavior without the prompt present at inference
- Teacher-student — the teacher generates training data using the full prompt; the student learns to match that output without the prompt
How it works
Two-stage process
Prompt distillation -- also known as context distillation -- works in two stages:
- Creating data for distillation: A teacher language model uses the target prompt \(p\) to generate responses \(r\) on a set of queries \(q\); i.e. \(r \sim \text{teacher}(\cdot|p, q)\)
- Training the student model: A student model is fine-tuned to predict the responses \(r\) to the query \(q\) but without accessing \(p\), hence learning to behave as if the target prompt is in its context; i.e. \(\text{student}(\cdot | q)\) should predict \(r\)
After training, the model responds correctly without ever seeing the classification prompt at inference time. In the example recipe, the same model (Qwen/Qwen3-30B-A3B) is used as both teacher and student, though in general they need not be identical.
Supported language labels
The language classification task supports the following labels:
ar (Arabic), de (German), el (Greek), en (English), es (Spanish),
fr (French), hi (Hindi), ru (Russian), tr (Turkish), ur (Urdu),
vi (Vietnamese), zh (Chinese - Simplified), ot (Other/Unknown)
The recipe in create_data.py also includes handling strategies for inputs containing code, numerical content, or multiple languages.
Run it
Step 1: Generate training data
mkdir -p /tmp/tinker-datasets
python -m tinker_cookbook.recipes.prompt_distillation.create_data \
output_file=/tmp/tinker-datasets/prompt_distillation_lang.jsonl
Step 2: Train the student model
Expected results
After training, the model responds with the correct language code given raw text input, without any classification prompt. For example: