Tutorial: Rendering

Prerequisites

Your First SFT

Run it interactively

curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/201_rendering.py && uv run marimo edit 201_rendering.py

Rendering converts a list of messages into a token sequence that a model can consume. While similar to HuggingFace chat templates, Tinker's rendering system handles the full training lifecycle: supervised learning, reinforcement learning, and deployment.

The renderer sits between your high-level conversation data and the low-level tokens the model sees:

Messages (list of dicts)  -->  Renderer  -->  Token IDs (list of ints)

This tutorial covers the Renderer class and its key methods.

Setup

We need a tokenizer (to map between text and token IDs) and a renderer (to apply the model's chat format).

from tinker_cookbook import renderers, tokenizer_utils

tokenizer = tokenizer_utils.get_tokenizer("Qwen/Qwen3-30B-A3B")
renderer = renderers.get_renderer("qwen3", tokenizer)

Example conversation

We will use this conversation throughout the tutorial.

messages = [
    {"role": "system", "content": "Answer concisely; at most one sentence per response"},
    {"role": "user", "content": "What is the longest-lived rodent species?"},
    {"role": "assistant", "content": "The naked mole rat, which can live over 30 years."},
    {"role": "user", "content": "How do they live so long?"},
    {
        "role": "assistant",
        "content": "They evolved multiple protective mechanisms including special hyaluronic acid that prevents cancer, extremely stable proteins, and efficient DNA repair systems that work together to prevent aging.",
    },
]

`build_generation_prompt()` -- for sampling

Converts a conversation into a token prompt ready for the model to continue. This is used during RL rollouts and at deployment time.

Typically you pass all messages except the final assistant reply, so the model generates its own response.

# Remove the last assistant message so the model can generate one
prompt = renderer.build_generation_prompt(messages[:-1])
print("ModelInput:", prompt)
print()
print("Decoded tokens:")
print(tokenizer.decode(prompt.to_ints()))

Output

ModelInput: ModelInput(chunks=[EncodedTextChunk(tokens=[151644, 8948, 198], type='encoded_text'), EncodedTextChunk(tokens=[16141, 3529, 285, 974, 26, 518, 1429, 825, 11652, 817, 2033, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 872, 198], type='encoded_text'), EncodedTextChunk(tokens=[3838, 374, 279, 22032, 61854, 20589, 306, 9419, 30, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 77091, 198], type='encoded_text'), EncodedTextChunk(tokens=[785, 19020, 34651, 11244, 11, 892, 646, 3887, 916, 220, 18, 15, 1635, 13, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 872, 198], type='encoded_text'), EncodedTextChunk(tokens=[4340, 653, 807, 3887, 773, 1293, 30, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 77091, 198], type='encoded_text')])

Decoded tokens:
<|im_start|>system
Answer concisely; at most one sentence per response<|im_end|>
<|im_start|>user
What is the longest-lived rodent species?<|im_end|>
<|im_start|>assistant
The naked mole rat, which can live over 30 years.<|im_end|>
<|im_start|>user
How do they live so long?<|im_end|>
<|im_start|>assistant

`get_stop_sequences()` -- stop tokens

When sampling, we need to know when the model has finished its response. get_stop_sequences() returns the token IDs (or strings) that signal end-of-generation.

stop_sequences = renderer.get_stop_sequences()
print(f"Stop sequences: {stop_sequences}")

# For Qwen3, this is the <|im_end|> token
for tok in stop_sequences:
    if isinstance(tok, int):
        print(f"  Token {tok} decodes to: {repr(tokenizer.decode([tok]))}")

Output

Stop sequences: [151645]
  Token 151645 decodes to: '<|im_end|>'

`parse_response()` -- decoding tokens back to a message

After sampling, you get raw token IDs. parse_response() converts them back into a structured message dict.

# Simulate some sampled tokens (in practice these come from the model)
fake_tokens = [45, 7741, 34651, 31410, 614, 4911, 76665, 13, 151645]

parsed_message, parse_success = renderer.parse_response(fake_tokens)
print(f"Parsed message: {parsed_message}")
print(f"Parse success: {parse_success}")

Output

Parsed message: {'role': 'assistant', 'content': 'Naked mole rats have unique adaptations.'}
Parse success: True

Putting it together: sampling a response

Here is the full pattern for generating a message from a model. This requires a running Tinker service (and TINKER_API_KEY).

import tinker
from tinker.types import SamplingParams

service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(base_model="Qwen/Qwen3-30B-A3B")

prompt = renderer.build_generation_prompt(messages[:-1])
stop_sequences = renderer.get_stop_sequences()
sampling_params = SamplingParams(max_tokens=100, temperature=0.5, stop=stop_sequences)

output = sampling_client.sample(prompt, sampling_params=sampling_params, num_samples=1).result()
sampled_message, success = renderer.parse_response(output.sequences[0].tokens)
print(sampled_message)

`build_supervised_example()` -- for training

For supervised fine-tuning, we need to distinguish prompt tokens (context the model reads) from completion tokens (what the model should learn to produce). build_supervised_example() returns both the tokens and per-token loss weights.

Weight 0 = prompt (no loss computed)
Weight 1 = completion (model trains on these)

model_input, weights = renderer.build_supervised_example(messages)

# Show which tokens are prompt vs completion
token_ids = model_input.to_ints()
for i, (tok_id, w) in enumerate(zip(token_ids, weights.tolist())):
    label = "COMPLETION" if w > 0 else "prompt"
    print(f"  [{i:3d}] {label:10s}  {repr(tokenizer.decode([tok_id]))}")

Output

  [  0] prompt      '<|im_start|>'
  [  1] prompt      'system'
  [  2] prompt      '\n'
  [  3] prompt      'Answer'
  [  4] prompt      ' conc'
  [  5] prompt      'is'
  [  6] prompt      'ely'
  [  7] prompt      ';'
  [  8] prompt      ' at'
  [  9] prompt      ' most'
  [ 10] prompt      ' one'
  [ 11] prompt      ' sentence'
  [ 12] prompt      ' per'
  [ 13] prompt      ' response'
  [ 14] prompt      '<|im_end|>'
  [ 15] prompt      '\n'
  [ 16] prompt      '<|im_start|>'
  [ 17] prompt      'user'
  [ 18] prompt      '\n'
  [ 19] prompt      'What'
  [ 20] prompt      ' is'
  [ 21] prompt      ' the'
  [ 22] prompt      ' longest'
  [ 23] prompt      '-lived'
  [ 24] prompt      ' rod'
  [ 25] prompt      'ent'
  [ 26] prompt      ' species'
  [ 27] prompt      '?'
  [ 28] prompt      '<|im_end|>'
  [ 29] prompt      '\n'
  [ 30] prompt      '<|im_start|>'
  [ 31] prompt      'assistant'
  [ 32] prompt      '\n'
  [ 33] prompt      'The'
  [ 34] prompt      ' naked'
  [ 35] prompt      ' mole'
  [ 36] prompt      ' rat'
  [ 37] prompt      ','
  [ 38] prompt      ' which'
  [ 39] prompt      ' can'
  [ 40] prompt      ' live'
  [ 41] prompt      ' over'
  [ 42] prompt      ' '
  [ 43] prompt      '3'
  [ 44] prompt      '0'
  [ 45] prompt      ' years'
  [ 46] prompt      '.'
  [ 47] prompt      '<|im_end|>'
  [ 48] prompt      '\n'
  [ 49] prompt      '<|im_start|>'
  [ 50] prompt      'user'
  [ 51] prompt      '\n'
  [ 52] prompt      'How'
  [ 53] prompt      ' do'
  [ 54] prompt      ' they'
  [ 55] prompt      ' live'
  [ 56] prompt      ' so'
  [ 57] prompt      ' long'
  [ 58] prompt      '?'
  [ 59] prompt      '<|im_end|>'
  [ 60] prompt      '\n'
  [ 61] prompt      '<|im_start|>'
  [ 62] prompt      'assistant'
  [ 63] prompt      '\n'
  [ 64] COMPLETION  'They'
  [ 65] COMPLETION  ' evolved'
  [ 66] COMPLETION  ' multiple'
  [ 67] COMPLETION  ' protective'
  [ 68] COMPLETION  ' mechanisms'
  [ 69] COMPLETION  ' including'
  [ 70] COMPLETION  ' special'
  [ 71] COMPLETION  ' hy'
  [ 72] COMPLETION  'al'
  [ 73] COMPLETION  'ur'
  [ 74] COMPLETION  'onic'
  [ 75] COMPLETION  ' acid'
  [ 76] COMPLETION  ' that'
  [ 77] COMPLETION  ' prevents'
  [ 78] COMPLETION  ' cancer'
  [ 79] COMPLETION  ','
  [ 80] COMPLETION  ' extremely'
  [ 81] COMPLETION  ' stable'
  [ 82] COMPLETION  ' proteins'
  [ 83] COMPLETION  ','
  [ 84] COMPLETION  ' and'
  [ 85] COMPLETION  ' efficient'
  [ 86] COMPLETION  ' DNA'
  [ 87] COMPLETION  ' repair'
  [ 88] COMPLETION  ' systems'
  [ 89] COMPLETION  ' that'
  [ 90] COMPLETION  ' work'
  [ 91] COMPLETION  ' together'
  [ 92] COMPLETION  ' to'
  [ 93] COMPLETION  ' prevent'
  [ 94] COMPLETION  ' aging'
  [ 95] COMPLETION  '.'
  [ 96] COMPLETION  '<|im_end|>'

Only the final assistant message has weight 1 (completion). Everything else -- system prompt, user messages, and even earlier assistant messages -- has weight 0. This way the loss only encourages the model to produce the correct response, without overfitting to the prompt content (system instructions, questions) which the model should not need to memorize.

`TrainOnWhat` -- controlling loss targets

By default, build_supervised_example trains on the last assistant message. The TrainOnWhat enum gives you more control:

Value	Trains on
`LAST_ASSISTANT_MESSAGE`	Only the final assistant reply (default)
`LAST_ASSISTANT_TURN`	Final assistant turn including tool calls/responses
`ALL_ASSISTANT_MESSAGES`	Every assistant message in the conversation
`ALL_MESSAGES`	All messages regardless of role
`ALL_TOKENS`	Every token including special tokens
`CUSTOMIZED`	Per-message `train` flags from the dataset

# Train on ALL assistant messages instead of just the last one
_, weights_all = renderer.build_supervised_example(
    messages,
    train_on_what=renderers.TrainOnWhat.ALL_ASSISTANT_MESSAGES,
)
print(f"Tokens with weight > 0: {(weights_all > 0).sum().item()}")

# Compare with default (last assistant message only)
_, weights_last = renderer.build_supervised_example(messages)
print(f"Tokens with weight > 0 (default): {(weights_last > 0).sum().item()}")

Output

Tokens with weight > 0: 48
Tokens with weight > 0 (default): 33

Available renderers

Tinker ships renderers for several model families. Use get_renderer() with the appropriate name:

Name	Model family	Notes
`qwen3`	Qwen3	Thinking enabled (default)
`qwen3_disable_thinking`	Qwen3	Thinking disabled
`llama3`	Llama 3	Omits the HF preamble
`deepseekv3`	DeepSeek V3	Non-thinking mode (default)
`deepseekv3_thinking`	DeepSeek V3	Thinking mode
`nemotron3`	NVIDIA Nemotron 3	Thinking enabled
`kimi_k2`	Kimi K2	Thinking format

Each renderer produces the correct special tokens for its model family. The default renderers match HuggingFace's apply_chat_template output, so models trained with Tinker work with the OpenAI-compatible endpoint.

# Example: switching between renderers
# Each model family needs its own tokenizer
qwen_tokenizer = tokenizer_utils.get_tokenizer("Qwen/Qwen3-30B-A3B")
qwen_renderer = renderers.get_renderer("qwen3", qwen_tokenizer)

test_messages = [{"role": "user", "content": "Hello!"}]
prompt_tokens = qwen_renderer.build_generation_prompt(test_messages)
print("Qwen3 prompt:")
print(qwen_tokenizer.decode(prompt_tokens.to_ints()))

Output

Qwen3 prompt:
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant

Vision inputs with `ImagePart`

For vision-language models (like Qwen3-VL), message content can include images alongside text. Use ImagePart for images and TextPart for text within the same message.

from tinker_cookbook.renderers import ImagePart, Message, TextPart

# A multimodal message with an image and text
multimodal_message = Message(
    role="user",
    content=[
        ImagePart(type="image", image="https://example.com/photo.png"),
        TextPart(type="text", text="What is in this image?"),
    ],
)
print("Multimodal message:", multimodal_message)

# Text-only messages still work as plain strings
text_message = Message(role="user", content="Describe this in one word.")
print("Text message:", text_message)

Output

Multimodal message: {'role': 'user', 'content': [{'type': 'image', 'image': 'https://example.com/photo.png'}, {'type': 'text', 'text': 'What is in this image?'}]}
Text message: {'role': 'user', 'content': 'Describe this in one word.'}

To use vision renderers, you also need an image processor:

from tinker_cookbook.image_processing_utils import get_image_processor

model_name = "Qwen/Qwen3-VL-235B-A22B-Instruct"
tokenizer = tokenizer_utils.get_tokenizer(model_name)
image_processor = get_image_processor(model_name)

renderer = renderers.get_renderer("qwen3_vl_instruct", tokenizer, image_processor=image_processor)

The VL renderers handle vision special tokens (<|vision_start|>, <|vision_end|>) and image preprocessing automatically.

Custom renderers with `register_renderer()`

If you need a format not covered by the built-in renderers, you can register your own. This lets you use get_renderer() with a custom name throughout your codebase.

from tinker_cookbook.renderers.base import Renderer

# Define a factory function that creates your renderer
def my_renderer_factory(tokenizer, image_processor=None):
    # In practice, you would return a custom Renderer subclass here.
    # For demonstration, we just return the Qwen3 renderer.
    from tinker_cookbook.renderers.qwen3 import Qwen3Renderer

    return Qwen3Renderer(tokenizer)

# Register it under a namespaced name
renderers.register_renderer("MyOrg/custom_format", my_renderer_factory)

# Now you can use it via get_renderer
print(f"Registered renderers: {renderers.get_registered_renderer_names()}")

# Clean up
renderers.unregister_renderer("MyOrg/custom_format")

Output

Registered renderers: ['MyOrg/custom_format']

Summary

The renderer is the bridge between conversations and tokens. Its four key methods cover the full lifecycle:

Method	Purpose	Used in
`build_generation_prompt()`	Messages to prompt tokens	RL, inference
`get_stop_sequences()`	End-of-generation tokens	Sampling
`parse_response()`	Tokens back to a message	RL, inference
`build_supervised_example()`	Messages to tokens + loss weights	SFT, DPO

Use get_renderer(name, tokenizer) to get the right renderer for your model, and TrainOnWhat to control which parts of the conversation the model trains on.

Tutorial: Rendering

Setup

Example conversation

build_generation_prompt() -- for sampling

get_stop_sequences() -- stop tokens

parse_response() -- decoding tokens back to a message

Putting it together: sampling a response

build_supervised_example() -- for training

TrainOnWhat -- controlling loss targets

Available renderers

Vision inputs with ImagePart

Custom renderers with register_renderer()

Summary

`build_generation_prompt()` -- for sampling

`get_stop_sequences()` -- stop tokens

`parse_response()` -- decoding tokens back to a message

`build_supervised_example()` -- for training

`TrainOnWhat` -- controlling loss targets

Vision inputs with `ImagePart`

Custom renderers with `register_renderer()`