Skip to content

Tutorial: Rendering

Prerequisites

Run it interactively

curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/201_rendering.py && uv run marimo edit 201_rendering.py

Rendering converts a list of messages into a token sequence that a model can consume. While similar to HuggingFace chat templates, Tinker's rendering system handles the full training lifecycle: supervised learning, reinforcement learning, and deployment.

The renderer sits between your high-level conversation data and the low-level tokens the model sees:

Messages (list of dicts)  -->  Renderer  -->  Token IDs (list of ints)

This tutorial covers the Renderer class and its key methods.

Setup

We need a tokenizer (to map between text and token IDs) and a renderer (to apply the model's chat format).

from tinker_cookbook import renderers, tokenizer_utils

tokenizer = tokenizer_utils.get_tokenizer("Qwen/Qwen3-30B-A3B")
renderer = renderers.get_renderer("qwen3", tokenizer)

Example conversation

We will use this conversation throughout the tutorial.

messages = [
    {"role": "system", "content": "Answer concisely; at most one sentence per response"},
    {"role": "user", "content": "What is the longest-lived rodent species?"},
    {"role": "assistant", "content": "The naked mole rat, which can live over 30 years."},
    {"role": "user", "content": "How do they live so long?"},
    {
        "role": "assistant",
        "content": "They evolved multiple protective mechanisms including special hyaluronic acid that prevents cancer, extremely stable proteins, and efficient DNA repair systems that work together to prevent aging.",
    },
]

build_generation_prompt() -- for sampling

Converts a conversation into a token prompt ready for the model to continue. This is used during RL rollouts and at deployment time.

Typically you pass all messages except the final assistant reply, so the model generates its own response.

# Remove the last assistant message so the model can generate one
prompt = renderer.build_generation_prompt(messages[:-1])
print("ModelInput:", prompt)
print()
print("Decoded tokens:")
print(tokenizer.decode(prompt.to_ints()))
Output
ModelInput: ModelInput(chunks=[EncodedTextChunk(tokens=[151644, 8948, 198], type='encoded_text'), EncodedTextChunk(tokens=[16141, 3529, 285, 974, 26, 518, 1429, 825, 11652, 817, 2033, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 872, 198], type='encoded_text'), EncodedTextChunk(tokens=[3838, 374, 279, 22032, 61854, 20589, 306, 9419, 30, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 77091, 198], type='encoded_text'), EncodedTextChunk(tokens=[785, 19020, 34651, 11244, 11, 892, 646, 3887, 916, 220, 18, 15, 1635, 13, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 872, 198], type='encoded_text'), EncodedTextChunk(tokens=[4340, 653, 807, 3887, 773, 1293, 30, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 77091, 198], type='encoded_text')])

Decoded tokens:
<|im_start|>system
Answer concisely; at most one sentence per response<|im_end|>
<|im_start|>user
What is the longest-lived rodent species?<|im_end|>
<|im_start|>assistant
The naked mole rat, which can live over 30 years.<|im_end|>
<|im_start|>user
How do they live so long?<|im_end|>
<|im_start|>assistant

The output is a ModelInput object containing the tokenized chat template. Notice how each message is wrapped in special tokens like <|im_start|> and <|im_end|>, and the final <|im_start|>assistant is left open for the model to fill in.

get_stop_sequences() -- stop tokens

When sampling, we need to know when the model has finished its response. get_stop_sequences() returns the token IDs (or strings) that signal end-of-generation.

stop_sequences = renderer.get_stop_sequences()
print(f"Stop sequences: {stop_sequences}")

# For Qwen3, this is the <|im_end|> token
for tok in stop_sequences:
    if isinstance(tok, int):
        print(f"  Token {tok} decodes to: {repr(tokenizer.decode([tok]))}")
Output
Stop sequences: [151645]
  Token 151645 decodes to: '<|im_end|>'

parse_response() -- decoding tokens back to a message

After sampling, you get raw token IDs. parse_response() converts them back into a structured message dict.

# Simulate some sampled tokens (in practice these come from the model)
fake_tokens = [45, 7741, 34651, 31410, 614, 4911, 76665, 13, 151645]

parsed_message, parse_success = renderer.parse_response(fake_tokens)
print(f"Parsed message: {parsed_message}")
print(f"Parse success: {parse_success}")
Output
Parsed message: {'role': 'assistant', 'content': 'Naked mole rats have unique adaptations.'}
Parse success: True

Putting it together: sampling a response

Here is the full pattern for generating a message from a model. This requires a running Tinker service (and TINKER_API_KEY).

import tinker
from tinker.types import SamplingParams

service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(base_model="Qwen/Qwen3-30B-A3B")

prompt = renderer.build_generation_prompt(messages[:-1])
stop_sequences = renderer.get_stop_sequences()
sampling_params = SamplingParams(max_tokens=100, temperature=0.5, stop=stop_sequences)

output = sampling_client.sample(prompt, sampling_params=sampling_params, num_samples=1).result()
sampled_message, success = renderer.parse_response(output.sequences[0].tokens)
print(sampled_message)

build_supervised_example() -- for training

For supervised fine-tuning, we need to distinguish prompt tokens (context the model reads) from completion tokens (what the model should learn to produce). build_supervised_example() returns both the tokens and per-token loss weights.

  • Weight 0 = prompt (no loss computed)
  • Weight 1 = completion (model trains on these)
model_input, weights = renderer.build_supervised_example(messages)

# Show which tokens are prompt vs completion
token_ids = model_input.to_ints()
for i, (tok_id, w) in enumerate(zip(token_ids, weights.tolist())):
    label = "COMPLETION" if w > 0 else "prompt"
    print(f"  [{i:3d}] {label:10s}  {repr(tokenizer.decode([tok_id]))}")
Output
  [  0] prompt      '<|im_start|>'
  [  1] prompt      'system'
  [  2] prompt      '\n'
  [  3] prompt      'Answer'
  [  4] prompt      ' conc'
  [  5] prompt      'is'
  [  6] prompt      'ely'
  [  7] prompt      ';'
  [  8] prompt      ' at'
  [  9] prompt      ' most'
  [ 10] prompt      ' one'
  [ 11] prompt      ' sentence'
  [ 12] prompt      ' per'
  [ 13] prompt      ' response'
  [ 14] prompt      '<|im_end|>'
  [ 15] prompt      '\n'
  [ 16] prompt      '<|im_start|>'
  [ 17] prompt      'user'
  [ 18] prompt      '\n'
  [ 19] prompt      'What'
  [ 20] prompt      ' is'
  [ 21] prompt      ' the'
  [ 22] prompt      ' longest'
  [ 23] prompt      '-lived'
  [ 24] prompt      ' rod'
  [ 25] prompt      'ent'
  [ 26] prompt      ' species'
  [ 27] prompt      '?'
  [ 28] prompt      '<|im_end|>'
  [ 29] prompt      '\n'
  [ 30] prompt      '<|im_start|>'
  [ 31] prompt      'assistant'
  [ 32] prompt      '\n'
  [ 33] prompt      'The'
  [ 34] prompt      ' naked'
  [ 35] prompt      ' mole'
  [ 36] prompt      ' rat'
  [ 37] prompt      ','
  [ 38] prompt      ' which'
  [ 39] prompt      ' can'
  [ 40] prompt      ' live'
  [ 41] prompt      ' over'
  [ 42] prompt      ' '
  [ 43] prompt      '3'
  [ 44] prompt      '0'
  [ 45] prompt      ' years'
  [ 46] prompt      '.'
  [ 47] prompt      '<|im_end|>'
  [ 48] prompt      '\n'
  [ 49] prompt      '<|im_start|>'
  [ 50] prompt      'user'
  [ 51] prompt      '\n'
  [ 52] prompt      'How'
  [ 53] prompt      ' do'
  [ 54] prompt      ' they'
  [ 55] prompt      ' live'
  [ 56] prompt      ' so'
  [ 57] prompt      ' long'
  [ 58] prompt      '?'
  [ 59] prompt      '<|im_end|>'
  [ 60] prompt      '\n'
  [ 61] prompt      '<|im_start|>'
  [ 62] prompt      'assistant'
  [ 63] prompt      '\n'
  [ 64] COMPLETION  'They'
  [ 65] COMPLETION  ' evolved'
  [ 66] COMPLETION  ' multiple'
  [ 67] COMPLETION  ' protective'
  [ 68] COMPLETION  ' mechanisms'
  [ 69] COMPLETION  ' including'
  [ 70] COMPLETION  ' special'
  [ 71] COMPLETION  ' hy'
  [ 72] COMPLETION  'al'
  [ 73] COMPLETION  'ur'
  [ 74] COMPLETION  'onic'
  [ 75] COMPLETION  ' acid'
  [ 76] COMPLETION  ' that'
  [ 77] COMPLETION  ' prevents'
  [ 78] COMPLETION  ' cancer'
  [ 79] COMPLETION  ','
  [ 80] COMPLETION  ' extremely'
  [ 81] COMPLETION  ' stable'
  [ 82] COMPLETION  ' proteins'
  [ 83] COMPLETION  ','
  [ 84] COMPLETION  ' and'
  [ 85] COMPLETION  ' efficient'
  [ 86] COMPLETION  ' DNA'
  [ 87] COMPLETION  ' repair'
  [ 88] COMPLETION  ' systems'
  [ 89] COMPLETION  ' that'
  [ 90] COMPLETION  ' work'
  [ 91] COMPLETION  ' together'
  [ 92] COMPLETION  ' to'
  [ 93] COMPLETION  ' prevent'
  [ 94] COMPLETION  ' aging'
  [ 95] COMPLETION  '.'
  [ 96] COMPLETION  '<|im_end|>'

Only the final assistant message has weight 1 (completion). Everything else -- system prompt, user messages, and even earlier assistant messages -- has weight 0. This way the loss only encourages the model to produce the correct response, without overfitting to the prompt content (system instructions, questions) which the model should not need to memorize.

TrainOnWhat -- controlling loss targets

By default, build_supervised_example trains on the last assistant message. The TrainOnWhat enum gives you more control:

Value Trains on
LAST_ASSISTANT_MESSAGE Only the final assistant reply (default)
LAST_ASSISTANT_TURN Final assistant turn including tool calls/responses
ALL_ASSISTANT_MESSAGES Every assistant message in the conversation
ALL_MESSAGES All messages regardless of role
ALL_TOKENS Every token including special tokens
CUSTOMIZED Per-message train flags from the dataset
# Train on ALL assistant messages instead of just the last one
_, weights_all = renderer.build_supervised_example(
    messages,
    train_on_what=renderers.TrainOnWhat.ALL_ASSISTANT_MESSAGES,
)
print(f"Tokens with weight > 0: {(weights_all > 0).sum().item()}")

# Compare with default (last assistant message only)
_, weights_last = renderer.build_supervised_example(messages)
print(f"Tokens with weight > 0 (default): {(weights_last > 0).sum().item()}")
Output
Tokens with weight > 0: 48
Tokens with weight > 0 (default): 33

Available renderers

Tinker ships renderers for several model families. Use get_renderer() with the appropriate name:

Name Model family Notes
qwen3 Qwen3 Thinking enabled (default)
qwen3_disable_thinking Qwen3 Thinking disabled
llama3 Llama 3 Omits the HF preamble
deepseekv3 DeepSeek V3 Non-thinking mode (default)
deepseekv3_thinking DeepSeek V3 Thinking mode
nemotron3 NVIDIA Nemotron 3 Thinking enabled
kimi_k2 Kimi K2 Thinking format

Each renderer produces the correct special tokens for its model family. The default renderers match HuggingFace's apply_chat_template output, so models trained with Tinker work with the OpenAI-compatible endpoint.

# Example: switching between renderers
# Each model family needs its own tokenizer
qwen_tokenizer = tokenizer_utils.get_tokenizer("Qwen/Qwen3-30B-A3B")
qwen_renderer = renderers.get_renderer("qwen3", qwen_tokenizer)

test_messages = [{"role": "user", "content": "Hello!"}]
prompt_tokens = qwen_renderer.build_generation_prompt(test_messages)
print("Qwen3 prompt:")
print(qwen_tokenizer.decode(prompt_tokens.to_ints()))
Output
Qwen3 prompt:
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant

Vision inputs with ImagePart

For vision-language models (like Qwen3-VL), message content can include images alongside text. Use ImagePart for images and TextPart for text within the same message.

from tinker_cookbook.renderers import ImagePart, Message, TextPart

# A multimodal message with an image and text
multimodal_message = Message(
    role="user",
    content=[
        ImagePart(type="image", image="https://example.com/photo.png"),
        TextPart(type="text", text="What is in this image?"),
    ],
)
print("Multimodal message:", multimodal_message)

# Text-only messages still work as plain strings
text_message = Message(role="user", content="Describe this in one word.")
print("Text message:", text_message)
Output
Multimodal message: {'role': 'user', 'content': [{'type': 'image', 'image': 'https://example.com/photo.png'}, {'type': 'text', 'text': 'What is in this image?'}]}
Text message: {'role': 'user', 'content': 'Describe this in one word.'}

To use vision renderers, you also need an image processor:

from tinker_cookbook.image_processing_utils import get_image_processor

model_name = "Qwen/Qwen3-VL-235B-A22B-Instruct"
tokenizer = tokenizer_utils.get_tokenizer(model_name)
image_processor = get_image_processor(model_name)

renderer = renderers.get_renderer("qwen3_vl_instruct", tokenizer, image_processor=image_processor)

The VL renderers handle vision special tokens (<|vision_start|>, <|vision_end|>) and image preprocessing automatically.

Custom renderers with register_renderer()

If you need a format not covered by the built-in renderers, you can register your own. This lets you use get_renderer() with a custom name throughout your codebase.

from tinker_cookbook.renderers.base import Renderer

# Define a factory function that creates your renderer
def my_renderer_factory(tokenizer, image_processor=None):
    # In practice, you would return a custom Renderer subclass here.
    # For demonstration, we just return the Qwen3 renderer.
    from tinker_cookbook.renderers.qwen3 import Qwen3Renderer

    return Qwen3Renderer(tokenizer)

# Register it under a namespaced name
renderers.register_renderer("MyOrg/custom_format", my_renderer_factory)

# Now you can use it via get_renderer
print(f"Registered renderers: {renderers.get_registered_renderer_names()}")

# Clean up
renderers.unregister_renderer("MyOrg/custom_format")
Output
Registered renderers: ['MyOrg/custom_format']

Summary

The renderer is the bridge between conversations and tokens. Its four key methods cover the full lifecycle:

Method Purpose Used in
build_generation_prompt() Messages to prompt tokens RL, inference
get_stop_sequences() End-of-generation tokens Sampling
parse_response() Tokens back to a message RL, inference
build_supervised_example() Messages to tokens + loss weights SFT, DPO

Use get_renderer(name, tokenizer) to get the right renderer for your model, and TrainOnWhat to control which parts of the conversation the model trains on.