Skip to content

Tutorial 201: Rendering

Prerequisites

Run it interactively [source]

curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/201_rendering.py && marimo edit 201_rendering.py

Rendering converts a list of messages into a token sequence that a model can consume. While similar to HuggingFace chat templates, Tinker's rendering system handles the full training lifecycle: supervised learning, reinforcement learning, and deployment.

The renderer sits between your high-level conversation data and the low-level tokens the model sees:

Messages (list of dicts)  -->  Renderer  -->  Token IDs (list of ints)

This tutorial covers the Renderer class and its key methods.

Setup

We need a tokenizer (to map between text and token IDs) and a renderer (to apply the model's chat format). Note for this example that both Qwen3.5 and Qwen3.6 models use the same qwen3_5 renderer.

from tinker_cookbook import renderers, tokenizer_utils

tokenizer = tokenizer_utils.get_tokenizer("Qwen/Qwen3.6-35B-A3B")
renderer = renderers.get_renderer("qwen3_5", tokenizer)
renderer  # noqa: B018

Example conversation

We will use this conversation throughout the tutorial.

messages = [
    {"role": "system", "content": "Answer concisely; at most one sentence per response"},
    {"role": "user", "content": "What is the longest-lived rodent species?"},
    {"role": "assistant", "content": "The naked mole rat, which can live over 30 years."},
    {"role": "user", "content": "How do they live so long?"},
    {
        "role": "assistant",
        "content": "They evolved multiple protective mechanisms including special hyaluronic acid that prevents cancer, extremely stable proteins, and efficient DNA repair systems that work together to prevent aging.",
    },
]

build_generation_prompt() -- for sampling

Converts a conversation into a token prompt ready for the model to continue. This is used during RL rollouts and at deployment time.

Typically you pass all messages except the final assistant reply, so the model generates its own response.

# Remove the last assistant message so the model can generate one
prompt = renderer.build_generation_prompt(messages[:-1])
print("ModelInput:", prompt)
print()
print("Decoded tokens:")
print(tokenizer.decode(prompt.to_ints()))
Output
ModelInput: ModelInput(chunks=[EncodedTextChunk(tokens=[248045, 8678, 198], type='encoded_text'), EncodedTextChunk(tokens=[15666, 3413, 284, 943, 26, 506, 1379, 799, 11316, 791, 1965, 248046], type='encoded_text'), EncodedTextChunk(tokens=[198, 248045, 846, 198], type='encoded_text'), EncodedTextChunk(tokens=[3710, 369, 279, 21354, 59769, 19964, 305, 9140, 30, 248046], type='encoded_text'), EncodedTextChunk(tokens=[198, 248045, 74455, 198], type='encoded_text'), EncodedTextChunk(tokens=[760, 18447, 33504, 10918, 11, 864, 628, 3756, 888, 220, 18, 15, 1578, 13, 248046], type='encoded_text'), EncodedTextChunk(tokens=[198, 248045, 846, 198], type='encoded_text'), EncodedTextChunk(tokens=[4199, 635, 781, 3756, 748, 1248, 30, 248046], type='encoded_text'), EncodedTextChunk(tokens=[198, 248045, 74455, 198, 248068, 198], type='encoded_text')])

Decoded tokens:
<|im_start|>system
Answer concisely; at most one sentence per response<|im_end|>
<|im_start|>user
What is the longest-lived rodent species?<|im_end|>
<|im_start|>assistant
The naked mole rat, which can live over 30 years.<|im_end|>
<|im_start|>user
How do they live so long?<|im_end|>
<|im_start|>assistant
<think>

The output is a ModelInput object containing the tokenized chat template. Notice how each message is wrapped in special tokens like <|im_start|> and <|im_end|>, and the final <|im_start|>assistant is left open for the model to fill in.

Because qwen3_5 is a thinking renderer, the prompt also ends with an open <think> tag that primes the model to reason before answering. If you'd prefer non-thinking mode instead, the qwen3_5_disable_thinking variant inserts a closed <think></think> so the model replies directly.

get_stop_sequences() -- stop tokens

When sampling, we need to know when the model has finished its response. get_stop_sequences() returns the token IDs (or strings) that signal end-of-generation.

stop_sequences = renderer.get_stop_sequences()
print(f"Stop sequences: {stop_sequences}")

# For Qwen3.5/3.6, this is the <|im_end|> token
for tok in stop_sequences:
    if isinstance(tok, int):
        print(f"  Token {tok} decodes to: {tokenizer.decode([tok])!r}")
Output
Stop sequences: [248046]
  Token 248046 decodes to: '<|im_end|>'

parse_response() -- decoding tokens back to a message

After sampling, you get raw token IDs. parse_response() converts them back into a structured message dict and a ParseTermination enum that tells you how the response ended:

  • STOP_SEQUENCE — the renderer's expected stop signal fired (e.g. <|im_end|> for chat templates, \n\nUser: for RoleColon).
  • EOS — the model emitted EOS instead. Some renderers (notably RoleColonRenderer for base models) accept this as a clean parse on single-turn prompts.
  • MALFORMED — no clean termination (truncated, or multiple/conflicting stop signals).

Use termination.is_clean (any clean termination — what eval grading reads) or termination.is_stop_sequence (strict — what RL format-reward shaping reads).

# Simulate what the model emits during sampling: the assistant's reply text
# followed by the <|im_end|> stop token. (In practice these come from the
# sampler -- here we build them by hand so the example is reproducible.)
fake_tokens = tokenizer.encode(
    "They have efficient DNA repair and cancer-resistant cells.<|im_end|>"
)
parsed_message, termination = renderer.parse_response(fake_tokens)

print(f"Fake tokens: {fake_tokens}")
print(f"Parsed message: {parsed_message}")
print(f"Termination: {termination} (is_clean={termination.is_clean})")
Output
Fake tokens: [6651, 599, 10727, 15095, 12368, 321, 9108, 44422, 7515, 13, 248046]
Parsed message: {'role': 'assistant', 'content': 'They have efficient DNA repair and cancer-resistant cells.'}
Termination: stop_sequence (is_clean=True)

Putting it together: sampling a response

Here is the full pattern for generating a message from a model. This requires a running Tinker service (and TINKER_API_KEY).

import tinker
from tinker.types import SamplingParams

service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(base_model="Qwen/Qwen3.6-35B-A3B")

prompt = renderer.build_generation_prompt(messages[:-1])
stop_sequences = renderer.get_stop_sequences()
sampling_params = SamplingParams(max_tokens=100, temperature=0.5, stop=stop_sequences)

output = sampling_client.sample(prompt, sampling_params=sampling_params, num_samples=1).result()
sampled_message, success = renderer.parse_response(output.sequences[0].tokens)
print(sampled_message)

build_supervised_example() -- for training

For supervised fine-tuning, we need to distinguish prompt tokens (context the model reads) from completion tokens (what the model should learn to produce). build_supervised_example() returns both the tokens and per-token loss weights.

  • Weight 0 = prompt (no loss computed)
  • Weight 1 = completion (model trains on these)
model_input, weights = renderer.build_supervised_example(messages)

# Show which tokens are prompt vs completion
token_ids = model_input.to_ints()
for i, (tok_id, w) in enumerate(zip(token_ids, weights.tolist())):
    label = "COMPLETION" if w > 0 else "prompt"
    print(f"  [{i:3d}] {label:10s}  {tokenizer.decode([tok_id])!r}")
Output
  [  0] prompt      '<|im_start|>'
  [  1] prompt      'system'
  [  2] prompt      '\n'
  [  3] prompt      'Answer'
  [  4] prompt      ' conc'
  [  5] prompt      'is'
  [  6] prompt      'ely'
  [  7] prompt      ';'
  [  8] prompt      ' at'
  [  9] prompt      ' most'
  [ 10] prompt      ' one'
  [ 11] prompt      ' sentence'
  [ 12] prompt      ' per'
  [ 13] prompt      ' response'
  [ 14] prompt      '<|im_end|>'
  [ 15] prompt      '\n'
  [ 16] prompt      '<|im_start|>'
  [ 17] prompt      'user'
  [ 18] prompt      '\n'
  [ 19] prompt      'What'
  [ 20] prompt      ' is'
  [ 21] prompt      ' the'
  [ 22] prompt      ' longest'
  [ 23] prompt      '-lived'
  [ 24] prompt      ' rod'
  [ 25] prompt      'ent'
  [ 26] prompt      ' species'
  [ 27] prompt      '?'
  [ 28] prompt      '<|im_end|>'
  [ 29] prompt      '\n'
  [ 30] prompt      '<|im_start|>'
  [ 31] prompt      'assistant'
  [ 32] prompt      '\n'
  [ 33] prompt      'The'
  [ 34] prompt      ' naked'
  [ 35] prompt      ' mole'
  [ 36] prompt      ' rat'
  [ 37] prompt      ','
  [ 38] prompt      ' which'
  [ 39] prompt      ' can'
  [ 40] prompt      ' live'
  [ 41] prompt      ' over'
  [ 42] prompt      ' '
  [ 43] prompt      '3'
  [ 44] prompt      '0'
  [ 45] prompt      ' years'
  [ 46] prompt      '.'
  [ 47] prompt      '<|im_end|>'
  [ 48] prompt      '\n'
  [ 49] prompt      '<|im_start|>'
  [ 50] prompt      'user'
  [ 51] prompt      '\n'
  [ 52] prompt      'How'
  [ 53] prompt      ' do'
  [ 54] prompt      ' they'
  [ 55] prompt      ' live'
  [ 56] prompt      ' so'
  [ 57] prompt      ' long'
  [ 58] prompt      '?'
  [ 59] prompt      '<|im_end|>'
  [ 60] prompt      '\n'
  [ 61] prompt      '<|im_start|>'
  [ 62] prompt      'assistant'
  [ 63] prompt      '\n'
  [ 64] prompt      '<think>'
  [ 65] prompt      '\n\n'
  [ 66] prompt      '</think>'
  [ 67] prompt      '\n\n'
  [ 68] COMPLETION  'They'
  [ 69] COMPLETION  ' evolved'
  [ 70] COMPLETION  ' multiple'
  [ 71] COMPLETION  ' protective'
  [ 72] COMPLETION  ' mechanisms'
  [ 73] COMPLETION  ' including'
  [ 74] COMPLETION  ' special'
  [ 75] COMPLETION  ' hy'
  [ 76] COMPLETION  'alur'
  [ 77] COMPLETION  'onic'
  [ 78] COMPLETION  ' acid'
  [ 79] COMPLETION  ' that'
  [ 80] COMPLETION  ' prevents'
  [ 81] COMPLETION  ' cancer'
  [ 82] COMPLETION  ','
  [ 83] COMPLETION  ' extremely'
  [ 84] COMPLETION  ' stable'
  [ 85] COMPLETION  ' proteins'
  [ 86] COMPLETION  ','
  [ 87] COMPLETION  ' and'
  [ 88] COMPLETION  ' efficient'
  [ 89] COMPLETION  ' DNA'
  [ 90] COMPLETION  ' repair'
  [ 91] COMPLETION  ' systems'
  [ 92] COMPLETION  ' that'
  [ 93] COMPLETION  ' work'
  [ 94] COMPLETION  ' together'
  [ 95] COMPLETION  ' to'
  [ 96] COMPLETION  ' prevent'
  [ 97] COMPLETION  ' aging'
  [ 98] COMPLETION  '.'
  [ 99] COMPLETION  '<|im_end|>'

Only the final assistant message has weight 1 (completion). Everything else -- system prompt, user messages, and even earlier assistant messages -- has weight 0. This way the loss only encourages the model to produce the correct response, without overfitting to the prompt content (system instructions, questions) which the model should not need to memorize.

TrainOnWhat -- controlling loss targets

By default, build_supervised_example trains on the last assistant message. The TrainOnWhat enum gives you more control:

Value Trains on
LAST_ASSISTANT_MESSAGE Only the final assistant reply (default)
LAST_ASSISTANT_TURN Final assistant turn including tool calls/responses
ALL_ASSISTANT_MESSAGES Every assistant message in the conversation
ALL_MESSAGES All messages regardless of role
ALL_TOKENS Every token including special tokens
CUSTOMIZED Per-message train flags from the dataset
# Train on ALL assistant messages instead of just the last one
_, weights_all = renderer.build_supervised_example(
    messages,
    train_on_what=renderers.TrainOnWhat.ALL_ASSISTANT_MESSAGES,
)
print(f"Tokens with weight > 0: {(weights_all > 0).sum().item()}")

# Compare with default (last assistant message only)
_, weights_last = renderer.build_supervised_example(messages)
print(f"Tokens with weight > 0 (default): {(weights_last > 0).sum().item()}")
Output
Tokens with weight > 0: 47
Tokens with weight > 0 (default): 32

Available renderers

Tinker ships renderers for several model families. Use get_renderer() with the appropriate name:

Name Model family Notes
qwen3_5 Qwen3.5 / Qwen3.6 (incl. VL) Thinking enabled (default)
qwen3_5_disable_thinking Qwen3.5 / Qwen3.6 (incl. VL) Thinking disabled
deepseekv3 DeepSeek V3 Non-thinking mode (default)
deepseekv3_thinking DeepSeek V3 Thinking mode
nemotron3 NVIDIA Nemotron 3 Thinking enabled
kimi_k26 Kimi K2.6 Thinking enabled (default)
kimi_k26_disable_thinking Kimi K2.6 Thinking disabled

Each renderer produces the correct special tokens for its model family. The default renderers match HuggingFace's apply_chat_template output, so models trained with Tinker work with the OpenAI-compatible endpoint.

# Example: switching between renderers
# Each model family needs its own tokenizer + matching renderer
_test_messages = [{"role": "user", "content": "Hello!"}]

for _model_name, _renderer_name in [
    ("Qwen/Qwen3.6-35B-A3B", "qwen3_5"),
    ("moonshotai/Kimi-K2.6", "kimi_k26"),
]:
    _tokenizer = tokenizer_utils.get_tokenizer(_model_name)
    _renderer = renderers.get_renderer(_renderer_name, _tokenizer)
    _prompt_tokens = _renderer.build_generation_prompt(_test_messages)
    print(f"--- {_model_name} ({_renderer_name}) ---")
    print(_tokenizer.decode(_prompt_tokens.to_ints()))
    print()
Output
--- Qwen/Qwen3.6-35B-A3B (qwen3_5) ---
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
<think>


--- moonshotai/Kimi-K2.6 (kimi_k26) ---
<|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|><|im_user|>user<|im_middle|>Hello!<|im_end|><|im_assistant|>assistant<|im_middle|><think>

Vision inputs with ImagePart

For vision-language models (Qwen3.5 and Qwen3.6 models are all vision-capable), message content can include images alongside text. Use ImagePart for images and TextPart for text within the same message.

from tinker_cookbook.renderers import ImagePart, Message, TextPart

# A multimodal message with an image and text
multimodal_message = Message(
    role="user",
    content=[
        ImagePart(type="image", image="https://example.com/photo.png"),
        TextPart(type="text", text="What is in this image?"),
    ],
)
print("Multimodal message:", multimodal_message)

# Text-only messages still work as plain strings
text_message = Message(role="user", content="Describe this in one word.")
print("Text message:", text_message)
Output
Multimodal message: {'role': 'user', 'content': [{'type': 'image', 'image': 'https://example.com/photo.png'}, {'type': 'text', 'text': 'What is in this image?'}]}
Text message: {'role': 'user', 'content': 'Describe this in one word.'}

The Qwen3.5 and Qwen3.6 models are natively vision-capable -- the same qwen3_5 renderer you set up above also handles images. You just additionally load an image processor and pass it in:

from tinker_cookbook.image_processing_utils import get_image_processor

image_processor = get_image_processor("Qwen/Qwen3.6-35B-A3B")
renderer = renderers.get_renderer("qwen3_5", tokenizer, image_processor=image_processor)

With an image processor attached, the renderer handles the vision special tokens (<|vision_start|>, <|vision_end|>) and image preprocessing automatically.

Custom renderers with register_renderer()

If you need a format not covered by the built-in renderers, you can register your own. This lets you use get_renderer() with a custom name throughout your codebase.

# Define a factory function that creates your renderer
def my_renderer_factory(tokenizer, image_processor=None):
    # In practice, you would return a custom Renderer subclass here.
    # For demonstration, we just return the Qwen3.5 renderer.
    from tinker_cookbook.renderers.qwen3_5 import Qwen3_5Renderer

    return Qwen3_5Renderer(tokenizer)

# Register it under a namespaced name
renderers.register_renderer("MyOrg/custom_format", my_renderer_factory)

# Now you can use it via get_renderer
print(f"Registered renderers: {renderers.get_registered_renderer_names()}")

# Clean up
renderers.unregister_renderer("MyOrg/custom_format")
Output
Registered renderers: ['MyOrg/custom_format']

Summary

The renderer is the bridge between conversations and tokens. Its four key methods cover the full lifecycle:

Method Purpose Used in
build_generation_prompt() Messages to prompt tokens RL, inference
get_stop_sequences() End-of-generation tokens Sampling
parse_response() Tokens back to a message RL, inference
build_supervised_example() Messages to tokens + loss weights SFT, DPO

Use get_renderer(name, tokenizer) to get the right renderer for your model, and TrainOnWhat to control which parts of the conversation the model trains on.