Tutorial 203: Completers

Prerequisites

Rendering

Run it interactively [source]

curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/203_completers.py && marimo edit 203_completers.py

Completers are thin wrappers around SamplingClient that provide two levels of abstraction:

TokenCompleter -- operates on token IDs and ModelInput. Used by RL algorithms that work at the token level.
MessageCompleter -- operates on message dicts (role/content). Used by evaluators, LLM-as-judge patterns, and chat applications.

In this tutorial you will:

Build a TinkerTokenCompleter from a SamplingClient
Use it to generate tokens with stop conditions
Build a TinkerMessageCompleter with a renderer
Use it to generate structured message responses
Implement a simple LLM-as-judge pattern

import warnings

warnings.filterwarnings("ignore", message="IProgress not found")

import tinker

from tinker_cookbook.completers import (
    TinkerMessageCompleter,
    TinkerTokenCompleter,
)
from tinker_cookbook.renderers import get_renderer, get_text_content

TokenCompleter vs MessageCompleter

TokenCompleter                          MessageCompleter
+--------------------------+            +---------------------------+
| Input:  ModelInput        |            | Input:  list[Message]     |
|         (token IDs)       |            |         (role + content)  |
| Output: TokensWithLogprobs|            | Output: Message           |
|         (tokens + logps)  |            |         (role + content)  |
+--------------------------+            +---------------------------+
      Used by RL loops                    Used by evals / judges

TokenCompleter gives you raw tokens and log-probabilities -- essential for computing advantages and building RL datums. MessageCompleter hides the tokenization details and speaks the language of conversations.

Setup

Create a sampling client and a renderer. We will use these throughout the tutorial.

api_key = mo.ui.text(kind="password", label="Paste your Tinker API key")
api_key  # noqa: B018

import os

mo.stop(
    "TINKER_API_KEY" not in os.environ and not api_key.value,
    "Paste your API key above",
)

if api_key.value:
    os.environ["TINKER_API_KEY"] = api_key.value

MODEL_NAME = "Qwen/Qwen3.5-4B"

service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(base_model=MODEL_NAME)
tokenizer = sampling_client.get_tokenizer()
renderer = get_renderer("qwen3_5_disable_thinking", tokenizer)

print(f"Sampling client ready for {MODEL_NAME}")

Output

Sampling client ready for Qwen/Qwen3.5-4B

TinkerTokenCompleter

TinkerTokenCompleter wraps a SamplingClient and exposes the TokenCompleter interface. You pass a ModelInput (tokenized prompt) and a stop condition (token IDs or strings).

token_completer = TinkerTokenCompleter(
    sampling_client=sampling_client,
    max_tokens=128,
    temperature=0.7,
)
print(
    f"TokenCompleter: max_tokens={token_completer.max_tokens}, temp={token_completer.temperature}"
)

Output

TokenCompleter: max_tokens=128, temp=0.7

Generate tokens with stop conditions

The TokenCompleter is an async callable. We pass a ModelInput and stop sequences. The result is a TokensWithLogprobs with the generated token IDs, their log-probabilities, and the stop reason.

# Build a prompt from messages
messages_for_tokens = [
    {"role": "user", "content": "What is 7 * 8?"},
]
model_input = renderer.build_generation_prompt(messages_for_tokens)
stop_sequences = renderer.get_stop_sequences()

# Generate tokens
token_result = await token_completer(model_input, stop=stop_sequences)

print(f"Generated {len(token_result.tokens)} tokens")
print(f"Stop reason: {token_result.stop_reason}")
print(f"Log-probs (first 5): {token_result.logprobs[:5]}")
print(f"Decoded: {tokenizer.decode(token_result.tokens)}")

Output

Generated 34 tokens
Stop reason: stop
Log-probs (first 5): [-0.008074609562754631, -0.3440316915512085, -1.9489881992340088, -0.00825585052371025, -7.74863383412594e-06]
Decoded: To calculate the product of 7 and 8:

$$7 \times 8 = 56$$

So, the answer is **56**.<|im_end|>

The log-probabilities are always available on TinkerTokenCompleter results. In RL, these are used as the sampling logprobs for importance sampling correction:

sampling_logprobs = token_result.logprobs  # from the sampler
# Later, forward_backward computes target_logprobs from the learner
# The ratio exp(target - sampling) corrects for off-policy data

TinkerMessageCompleter

TinkerMessageCompleter wraps a SamplingClient and a Renderer to speak the message-level protocol. You pass a list of message dicts; it handles rendering, sampling, and parsing internally.

message_completer = TinkerMessageCompleter(
    sampling_client=sampling_client,
    renderer=renderer,
    max_tokens=256,
    temperature=0.7,
)
print("MessageCompleter ready")

Output

MessageCompleter ready

# Generate a message response
conversation = [
    {"role": "user", "content": "Explain what a hash table is in one sentence."},
]

response = await message_completer(conversation)
print(f"Role: {response['role']}")
print(f"Content: {get_text_content(response)}")

Output

Role: assistant
Content: A hash table is a data structure that stores key-value pairs in an array by using a hash function to map each key to a specific index, enabling average-case constant time complexity for insertions, deletions, and lookups.

Multi-turn conversations

MessageCompleter handles multi-turn conversations naturally -- just pass the full message history.

multi_turn = [
    {"role": "user", "content": "What is the largest planet in our solar system?"},
    {"role": "assistant", "content": "Jupiter."},
    {"role": "user", "content": "How many moons does it have?"},
]

followup = await message_completer(multi_turn)
print(f"Response: {get_text_content(followup)}")

Output

Response: Jupiter has **95 confirmed moons**.

This number is constantly updated by astronomers as new moons are discovered or existing ones are re-identified. Most of these moons are tiny, irregularly shaped rocky bodies discovered in recent decades. The majority of the well-known moons orbit in a system called the Galilean moons, which were discovered by Galileo Galilei in 1610. These four large moons are:
1.  **Io** (the most volcanically active body in the solar system)
2.  **Europa** (covered in ice with a likely subsurface ocean)
3.  **Ganymede** (the largest moon in the solar system, bigger than Mercury)
4.  **Callisto** (the oldest and most heavily cratered)

The remaining 91+ moons are much smaller and scattered throughout Jupiter's orbit, some trapped in resonance with the Galilean moons and others in distant, chaotic orbits.

LLM-as-judge pattern

A common evaluation pattern uses one model as a "judge" to score outputs from another model (or the same model at a different checkpoint). The MessageCompleter makes this straightforward.

The pattern:

Generate a candidate answer using the model under evaluation
Ask the judge to score it
Parse the score from the judge's response

import re

# Step 1: Generate a candidate answer
question = "Why do leaves change color in autumn?"
candidate = await message_completer([{"role": "user", "content": question}])
candidate_text = get_text_content(candidate)
print(f"Candidate answer:\n{candidate_text}\n")

# Step 2: Ask the judge to score it
judge_prompt = f"""Rate the following answer on a scale of 1-5 for accuracy and clarity.

Question: {question}
Answer: {candidate_text}

Respond with ONLY a number from 1 to 5."""

judge_response = await message_completer([{"role": "user", "content": judge_prompt}])
judge_text = get_text_content(judge_response)

# Step 3: Parse the score
match = re.search(r"[1-5]", judge_text)
score = int(match.group()) if match else None
print(f"Judge response: {judge_text}")
print(f"Parsed score: {score}")

Output

Candidate answer:
Leaves change color in autumn primarily due to a combination of **temperature changes** and **reduced daylight**, which triggers a biological process that halts the leaf's life cycle.

During the growing season (spring and summer), leaves appear green because of pigments called **chlorophyll**. Chlorophyll is essential for photosynthesis—the process by which plants convert sunlight into energy. It absorbs red and blue light while reflecting green light, giving the leaf its color.

As days get shorter and temperatures drop in autumn, trees receive signals that it is time to conserve energy and prepare for winter dormancy. Consequently, the tree stops producing new chlorophyll. Since chlorophyll breaks down quickly without being replaced, the green color fades away, revealing the pigments that were present all along but hidden underneath:

*   **Yellow and Gold**: These colors come from **carotenoids**, which are always present in the leaf but masked by the dominant chlorophyll.
*   **Red and Orange**: These vibrant colors come from **anthocyanins** (red pigments). Interestingly, these are not present before fall; they are *produced* by the tree in autumn. The production of anthocyanins often happens when there is plenty of sugar

Judge response: 5
Parsed score: 5

Using the judge as a reward function

In RL training, you can wrap this judge pattern into a reward function:

async def judge_reward(message_completer, question, answer):
    judge_prompt = f"Rate this answer 1-5.\nQ: {question}\nA: {answer}\nScore:"
    response = await message_completer([{"role": "user", "content": judge_prompt}])
    text = get_text_content(response)
    match = re.search(r"[1-5]", text)
    return float(match.group()) / 5.0 if match else 0.0  # normalize to [0, 1]

This is especially useful when you have a stronger model judging a weaker model's outputs, or when your reward function cannot be expressed as a simple string match.

Summary

Class	Input	Output	Use case
`TinkerTokenCompleter`	`ModelInput` + stop tokens	`TokensWithLogprobs`	RL rollouts, token-level control
`TinkerMessageCompleter`	`list[Message]`	`Message`	Evals, judges, chat apps

Both are async callables that wrap a SamplingClient. TokenCompleter gives you log-probabilities for RL; MessageCompleter handles rendering and parsing for you.

You can also implement the TokenCompleter or MessageCompleter interfaces with non-Tinker backends (e.g., a local vLLM server) for testing or hybrid setups.