Tutorial 203: Completers
Prerequisites
Run it interactively [source]
Completers are thin wrappers around SamplingClient that provide two levels of abstraction:
- TokenCompleter -- operates on token IDs and
ModelInput. Used by RL algorithms that work at the token level. - MessageCompleter -- operates on message dicts (role/content). Used by evaluators, LLM-as-judge patterns, and chat applications.
In this tutorial you will:
- Build a
TinkerTokenCompleterfrom aSamplingClient - Use it to generate tokens with stop conditions
- Build a
TinkerMessageCompleterwith a renderer - Use it to generate structured message responses
- Implement a simple LLM-as-judge pattern
import warnings
warnings.filterwarnings("ignore", message="IProgress not found")
import tinker
from tinker_cookbook.completers import (
TinkerMessageCompleter,
TinkerTokenCompleter,
)
from tinker_cookbook.renderers import get_renderer, get_text_content
TokenCompleter vs MessageCompleter
TokenCompleter MessageCompleter
+--------------------------+ +---------------------------+
| Input: ModelInput | | Input: list[Message] |
| (token IDs) | | (role + content) |
| Output: TokensWithLogprobs| | Output: Message |
| (tokens + logps) | | (role + content) |
+--------------------------+ +---------------------------+
Used by RL loops Used by evals / judges
TokenCompleter gives you raw tokens and log-probabilities -- essential for computing advantages and building RL datums. MessageCompleter hides the tokenization details and speaks the language of conversations.
Setup
Create a sampling client and a renderer. We will use these throughout the tutorial.
import os
mo.stop(
"TINKER_API_KEY" not in os.environ and not api_key.value,
"Paste your API key above",
)
if api_key.value:
os.environ["TINKER_API_KEY"] = api_key.value
MODEL_NAME = "Qwen/Qwen3.5-4B"
service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(base_model=MODEL_NAME)
tokenizer = sampling_client.get_tokenizer()
renderer = get_renderer("qwen3_5_disable_thinking", tokenizer)
print(f"Sampling client ready for {MODEL_NAME}")
TinkerTokenCompleter
TinkerTokenCompleter wraps a SamplingClient and exposes the TokenCompleter interface. You pass a ModelInput (tokenized prompt) and a stop condition (token IDs or strings).
token_completer = TinkerTokenCompleter(
sampling_client=sampling_client,
max_tokens=128,
temperature=0.7,
)
print(
f"TokenCompleter: max_tokens={token_completer.max_tokens}, temp={token_completer.temperature}"
)
Generate tokens with stop conditions
The TokenCompleter is an async callable. We pass a ModelInput and stop sequences. The result is a TokensWithLogprobs with the generated token IDs, their log-probabilities, and the stop reason.
# Build a prompt from messages
messages_for_tokens = [
{"role": "user", "content": "What is 7 * 8?"},
]
model_input = renderer.build_generation_prompt(messages_for_tokens)
stop_sequences = renderer.get_stop_sequences()
# Generate tokens
token_result = await token_completer(model_input, stop=stop_sequences)
print(f"Generated {len(token_result.tokens)} tokens")
print(f"Stop reason: {token_result.stop_reason}")
print(f"Log-probs (first 5): {token_result.logprobs[:5]}")
print(f"Decoded: {tokenizer.decode(token_result.tokens)}")
Output
The log-probabilities are always available on TinkerTokenCompleter results. In RL, these are used as the sampling logprobs for importance sampling correction:
sampling_logprobs = token_result.logprobs # from the sampler
# Later, forward_backward computes target_logprobs from the learner
# The ratio exp(target - sampling) corrects for off-policy data
TinkerMessageCompleter
TinkerMessageCompleter wraps a SamplingClient and a Renderer to speak the message-level protocol. You pass a list of message dicts; it handles rendering, sampling, and parsing internally.
message_completer = TinkerMessageCompleter(
sampling_client=sampling_client,
renderer=renderer,
max_tokens=256,
temperature=0.7,
)
print("MessageCompleter ready")
# Generate a message response
conversation = [
{"role": "user", "content": "Explain what a hash table is in one sentence."},
]
response = await message_completer(conversation)
print(f"Role: {response['role']}")
print(f"Content: {get_text_content(response)}")
Output
Multi-turn conversations
MessageCompleter handles multi-turn conversations naturally -- just pass the full message history.
multi_turn = [
{"role": "user", "content": "What is the largest planet in our solar system?"},
{"role": "assistant", "content": "Jupiter."},
{"role": "user", "content": "How many moons does it have?"},
]
followup = await message_completer(multi_turn)
print(f"Response: {get_text_content(followup)}")
Output
Response: Jupiter has **95 confirmed moons**.
This number is constantly updated by astronomers as new moons are discovered or existing ones are re-identified. Most of these moons are tiny, irregularly shaped rocky bodies discovered in recent decades. The majority of the well-known moons orbit in a system called the Galilean moons, which were discovered by Galileo Galilei in 1610. These four large moons are:
1. **Io** (the most volcanically active body in the solar system)
2. **Europa** (covered in ice with a likely subsurface ocean)
3. **Ganymede** (the largest moon in the solar system, bigger than Mercury)
4. **Callisto** (the oldest and most heavily cratered)
The remaining 91+ moons are much smaller and scattered throughout Jupiter's orbit, some trapped in resonance with the Galilean moons and others in distant, chaotic orbits.
LLM-as-judge pattern
A common evaluation pattern uses one model as a "judge" to score outputs from another model (or the same model at a different checkpoint). The MessageCompleter makes this straightforward.
The pattern:
- Generate a candidate answer using the model under evaluation
- Ask the judge to score it
- Parse the score from the judge's response
import re
# Step 1: Generate a candidate answer
question = "Why do leaves change color in autumn?"
candidate = await message_completer([{"role": "user", "content": question}])
candidate_text = get_text_content(candidate)
print(f"Candidate answer:\n{candidate_text}\n")
# Step 2: Ask the judge to score it
judge_prompt = f"""Rate the following answer on a scale of 1-5 for accuracy and clarity.
Question: {question}
Answer: {candidate_text}
Respond with ONLY a number from 1 to 5."""
judge_response = await message_completer([{"role": "user", "content": judge_prompt}])
judge_text = get_text_content(judge_response)
# Step 3: Parse the score
match = re.search(r"[1-5]", judge_text)
score = int(match.group()) if match else None
print(f"Judge response: {judge_text}")
print(f"Parsed score: {score}")
Output
Candidate answer:
Leaves change color in autumn primarily due to a combination of **temperature changes** and **reduced daylight**, which triggers a biological process that halts the leaf's life cycle.
During the growing season (spring and summer), leaves appear green because of pigments called **chlorophyll**. Chlorophyll is essential for photosynthesis—the process by which plants convert sunlight into energy. It absorbs red and blue light while reflecting green light, giving the leaf its color.
As days get shorter and temperatures drop in autumn, trees receive signals that it is time to conserve energy and prepare for winter dormancy. Consequently, the tree stops producing new chlorophyll. Since chlorophyll breaks down quickly without being replaced, the green color fades away, revealing the pigments that were present all along but hidden underneath:
* **Yellow and Gold**: These colors come from **carotenoids**, which are always present in the leaf but masked by the dominant chlorophyll.
* **Red and Orange**: These vibrant colors come from **anthocyanins** (red pigments). Interestingly, these are not present before fall; they are *produced* by the tree in autumn. The production of anthocyanins often happens when there is plenty of sugar
Judge response: 5
Parsed score: 5
Using the judge as a reward function
In RL training, you can wrap this judge pattern into a reward function:
async def judge_reward(message_completer, question, answer):
judge_prompt = f"Rate this answer 1-5.\nQ: {question}\nA: {answer}\nScore:"
response = await message_completer([{"role": "user", "content": judge_prompt}])
text = get_text_content(response)
match = re.search(r"[1-5]", text)
return float(match.group()) / 5.0 if match else 0.0 # normalize to [0, 1]
This is especially useful when you have a stronger model judging a weaker model's outputs, or when your reward function cannot be expressed as a simple string match.
Summary
| Class | Input | Output | Use case |
|---|---|---|---|
TinkerTokenCompleter |
ModelInput + stop tokens |
TokensWithLogprobs |
RL rollouts, token-level control |
TinkerMessageCompleter |
list[Message] |
Message |
Evals, judges, chat apps |
Both are async callables that wrap a SamplingClient. TokenCompleter gives you log-probabilities for RL; MessageCompleter handles rendering and parsing for you.
You can also implement the TokenCompleter or MessageCompleter interfaces with non-Tinker backends (e.g., a local vLLM server) for testing or hybrid setups.