Completers
The concept of policies is crucial to the RL training process. In the Tinker Cookbook, policies are implemented as Completers
. Completers are abstractions that represent models or policies that can be sampled from, providing different levels of structure depending on your use case.
Overview of Completer Types
The Tinker Cookbook provides two main types of completers, each designed for different use cases:
- TokenCompleter: Operates on tokens and is used by RL algorithms
- MessageCompleter: Operates on messages and needs to be used with a renderer
The choice between these depends on whether you're working at the token level for RL training or at the message level for interacting with and evaluating the model.
TokenCompleter
The TokenCompleter
is the foundational interface used by RL algorithms because they work directly with tokens.
class TokenCompleter:
async def __call__(
self, model_input: types.ModelInput, stop: StopCondition
) -> TokensWithLogprobs:
This interface takes:
model_input
: The input to the model (of typetypes.ModelInput
)stop
: Stop conditions, either a list of strings or token IDs (combined into aStopCondition
class). When training with reinforcement learning, this should be defined by theinitial_observation
function of the environment.
It returns a TokensWithLogprobs
object containing:
tokens
: The generated token sequencemaybe_logprobs
: Optional log probabilities for each token
MessageCompleter
The MessageCompleter
operates at a higher level with structured messages, similarly to standard chat APIs. It takes a list of messages and returns a single assistant message response.
class MessageCompleter:
async def __call__(self, messages: list[renderers.Message]) -> renderers.Message:
For training purposes the TokenCompleter
is the class we will use for RL training as we need to optimize the same same set of tokens during the update step that the model output during rollout. The MessageCompleter
is useful for sampling where we need to use the model output for semantic purposes such as Judge models or multi-agent environments.
The Tinker Cookbook uses two concrete implementations of these interfaces - TinkerTokenCompleter
and TinkerMessageCompleter
which are both wrappers around a tinker.SamplingClient
. While the TinkerTokenCompleter operates directly on tokens, the TinkerMessageCompleter needs to be instantiated with a renderer to make it compatible with the inputs expected by the samping client.