Completers

The concept of policies is crucial to the RL training process. In the Tinker Cookbook, policies are implemented as Completers. Completers are abstractions that represent models or policies that can be sampled from, providing different levels of structure depending on your use case.

Overview of Completer Types

The Tinker Cookbook provides two main types of completers, each designed for different use cases:

TokenCompleter: Operates on tokens and is used by RL algorithms
MessageCompleter: Operates on messages and needs to be used with a renderer

The choice between these depends on whether you're working at the token level for RL training or at the message level for interacting with and evaluating the model.

TokenCompleter

The TokenCompleter is the foundational interface used by RL algorithms because they work directly with tokens.

class TokenCompleter:
    async def __call__(
        self, model_input: types.ModelInput, stop: StopCondition
    ) -> TokensWithLogprobs:

This interface takes:

model_input: The input to the model (of type types.ModelInput)
stop: Stop conditions, either a list of strings or token IDs (combined into a StopCondition class). When training with reinforcement learning, this should be defined by the initial_observation function of the environment.

It returns a TokensWithLogprobs object containing:

tokens: The generated token sequence
maybe_logprobs: Optional log probabilities for each token

MessageCompleter

The MessageCompleter operates at a higher level with structured messages, similarly to standard chat APIs. It takes a list of messages and returns a single assistant message response.

class MessageCompleter:
    async def __call__(self, messages: list[renderers.Message]) -> renderers.Message:

For training purposes the TokenCompleter is the class we will use for RL training as we need to optimize the same same set of tokens during the update step that the model output during rollout. The MessageCompleter is useful for sampling where we need to use the model output for semantic purposes such as Judge models or multi-agent environments.

The Tinker Cookbook uses two concrete implementations of these interfaces - TinkerTokenCompleter and TinkerMessageCompleter which are both wrappers around a tinker.SamplingClient. While the TinkerTokenCompleter operates directly on tokens, the TinkerMessageCompleter needs to be instantiated with a renderer to make it compatible with the inputs expected by the samping client.

Evaluations Under the Hood