tinker_cookbook.rl.Env

class tinker_cookbook.rl.Env(ABC)

Stateful environment that a single agent interacts with.

Return the starting observation and stop condition for this episode.

Returns: tuple[Observation, StopCondition]: The initial observation (model input) and the stop condition for the first generation step.

Advance the environment by one step given the agent's action.

Parameters:

action (Action) – Token IDs produced by the agent.
extra (ActionExtra | None) – Optional metadata about the action, such as the stop reason.

Returns: StepResult: The reward, next observation, and whether the episode is done.