tinker_cookbook.rl.Env
class tinker_cookbook.rl.Env(ABC)
Stateful environment that a single agent interacts with.
initial_observation()
Return the starting observation and stop condition for this episode.
Returns: tuple[Observation, StopCondition]: The initial observation (model input) and the stop condition for the first generation step.
step(action, extra)
Advance the environment by one step given the agent's action.
Parameters:
- action (Action) – Token IDs produced by the agent.
- extra (ActionExtra | None) – Optional metadata about the action, such as the stop reason.
Returns: StepResult: The reward, next observation, and whether the episode is done.