tinker_cookbook.rl.Transition

class tinker_cookbook.rl.Transition(**)

A single (observation, action, reward) tuple from a trajectory.

Fields:

ob (Observation) – Observation the agent saw before taking the action.
ac (TokensWithLogprobs) – Action taken (tokens and their log-probabilities).
reward (float) – Immediate reward received after taking the action.
episode_done (bool) – Whether this transition ended the episode.
metrics (Metrics) – Numeric values aggregated and reported in training logs. Default: field(default_factory=dict).
logs (Logs) – Diagnostic info for display/debugging tools (not aggregated like metrics). Default: field(default_factory=dict).