Skip to content

tinker_cookbook.rl.Transition

class tinker_cookbook.rl.Transition(**)

A single (observation, action, reward) tuple from a trajectory.

Fields:

  • ob (Observation) – Observation the agent saw before taking the action.
  • ac (TokensWithLogprobs) – Action taken (tokens and their log-probabilities).
  • reward (float) – Immediate reward received after taking the action.
  • episode_done (bool) – Whether this transition ended the episode.
  • metrics (Metrics) – Numeric values aggregated and reported in training logs. Default: field(default_factory=dict).
  • logs (Logs) – Diagnostic info for display/debugging tools (not aggregated like metrics). Default: field(default_factory=dict).