tinker_cookbook.rl.Transition
class tinker_cookbook.rl.Transition(**)
A single (observation, action, reward) tuple from a trajectory.
Fields:
- ob (Observation) – Observation the agent saw before taking the action.
- ac (TokensWithLogprobs) – Action taken (tokens and their log-probabilities).
- reward (float) – Immediate reward received after taking the action.
- episode_done (bool) – Whether this transition ended the episode.
- metrics (Metrics) – Numeric values aggregated and reported in training logs. Default:
field(default_factory=dict). - logs (Logs) – Diagnostic info for display/debugging tools (not aggregated like metrics). Default:
field(default_factory=dict).