Skip to content

tinker_cookbook.rl.trajectory_to_data

tinker_cookbook.rl.trajectory_to_data(traj, traj_advantage)

Return one or more Datum objects corresponding to the trajectory.

Parameters:

  • traj (Trajectory) – A single trajectory containing transitions (observation-action pairs).
  • traj_advantage (float) – The scalar advantage to assign to all action tokens in this trajectory.

Returns: list[tinker.Datum]: One or more training datums, each containing model input, targets, sampled log-probs, advantages, and masks.