tinker_cookbook.rl.trajectory_to_data
tinker_cookbook.rl.trajectory_to_data(traj, traj_advantage)
Return one or more Datum objects corresponding to the trajectory.
Parameters:
- traj (Trajectory) – A single trajectory containing transitions (observation-action pairs).
- traj_advantage (float) – The scalar advantage to assign to all action tokens in this trajectory.
Returns: list[tinker.Datum]: One or more training datums, each containing model input, targets, sampled log-probs, advantages, and masks.