tinker_cookbook.rl.trajectory_to_data

Return one or more Datum objects corresponding to the trajectory.

Parameters:

traj (Trajectory) – A single trajectory containing transitions (observation-action pairs).
traj_advantage (float) – The scalar advantage to assign to all action tokens in this trajectory.

Returns: list[tinker.Datum]: One or more training datums, each containing model input, targets, sampled log-probs, advantages, and masks.