Skip to content

tinker_cookbook.rl.compute_advantages

tinker_cookbook.rl.compute_advantages(trajectory_groups_P)

Compute advantages for each trajectory, centered within groups.

Parameters:

  • trajectory_groups_P (list[TrajectoryGroup]) – Groups of trajectories, where each group's rewards are centered independently.

Returns: list[torch.Tensor]: Per-group advantage tensors of shape (G,), where G is the number of trajectories in each group.