tinker_cookbook.rl.compute_advantages

Compute advantages for each trajectory, centered within groups.

Parameters:

trajectory_groups_P (list[TrajectoryGroup]) – Groups of trajectories, where each group's rewards are centered independently.

Returns: list[torch.Tensor]: Per-group advantage tensors of shape (G,), where G is the number of trajectories in each group.