tinker_cookbook.rl.compute_advantages
tinker_cookbook.rl.compute_advantages(trajectory_groups_P)
Compute advantages for each trajectory, centered within groups.
Parameters:
- trajectory_groups_P (list[TrajectoryGroup]) – Groups of trajectories, where each group's rewards are centered independently.
Returns: list[torch.Tensor]: Per-group advantage tensors of shape (G,), where G is the number of trajectories in each group.