tinker_cookbook.rl.assemble_training_data
tinker_cookbook.rl.assemble_training_data(trajectory_groups_P, advantages_P)
Convert trajectories to training data format.
Parameters:
- trajectory_groups_P (list[TrajectoryGroup]) – Groups of trajectories to convert into training datums.
- advantages_P (list[torch.Tensor]) – Per-group advantage tensors, one per trajectory group, as returned by :func:
compute_advantages.
Returns: tuple[list[tinker.Datum], list[dict[str, int]]]: A flat list of training datums and a parallel list of metadata dicts mapping each datum back to its group_idx and traj_idx.