train_on_policy.Config |
|
DatasetWithTeacher |
Pairs a supervised dataset with its teacher model. |
train_off_policy.Config |
Configuration for off-policy distillation with soft teacher targets. |
SDFTBatchProvider |
Protocol for SDFT datasets that return builders alongside golden answers. |
sdft.Config |
Configuration for SDFT training. |
TeacherConfig |
Configuration for a teacher model. |
DistillationDatasetConfig |
Configuration for a dataset used in distillation. |
PromptOnlyEnv |
Environment that only provides prompts with no rewards. |
PromptOnlyDataset |
Dataset that provides prompts without rewards. |
PromptOnlyDatasetBuilder |
Builder for prompt-only datasets. |