tinker_cookbook.preference.Config
class tinker_cookbook.preference.Config(**)
Configuration for Direct Preference Optimization (DPO) training.
Fields:
- log_path (str)
- model_name (str)
- dataset_builder (ChatDatasetBuilder)
- load_checkpoint_path (str | None) – Default:
None - renderer_name (str | None) – Default:
None - learning_rate (float) – Default:
1e-05 - lr_schedule (LRSchedule) – Default:
'linear' - num_epochs (int) – Default:
1 - dpo_beta (float) – Default:
0.1 - lora_rank (int) – Default:
32 - num_replicas (int) – Default:
8 - base_url (str | None) – Default:
None - evaluator_builders (list[EvaluatorBuilder]) – Checkpointing and evaluation (0 = disabled for *_every fields) Default:
[]. - infrequent_evaluator_builders (list[EvaluatorBuilder]) – Default:
[] - save_every (int) – Default:
20 - eval_every (int) – Default:
10 - infrequent_eval_every (int) – Default:
100 - ttl_seconds (int | None) – 7 days Default:
604800. - rolling_save_every (int) – but skips the sampler-weight export, making it cheaper than periodic checkpoints. Default:
0. - rolling_ttl_seconds (int) – 2 hours Default:
7200. - adam_beta1 (float) – Default:
0.9 - adam_beta2 (float) – Default:
0.95 - adam_eps (float) – Default:
1e-08 - wandb_project (str | None) – Default:
None - wandb_name (str | None) – Default:
None - enable_trace (bool) – Profiling Default:
False. - span_chart_every (int) – Default:
0 - reference_model_name (str | None) – Default:
None - max_steps (int | None) – Maximum number of training steps. If None, train for num_epochs * n_batches. Default:
None.