Skip to content

tinker_cookbook.supervised.SupervisedDatasetFromHFDataset

class tinker_cookbook.supervised.SupervisedDatasetFromHFDataset(SupervisedDataset)

A supervised dataset backed by a HuggingFace dataset.

Parameters:

  • hf_dataset – The HuggingFace dataset to draw rows from.
  • batch_size – Number of rows per batch.
  • map_fn – Function mapping a single row to a Datum. Mutually exclusive with flatmap_fn.
  • flatmap_fn – Function mapping a single row to multiple Datums. Mutually exclusive with map_fn.

get_batch(index)

Return a batch of Datum objects at the given index.

Parameters:

  • index (int) – Zero-based batch index.

Returns: list[tinker.Datum]: Training datums for this batch.

set_epoch(seed)

Shuffle the dataset for a new epoch.

Parameters:

  • seed (int) – Random seed for shuffling. Default 0.