Skip to content

tinker_cookbook.rl.RetryOnFailure

class tinker_cookbook.rl.RetryOnFailure(RolloutStrategy)

Retry failed or timed-out trajectories with fresh environments.

Parameters:

  • max_retries – Total retry budget across all trajectories in the group. For example, with max_retries=3 and a group of 8 envs, up to 3 individual trajectory failures will be retried.
  • per_rollout_timeout – Maximum seconds for a single rollout before it is cancelled and retried. 0 disables the timeout (default). Set this to catch sampling requests that hang indefinitely — e.g., per_rollout_timeout=1800 for a 30-minute deadline per rollout.

Fields:

  • max_retries (int) – Default: 3
  • per_rollout_timeout (float) – Default: 0

execute(env_group_builder, policy)

Run rollouts with automatic retry on individual trajectory failures.

Parameters:

  • env_group_builder (EnvGroupBuilder) – Builder used to create (and re-create on retry) environments for this rollout group.
  • policy (TokenCompleter) – The policy used to generate actions.

Returns: RolloutResult: Result containing the successfully completed trajectories, surviving environments, and a list of any errors encountered (including retried ones).

Raises:

  • Exception: Re-raises the failing exception when the retry budget
  • is exhausted, after cancelling all remaining in-flight tasks.