StepResult |
Result returned by :meth:Env.step. |
Transition |
A single (observation, action, reward) tuple from a trajectory. |
ActionExtra |
Extra metadata passed alongside an action to :meth:Env.step. |
Env |
Stateful environment that a single agent interacts with. |
Trajectory |
A complete episode: a sequence of transitions from one agent in one environment. |
RolloutError |
A captured error from a failed trajectory rollout. |
EnvGroupBuilder |
Builds a group of environments. The group will be used in the following way: |
TrajectoryGroup |
A group of trajectories produced by one :class:EnvGroupBuilder. |
RLDataset |
A dataset that produces batches of :class:EnvGroupBuilder instances. |
RLDatasetBuilder |
Abstract builder that constructs training and optional test RL datasets. |
ProblemEnv |
A single-turn Q&A environment that rewards correct answers and valid formatting. |
ProblemGroupBuilder |
Builds a group of ProblemEnv instances from a factory callable. |
MessageStepResult |
Result of a message-level step. |
MessageEnv |
Abstract base class for message-level environments. |
EnvFromMessageEnv |
Adapter that wraps a MessageEnv to implement the token-level Env interface. |
RolloutStrategy |
Controls how trajectories are collected from a group of environments. |
FailFast |
Default strategy: any trajectory error crashes the group. |
RetryOnFailure |
Retry failed trajectories with fresh environments. |