Skip to content

tinker_cookbook.tool_use.build_agent_tool_env

tinker_cookbook.tool_use.build_agent_tool_env(renderer, tools, initial_messages, reward_fn, max_turns, failed_parse_reward, max_trajectory_tokens, max_generation_tokens, context_overflow_reward)

Convenience method to build an EnvFromMessageEnv for tool-using agents.

Parameters:

  • renderer (Renderer) – The renderer for tokenizing messages.
  • tools (list[Tool]) – List of tools the agent can call (must implement Tool protocol).
  • initial_messages (list[Message]) – Initial conversation history (system prompt, user message, etc.).
  • reward_fn (RewardFn) – Function that grades a completed episode. Takes the full message history and returns (reward, metrics). Called once at episode end.
  • max_turns (int) – Maximum turns before episode ends.
  • failed_parse_reward (float) – Reward when model output fails to parse.
  • max_trajectory_tokens (int | None) – Maximum tokens in trajectory before terminating episode.
  • max_generation_tokens (int | None) – Maximum tokens per generation. When set, the episode terminates if the trajectory + generation budget would exceed max_trajectory_tokens, preventing context overflow errors.
  • context_overflow_reward (float) – Reward assigned when the episode is terminated due to context overflow. Defaults to -0.1.

Returns: An EnvFromMessageEnv ready for RL training.