tinker_cookbook.tool_use.build_agent_tool_env
tinker_cookbook.tool_use.build_agent_tool_env(renderer, tools, initial_messages, reward_fn, max_turns, failed_parse_reward, max_trajectory_tokens, max_generation_tokens, context_overflow_reward)
Convenience method to build an EnvFromMessageEnv for tool-using agents.
Parameters:
- renderer (Renderer) – The renderer for tokenizing messages.
- tools (list[Tool]) – List of tools the agent can call (must implement Tool protocol).
- initial_messages (list[Message]) – Initial conversation history (system prompt, user message, etc.).
- reward_fn (RewardFn) – Function that grades a completed episode. Takes the full message history and returns (reward, metrics). Called once at episode end.
- max_turns (int) – Maximum turns before episode ends.
- failed_parse_reward (float) – Reward when model output fails to parse.
- max_trajectory_tokens (int | None) – Maximum tokens in trajectory before terminating episode.
- max_generation_tokens (int | None) – Maximum tokens per generation. When set, the episode terminates if the trajectory + generation budget would exceed max_trajectory_tokens, preventing context overflow errors.
- context_overflow_reward (float) – Reward assigned when the episode is terminated due to context overflow. Defaults to -0.1.
Returns: An EnvFromMessageEnv ready for RL training.