Skip to content

tinker_cookbook.rl.ProblemEnv

class tinker_cookbook.rl.ProblemEnv(Env)

A single-turn Q&A environment that rewards correct answers and valid formatting.

get_question()

Return the question text for this problem.

check_answer(sample_str)

Return a reward (0.0 to 1.0) for the model's response.

Parameters:

  • sample_str (str) – The decoded text of the model's response.

Returns: bool: Whether the answer is correct.

check_format(sample_str)

Return a format compliance reward (0.0 to 1.0).

Parameters:

  • sample_str (str) – The decoded text of the model's response.

Returns: bool: Whether the response follows the expected format.

get_reference_answer()

Return the reference answer for logging purposes.

initial_observation()

Build the initial prompt from the conversation prefix and question.

step(action, extra)

Score the model's response for correctness and format compliance.

Parameters:

  • action (Action) – Token IDs of the model's response.
  • extra (ActionExtra | None) – Optional action metadata (unused).