tinker_cookbook.rl.ProblemEnv

class tinker_cookbook.rl.ProblemEnv(Env)

A single-turn Q&A environment that rewards correct answers and valid formatting.

Return the question text for this problem.

Return a reward (0.0 to 1.0) for the model's response.

Parameters:

Returns: bool: Whether the answer is correct.

Return a format compliance reward (0.0 to 1.0).

Parameters:

Returns: bool: Whether the response follows the expected format.

Return the reference answer for logging purposes.

Build the initial prompt from the conversation prefix and question.

Score the model's response for correctness and format compliance.

Parameters: