tinker_cookbook.rl.ProblemEnv
class tinker_cookbook.rl.ProblemEnv(Env)
A single-turn Q&A environment that rewards correct answers and valid formatting.
get_question()
Return the question text for this problem.
check_answer(sample_str)
Return a reward (0.0 to 1.0) for the model's response.
Parameters:
- sample_str (str) – The decoded text of the model's response.
Returns: bool: Whether the answer is correct.
check_format(sample_str)
Return a format compliance reward (0.0 to 1.0).
Parameters:
- sample_str (str) – The decoded text of the model's response.
Returns: bool: Whether the response follows the expected format.
get_reference_answer()
Return the reference answer for logging purposes.
initial_observation()
Build the initial prompt from the conversation prefix and question.
step(action, extra)
Score the model's response for correctness and format compliance.
Parameters:
- action (Action) – Token IDs of the model's response.
- extra (ActionExtra | None) – Optional action metadata (unused).