tinker_cookbook.stores.EvalStore
class tinker_cookbook.stores.EvalStore(**)
Manages evaluation runs across checkpoints.
url(path)
Return a human-readable URI for a path within this eval store.
Parameters:
- path (str)
create_run(model_name, benchmarks, checkpoint_path, checkpoint_name, config, run_id)
Create a new evaluation run and return its run_id.
Parameters:
- model_name (str)
- benchmarks (list[str])
- checkpoint_path (str | None)
- checkpoint_name (str | None)
- config (dict | None)
- run_id (str | None)
run_dir(run_id)
Return filesystem path for backward compat with BenchmarkConfig.save_dir.
Parameters:
- run_id (str)
finalize_run(run_id)
Collect scores from benchmark results and update metadata.
Parameters:
- run_id (str)
list_runs()
List all evaluation runs, most recent first.
read_run(run_id)
Load metadata for a specific run. Raises FileNotFoundError if missing.
Parameters:
- run_id (str)
list_benchmarks(run_id)
List benchmark names that have results for a run.
Parameters:
- run_id (str)
read_result(run_id, benchmark)
Get aggregated result for a benchmark.
Parameters:
- run_id (str)
- benchmark (str)
read_trajectories(run_id, benchmark, correct_only, incorrect_only, errors_only)
Get trajectories with optional filtering.
Parameters:
- run_id (str)
- benchmark (str)
- correct_only (bool)
- incorrect_only (bool)
- errors_only (bool)
read_single_trajectory(run_id, benchmark, idx)
Get a single trajectory by index (O(n) scan — loads all trajectories).
Parameters:
- run_id (str)
- benchmark (str)
- idx (int)
read_summary(run_id)
Read the combined summary for a run, or None if missing.
Parameters:
- run_id (str)
write_result(run_id, result)
Save a benchmark result.
Parameters:
- run_id (str)
- result (BenchmarkResult)
write_trajectory(run_id, benchmark, traj)
Append one trajectory to the JSONL file.
Parameters:
- run_id (str)
- benchmark (str)
- traj (StoredTrajectory)
write_summary(run_id, results)
Save a combined summary.
Parameters:
- run_id (str)
- results (dict[str, BenchmarkResult])
delete_run(run_id)
Delete all data for a run. Idempotent (no error if already gone).
Parameters:
- run_id (str)
alist_runs()
Async version of :meth:list_runs.
aread_trajectories(run_id, benchmark)
Async version of :meth:read_trajectories.
Parameters:
- run_id (str)
- benchmark (str)
aread_result(run_id, benchmark)
Async version of :meth:read_result.
Parameters:
- run_id (str)
- benchmark (str)