Storage
Training and evaluation data — metrics, checkpoints, rollouts, trajectories — is saved through a unified storage layer. By default, data is written to local disk. To use cloud storage (GCS, S3, Azure), just change the path to a URI.
For cloud support:
Training
Pass a cloud URI as log_dir:
# Local (default)
ml_logger = setup_logging(log_dir="/tmp/my_run", config=config)
# GCS
ml_logger = setup_logging(log_dir="gs://bucket/my_run", config=config)
# S3
ml_logger = setup_logging(log_dir="s3://bucket/my_run", config=config)
All training data — metrics, checkpoints, rollouts — is written to the cloud path automatically.
Evaluation
Pass a cloud URI as save_dir:
# Local
config = BenchmarkConfig(save_dir="/tmp/evals/run1")
# GCS
config = BenchmarkConfig(save_dir="gs://bucket/evals/run1")
result = await run_benchmark("gsm8k", client, renderer, config)
Reading Data Back
from tinker_cookbook.stores import TrainingRunStore, storage_from_uri
# Works with local paths and cloud URIs
store = TrainingRunStore(storage_from_uri("gs://bucket/my_run"))
config = store.read_config()
metrics = store.read_metrics()
rollouts = store.read_rollouts(0)
checkpoints = store.read_checkpoints()
Supported Backends
| URI | Backend |
|---|---|
/local/path |
Local filesystem |
gs://bucket/prefix |
Google Cloud Storage |
s3://bucket/prefix |
Amazon S3 |
az://container/prefix |
Azure Blob Storage |
Any fsspec-supported filesystem can be used.
Flush for cloud backends
Cloud writes are staged locally for performance. Call flush() at checkpoints to ensure data is uploaded:
storage = storage_from_uri("gs://bucket/run")
store = TrainingRunStore(storage)
for step in range(num_steps):
store.write_metrics({"loss": loss}, step=step)
if step % save_every == 0:
storage.flush()
Or use a context manager for automatic flush: