RL Training Outputs

Each RL training run writes files to log_path. This page describes each file and how to extract data from it.

Files written to `log_path`

File	Format	Contents
`metrics.jsonl`	JSONL	One JSON object per training iteration with all scalar metrics
`config.json`	JSON	Serialized training config (hyperparams, model, dataset, etc.)
`checkpoints.jsonl`	JSONL	Checkpoint metadata (paths, loop state for resume)
`train_iteration_NNNNNN.html`	HTML	Human-readable logtree report for training rollouts
`train_iteration_NNNNNN_logtree.json`	JSON	Machine-readable export of the same logtree trace
`train_iteration_NNNNNN_rollout_summaries.jsonl`	JSONL	One JSON object per trajectory with rewards, metrics, and step-level data
`eval_<name>_iteration_NNNNNN.html`	HTML	Logtree report for eval rollouts
`eval_<name>_iteration_NNNNNN_logtree.json`	JSON	Machine-readable export of eval logtree trace
`eval_<name>_iteration_NNNNNN_rollout_summaries.jsonl`	JSONL	Per-trajectory eval data (for `RLTestSetEvaluator`)
`code.diff`	text	Git diff at the time training started

<name> is the evaluator name (sanitized for filenames); iteration numbers are zero-padded to 6 digits.

`metrics.jsonl`

Each line is a JSON object keyed by metric name. Common keys (varies by env and config):

progress/batch, progress/done_frac — iteration index and completion fraction
env/all/reward/total — mean total reward across all trajectories
env/all/<metric> — env-emitted metrics (e.g., format_parse, correct)
ac_tokens_per_turn — mean generated tokens per turn
entropy — per-token entropy
kl_sample_train_v1, kl_sample_train_v2 — KL divergence estimators
optim/lr — learning rate
time/... — wall-clock timings for different stages

import pandas as pd
 
df = pd.read_json("path/to/metrics.jsonl", lines=True)
df.plot(x="progress/batch", y="env/all/reward/total")

`*_rollout_summaries.jsonl`

One line per trajectory. Best for aggregate analysis (reward distributions, per-step metrics).

import json
 
with open("train_iteration_000010_rollout_summaries.jsonl") as f:
    trajectories = [json.loads(line) for line in f]
 
# Each trajectory has:
# - metadata: schema_version, split, iteration, group_idx, traj_idx, tags, sampling_client_step
# - episode totals: total_reward, final_reward, trajectory_metrics, final_ob_len
# - steps: list of {step_idx, ob_len, ac_len, reward, episode_done, metrics, logs}

`*_logtree.json`

The logtree JSON contains full rollout transcripts: prompts, model responses, grading details, and reward breakdowns. Use this when you need the actual text content of rollouts.

Top level: title, started_at, path, root. root is a tree of nodes, each with tag, attrs, and children (either text strings or nested nodes).

Some nodes carry a data field with structured content. Use data to extract typed data like conversation messages:

import json
 
def find_conversations(node):
    """Recursively find all nodes with conversation data."""
    results = []
    if isinstance(node, dict):
        if node.get("data", {}).get("type") == "conversation":
            results.append(node["data"])
        for child in node.get("children", []):
            if isinstance(child, dict):
                results.extend(find_conversations(child))
    return results
 
with open("eval_test_iteration_000020_logtree.json") as f:
    trace = json.load(f)
 
for conv in find_conversations(trace["root"]):
    for msg in conv["messages"]:
        print(f"{msg['role']}: {msg['content'][:100] if isinstance(msg['content'], str) else '...'}")

Note: num_groups_to_log (default: 4) controls how many trajectory groups get detailed env-level logging. Groups beyond this limit have no rollout content in the logtree — only the Trajectory Details section (turn-level stats) is always present.

`config.json`

Serialized chz config capturing all training hyperparameters. Useful for reproducing a run or comparing configs across experiments.

`checkpoints.jsonl`

Each line records a saved checkpoint with its path and the loop state at save time. Used by the resume logic to pick up where training left off.

Sequence Extension Preferences

RL Training Outputs

Files written to log_path

metrics.jsonl

*_rollout_summaries.jsonl

*_logtree.json

config.json