Skip to content

Quick Start

Installation

uv pip install tinker

Set your API key (get one from the Tinker Console):

export TINKER_API_KEY="your-api-key-here"

This gives you the Python SDK (import tinker) and the CLI (tinker run list, tinker checkpoint download).


This page walks through the two main LLM fine-tuning workflows — supervised fine-tuning (SFT) and reinforcement learning (RL) — showing how each step maps to Tinker SDK calls.

Supervised Fine-Tuning (SFT)

SFT trains a model to imitate examples:

ServiceClientconnect

TrainingClientcreate_lora

forward_backward"cross_entropy"

optim_stepupdate weights

SamplingClientsave_weights

sampleevaluate
  1. Create clients — connect to Tinker, create a TrainingClient
  2. Prepare data — tokenize examples into Datum objects with loss masks
  3. Trainforward_backward (gradients) + optim_step (update weights)
  4. Evaluate — save weights, create a SamplingClient, sample

Reinforcement Learning (RL)

RL trains a model to maximize a reward signal:

TrainingClientcreate_lora

SamplingClienton-policy

samplerollouts

reward+ logprobs

forward_backward"importance_sampling"

optim_stepupdate

repeat

  1. Create clients — connect to Tinker, create a TrainingClient
  2. Get on-policy samplersave_weights_and_get_sampling_client
  3. Sample rollouts — generate completions from the on-policy model
  4. Score — compute rewards and log-probabilities
  5. Trainforward_backward with RL loss + optim_step
  6. Repeat — new weights → new SamplingClient → sample again

API Cheatsheet

Create clients

import tinker
from tinker import types

# Entry point — reads TINKER_API_KEY from environment
service_client = tinker.ServiceClient()

# Training client (LoRA fine-tuning)
training_client = service_client.create_lora_training_client(
    base_model="Qwen/Qwen3-8B", rank=32
)

# Sampling client (text generation)
sampling_client = service_client.create_sampling_client(
    base_model="Qwen/Qwen3-8B"
)

# Tokenizer
tokenizer = training_client.get_tokenizer()

Subprocess sampling

SamplingClient is picklable — you can pass it to other processes for parallel sampling. If your training loop has CPU-heavy work (grading, environment logic), set TINKER_SUBPROCESS_SAMPLING=1 to run sample() and compute_logprobs() in a dedicated subprocess, preventing GIL contention.

Prepare training data

A Datum is a single training example. It contains input tokens, target tokens, and per-token loss weights (0 = ignore, 1 = compute loss).

datum = types.Datum(
    model_input=types.ModelInput.from_ints(tokens=input_tokens),
    loss_fn_inputs=dict(
        weights=weights,           # 0 for prompt, 1 for completion
        target_tokens=target_tokens  # shifted by 1 from input
    )
)

For RL losses (importance_sampling, PPO, CISPO), the loss_fn_inputs also includes logprobs and advantages:

rl_datum = types.Datum(
    model_input=types.ModelInput.from_ints(tokens=tokens),
    loss_fn_inputs=dict(
        target_tokens=target_tokens,
        weights=weights,
        logprobs=sampling_logprobs,  # from the rollout policy
        advantages=advantages,        # reward - baseline
    )
)

Sample text

prompt = types.ModelInput.from_ints(tokenizer.encode("The capital of France is"))
params = types.SamplingParams(max_tokens=50, temperature=0.7, stop=["\n"])

# Single sample — sample_async returns the result directly
result = await sampling_client.sample_async(prompt=prompt, num_samples=1, sampling_params=params)
print(tokenizer.decode(result.sequences[0].tokens))

# Multiple samples in one call
result = await sampling_client.sample_async(prompt=prompt, num_samples=8, sampling_params=params)
for seq in result.sequences:
    print(tokenizer.decode(seq.tokens))

Compute log-probabilities

Used for scoring in RL (comparing the training policy against the sampling policy).

# Prompt logprobs
result = await sampling_client.sample_async(
    prompt=prompt, num_samples=1,
    sampling_params=types.SamplingParams(max_tokens=1),
    include_prompt_logprobs=True,
)
print(result.prompt_logprobs)  # [None, -9.5, -1.6, ...]

# Shorthand — compute_logprobs_async returns the result directly
logprobs = await sampling_client.compute_logprobs_async(prompt)

# Top-k logprobs (for distillation)
result = await sampling_client.sample_async(
    prompt=prompt, num_samples=1,
    sampling_params=types.SamplingParams(max_tokens=1),
    include_prompt_logprobs=True,
    topk_prompt_logprobs=5,
)
print(result.topk_prompt_logprobs)  # [None, [(token_id, logprob), ...], ...]

Forward-backward

Computes gradients for the given data and loss function. Returns immediately with a future.

# SFT: cross-entropy loss
fwdbwd_future = await training_client.forward_backward_async(data=[datum], loss_fn="cross_entropy")
fwdbwd_result = await fwdbwd_future.result_async()
print(f"Loss: {fwdbwd_result.loss}")

# RL losses
fwdbwd_future = await training_client.forward_backward_async(data, "importance_sampling")
fwdbwd_future = await training_client.forward_backward_async(data, "ppo")
fwdbwd_future = await training_client.forward_backward_async(data, "cispo")
fwdbwd_future = await training_client.forward_backward_async(data, "dro")

# Custom loss
fwdbwd_future = await training_client.forward_backward_custom_async(data, my_loss_fn)

See Loss Functions for the math behind each loss.

Optimizer step

Updates model weights using the gradients from the last forward_backward.

optim_future = await training_client.optim_step_async(
    types.AdamParams(learning_rate=1e-4)
)
await optim_future.result_async()

Save and load weights

# Save weights → get a sampling client for evaluation
sampling_client = training_client.save_weights_and_get_sampling_client(name="checkpoint-1")

# Save full state (weights + optimizer) for resuming
training_client.save_state(name="step-100")

# Resume from weights only
training_client = await service_client.create_training_client_from_state_async(
    path="tinker://run-id/sampler_weights/checkpoint-1"
)

# Resume with optimizer state
training_client = await service_client.create_training_client_from_state_with_optimizer_async(
    path="tinker://run-id/weights/step-100"
)

Vision inputs

import requests

image_data = requests.get("https://example.com/image.png").content

model_input = tinker.ModelInput(chunks=[
    types.EncodedTextChunk(tokens=tokenizer.encode("<|im_start|>user\n<|vision_start|>")),
    types.ImageChunk(data=image_data, format="png"),
    types.EncodedTextChunk(tokens=tokenizer.encode("<|vision_end|>Describe this image<|im_end|>\n<|im_start|>assistant\n")),
])

Concurrent requests

import asyncio

# Multiple samples from the same prompt — just increase num_samples
result = await sampling_client.sample_async(prompt=prompt, num_samples=16, sampling_params=params)
for seq in result.sequences:  # 16 independent completions
    print(tokenizer.decode(seq.tokens))

# Multiple different prompts in parallel — use asyncio.gather
# sample_async returns results directly, so gather gives you the results
results = await asyncio.gather(
    sampling_client.sample_async(prompt=prompt1, num_samples=1, sampling_params=params),
    sampling_client.sample_async(prompt=prompt2, num_samples=1, sampling_params=params),
    sampling_client.sample_async(prompt=prompt3, num_samples=1, sampling_params=params),
)

# Pipeline training: overlap forward-backward with optimizer step
fwdbwd = await training_client.forward_backward_async(batch1, "cross_entropy")
optim_future = training_client.optim_step(types.AdamParams(learning_rate=1e-4))
next_fwdbwd = await training_client.forward_backward_async(batch2, "cross_entropy")
await optim_future

See Clock Cycles & Pipelining for more on throughput optimization.


Next steps