Skip to content

Tutorial 05-2: Convert to PEFT LoRA Adapter

Prerequisites

Run it interactively

curl -O https://raw.githubusercontent.com/thinking-machines-lab/tinker-cookbook/main/tutorials/502_lora_adapter.py && uv run marimo edit 502_lora_adapter.py

Instead of merging LoRA weights into the base model, you can export a standalone PEFT adapter. This is the preferred approach for serving with vLLM or SGLang, where you keep one base model and hot-swap lightweight adapters.

PEFT format vs merged:

Merged model PEFT adapter
Size Full model (GBs) Just the LoRA matrices (MBs)
Deployment Load like any HF model Load base model + attach adapter
Multi-adapter One model per adapter One base + many adapters
Use with Any framework vLLM --lora-modules, SGLang --lora-paths, PEFT

Setup: create a checkpoint

First we need a Tinker checkpoint to export. We create a training client, run one step of SFT, and save the weights.

import tinker
from tinker_cookbook import renderers
from tinker_cookbook.supervised.data import conversation_to_datum
from tinker_cookbook.tokenizer_utils import get_tokenizer

BASE_MODEL = "Qwen/Qwen3.5-4B"

service_client = tinker.ServiceClient()
training_client = await service_client.create_lora_training_client_async(
    base_model=BASE_MODEL, rank=16
)

_tokenizer = get_tokenizer(BASE_MODEL)
_renderer = renderers.get_renderer("qwen3", _tokenizer)
_messages = [
    {"role": "user", "content": "What is Tinker?"},
    {"role": "assistant", "content": "Tinker is a cloud training API for LLM fine-tuning."},
]
_datum = conversation_to_datum(_messages, _renderer, max_length=512)

_fwd = await training_client.forward_backward_async([_datum], loss_fn="cross_entropy")
_opt = await training_client.optim_step_async(tinker.AdamParams(learning_rate=1e-4))
await _fwd.result_async()
await _opt.result_async()

_save_result = training_client.save_weights_for_sampler(name="adapter-tutorial")
sampler_path = _save_result.result().path
print(f"Base model:  {BASE_MODEL}")
print(f"Checkpoint:  {sampler_path}")
Output
Base model:  Qwen/Qwen3.5-4B
Checkpoint:  tinker://040ac13a-4dca-5e21-bfe5-2c1e581ae9d4:train:0/sampler_weights/adapter-tutorial

Step 1: Download the checkpoint

Use weights.download() to fetch a Tinker checkpoint to local disk.

from tinker_cookbook import weights

adapter_dir = weights.download(
    tinker_path=sampler_path,
    output_dir="/tmp/tinker-adapter-tutorial/adapter",
)
print(f"Adapter downloaded to: {adapter_dir}")
Output
Adapter downloaded to: /tmp/tinker-adapter-tutorial/adapter

Step 2: Convert to PEFT format

build_lora_adapter remaps Tinker's internal adapter keys to match the HuggingFace model's parameter names (which serving frameworks expect). No base model weights are downloaded or merged -- this is a lightweight operation.

PEFT_OUTPUT = "./peft_adapter"

weights.build_lora_adapter(
    base_model=BASE_MODEL,
    adapter_path=adapter_dir,
    output_path=PEFT_OUTPUT,
)
print(f"PEFT adapter saved to: {PEFT_OUTPUT}")
Output
PEFT adapter saved to: ./peft_adapter

Step 3: Inspect the output

The PEFT adapter directory contains just two files:

  • adapter_config.json -- metadata (base model, rank, alpha, target modules)
  • adapter_model.safetensors -- the LoRA weight matrices
import json
import os

for f in sorted(os.listdir(PEFT_OUTPUT)):
    size_mb = os.path.getsize(os.path.join(PEFT_OUTPUT, f)) / 1e6
    print(f"  {f:40s} {size_mb:>8.2f} MB")

# Show the adapter config
with open(os.path.join(PEFT_OUTPUT, "adapter_config.json")) as fh:
    config = json.load(fh)
print("\nadapter_config.json:")
print(json.dumps(config, indent=2))
Output
  adapter_config.json                          0.00 MB
  adapter_model.safetensors                  145.89 MB

adapter_config.json:
{
  "peft_type": "LORA",
  "auto_mapping": null,
  "base_model_name_or_path": "Qwen/Qwen3.5-4B",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 32,
  "lora_dropout": 0.0,
  "modules_to_save": null,
  "r": 16,
  "rank_pattern": {},
  "alpha_pattern": {},
  "target_modules": [
    "down_proj",
    "embed_tokens",
    "gate_proj",
    "in_proj_k",
    "in_proj_q",
    "in_proj_v",
    "in_proj_z",
    "k_proj",
    "o_proj",
    "out_proj",
    "q_proj",
    "up_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM"
}

Loading the adapter

With PEFT / transformers:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "./peft_adapter")

With vLLM (CLI — multi-adapter serving):

vllm serve Qwen/Qwen3-8B \
    --lora-modules my_adapter=./peft_adapter

With SGLang:

python -m sglang.launch_server \
    --model Qwen/Qwen3-8B \
    --lora-paths my_adapter=./peft_adapter

Next steps