Tutorial 05-2: Convert to PEFT LoRA Adapter
Prerequisites
Run it interactively
Instead of merging LoRA weights into the base model, you can export a standalone PEFT adapter. This is the preferred approach for serving with vLLM or SGLang, where you keep one base model and hot-swap lightweight adapters.
PEFT format vs merged:
| Merged model | PEFT adapter | |
|---|---|---|
| Size | Full model (GBs) | Just the LoRA matrices (MBs) |
| Deployment | Load like any HF model | Load base model + attach adapter |
| Multi-adapter | One model per adapter | One base + many adapters |
| Use with | Any framework | vLLM --lora-modules, SGLang --lora-paths, PEFT |
Setup: create a checkpoint
First we need a Tinker checkpoint to export. We create a training client, run one step of SFT, and save the weights.
import tinker
from tinker_cookbook import renderers
from tinker_cookbook.supervised.data import conversation_to_datum
from tinker_cookbook.tokenizer_utils import get_tokenizer
BASE_MODEL = "Qwen/Qwen3.5-4B"
service_client = tinker.ServiceClient()
training_client = await service_client.create_lora_training_client_async(
base_model=BASE_MODEL, rank=16
)
_tokenizer = get_tokenizer(BASE_MODEL)
_renderer = renderers.get_renderer("qwen3", _tokenizer)
_messages = [
{"role": "user", "content": "What is Tinker?"},
{"role": "assistant", "content": "Tinker is a cloud training API for LLM fine-tuning."},
]
_datum = conversation_to_datum(_messages, _renderer, max_length=512)
_fwd = await training_client.forward_backward_async([_datum], loss_fn="cross_entropy")
_opt = await training_client.optim_step_async(tinker.AdamParams(learning_rate=1e-4))
await _fwd.result_async()
await _opt.result_async()
_save_result = training_client.save_weights_for_sampler(name="adapter-tutorial")
sampler_path = _save_result.result().path
print(f"Base model: {BASE_MODEL}")
print(f"Checkpoint: {sampler_path}")
Output
Step 1: Download the checkpoint
Use weights.download() to fetch a Tinker checkpoint to local disk.
from tinker_cookbook import weights
adapter_dir = weights.download(
tinker_path=sampler_path,
output_dir="/tmp/tinker-adapter-tutorial/adapter",
)
print(f"Adapter downloaded to: {adapter_dir}")
Step 2: Convert to PEFT format
build_lora_adapter remaps Tinker's internal adapter keys to match the HuggingFace model's parameter names (which serving frameworks expect). No base model weights are downloaded or merged -- this is a lightweight operation.
PEFT_OUTPUT = "./peft_adapter"
weights.build_lora_adapter(
base_model=BASE_MODEL,
adapter_path=adapter_dir,
output_path=PEFT_OUTPUT,
)
print(f"PEFT adapter saved to: {PEFT_OUTPUT}")
Step 3: Inspect the output
The PEFT adapter directory contains just two files:
adapter_config.json-- metadata (base model, rank, alpha, target modules)adapter_model.safetensors-- the LoRA weight matrices
import json
import os
for f in sorted(os.listdir(PEFT_OUTPUT)):
size_mb = os.path.getsize(os.path.join(PEFT_OUTPUT, f)) / 1e6
print(f" {f:40s} {size_mb:>8.2f} MB")
# Show the adapter config
with open(os.path.join(PEFT_OUTPUT, "adapter_config.json")) as fh:
config = json.load(fh)
print("\nadapter_config.json:")
print(json.dumps(config, indent=2))
Output
adapter_config.json 0.00 MB
adapter_model.safetensors 145.89 MB
adapter_config.json:
{
"peft_type": "LORA",
"auto_mapping": null,
"base_model_name_or_path": "Qwen/Qwen3.5-4B",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"lora_alpha": 32,
"lora_dropout": 0.0,
"modules_to_save": null,
"r": 16,
"rank_pattern": {},
"alpha_pattern": {},
"target_modules": [
"down_proj",
"embed_tokens",
"gate_proj",
"in_proj_k",
"in_proj_q",
"in_proj_v",
"in_proj_z",
"k_proj",
"o_proj",
"out_proj",
"q_proj",
"up_proj",
"v_proj"
],
"task_type": "CAUSAL_LM"
}
Loading the adapter
With PEFT / transformers:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "./peft_adapter")
With vLLM (CLI — multi-adapter serving):
With SGLang:
Next steps
- Export a Merged HuggingFace Model -- Merge LoRA into a standalone model
- Publish to HuggingFace Hub -- Upload the adapter with a custom model card