Tutorial 05-1: Export a Merged HuggingFace Model
Prerequisites
Run it interactively
After training a LoRA adapter with Tinker, you typically want a standalone model you can deploy anywhere. This tutorial shows how to merge your LoRA adapter into the base model, producing a complete HuggingFace model directory.
What merging does: During LoRA training, Tinker only updates small low-rank matrices (the adapter). The base model weights stay frozen. Merging adds the adapter deltas back into the base weights: W_merged = W_base + (B @ A) * (alpha / rank). The result is a normal model with no LoRA dependency.
Tinker checkpoint Merged HuggingFace model
+-------------------+ +---------------------------+
| adapter weights | --> | model shards (.safetensors)|
| adapter config | --> | config.json |
+-------------------+ | tokenizer files ... |
+ base model +---------------------------+
(from HF Hub)
Setup: create a checkpoint
First we need a Tinker checkpoint to export. We create a training client, run one step of SFT, and save the weights. In practice, you would use a checkpoint from a real training run.
import tinker
from tinker_cookbook import renderers
from tinker_cookbook.supervised.data import conversation_to_datum
from tinker_cookbook.tokenizer_utils import get_tokenizer
BASE_MODEL = "Qwen/Qwen3.5-4B"
service_client = tinker.ServiceClient()
training_client = await service_client.create_lora_training_client_async(
base_model=BASE_MODEL, rank=16
)
# Build a minimal training example
_tokenizer = get_tokenizer(BASE_MODEL)
_renderer = renderers.get_renderer("qwen3", _tokenizer)
_messages = [
{"role": "user", "content": "What is Tinker?"},
{"role": "assistant", "content": "Tinker is a cloud training API for LLM fine-tuning."},
]
_datum = conversation_to_datum(_messages, _renderer, max_length=512)
# One training step + save
_fwd = await training_client.forward_backward_async([_datum], loss_fn="cross_entropy")
_opt = await training_client.optim_step_async(tinker.AdamParams(learning_rate=1e-4))
await _fwd.result_async()
await _opt.result_async()
_save_result = training_client.save_weights_for_sampler(name="export-tutorial")
sampler_path = _save_result.result().path
print(f"Base model: {BASE_MODEL}")
print(f"Checkpoint: {sampler_path}")
Output
Step 1: Download the checkpoint
Use weights.download() to fetch a Tinker checkpoint to local disk. The tinker_path follows the format tinker://<run_id>/sampler_weights/<name>.
from tinker_cookbook import weights
adapter_dir = weights.download(
tinker_path=sampler_path,
output_dir="/tmp/tinker-export-tutorial/adapter",
)
print(f"Adapter downloaded to: {adapter_dir}")
Step 2: Merge the adapter into a full model
build_hf_model downloads the base model from HuggingFace Hub, applies the LoRA deltas, and saves the merged result.
OUTPUT_PATH = "/tmp/tinker-export-tutorial/merged_model"
weights.build_hf_model(
base_model=BASE_MODEL,
adapter_path=adapter_dir,
output_path=OUTPUT_PATH,
)
print(f"Merged model saved to: {OUTPUT_PATH}")
Step 3: Inspect the output
The output directory is a standard HuggingFace model -- it contains config, tokenizer files, and safetensors shards.
import os
for _f in sorted(os.listdir(OUTPUT_PATH)):
_size_mb = os.path.getsize(os.path.join(OUTPUT_PATH, _f)) / 1e6
print(f" {_f:45s} {_size_mb:>8.1f} MB")
Output
Step 4: Verify the output
The merged model is a standard HuggingFace model — you can load it with transformers, serve it with vLLM, or deploy with any HF-compatible framework:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("./merged_model")
model = AutoModelForCausalLM.from_pretrained("./merged_model", device_map="auto")
inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(output[0], skip_special_tokens=True))
import json
# Verify the config is valid
with open(f"{OUTPUT_PATH}/config.json") as _f:
_config = json.load(_f)
# Some models nest text params under text_config (e.g. vision-language models)
_tc = _config.get("text_config", _config)
print(f"Architecture: {_config.get('architectures', ['unknown'])[0]}")
print(f"Hidden size: {_tc.get('hidden_size', 'unknown')}")
print(f"Num layers: {_tc.get('num_hidden_layers', 'unknown')}")
print(f"Vocab size: {_tc.get('vocab_size', 'unknown')}")
Output
Next steps
- Build a PEFT LoRA Adapter -- Convert to PEFT format for vLLM
--lora-modules - Publish to HuggingFace Hub -- Upload the merged model with a custom model card