Skip to content

tinker_cookbook.weights.build_hf_model

tinker_cookbook.weights.build_hf_model(base_model, adapter_path, output_path, dtype, trust_remote_code, merge_strategy, dequantize, quantize, serving_format)

Build a complete HuggingFace model from Tinker LoRA adapter weights.

Parameters:

  • base_model (str) – HuggingFace model name (e.g. "Qwen/Qwen3.5-35B-A3B") or local path to a saved HuggingFace model.
  • adapter_path (str) – Local path to the Tinker adapter directory. Must contain adapter_model.safetensors and adapter_config.json.
  • output_path (str) – Directory where the merged model will be saved. Must not already exist.
  • dtype (str) – Data type for loading the base model. One of "bfloat16" (default), "float16", or "float32". Use "float32" for maximum precision during merge. Only used by merge_strategy="full"; the shard strategy preserves the on-disk dtype of each tensor.
  • trust_remote_code (bool | None) – Whether to trust remote code when loading HF models. Required for some newer model architectures (e.g. Qwen3.5). If None (default), falls back to the HF_TRUST_REMOTE_CODE environment variable, then False.
  • merge_strategy (str) – Controls how the merge is performed. "auto" (default) uses shard-by-shard processing for lower peak memory. "shard" forces shard-by-shard (fails if shards can't be resolved). "full" forces full-model loading (original behavior, higher memory but simpler).
  • dequantize (bool) – If True, dequantize quantized base model weights before merging. Not yet implemented for the standard merge path, but used internally by the quantized export path for models with native FP8 weights (e.g. DeepSeek V3.1).
  • quantize (str | None) – Output quantization method. Currently supported: "experts-fp8" — quantize routed expert weights to FP8 with blockwise scaling. Requires serving_format to be set. None (default) — no quantization.
  • serving_format (str | None) – Serving framework format for quantization metadata. Currently supported: "vllm" — write compressed-tensors config for vLLM. Required when quantize is set. None (default) — no serving-specific metadata.

Raises:

  • FileNotFoundError: If adapter files are missing.
  • FileExistsError: If output_path already exists.
  • KeyError: If adapter config is malformed.
  • ValueError: If tensor shapes are incompatible during merge, or
  • if dtype, merge_strategy, quantize, or
  • serving_format is not a recognized value, or if
  • quantize and serving_format are not both set/unset.
  • NotImplementedError: If dequantize=True on the standard merge path.