Supervised Learning
VLM Image Classification
This recipe will teach you how to train an image classifier powered by vision-language models on tinker.
python -m tinker_cookbook.recipes.vlm_classifier.train \
experiment_dir=./vlm_classifier \
wandb_project=vlm-classifier \
dataset=caltech101 \
renderer_name=qwen3_5_disable_thinking \
model_name=Qwen/Qwen3.6-35B-A3B
Currently, the qwen series of VLMs are supported. Running the above script after installing tinker-cookbook will fine-tune Qwen/Qwen3.6-35B-A3B on the caltech101 as an example.
Evaluation
Once trained, you can evaluate the class predictions from your VLM as follows:
python -m tinker_cookbook.recipes.vlm_classifier.eval \
dataset=caltech101 \
model_path=$YOUR_MODEL_PATH \
model_name=Qwen/Qwen3.6-35B-A3B \
renderer_name=qwen3_5_disable_thinking
This will print the test accuracy of your model.
Custom Datasets
You can add custom datasets by creating a custom SupervisedDatasetBuilder in tinker_cookbook.recipes.vlm_classifier.data if your dataset is available for download on Hugging Face, and has a column with your image, and a column with the image labels (note, you must also define a ClassLabel for mapping integer labels to a human-readable class name).
For more general datasets, you can subclass the base ClassifierDataset to load arbitrary image classification datasets in the provided classifier tooling.
Custom Evaluators
We provide a suite of evaluators in tinker_cookbook.recipes.vlm_classifier.eval for sampling from VLMs, parsing the predicted class name from the response, and computing evaluation metrics.
To define a custom evaluator for a new dataset, you can create a new EvaluatorBuilder if your dataset is available on Hugging Face, or you can subclass ClassifierEvaluator to add an arbitrary custom dataset and parsing strategy.