Replicating DeepCoder with Tinker
Competitive programming problems are a common testbed for RL with LLMs. The recent DeepCoder blog post introduces a dataset and training pipeline for this purpose. This recipe demonstrates a similar setup using Qwen3.5-4B with thinking enabled.
Running This Demo
Sandboxing
Sandboxing is essential for safely executing generated code during training and evaluation. Two sandbox backends are supported:
SandboxFusion (Default)
Sandbox Fusion provides local Docker-based sandboxing. You can start a local sandbox in Docker with:
docker run -it -p 8080:8080 \
-v ${TINKER_COOKBOOK_ROOT}/tinker_cookbook/recipes/code_rl/sandbox_config/local.yaml:/root/sandbox/sandbox/configs/local.yaml \
volcengine/sandbox-fusion:server-20250609
Here, ${TINKER_COOKBOOK_ROOT} is the absolute path to your local tinker-cookbook repository. The training script reads the sandbox endpoint from the SANDBOX_URL environment variable. By default it uses http://localhost:8080/run_code. Example:
If you prefer not to use Docker, you can set up the sandbox manually by following the instructions in the Sandbox Fusion repository.
Modal (Alternative)
Modal provides cloud-based sandboxed execution without local Docker setup. To use Modal:
-
Install the modal extra and authenticate:
-
Set the sandbox backend in your training command:
Optional environment variables for Modal:
- MODAL_POOL_SIZE: Number of concurrent sandboxes (default: 32)
- MODAL_CREATION_RATE_LIMIT: Max sandboxes created per second (default: 4)
Example command
Train a Qwen3.5-4B model with thinking enabled:
python -m tinker_cookbook.recipes.code_rl.train \
model_name="Qwen/Qwen3.5-4B" \
group_size=8 groups_per_batch=128 \
learning_rate=4e-5 \
lora_rank=32 \
max_tokens=24576
After 190 steps of training, you can expect the following performance on LiveCodeBench v6 (2025.02–2025.05):
| Model | Pass@1 | Pass@8 |
|---|---|---|
| Qwen3.5-4B (before training) | 23.6% | 34.3% |
| Qwen3.5-4B (after 190 steps) | 52.4% | 76.0% |
[1] Luo, M., Tan, S., Huang, R., Patel, A., Ariyak, A., Wu, Q., Shi, X., Xin, R., Cai, C., Weber, M., Zhang, C., Li, L. E., Popa, R. A., & Stoica, I. (2025). DeepCoder: A fully open-source 14B coder at O3-mini level.