Describe the bug
_convert_mixture_state_dict_to_diffusers converts flat Kohya-style keys like lora_transformer_transformer_blocks_0_<sublayer> into dotted diffusers paths. For each key it first sets:
diffusers_key = f"transformer_blocks.{i}"
and only appends a sublayer suffix (.attn.to_q, .attn.to_out.0, etc.) when "attn_" in k. Any non-attention sublayer in the block (e.g. norm1.linear, proj_mlp, ff.net, ...) has no matching branch, so diffusers_key is left as the bare block path transformer_blocks.{i} — a container module, not a leaf parameter. The converter still writes this as transformer_blocks.{i}.lora_A.weight / .lora_B.weight.
Downstream, _maybe_expand_lora_state_dict strips the LoRA suffix to look up the base parameter:
base_weight_param = transformer_state_dict[base_param_name] # base_param_name = "transformer_blocks.0.weight"
This raises KeyError, since no such parameter exists.
Reproduction
Load any Kohya-style LoRA (prefix lora_transformer_..., the format handled by _convert_mixture_state_dict_to_diffusers) into a Flux- or Chroma-based pipeline where the LoRA touches a non-attention sublayer (e.g. norm1.linear, proj_mlp) of a transformer block:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("<flux-or-chroma-repo>", torch_dtype=torch.bfloat16)
pipe.load_lora_weights("<path-to-lora-with-non-attention-sublayer-keys>.safetensors")
Logs
Traceback (most recent call last):
File ".../diffusers/loaders/lora_pipeline.py", line 1691, in load_lora_weights
transformer_lora_state_dict = self._maybe_expand_lora_state_dict(
File ".../diffusers/loaders/lora_pipeline.py", line 2203, in _maybe_expand_lora_state_dict
base_weight_param = transformer_state_dict[base_param_name]
KeyError: 'transformer_blocks.0.weight'
System Info
diffusers version: 0.38.0.dev0 (main)
- Python: 3.12.3
- Platform: Linux-6.8.0-36-generic-x86_64-with-glibc2.39
Who can help?
@sayakpaul @yiyixuxu
Drafted by Claude
Describe the bug
_convert_mixture_state_dict_to_diffusersconverts flat Kohya-style keys likelora_transformer_transformer_blocks_0_<sublayer>into dotted diffusers paths. For each key it first sets:and only appends a sublayer suffix (
.attn.to_q,.attn.to_out.0, etc.) when"attn_" in k. Any non-attention sublayer in the block (e.g.norm1.linear,proj_mlp,ff.net, ...) has no matching branch, sodiffusers_keyis left as the bare block pathtransformer_blocks.{i}— a container module, not a leaf parameter. The converter still writes this astransformer_blocks.{i}.lora_A.weight/.lora_B.weight.Downstream,
_maybe_expand_lora_state_dictstrips the LoRA suffix to look up the base parameter:This raises
KeyError, since no such parameter exists.Reproduction
Load any Kohya-style LoRA (prefix
lora_transformer_..., the format handled by_convert_mixture_state_dict_to_diffusers) into a Flux- or Chroma-based pipeline where the LoRA touches a non-attention sublayer (e.g.norm1.linear,proj_mlp) of a transformer block:Logs
System Info
diffusersversion: 0.38.0.dev0 (main)Who can help?
@sayakpaul @yiyixuxu
Drafted by Claude