diffusers generally splits fused QKV projections to separate Q, K, V projections in its checkpoint format (for example, for diffusion transformer models). I am interested in collecting feedback on this design choice and any problems which may arise from it. To be honest upfront, I think it is unlikely that we will change this design in the near future, but we would appreciate your feedback nevertheless.
diffusersgenerally splits fused QKV projections to separate Q, K, V projections in its checkpoint format (for example, for diffusion transformer models). I am interested in collecting feedback on this design choice and any problems which may arise from it. To be honest upfront, I think it is unlikely that we will change this design in the near future, but we would appreciate your feedback nevertheless.