[bitsandbytes] improve dtype mismatch handling for bnb + lora. #11270

sayakpaul · 2025-04-10T04:19:47Z

What does this PR do?

If we try to do:

from diffusers import DiffusionPipeline, FluxControlPipeline
from PIL import Image
import torch

pipe = FluxControlPipeline.from_pretrained("eramth/flux-4bit", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora")

pipe("a dog", control_image=Image.new(mode="RGB", size=(256, 256)))

we will run into

Error

Traceback (most recent call last):
  File "/fsx/sayak/diffusers/bnb_torch_dtype.py", line 8, in <module>
    pipe("a dog", control_image=Image.new(mode="RGB", size=(256, 256)))
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/fsx/sayak/diffusers/src/diffusers/pipelines/flux/pipeline_flux_control.py", line 835, in __call__
    noise_pred = self.transformer(
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/sayak/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 445, in forward
    hidden_states = self.x_embedder(hidden_states)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 712, in forward
    result = self.base_layer(x, *args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: self and mat2 must have the same dtype, but got BFloat16 and Half

This PR fixes that

HuggingFaceDocBuilderDev · 2025-04-10T04:26:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

LGTM ! Just a small question

SunMarc · 2025-04-10T09:20:56Z

src/diffusers/loaders/lora_pipeline.py

+            raise ValueError(
+                f"Model is in {model.dtype} dtype while the current module weight will be dequantized to {module.weight.quant_state.dtype} dtype. "
+                f"Please pass {module.weight.quant_state.dtype} as `torch_dtype` in `from_pretrained()`."
+            )
        module_weight = dequantize_bnb_weight(


since we specified dtype = model.dtype in the dequantize_bnb_weight, won't the module_weights have the same dtype as model ?

Yes it will. But the LoRA params would not be in that dtype as they are derived early from the module_weight data dtype. This is why in the error trace, the error happens in peft.

To summarize, we have the following right ? :

changed loras params using dtype from module_weight (this is maybe where module.weight.quant_state.dtype was used)

dequantized module_weight using dtype from model.dtype (so we are not using module.weight.quant_state.dtype actually no ?). model.dtype value comes from torch_dtype.

-> dtype mismatch issue due to loras param not having the same dtype as module_weight

Yeah. We don't really have any special treatment to handle LoRA param dtype. Ccing @BenjaminBossan here.

dequantized module_weight using dtype from model.dtype (so we are not using module.weight.quant_state.dtype actually no ?). model.dtype value comes from torch_dtype.

Well, we use the quant_state:

diffusers/src/diffusers/quantizers/bitsandbytes/utils.py

Line 173 in ea5a6a8

output_tensor = bnb.functional.dequantize_4bit(weight.data, weight.quant_state)

But then we also perform another type-casting:

diffusers/src/diffusers/quantizers/bitsandbytes/utils.py

Line 189 in ea5a6a8

if dtype:

Just to clarify: This is unrelated to the LoRA parameters. Instead, what happens is that a PEFT LoraLayer wraps the base layer and calls self.base_layer(x), which should just be the result from the original layer. Due to the change in dtype, we will encounter the dtype mismatch there.

from diffusers import DiffusionPipeline, FluxControlPipeline from PIL import Image import torch pipe = FluxControlPipeline.from_pretrained("eramth/flux-4bit", torch_dtype=torch.bfloat16).to("cuda") pipe("a dog", control_image=Image.new(mode="RGB", size=(256, 256)))

This works though.

Yeah, it happens inside the LoRA layer, but what I mean is that the LoRA weights are not involved, it's the call to the base layer that is causing the issue.

sayakpaul · 2025-04-11T08:03:37Z

@BenjaminBossan @SunMarc instead of the current error raising proposal, this solves the issue:

diff --git a/src/diffusers/loaders/lora_pipeline.py b/src/diffusers/loaders/lora_pipeline.py
index 2e241bc9f..080559357 100644
--- a/src/diffusers/loaders/lora_pipeline.py
+++ b/src/diffusers/loaders/lora_pipeline.py
@@ -2367,7 +2367,7 @@ class FluxLoraLoaderMixin(LoraBaseMixin):
                     # TODO: consider if this layer needs to be a quantized layer as well if `is_quantized` is True.
                     with torch.device("meta"):
                         expanded_module = torch.nn.Linear(
-                            in_features, out_features, bias=bias, dtype=module_weight.dtype
+                            in_features, out_features, bias=bias, dtype=transformer.dtype
                         )
                     # Only weights are expanded and biases are not. This is because only the input dimensions
                     # are changed while the output dimensions remain the same. The shape of the weight tensor

Does this work for you?

Currently, we keep the expanded module as nn.Linear even when the underlying model is quantized (such as the case here). But we eventually plan to move to using the respective quantized linear layer (determined by the quantization backend being used).

diffusers/src/diffusers/loaders/lora_pipeline.py

Line 2369 in ea5a6a8

expanded_module = torch.nn.Linear(

BenjaminBossan · 2025-04-11T09:32:52Z

Does this work for you?

If the current situation is just temporary and the proposed change solves the initial issue, then that's fine with me. I wonder if we can always rely on the dtype attribute on the model, there can be more than one dtype, right?

sayakpaul · 2025-04-11T09:40:16Z

I wonder if we can always rely on the dtype attribute on the model, there can be more than one dtype, right?

Eventually, we will resort to the module_weight.dtype solution as that is more precise. But this is a special case for now.

BenjaminBossan · 2025-04-11T15:17:35Z

Wait, I'm confused now, I had understood your comment to suggest that you would rather do that instead of raising an error. But the PR currently still raises the error. 😖

sayakpaul · 2025-04-11T15:38:39Z

My comment suggests the use of model.dtype not module_weight.dtype. For now, that should be okay for the case we're covering. I definitely want to follow what I suggested in #11270 (comment) and eventually move to (through a future PR):

But we eventually plan to move to using the respective quantized linear layer (determined by the quantization backend being used).

This will include

we will resort to the module_weight.dtype solution as that is more precise

Is this clearer now?

BenjaminBossan

Got it now. PR LGTM.

sayakpaul added 2 commits April 10, 2025 09:39

improve dtype mismatch handling for bnb + lora.

5801679

add a test

8bf12e8

sayakpaul requested review from DN6 and SunMarc April 10, 2025 04:19

Merge branch 'main' into improve-dtype-mismatch-bnb-lora

40958f7

SunMarc approved these changes Apr 10, 2025

View reviewed changes

Merge branch 'main' into improve-dtype-mismatch-bnb-lora

58c8183

Merge branch 'main' into improve-dtype-mismatch-bnb-lora

44e4ded

sayakpaul requested review from SunMarc and BenjaminBossan April 11, 2025 11:20

Merge branch 'main' into improve-dtype-mismatch-bnb-lora

fd694b0

BenjaminBossan approved these changes Apr 11, 2025

View reviewed changes

sayakpaul added 2 commits April 14, 2025 15:34

Merge branch 'main' into improve-dtype-mismatch-bnb-lora

2107d4c

fix and updates

8b7ef9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitsandbytes] improve dtype mismatch handling for bnb + lora. #11270

[bitsandbytes] improve dtype mismatch handling for bnb + lora. #11270

sayakpaul commented Apr 10, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 10, 2025

SunMarc left a comment

SunMarc Apr 10, 2025

sayakpaul Apr 10, 2025

SunMarc Apr 10, 2025 •

edited

Loading

sayakpaul Apr 10, 2025

BenjaminBossan Apr 10, 2025

sayakpaul Apr 10, 2025

BenjaminBossan Apr 10, 2025

sayakpaul commented Apr 11, 2025

BenjaminBossan commented Apr 11, 2025

sayakpaul commented Apr 11, 2025

BenjaminBossan commented Apr 11, 2025

sayakpaul commented Apr 11, 2025

BenjaminBossan left a comment

[bitsandbytes] improve dtype mismatch handling for bnb + lora. #11270

Are you sure you want to change the base?

[bitsandbytes] improve dtype mismatch handling for bnb + lora. #11270

Conversation

sayakpaul commented Apr 10, 2025 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 10, 2025

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc Apr 10, 2025

Choose a reason for hiding this comment

sayakpaul Apr 10, 2025

Choose a reason for hiding this comment

SunMarc Apr 10, 2025 • edited Loading

Choose a reason for hiding this comment

sayakpaul Apr 10, 2025

Choose a reason for hiding this comment

BenjaminBossan Apr 10, 2025

Choose a reason for hiding this comment

sayakpaul Apr 10, 2025

Choose a reason for hiding this comment

BenjaminBossan Apr 10, 2025

Choose a reason for hiding this comment

sayakpaul commented Apr 11, 2025

BenjaminBossan commented Apr 11, 2025

sayakpaul commented Apr 11, 2025

BenjaminBossan commented Apr 11, 2025

sayakpaul commented Apr 11, 2025

BenjaminBossan left a comment

Choose a reason for hiding this comment

sayakpaul commented Apr 10, 2025 •

edited

Loading

SunMarc Apr 10, 2025 •

edited

Loading