Skip to content

[Models] handle initialization of new layers in a partially pre-trained model better #11279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sayakpaul opened this issue Apr 10, 2025 · 2 comments
Assignees

Comments

@sayakpaul
Copy link
Member

If we do

from diffusers import AutoModel 
import torch 

model = AutoModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev", subfolder="transformer", num_single_layers=40, torch_dtype=torch.bfloat16
).to("cuda")

It will result into

Traceback (most recent call last):
  File "/fsx/sayak/diffusers/check_sharded_model.py", line 6, in <module>
    ).to("cuda")
  File "/fsx/sayak/diffusers/src/diffusers/models/modeling_utils.py", line 1353, in to
    return super().to(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1336, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

@SunMarc since we discussed this in person.

@SunMarc
Copy link
Member

SunMarc commented Apr 10, 2025

Is it the issue where missing keys are not initialized at all, hence resulting in a error when moving the model ?

@sayakpaul
Copy link
Member Author

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants