[Models] handle initialization of new layers in a partially pre-trained model better #11279

sayakpaul · 2025-04-10T08:54:14Z

If we do

from diffusers import AutoModel 
import torch 

model = AutoModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev", subfolder="transformer", num_single_layers=40, torch_dtype=torch.bfloat16
).to("cuda")

It will result into

Traceback (most recent call last):
  File "/fsx/sayak/diffusers/check_sharded_model.py", line 6, in <module>
    ).to("cuda")
  File "/fsx/sayak/diffusers/src/diffusers/models/modeling_utils.py", line 1353, in to
    return super().to(*args, **kwargs)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1336, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

@SunMarc since we discussed this in person.

SunMarc · 2025-04-10T09:23:29Z

Is it the issue where missing keys are not initialized at all, hence resulting in a error when moving the model ?

sayakpaul · 2025-04-10T10:07:51Z

Yes!

sayakpaul assigned SunMarc Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Models] handle initialization of new layers in a partially pre-trained model better #11279

[Models] handle initialization of new layers in a partially pre-trained model better #11279

sayakpaul commented Apr 10, 2025

SunMarc commented Apr 10, 2025

sayakpaul commented Apr 10, 2025

[Models] handle initialization of new layers in a partially pre-trained model better #11279

[Models] handle initialization of new layers in a partially pre-trained model better #11279

Comments

sayakpaul commented Apr 10, 2025

SunMarc commented Apr 10, 2025

sayakpaul commented Apr 10, 2025