You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
venv/lib/python3.12/site-packages/diffusers/models/model_loading_utils.py", line 230, in load_model_dict_into_meta raise ValueError(ValueError: Cannot load because text_model.embeddings.position_embedding.weight expected shape torch.Size([77, 768]), but got torch.Size([248, 768]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.
Describe the bug
There's a hardcode somewhere for 77 tokens, when it should be using the dimensions of what is actually in the model.
I have a diffusers-layout SD1.5 model, with LongCLIP.
https://huggingface.co/opendiffusionai/xllsd-alpha0
I can pull it locally, then convert to single file format, with
python convert_diffusers_to_original_stable_diffusion.py
--use_safetensors
--model_path $SRCM
--checkpoint_path $DESTM
But then if I try to convert it back, I get size errors for the text encoder not being 77 size.
I should point out that the model WORKS PROPERLY for diffusion, when loaded in diffusers format, so I dont have some funky broken model here.
Reproduction
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import StableDiffusionPipeline, AutoencoderKL
import torch
pipe = StableDiffusionPipeline.from_single_file(
"XLLsd-phase0.safetensors",
torch_dtype=torch.float32,
use_safetensors=True)
outname = "XLLsd_recreate"
pipe.save_pretrained(outname, safe_serialization=False)
Logs
System Info
Who can help?
No response
The text was updated successfully, but these errors were encountered: