value errors in convert to/from diffusers from original stable diffusion #11285

ppbrown · 2025-04-10T17:16:42Z

Describe the bug

There's a hardcode somewhere for 77 tokens, when it should be using the dimensions of what is actually in the model.

I have a diffusers-layout SD1.5 model, with LongCLIP.

https://huggingface.co/opendiffusionai/xllsd-alpha0

I can pull it locally, then convert to single file format, with

python convert_diffusers_to_original_stable_diffusion.py
--use_safetensors
--model_path $SRCM
--checkpoint_path $DESTM

But then if I try to convert it back, I get size errors for the text encoder not being 77 size.

I should point out that the model WORKS PROPERLY for diffusion, when loaded in diffusers format, so I dont have some funky broken model here.

Reproduction

from transformers import CLIPTextModel, CLIPTokenizer

from diffusers import StableDiffusionPipeline, AutoencoderKL
import torch

pipe = StableDiffusionPipeline.from_single_file(
"XLLsd-phase0.safetensors",
torch_dtype=torch.float32,
use_safetensors=True)

outname = "XLLsd_recreate"
pipe.save_pretrained(outname, safe_serialization=False)

Logs

venv/lib/python3.12/site-packages/diffusers/models/model_loading_utils.py", line 230, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load  because text_model.embeddings.position_embedding.weight expected shape torch.Size([77, 768]), but got torch.Size([248, 768]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

System Info

🤗 Diffusers version: 0.32.2
Platform: Linux-6.8.0-55-generic-x86_64-with-glibc2.39
Running on Google Colab?: No
Python version: 3.12.3
PyTorch version (GPU?): 2.6.0+cu124 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.29.3
Transformers version: 4.50.0
Accelerate version: 1.5.2
PEFT version: not installed
Bitsandbytes version: 0.45.2
Safetensors version: 0.5.3
xFormers version: not installed
Accelerator: NVIDIA GeForce RTX 4090, 24564 MiB

Who can help?

No response

ppbrown added the bug Something isn't working label Apr 10, 2025

hlky mentioned this issue Apr 13, 2025

[single file] Detect CLIP max_position_embeddings #11306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

value errors in convert to/from diffusers from original stable diffusion #11285

value errors in convert to/from diffusers from original stable diffusion #11285

ppbrown commented Apr 10, 2025 •

edited

Loading

value errors in convert to/from diffusers from original stable diffusion #11285

value errors in convert to/from diffusers from original stable diffusion #11285

Comments

ppbrown commented Apr 10, 2025 • edited Loading

Describe the bug

Reproduction

Logs

System Info

Who can help?

ppbrown commented Apr 10, 2025 •

edited

Loading