MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline #11208

DanilaAniva · 2025-04-04T12:39:10Z

Describe the bug

When using StableDiffusion3ControlNetInpaintingPipeline with SD3MultiControlNetModel, I receive an error:

NotImplementedError: MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline.

Reproduction

Example reproduction code:

import os
import torch
from diffusers.utils import load_image
from diffusers.pipelines import StableDiffusion3ControlNetInpaintingPipeline
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from transformers import T5EncoderModel

# Load images
image = load_image(
    "https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/resolve/main/images/dog.png"
)
mask = load_image(
    "https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/resolve/main/images/dog_mask.png"
)

# Initialize ControlNet models
controlnetA = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Pose")
controlnetB = SD3ControlNetModel.from_pretrained("alimama-creative/SD3-Controlnet-Inpainting", use_safetensors=True, extra_conditioning_channels=1)
controlnet = SD3MultiControlNetModel([controlnetA, controlnetB])

# Load transformer and text encoder
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)
model_id = "stabilityai/stable-diffusion-3.5-large-turbo"
model_nf4 = SD3Transformer2DModel.from_pretrained(model_id, subfolder="transformer", quantization_config=nf4_config, torch_dtype=torch.bfloat16)
t5_nf4 = T5EncoderModel.from_pretrained("diffusers/t5-nf4", torch_dtype=torch.bfloat16)

# Initialize pipeline
pipe = StableDiffusion3ControlNetInpaintingPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large-turbo",
    token=os.getenv("HF_TOKEN"),
    controlnet=controlnet,
    transformer=model_nf4,
    text_encoder_3=t5_nf4,
    torch_dtype=torch.bfloat16
)

pipe.enable_model_cpu_offload()

# This fails with NotImplementedError
result_image = pipe(
    prompt="a cute dog with a hat",
    negative_prompt="low quality, bad anatomy",
    control_image=[image, image],
    num_inference_steps=30,
    guidance_scale=7.5,
    controlnet_conditioning_scale=[1.0, 1.0],
    output_type="pil",
).images[0]

Logs

Error


NotImplementedError: MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline.


Error occurs in `diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet_inpainting.py` at line 1026. *Full error code*:


---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[1], line 41
     38 pipe.enable_model_cpu_offload()
     40 # This fails with NotImplementedError
---> 41 result_image = pipe(
     42     prompt="a cute dog with a hat",
     43     negative_prompt="low quality, bad anatomy",
     44     control_image=[image, image],
     45     num_inference_steps=30,
     46     guidance_scale=7.5,
     47     controlnet_conditioning_scale=[1.0, 1.0],
     48     output_type="pil",
     49 ).images[0]

File ~/miniconda3/envs/bnb310/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/miniconda3/envs/bnb310/lib/python3.10/site-packages/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet_inpainting.py:1026, in StableDiffusion3ControlNetInpaintingPipeline.__call__(self, prompt, prompt_2, prompt_3, height, width, num_inference_steps, sigmas, guidance_scale, control_guidance_start, control_guidance_end, control_image, control_mask, controlnet_conditioning_scale, controlnet_pooled_projections, negative_prompt, negative_prompt_2, negative_prompt_3, num_images_per_prompt, generator, latents, prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, output_type, return_dict, joint_attention_kwargs, clip_skip, callback_on_step_end, callback_on_step_end_tensor_inputs, max_sequence_length)
   1023     width = latent_width * self.vae_scale_factor
   1025 elif isinstance(self.controlnet, SD3MultiControlNetModel):
-> 1026     raise NotImplementedError("MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline.")
   1027 else:
   1028     assert False

NotImplementedError: MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline.


Expected Behavior
I expect `StableDiffusion3ControlNetInpaintingPipeline` to support `SD3MultiControlNetModel`

System Info

Versions

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]
PyTorch version: 2.2.0+cu118
CUDA version: 11.8
Diffusers version: 0.32.2
Transformers version: 4.50.3
Accelerate version: 1.7.0.dev0

Who can help?

@yiyixuxu @sayakpaul

The text was updated successfully, but these errors were encountered:

yiyixuxu · 2025-04-08T00:50:27Z

do you have a use case where you would use multiple controlnet with inpaiting for sd3? cc @asomoza here too
functionally we should be able to support mulicontrolnet

DanilaAniva · 2025-04-08T09:11:11Z

do you have a use case where you would use multiple controlnet with inpaiting for sd3? cc @asomoza here too functionally we should be able to support mulicontrolnet

Yes, I have a specific use case requiring multiple ControlNets with SD3's inpainting capabilities. I would like to use depth control alongside inpainting to better preserve the anatomical features and structure of the original image.

Combining inpainting with depth control sometimes produces better results than using inpainting alone. This approach helps maintain the original image's spatial relationships while targeting specific areas for regeneration.

Here are examples of what I'm trying to do with SD3:

I can use this approach with Flux models:

import torch
from diffusers import FluxControlInpaintPipeline
from diffusers.models.transformers import FluxTransformer2DModel
from transformers import T5EncoderModel
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor
from PIL import Image
import numpy as np

pipe = FluxControlInpaintPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Depth-dev",
    torch_dtype=torch.bfloat16,
)
# GPU optimization code...
pipe.to("cuda")

prompt = "a blue robot singing opera with human-like expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

# Create mask for the robot's head
head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)

# Process depth map
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")

output = pipe(
    prompt=prompt,
    image=image,
    control_image=control_image,
    mask_image=mask_image,
    num_inference_steps=30,
    strength=0.9,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]

Some result example:

I also can use similar functionality with SD 1.5 models:

import torch
import numpy as np
from PIL import Image
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel
from diffusers.utils import make_image_grid
import controlnet_hinter
import cv2

# Setup model
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "...inpaint_model...",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.to("cuda")

# Load images
init_image = Image.open('image.jpg')
mask_img = Image.open('mask.png')

# Prepare mask
mask_array = np.array(mask_img) > 0
mask_array = cv2.resize(mask_array.astype(np.uint8), 
                     (init_image.size[0], init_image.size[1]), 
                     interpolation=cv2.INTER_NEAREST).astype(bool)
mask = Image.fromarray(mask_array.astype(np.uint8) * 255)

# Generate depth map
control_image = controlnet_hinter.hint_depth(init_image)
control_image = control_image.resize(init_image.size)

# Generation configuration
generator = torch.Generator("cuda").manual_seed(42)
config = {
    "negative_prompt": "bad quality, worst quality",
    "num_inference_steps": 30,
    "guidance_scale": 7.5,
    "strength": 0.7,
    "controlnet_conditioning_scale": 0.6,
    "control_guidance_start": 0.6,
    "control_guidance_end": 0.8
}

# Generate image
output = pipe(
    prompt="Elegant blonde woman displaying refined style...",
    image=init_image,
    mask_image=mask,
    control_image=control_image,
    # other parameters...
).images[0]

Some result example:

Additionally, my primary concern with SD3 inpainting is its significant issues with anatomical consistency. When using SD3 inpainting without depth control, I frequently encounter severe distortions in human faces and body proportions that reduce output quality. Adding depth control would help maintain proper structural integrity while inpainting, solving these anatomical problems that are more pronounced in SD3 than previous SD generations

yiyixuxu · 2025-04-08T16:29:04Z

thanks @DanilaAniva
I opened up to community, we will add it too if no one picks this up

DanilaAniva added the bug Something isn't working label Apr 4, 2025

yiyixuxu added help wanted Extra attention is needed Good Example PR contributions-welcome labels Apr 8, 2025

ishan-modi linked a pull request Apr 9, 2025 that will close this issue

[Feature] MultiControlNet support for SD3Impainting #11251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline #11208

MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline #11208

DanilaAniva commented Apr 4, 2025

yiyixuxu commented Apr 8, 2025

DanilaAniva commented Apr 8, 2025

yiyixuxu commented Apr 8, 2025

MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline #11208

MultiControlNetModel is not supported for SD3ControlNetInpaintingPipeline #11208

Comments

DanilaAniva commented Apr 4, 2025

Describe the bug

Reproduction

Logs

System Info

Who can help?

yiyixuxu commented Apr 8, 2025

DanilaAniva commented Apr 8, 2025

yiyixuxu commented Apr 8, 2025