Skip to content

Commit cefa28f

Browse files
sayakpaulstevhliu
andauthored
[docs] Promote AutoModel usage (#11300)
* docs: promote the usage of automodel. * bitsandbytes * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
1 parent 8819cda commit cefa28f

File tree

8 files changed

+44
-38
lines changed

8 files changed

+44
-38
lines changed

Diff for: docs/source/en/quantization/bitsandbytes.md

+16-16
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
4949
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
5050
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
5151

52-
from diffusers import FluxTransformer2DModel
52+
from diffusers import AutoModel
5353
from transformers import T5EncoderModel
5454

5555
quant_config = TransformersBitsAndBytesConfig(load_in_8bit=True,)
@@ -63,7 +63,7 @@ text_encoder_2_8bit = T5EncoderModel.from_pretrained(
6363

6464
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True,)
6565

66-
transformer_8bit = FluxTransformer2DModel.from_pretrained(
66+
transformer_8bit = AutoModel.from_pretrained(
6767
"black-forest-labs/FLUX.1-dev",
6868
subfolder="transformer",
6969
quantization_config=quant_config,
@@ -74,7 +74,7 @@ transformer_8bit = FluxTransformer2DModel.from_pretrained(
7474
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
7575

7676
```diff
77-
transformer_8bit = FluxTransformer2DModel.from_pretrained(
77+
transformer_8bit = AutoModel.from_pretrained(
7878
"black-forest-labs/FLUX.1-dev",
7979
subfolder="transformer",
8080
quantization_config=quant_config,
@@ -133,7 +133,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
133133
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
134134
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
135135

136-
from diffusers import FluxTransformer2DModel
136+
from diffusers import AutoModel
137137
from transformers import T5EncoderModel
138138

139139
quant_config = TransformersBitsAndBytesConfig(load_in_4bit=True,)
@@ -147,7 +147,7 @@ text_encoder_2_4bit = T5EncoderModel.from_pretrained(
147147

148148
quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True,)
149149

150-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
150+
transformer_4bit = AutoModel.from_pretrained(
151151
"black-forest-labs/FLUX.1-dev",
152152
subfolder="transformer",
153153
quantization_config=quant_config,
@@ -158,7 +158,7 @@ transformer_4bit = FluxTransformer2DModel.from_pretrained(
158158
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
159159

160160
```diff
161-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
161+
transformer_4bit = AutoModel.from_pretrained(
162162
"black-forest-labs/FLUX.1-dev",
163163
subfolder="transformer",
164164
quantization_config=quant_config,
@@ -217,11 +217,11 @@ print(model.get_memory_footprint())
217217
Quantized models can be loaded from the [`~ModelMixin.from_pretrained`] method without needing to specify the `quantization_config` parameters:
218218

219219
```py
220-
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
220+
from diffusers import AutoModel, BitsAndBytesConfig
221221

222222
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
223223

224-
model_4bit = FluxTransformer2DModel.from_pretrained(
224+
model_4bit = AutoModel.from_pretrained(
225225
"hf-internal-testing/flux.1-dev-nf4-pkg", subfolder="transformer"
226226
)
227227
```
@@ -243,13 +243,13 @@ An "outlier" is a hidden state value greater than a certain threshold, and these
243243
To find the best threshold for your model, we recommend experimenting with the `llm_int8_threshold` parameter in [`BitsAndBytesConfig`]:
244244

245245
```py
246-
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
246+
from diffusers import AutoModel, BitsAndBytesConfig
247247

248248
quantization_config = BitsAndBytesConfig(
249249
load_in_8bit=True, llm_int8_threshold=10,
250250
)
251251

252-
model_8bit = FluxTransformer2DModel.from_pretrained(
252+
model_8bit = AutoModel.from_pretrained(
253253
"black-forest-labs/FLUX.1-dev",
254254
subfolder="transformer",
255255
quantization_config=quantization_config,
@@ -305,7 +305,7 @@ NF4 is a 4-bit data type from the [QLoRA](https://hf.co/papers/2305.14314) paper
305305
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
306306
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
307307

308-
from diffusers import FluxTransformer2DModel
308+
from diffusers import AutoModel
309309
from transformers import T5EncoderModel
310310

311311
quant_config = TransformersBitsAndBytesConfig(
@@ -325,7 +325,7 @@ quant_config = DiffusersBitsAndBytesConfig(
325325
bnb_4bit_quant_type="nf4",
326326
)
327327

328-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
328+
transformer_4bit = AutoModel.from_pretrained(
329329
"black-forest-labs/FLUX.1-dev",
330330
subfolder="transformer",
331331
quantization_config=quant_config,
@@ -343,7 +343,7 @@ Nested quantization is a technique that can save additional memory at no additio
343343
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
344344
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
345345

346-
from diffusers import FluxTransformer2DModel
346+
from diffusers import AutoModel
347347
from transformers import T5EncoderModel
348348

349349
quant_config = TransformersBitsAndBytesConfig(
@@ -363,7 +363,7 @@ quant_config = DiffusersBitsAndBytesConfig(
363363
bnb_4bit_use_double_quant=True,
364364
)
365365

366-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
366+
transformer_4bit = AutoModel.from_pretrained(
367367
"black-forest-labs/FLUX.1-dev",
368368
subfolder="transformer",
369369
quantization_config=quant_config,
@@ -379,7 +379,7 @@ Once quantized, you can dequantize a model to its original precision, but this m
379379
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
380380
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
381381

382-
from diffusers import FluxTransformer2DModel
382+
from diffusers import AutoModel
383383
from transformers import T5EncoderModel
384384

385385
quant_config = TransformersBitsAndBytesConfig(
@@ -399,7 +399,7 @@ quant_config = DiffusersBitsAndBytesConfig(
399399
bnb_4bit_use_double_quant=True,
400400
)
401401

402-
transformer_4bit = FluxTransformer2DModel.from_pretrained(
402+
transformer_4bit = AutoModel.from_pretrained(
403403
"black-forest-labs/FLUX.1-dev",
404404
subfolder="transformer",
405405
quantization_config=quant_config,

Diff for: docs/source/en/quantization/torchao.md

+12-9
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ The example below only quantizes the weights to int8.
2626

2727
```python
2828
import torch
29-
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
29+
from diffusers import FluxPipeline, AutoModel, TorchAoConfig
3030

3131
model_id = "black-forest-labs/FLUX.1-dev"
3232
dtype = torch.bfloat16
3333

3434
quantization_config = TorchAoConfig("int8wo")
35-
transformer = FluxTransformer2DModel.from_pretrained(
35+
transformer = AutoModel.from_pretrained(
3636
model_id,
3737
subfolder="transformer",
3838
quantization_config=quantization_config,
@@ -99,10 +99,10 @@ To serialize a quantized model in a given dtype, first load the model with the d
9999

100100
```python
101101
import torch
102-
from diffusers import FluxTransformer2DModel, TorchAoConfig
102+
from diffusers import AutoModel, TorchAoConfig
103103

104104
quantization_config = TorchAoConfig("int8wo")
105-
transformer = FluxTransformer2DModel.from_pretrained(
105+
transformer = AutoModel.from_pretrained(
106106
"black-forest-labs/Flux.1-Dev",
107107
subfolder="transformer",
108108
quantization_config=quantization_config,
@@ -115,9 +115,9 @@ To load a serialized quantized model, use the [`~ModelMixin.from_pretrained`] me
115115

116116
```python
117117
import torch
118-
from diffusers import FluxPipeline, FluxTransformer2DModel
118+
from diffusers import FluxPipeline, AutoModel
119119

120-
transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_int8wo", torch_dtype=torch.bfloat16, use_safetensors=False)
120+
transformer = AutoModel.from_pretrained("/path/to/flux_int8wo", torch_dtype=torch.bfloat16, use_safetensors=False)
121121
pipe = FluxPipeline.from_pretrained("black-forest-labs/Flux.1-Dev", transformer=transformer, torch_dtype=torch.bfloat16)
122122
pipe.to("cuda")
123123

@@ -131,10 +131,10 @@ If you are using `torch<=2.6.0`, some quantization methods, such as `uint4wo`, c
131131
```python
132132
import torch
133133
from accelerate import init_empty_weights
134-
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
134+
from diffusers import FluxPipeline, AutoModel, TorchAoConfig
135135

136136
# Serialize the model
137-
transformer = FluxTransformer2DModel.from_pretrained(
137+
transformer = AutoModel.from_pretrained(
138138
"black-forest-labs/Flux.1-Dev",
139139
subfolder="transformer",
140140
quantization_config=TorchAoConfig("uint4wo"),
@@ -146,10 +146,13 @@ transformer.save_pretrained("/path/to/flux_uint4wo", safe_serialization=False, m
146146
# Load the model
147147
state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
148148
with init_empty_weights():
149-
transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
149+
transformer = AutoModel.from_config("/path/to/flux_uint4wo/config.json")
150150
transformer.load_state_dict(state_dict, strict=True, assign=True)
151151
```
152152

153+
> [!TIP]
154+
> The [`AutoModel`] API is supported for PyTorch >= 2.6 as shown in the examples below.
155+
153156
## Resources
154157

155158
- [TorchAO Quantization API](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md)

Diff for: docs/source/en/quicktour.md

+3
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,9 @@ Models are initiated with the [`~ModelMixin.from_pretrained`] method which also
163163
>>> model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)
164164
```
165165

166+
> [!TIP]
167+
> Use the [`AutoModel`] API to automatically select a model class if you're unsure of which one to use.
168+
166169
To access the model parameters, call `model.config`:
167170

168171
```py

Diff for: docs/source/en/training/adapt_a_model.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ To adapt your text-to-image model for inpainting, you'll need to change the numb
3131
Initialize a [`UNet2DConditionModel`] with the pretrained text-to-image model weights, and change `in_channels` to 9. Changing the number of `in_channels` means you need to set `ignore_mismatched_sizes=True` and `low_cpu_mem_usage=False` to avoid a size mismatch error because the shape is different now.
3232

3333
```py
34-
from diffusers import UNet2DConditionModel
34+
from diffusers import AutoModel
3535

3636
model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
37-
unet = UNet2DConditionModel.from_pretrained(
37+
unet = AutoModel.from_pretrained(
3838
model_id,
3939
subfolder="unet",
4040
in_channels=9,

Diff for: docs/source/en/training/distributed_inference.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -165,10 +165,10 @@ flush()
165165
Load the diffusion transformer next which has 12.5B parameters. This time, set `device_map="auto"` to automatically distribute the model across two 16GB GPUs. The `auto` strategy is backed by [Accelerate](https://hf.co/docs/accelerate/index) and available as a part of the [Big Model Inference](https://hf.co/docs/accelerate/concept_guides/big_model_inference) feature. It starts by distributing a model across the fastest device first (GPU) before moving to slower devices like the CPU and hard drive if needed. The trade-off of storing model parameters on slower devices is slower inference latency.
166166

167167
```py
168-
from diffusers import FluxTransformer2DModel
168+
from diffusers import AutoModel
169169
import torch
170170

171-
transformer = FluxTransformer2DModel.from_pretrained(
171+
transformer = AutoModel.from_pretrained(
172172
"black-forest-labs/FLUX.1-dev",
173173
subfolder="transformer",
174174
device_map="auto",

Diff for: docs/source/en/tutorials/inference_with_big_models.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@ The denoiser checkpoint can also have multiple shards and supports inference tha
3232
For example, let's save a sharded checkpoint for the [SDXL UNet](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main/unet):
3333

3434
```python
35-
from diffusers import UNet2DConditionModel
35+
from diffusers import AutoModel
3636

37-
unet = UNet2DConditionModel.from_pretrained(
37+
unet = AutoModel.from_pretrained(
3838
"stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet"
3939
)
4040
unet.save_pretrained("sdxl-unet-sharded", max_shard_size="5GB")
@@ -43,10 +43,10 @@ unet.save_pretrained("sdxl-unet-sharded", max_shard_size="5GB")
4343
The size of the fp32 variant of the SDXL UNet checkpoint is ~10.4GB. Set the `max_shard_size` parameter to 5GB to create 3 shards. After saving, you can load them in [`StableDiffusionXLPipeline`]:
4444

4545
```python
46-
from diffusers import UNet2DConditionModel, StableDiffusionXLPipeline
46+
from diffusers import AutoModel, StableDiffusionXLPipeline
4747
import torch
4848

49-
unet = UNet2DConditionModel.from_pretrained(
49+
unet = AutoModel.from_pretrained(
5050
"sayakpaul/sdxl-unet-sharded", torch_dtype=torch.float16
5151
)
5252
pipeline = StableDiffusionXLPipeline.from_pretrained(

Diff for: docs/source/en/using-diffusers/loading_adapters.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ The [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] method loads L
134134
- the LoRA weights don't have separate identifiers for the UNet and text encoder
135135
- the LoRA weights have separate identifiers for the UNet and text encoder
136136

137-
To directly load (and save) a LoRA adapter at the *model-level*, use [`~PeftAdapterMixin.load_lora_adapter`], which builds and prepares the necessary model configuration for the adapter. Like [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`], [`PeftAdapterMixin.load_lora_adapter`] can load LoRAs for both the UNet and text encoder. For example, if you're loading a LoRA for the UNet, [`PeftAdapterMixin.load_lora_adapter`] ignores the keys for the text encoder.
137+
To directly load (and save) a LoRA adapter at the *model-level*, use [`~loaders.PeftAdapterMixin.load_lora_adapter`], which builds and prepares the necessary model configuration for the adapter. Like [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`], [`~loaders.PeftAdapterMixin.load_lora_adapter`] can load LoRAs for both the UNet and text encoder. For example, if you're loading a LoRA for the UNet, [`~loaders.PeftAdapterMixin.load_lora_adapter`] ignores the keys for the text encoder.
138138

139139
Use the `weight_name` parameter to specify the specific weight file and the `prefix` parameter to filter for the appropriate state dicts (`"unet"` in this case) to load.
140140

@@ -155,7 +155,7 @@ image
155155
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_attn_proc.png" />
156156
</div>
157157

158-
Save an adapter with [`~PeftAdapterMixin.save_lora_adapter`].
158+
Save an adapter with [`~loaders.PeftAdapterMixin.save_lora_adapter`].
159159

160160
To unload the LoRA weights, use the [`~loaders.StableDiffusionLoraLoaderMixin.unload_lora_weights`] method to discard the LoRA weights and restore the model to its original weights:
161161

Diff for: docs/source/en/using-diffusers/merge_loras.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -66,10 +66,10 @@ Let's dive deeper into what these steps entail.
6666
1. Load a UNet that corresponds to the UNet in the LoRA checkpoint. In this case, both LoRAs use the SDXL UNet as their base model.
6767

6868
```python
69-
from diffusers import UNet2DConditionModel
69+
from diffusers import AutoModel
7070
import torch
7171

72-
unet = UNet2DConditionModel.from_pretrained(
72+
unet = AutoModel.from_pretrained(
7373
"stabilityai/stable-diffusion-xl-base-1.0",
7474
torch_dtype=torch.float16,
7575
use_safetensors=True,
@@ -136,7 +136,7 @@ feng_peft_model.load_state_dict(original_state_dict, strict=True)
136136
```python
137137
from peft import PeftModel
138138

139-
base_unet = UNet2DConditionModel.from_pretrained(
139+
base_unet = AutoModel.from_pretrained(
140140
"stabilityai/stable-diffusion-xl-base-1.0",
141141
torch_dtype=torch.float16,
142142
use_safetensors=True,

0 commit comments

Comments
 (0)