JunhaoZhuang commited on
Commit
edf9d60
·
verified ·
1 Parent(s): 23e7e6a

Upload 317 files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. diffusers/docs/.DS_Store +0 -0
  3. diffusers/docs/README.md +268 -0
  4. diffusers/docs/TRANSLATING.md +69 -0
  5. diffusers/docs/source/_config.py +9 -0
  6. diffusers/docs/source/en/_toctree.yml +540 -0
  7. diffusers/docs/source/en/advanced_inference/outpaint.md +231 -0
  8. diffusers/docs/source/en/api/activations.md +27 -0
  9. diffusers/docs/source/en/api/attnprocessor.md +54 -0
  10. diffusers/docs/source/en/api/configuration.md +30 -0
  11. diffusers/docs/source/en/api/image_processor.md +35 -0
  12. diffusers/docs/source/en/api/internal_classes_overview.md +15 -0
  13. diffusers/docs/source/en/api/loaders/ip_adapter.md +29 -0
  14. diffusers/docs/source/en/api/loaders/lora.md +47 -0
  15. diffusers/docs/source/en/api/loaders/peft.md +25 -0
  16. diffusers/docs/source/en/api/loaders/single_file.md +62 -0
  17. diffusers/docs/source/en/api/loaders/textual_inversion.md +27 -0
  18. diffusers/docs/source/en/api/loaders/unet.md +27 -0
  19. diffusers/docs/source/en/api/logging.md +96 -0
  20. diffusers/docs/source/en/api/models/asymmetricautoencoderkl.md +60 -0
  21. diffusers/docs/source/en/api/models/aura_flow_transformer2d.md +19 -0
  22. diffusers/docs/source/en/api/models/autoencoder_oobleck.md +38 -0
  23. diffusers/docs/source/en/api/models/autoencoder_tiny.md +57 -0
  24. diffusers/docs/source/en/api/models/autoencoderkl.md +58 -0
  25. diffusers/docs/source/en/api/models/autoencoderkl_cogvideox.md +37 -0
  26. diffusers/docs/source/en/api/models/cogvideox_transformer3d.md +30 -0
  27. diffusers/docs/source/en/api/models/consistency_decoder_vae.md +30 -0
  28. diffusers/docs/source/en/api/models/controlnet.md +50 -0
  29. diffusers/docs/source/en/api/models/controlnet_flux.md +45 -0
  30. diffusers/docs/source/en/api/models/controlnet_hunyuandit.md +37 -0
  31. diffusers/docs/source/en/api/models/controlnet_sd3.md +42 -0
  32. diffusers/docs/source/en/api/models/controlnet_sparsectrl.md +46 -0
  33. diffusers/docs/source/en/api/models/dit_transformer2d.md +19 -0
  34. diffusers/docs/source/en/api/models/flux_transformer.md +19 -0
  35. diffusers/docs/source/en/api/models/hunyuan_transformer2d.md +20 -0
  36. diffusers/docs/source/en/api/models/latte_transformer3d.md +19 -0
  37. diffusers/docs/source/en/api/models/lumina_nextdit2d.md +20 -0
  38. diffusers/docs/source/en/api/models/overview.md +28 -0
  39. diffusers/docs/source/en/api/models/pixart_transformer2d.md +19 -0
  40. diffusers/docs/source/en/api/models/prior_transformer.md +27 -0
  41. diffusers/docs/source/en/api/models/sd3_transformer2d.md +19 -0
  42. diffusers/docs/source/en/api/models/stable_audio_transformer.md +19 -0
  43. diffusers/docs/source/en/api/models/stable_cascade_unet.md +19 -0
  44. diffusers/docs/source/en/api/models/transformer2d.md +41 -0
  45. diffusers/docs/source/en/api/models/transformer_temporal.md +23 -0
  46. diffusers/docs/source/en/api/models/unet-motion.md +25 -0
  47. diffusers/docs/source/en/api/models/unet.md +25 -0
  48. diffusers/docs/source/en/api/models/unet2d-cond.md +31 -0
  49. diffusers/docs/source/en/api/models/unet2d.md +25 -0
  50. diffusers/docs/source/en/api/models/unet3d-cond.md +25 -0
.gitattributes CHANGED
@@ -63,3 +63,4 @@ examples/shadow/example1/reference_image_4.png filter=lfs diff=lfs merge=lfs -te
63
  examples/shadow/example1/reference_image_5.png filter=lfs diff=lfs merge=lfs -text
64
  examples/shadow/example2/input.png filter=lfs diff=lfs merge=lfs -text
65
  examples/shadow/example2/reference_image_0.png filter=lfs diff=lfs merge=lfs -text
 
 
63
  examples/shadow/example1/reference_image_5.png filter=lfs diff=lfs merge=lfs -text
64
  examples/shadow/example2/input.png filter=lfs diff=lfs merge=lfs -text
65
  examples/shadow/example2/reference_image_0.png filter=lfs diff=lfs merge=lfs -text
66
+ diffusers/docs/source/en/imgs/access_request.png filter=lfs diff=lfs merge=lfs -text
diffusers/docs/.DS_Store ADDED
Binary file (6.15 kB). View file
 
diffusers/docs/README.md ADDED
@@ -0,0 +1,268 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!---
2
+ Copyright 2024- The HuggingFace Team. All rights reserved.
3
+
4
+ Licensed under the Apache License, Version 2.0 (the "License");
5
+ you may not use this file except in compliance with the License.
6
+ You may obtain a copy of the License at
7
+
8
+ http://www.apache.org/licenses/LICENSE-2.0
9
+
10
+ Unless required by applicable law or agreed to in writing, software
11
+ distributed under the License is distributed on an "AS IS" BASIS,
12
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ See the License for the specific language governing permissions and
14
+ limitations under the License.
15
+ -->
16
+
17
+ # Generating the documentation
18
+
19
+ To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
20
+ you can install them with the following command, at the root of the code repository:
21
+
22
+ ```bash
23
+ pip install -e ".[docs]"
24
+ ```
25
+
26
+ Then you need to install our open source documentation builder tool:
27
+
28
+ ```bash
29
+ pip install git+https://github.com/huggingface/doc-builder
30
+ ```
31
+
32
+ ---
33
+ **NOTE**
34
+
35
+ You only need to generate the documentation to inspect it locally (if you're planning changes and want to
36
+ check how they look before committing for instance). You don't have to commit the built documentation.
37
+
38
+ ---
39
+
40
+ ## Previewing the documentation
41
+
42
+ To preview the docs, first install the `watchdog` module with:
43
+
44
+ ```bash
45
+ pip install watchdog
46
+ ```
47
+
48
+ Then run the following command:
49
+
50
+ ```bash
51
+ doc-builder preview {package_name} {path_to_docs}
52
+ ```
53
+
54
+ For example:
55
+
56
+ ```bash
57
+ doc-builder preview diffusers docs/source/en
58
+ ```
59
+
60
+ The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
61
+
62
+ ---
63
+ **NOTE**
64
+
65
+ The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
66
+
67
+ ---
68
+
69
+ ## Adding a new element to the navigation bar
70
+
71
+ Accepted files are Markdown (.md).
72
+
73
+ Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
74
+ the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml) file.
75
+
76
+ ## Renaming section headers and moving sections
77
+
78
+ It helps to keep the old links working when renaming the section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums, and Social media and it'd make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information.
79
+
80
+ Therefore, we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor.
81
+
82
+ So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:
83
+
84
+ ```md
85
+ Sections that were moved:
86
+
87
+ [ <a href="#section-b">Section A</a><a id="section-a"></a> ]
88
+ ```
89
+ and of course, if you moved it to another file, then:
90
+
91
+ ```md
92
+ Sections that were moved:
93
+
94
+ [ <a href="../new-file#section-b">Section A</a><a id="section-a"></a> ]
95
+ ```
96
+
97
+ Use the relative style to link to the new file so that the versioned docs continue to work.
98
+
99
+ For an example of a rich moved section set please see the very end of [the transformers Trainer doc](https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/trainer.md).
100
+
101
+
102
+ ## Writing Documentation - Specification
103
+
104
+ The `huggingface/diffusers` documentation follows the
105
+ [Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style for docstrings,
106
+ although we can write them directly in Markdown.
107
+
108
+ ### Adding a new tutorial
109
+
110
+ Adding a new tutorial or section is done in two steps:
111
+
112
+ - Add a new Markdown (.md) file under `docs/source/<languageCode>`.
113
+ - Link that file in `docs/source/<languageCode>/_toctree.yml` on the correct toc-tree.
114
+
115
+ Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
116
+ depending on the intended targets (beginners, more advanced users, or researchers) it should go in sections two, three, or four.
117
+
118
+ ### Adding a new pipeline/scheduler
119
+
120
+ When adding a new pipeline:
121
+
122
+ - Create a file `xxx.md` under `docs/source/<languageCode>/api/pipelines` (don't hesitate to copy an existing file as template).
123
+ - Link that file in (*Diffusers Summary*) section in `docs/source/api/pipelines/overview.md`, along with the link to the paper, and a colab notebook (if available).
124
+ - Write a short overview of the diffusion model:
125
+ - Overview with paper & authors
126
+ - Paper abstract
127
+ - Tips and tricks and how to use it best
128
+ - Possible an end-to-end example of how to use it
129
+ - Add all the pipeline classes that should be linked in the diffusion model. These classes should be added using our Markdown syntax. By default as follows:
130
+
131
+ ```
132
+ [[autodoc]] XXXPipeline
133
+ - all
134
+ - __call__
135
+ ```
136
+
137
+ This will include every public method of the pipeline that is documented, as well as the `__call__` method that is not documented by default. If you just want to add additional methods that are not documented, you can put the list of all methods to add in a list that contains `all`.
138
+
139
+ ```
140
+ [[autodoc]] XXXPipeline
141
+ - all
142
+ - __call__
143
+ - enable_attention_slicing
144
+ - disable_attention_slicing
145
+ - enable_xformers_memory_efficient_attention
146
+ - disable_xformers_memory_efficient_attention
147
+ ```
148
+
149
+ You can follow the same process to create a new scheduler under the `docs/source/<languageCode>/api/schedulers` folder.
150
+
151
+ ### Writing source documentation
152
+
153
+ Values that should be put in `code` should either be surrounded by backticks: \`like so\`. Note that argument names
154
+ and objects like True, None, or any strings should usually be put in `code`.
155
+
156
+ When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool
157
+ adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
158
+ function to be in the main package.
159
+
160
+ If you want to create a link to some internal class or function, you need to
161
+ provide its path. For instance: \[\`pipelines.ImagePipelineOutput\`\]. This will be converted into a link with
162
+ `pipelines.ImagePipelineOutput` in the description. To get rid of the path and only keep the name of the object you are
163
+ linking to in the description, add a ~: \[\`~pipelines.ImagePipelineOutput\`\] will generate a link with `ImagePipelineOutput` in the description.
164
+
165
+ The same works for methods so you can either use \[\`XXXClass.method\`\] or \[\`~XXXClass.method\`\].
166
+
167
+ #### Defining arguments in a method
168
+
169
+ Arguments should be defined with the `Args:` (or `Arguments:` or `Parameters:`) prefix, followed by a line return and
170
+ an indentation. The argument should be followed by its type, with its shape if it is a tensor, a colon, and its
171
+ description:
172
+
173
+ ```
174
+ Args:
175
+ n_layers (`int`): The number of layers of the model.
176
+ ```
177
+
178
+ If the description is too long to fit in one line, another indentation is necessary before writing the description
179
+ after the argument.
180
+
181
+ Here's an example showcasing everything so far:
182
+
183
+ ```
184
+ Args:
185
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
186
+ Indices of input sequence tokens in the vocabulary.
187
+
188
+ Indices can be obtained using [`AlbertTokenizer`]. See [`~PreTrainedTokenizer.encode`] and
189
+ [`~PreTrainedTokenizer.__call__`] for details.
190
+
191
+ [What are input IDs?](../glossary#input-ids)
192
+ ```
193
+
194
+ For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
195
+ following signature:
196
+
197
+ ```py
198
+ def my_function(x: str=None, a: float=3.14):
199
+ ```
200
+
201
+ then its documentation should look like this:
202
+
203
+ ```
204
+ Args:
205
+ x (`str`, *optional*):
206
+ This argument controls ...
207
+ a (`float`, *optional*, defaults to `3.14`):
208
+ This argument is used to ...
209
+ ```
210
+
211
+ Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even
212
+ if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
213
+ however write as many lines as you want in the indented description (see the example above with `input_ids`).
214
+
215
+ #### Writing a multi-line code block
216
+
217
+ Multi-line code blocks can be useful for displaying examples. They are done between two lines of three backticks as usual in Markdown:
218
+
219
+
220
+ ````
221
+ ```
222
+ # first line of code
223
+ # second line
224
+ # etc
225
+ ```
226
+ ````
227
+
228
+ #### Writing a return block
229
+
230
+ The return block should be introduced with the `Returns:` prefix, followed by a line return and an indentation.
231
+ The first line should be the type of the return, followed by a line return. No need to indent further for the elements
232
+ building the return.
233
+
234
+ Here's an example of a single value return:
235
+
236
+ ```
237
+ Returns:
238
+ `List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
239
+ ```
240
+
241
+ Here's an example of a tuple return, comprising several objects:
242
+
243
+ ```
244
+ Returns:
245
+ `tuple(torch.Tensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
246
+ - ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.Tensor` of shape `(1,)` --
247
+ Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
248
+ - **prediction_scores** (`torch.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
249
+ Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
250
+ ```
251
+
252
+ #### Adding an image
253
+
254
+ Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
255
+ the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
256
+ them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
257
+ If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
258
+ to this dataset.
259
+
260
+ ## Styling the docstring
261
+
262
+ We have an automatic script running with the `make style` command that will make sure that:
263
+ - the docstrings fully take advantage of the line width
264
+ - all code examples are formatted using black, like the code of the Transformers library
265
+
266
+ This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's
267
+ recommended to commit your changes before running `make style`, so you can revert the changes done by that script
268
+ easily.
diffusers/docs/TRANSLATING.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ ### Translating the Diffusers documentation into your language
14
+
15
+ As part of our mission to democratize machine learning, we'd love to make the Diffusers library available in many more languages! Follow the steps below if you want to help translate the documentation into your language 🙏.
16
+
17
+ **🗞️ Open an issue**
18
+
19
+ To get started, navigate to the [Issues](https://github.com/huggingface/diffusers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "🌐 Translating a New Language?" from the "New issue" button.
20
+
21
+ Once an issue exists, post a comment to indicate which chapters you'd like to work on, and we'll add your name to the list.
22
+
23
+
24
+ **🍴 Fork the repository**
25
+
26
+ First, you'll need to [fork the Diffusers repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo). You can do this by clicking on the **Fork** button on the top-right corner of this repo's page.
27
+
28
+ Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:
29
+
30
+ ```bash
31
+ git clone https://github.com/<YOUR-USERNAME>/diffusers.git
32
+ ```
33
+
34
+ **📋 Copy-paste the English version with a new language code**
35
+
36
+ The documentation files are in one leading directory:
37
+
38
+ - [`docs/source`](https://github.com/huggingface/diffusers/tree/main/docs/source): All the documentation materials are organized here by language.
39
+
40
+ You'll only need to copy the files in the [`docs/source/en`](https://github.com/huggingface/diffusers/tree/main/docs/source/en) directory, so first navigate to your fork of the repo and run the following:
41
+
42
+ ```bash
43
+ cd ~/path/to/diffusers/docs
44
+ cp -r source/en source/<LANG-ID>
45
+ ```
46
+
47
+ Here, `<LANG-ID>` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
48
+
49
+ **✍️ Start translating**
50
+
51
+ The fun part comes - translating the text!
52
+
53
+ The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
54
+
55
+ > 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/<LANG-ID>/` directory!
56
+
57
+ The fields you should add are `local` (with the name of the file containing the translation; e.g. `autoclass_tutorial`), and `title` (with the title of the doc in your language; e.g. `Load pretrained instances with an AutoClass`) -- as a reference, here is the `_toctree.yml` for [English](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml):
58
+
59
+ ```yaml
60
+ - sections:
61
+ - local: pipeline_tutorial # Do not change this! Use the same name for your .md file
62
+ title: Pipelines for inference # Translate this!
63
+ ...
64
+ title: Tutorials # Translate this!
65
+ ```
66
+
67
+ Once you have translated the `_toctree.yml` file, you can start translating the [MDX](https://mdxjs.com/) files associated with your docs chapter.
68
+
69
+ > 🙋 If you'd like others to help you with the translation, you should [open an issue](https://github.com/huggingface/diffusers/issues) and tag @patrickvonplaten.
diffusers/docs/source/_config.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # docstyle-ignore
2
+ INSTALL_CONTENT = """
3
+ # Diffusers installation
4
+ ! pip install diffusers transformers datasets accelerate
5
+ # To install from source instead of the last release, comment the command above and uncomment the following one.
6
+ # ! pip install git+https://github.com/huggingface/diffusers.git
7
+ """
8
+
9
+ notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]
diffusers/docs/source/en/_toctree.yml ADDED
@@ -0,0 +1,540 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - sections:
2
+ - local: index
3
+ title: 🧨 Diffusers
4
+ - local: quicktour
5
+ title: Quicktour
6
+ - local: stable_diffusion
7
+ title: Effective and efficient diffusion
8
+ - local: installation
9
+ title: Installation
10
+ title: Get started
11
+ - sections:
12
+ - local: tutorials/tutorial_overview
13
+ title: Overview
14
+ - local: using-diffusers/write_own_pipeline
15
+ title: Understanding pipelines, models and schedulers
16
+ - local: tutorials/autopipeline
17
+ title: AutoPipeline
18
+ - local: tutorials/basic_training
19
+ title: Train a diffusion model
20
+ - local: tutorials/using_peft_for_inference
21
+ title: Load LoRAs for inference
22
+ - local: tutorials/fast_diffusion
23
+ title: Accelerate inference of text-to-image diffusion models
24
+ - local: tutorials/inference_with_big_models
25
+ title: Working with big models
26
+ title: Tutorials
27
+ - sections:
28
+ - local: using-diffusers/loading
29
+ title: Load pipelines
30
+ - local: using-diffusers/custom_pipeline_overview
31
+ title: Load community pipelines and components
32
+ - local: using-diffusers/schedulers
33
+ title: Load schedulers and models
34
+ - local: using-diffusers/other-formats
35
+ title: Model files and layouts
36
+ - local: using-diffusers/loading_adapters
37
+ title: Load adapters
38
+ - local: using-diffusers/push_to_hub
39
+ title: Push files to the Hub
40
+ title: Load pipelines and adapters
41
+ - sections:
42
+ - local: using-diffusers/unconditional_image_generation
43
+ title: Unconditional image generation
44
+ - local: using-diffusers/conditional_image_generation
45
+ title: Text-to-image
46
+ - local: using-diffusers/img2img
47
+ title: Image-to-image
48
+ - local: using-diffusers/inpaint
49
+ title: Inpainting
50
+ - local: using-diffusers/text-img2vid
51
+ title: Text or image-to-video
52
+ - local: using-diffusers/depth2img
53
+ title: Depth-to-image
54
+ title: Generative tasks
55
+ - sections:
56
+ - local: using-diffusers/overview_techniques
57
+ title: Overview
58
+ - local: training/distributed_inference
59
+ title: Distributed inference
60
+ - local: using-diffusers/merge_loras
61
+ title: Merge LoRAs
62
+ - local: using-diffusers/scheduler_features
63
+ title: Scheduler features
64
+ - local: using-diffusers/callback
65
+ title: Pipeline callbacks
66
+ - local: using-diffusers/reusing_seeds
67
+ title: Reproducible pipelines
68
+ - local: using-diffusers/image_quality
69
+ title: Controlling image quality
70
+ - local: using-diffusers/weighted_prompts
71
+ title: Prompt techniques
72
+ title: Inference techniques
73
+ - sections:
74
+ - local: advanced_inference/outpaint
75
+ title: Outpainting
76
+ title: Advanced inference
77
+ - sections:
78
+ - local: using-diffusers/sdxl
79
+ title: Stable Diffusion XL
80
+ - local: using-diffusers/sdxl_turbo
81
+ title: SDXL Turbo
82
+ - local: using-diffusers/kandinsky
83
+ title: Kandinsky
84
+ - local: using-diffusers/ip_adapter
85
+ title: IP-Adapter
86
+ - local: using-diffusers/pag
87
+ title: PAG
88
+ - local: using-diffusers/controlnet
89
+ title: ControlNet
90
+ - local: using-diffusers/t2i_adapter
91
+ title: T2I-Adapter
92
+ - local: using-diffusers/inference_with_lcm
93
+ title: Latent Consistency Model
94
+ - local: using-diffusers/textual_inversion_inference
95
+ title: Textual inversion
96
+ - local: using-diffusers/shap-e
97
+ title: Shap-E
98
+ - local: using-diffusers/diffedit
99
+ title: DiffEdit
100
+ - local: using-diffusers/inference_with_tcd_lora
101
+ title: Trajectory Consistency Distillation-LoRA
102
+ - local: using-diffusers/svd
103
+ title: Stable Video Diffusion
104
+ - local: using-diffusers/marigold_usage
105
+ title: Marigold Computer Vision
106
+ title: Specific pipeline examples
107
+ - sections:
108
+ - local: training/overview
109
+ title: Overview
110
+ - local: training/create_dataset
111
+ title: Create a dataset for training
112
+ - local: training/adapt_a_model
113
+ title: Adapt a model to a new task
114
+ - isExpanded: false
115
+ sections:
116
+ - local: training/unconditional_training
117
+ title: Unconditional image generation
118
+ - local: training/text2image
119
+ title: Text-to-image
120
+ - local: training/sdxl
121
+ title: Stable Diffusion XL
122
+ - local: training/kandinsky
123
+ title: Kandinsky 2.2
124
+ - local: training/wuerstchen
125
+ title: Wuerstchen
126
+ - local: training/controlnet
127
+ title: ControlNet
128
+ - local: training/t2i_adapters
129
+ title: T2I-Adapters
130
+ - local: training/instructpix2pix
131
+ title: InstructPix2Pix
132
+ title: Models
133
+ - isExpanded: false
134
+ sections:
135
+ - local: training/text_inversion
136
+ title: Textual Inversion
137
+ - local: training/dreambooth
138
+ title: DreamBooth
139
+ - local: training/lora
140
+ title: LoRA
141
+ - local: training/custom_diffusion
142
+ title: Custom Diffusion
143
+ - local: training/lcm_distill
144
+ title: Latent Consistency Distillation
145
+ - local: training/ddpo
146
+ title: Reinforcement learning training with DDPO
147
+ title: Methods
148
+ title: Training
149
+ - sections:
150
+ - local: optimization/fp16
151
+ title: Speed up inference
152
+ - local: optimization/memory
153
+ title: Reduce memory usage
154
+ - local: optimization/torch2.0
155
+ title: PyTorch 2.0
156
+ - local: optimization/xformers
157
+ title: xFormers
158
+ - local: optimization/tome
159
+ title: Token merging
160
+ - local: optimization/deepcache
161
+ title: DeepCache
162
+ - local: optimization/tgate
163
+ title: TGATE
164
+ - local: optimization/xdit
165
+ title: xDiT
166
+ - sections:
167
+ - local: using-diffusers/stable_diffusion_jax_how_to
168
+ title: JAX/Flax
169
+ - local: optimization/onnx
170
+ title: ONNX
171
+ - local: optimization/open_vino
172
+ title: OpenVINO
173
+ - local: optimization/coreml
174
+ title: Core ML
175
+ title: Optimized model formats
176
+ - sections:
177
+ - local: optimization/mps
178
+ title: Metal Performance Shaders (MPS)
179
+ - local: optimization/habana
180
+ title: Habana Gaudi
181
+ title: Optimized hardware
182
+ title: Accelerate inference and reduce memory
183
+ - sections:
184
+ - local: conceptual/philosophy
185
+ title: Philosophy
186
+ - local: using-diffusers/controlling_generation
187
+ title: Controlled generation
188
+ - local: conceptual/contribution
189
+ title: How to contribute?
190
+ - local: conceptual/ethical_guidelines
191
+ title: Diffusers' Ethical Guidelines
192
+ - local: conceptual/evaluation
193
+ title: Evaluating Diffusion Models
194
+ title: Conceptual Guides
195
+ - sections:
196
+ - local: community_projects
197
+ title: Projects built with Diffusers
198
+ title: Community Projects
199
+ - sections:
200
+ - isExpanded: false
201
+ sections:
202
+ - local: api/configuration
203
+ title: Configuration
204
+ - local: api/logging
205
+ title: Logging
206
+ - local: api/outputs
207
+ title: Outputs
208
+ title: Main Classes
209
+ - isExpanded: false
210
+ sections:
211
+ - local: api/loaders/ip_adapter
212
+ title: IP-Adapter
213
+ - local: api/loaders/lora
214
+ title: LoRA
215
+ - local: api/loaders/single_file
216
+ title: Single files
217
+ - local: api/loaders/textual_inversion
218
+ title: Textual Inversion
219
+ - local: api/loaders/unet
220
+ title: UNet
221
+ - local: api/loaders/peft
222
+ title: PEFT
223
+ title: Loaders
224
+ - isExpanded: false
225
+ sections:
226
+ - local: api/models/overview
227
+ title: Overview
228
+ - sections:
229
+ - local: api/models/controlnet
230
+ title: ControlNetModel
231
+ - local: api/models/controlnet_flux
232
+ title: FluxControlNetModel
233
+ - local: api/models/controlnet_hunyuandit
234
+ title: HunyuanDiT2DControlNetModel
235
+ - local: api/models/controlnet_sd3
236
+ title: SD3ControlNetModel
237
+ - local: api/models/controlnet_sparsectrl
238
+ title: SparseControlNetModel
239
+ title: ControlNets
240
+ - sections:
241
+ - local: api/models/aura_flow_transformer2d
242
+ title: AuraFlowTransformer2DModel
243
+ - local: api/models/cogvideox_transformer3d
244
+ title: CogVideoXTransformer3DModel
245
+ - local: api/models/dit_transformer2d
246
+ title: DiTTransformer2DModel
247
+ - local: api/models/flux_transformer
248
+ title: FluxTransformer2DModel
249
+ - local: api/models/hunyuan_transformer2d
250
+ title: HunyuanDiT2DModel
251
+ - local: api/models/latte_transformer3d
252
+ title: LatteTransformer3DModel
253
+ - local: api/models/lumina_nextdit2d
254
+ title: LuminaNextDiT2DModel
255
+ - local: api/models/pixart_transformer2d
256
+ title: PixArtTransformer2DModel
257
+ - local: api/models/prior_transformer
258
+ title: PriorTransformer
259
+ - local: api/models/sd3_transformer2d
260
+ title: SD3Transformer2DModel
261
+ - local: api/models/stable_audio_transformer
262
+ title: StableAudioDiTModel
263
+ - local: api/models/transformer2d
264
+ title: Transformer2DModel
265
+ - local: api/models/transformer_temporal
266
+ title: TransformerTemporalModel
267
+ title: Transformers
268
+ - sections:
269
+ - local: api/models/stable_cascade_unet
270
+ title: StableCascadeUNet
271
+ - local: api/models/unet
272
+ title: UNet1DModel
273
+ - local: api/models/unet2d
274
+ title: UNet2DModel
275
+ - local: api/models/unet2d-cond
276
+ title: UNet2DConditionModel
277
+ - local: api/models/unet3d-cond
278
+ title: UNet3DConditionModel
279
+ - local: api/models/unet-motion
280
+ title: UNetMotionModel
281
+ - local: api/models/uvit2d
282
+ title: UViT2DModel
283
+ title: UNets
284
+ - sections:
285
+ - local: api/models/autoencoderkl
286
+ title: AutoencoderKL
287
+ - local: api/models/autoencoderkl_cogvideox
288
+ title: AutoencoderKLCogVideoX
289
+ - local: api/models/asymmetricautoencoderkl
290
+ title: AsymmetricAutoencoderKL
291
+ - local: api/models/consistency_decoder_vae
292
+ title: ConsistencyDecoderVAE
293
+ - local: api/models/autoencoder_oobleck
294
+ title: Oobleck AutoEncoder
295
+ - local: api/models/autoencoder_tiny
296
+ title: Tiny AutoEncoder
297
+ - local: api/models/vq
298
+ title: VQModel
299
+ title: VAEs
300
+ title: Models
301
+ - isExpanded: false
302
+ sections:
303
+ - local: api/pipelines/overview
304
+ title: Overview
305
+ - local: api/pipelines/amused
306
+ title: aMUSEd
307
+ - local: api/pipelines/animatediff
308
+ title: AnimateDiff
309
+ - local: api/pipelines/attend_and_excite
310
+ title: Attend-and-Excite
311
+ - local: api/pipelines/audioldm
312
+ title: AudioLDM
313
+ - local: api/pipelines/audioldm2
314
+ title: AudioLDM 2
315
+ - local: api/pipelines/aura_flow
316
+ title: AuraFlow
317
+ - local: api/pipelines/auto_pipeline
318
+ title: AutoPipeline
319
+ - local: api/pipelines/blip_diffusion
320
+ title: BLIP-Diffusion
321
+ - local: api/pipelines/cogvideox
322
+ title: CogVideoX
323
+ - local: api/pipelines/consistency_models
324
+ title: Consistency Models
325
+ - local: api/pipelines/controlnet
326
+ title: ControlNet
327
+ - local: api/pipelines/controlnet_flux
328
+ title: ControlNet with Flux.1
329
+ - local: api/pipelines/controlnet_hunyuandit
330
+ title: ControlNet with Hunyuan-DiT
331
+ - local: api/pipelines/controlnet_sd3
332
+ title: ControlNet with Stable Diffusion 3
333
+ - local: api/pipelines/controlnet_sdxl
334
+ title: ControlNet with Stable Diffusion XL
335
+ - local: api/pipelines/controlnetxs
336
+ title: ControlNet-XS
337
+ - local: api/pipelines/controlnetxs_sdxl
338
+ title: ControlNet-XS with Stable Diffusion XL
339
+ - local: api/pipelines/dance_diffusion
340
+ title: Dance Diffusion
341
+ - local: api/pipelines/ddim
342
+ title: DDIM
343
+ - local: api/pipelines/ddpm
344
+ title: DDPM
345
+ - local: api/pipelines/deepfloyd_if
346
+ title: DeepFloyd IF
347
+ - local: api/pipelines/diffedit
348
+ title: DiffEdit
349
+ - local: api/pipelines/dit
350
+ title: DiT
351
+ - local: api/pipelines/flux
352
+ title: Flux
353
+ - local: api/pipelines/hunyuandit
354
+ title: Hunyuan-DiT
355
+ - local: api/pipelines/i2vgenxl
356
+ title: I2VGen-XL
357
+ - local: api/pipelines/pix2pix
358
+ title: InstructPix2Pix
359
+ - local: api/pipelines/kandinsky
360
+ title: Kandinsky 2.1
361
+ - local: api/pipelines/kandinsky_v22
362
+ title: Kandinsky 2.2
363
+ - local: api/pipelines/kandinsky3
364
+ title: Kandinsky 3
365
+ - local: api/pipelines/kolors
366
+ title: Kolors
367
+ - local: api/pipelines/latent_consistency_models
368
+ title: Latent Consistency Models
369
+ - local: api/pipelines/latent_diffusion
370
+ title: Latent Diffusion
371
+ - local: api/pipelines/latte
372
+ title: Latte
373
+ - local: api/pipelines/ledits_pp
374
+ title: LEDITS++
375
+ - local: api/pipelines/lumina
376
+ title: Lumina-T2X
377
+ - local: api/pipelines/marigold
378
+ title: Marigold
379
+ - local: api/pipelines/panorama
380
+ title: MultiDiffusion
381
+ - local: api/pipelines/musicldm
382
+ title: MusicLDM
383
+ - local: api/pipelines/pag
384
+ title: PAG
385
+ - local: api/pipelines/paint_by_example
386
+ title: Paint by Example
387
+ - local: api/pipelines/pia
388
+ title: Personalized Image Animator (PIA)
389
+ - local: api/pipelines/pixart
390
+ title: PixArt-α
391
+ - local: api/pipelines/pixart_sigma
392
+ title: PixArt-Σ
393
+ - local: api/pipelines/self_attention_guidance
394
+ title: Self-Attention Guidance
395
+ - local: api/pipelines/semantic_stable_diffusion
396
+ title: Semantic Guidance
397
+ - local: api/pipelines/shap_e
398
+ title: Shap-E
399
+ - local: api/pipelines/stable_audio
400
+ title: Stable Audio
401
+ - local: api/pipelines/stable_cascade
402
+ title: Stable Cascade
403
+ - sections:
404
+ - local: api/pipelines/stable_diffusion/overview
405
+ title: Overview
406
+ - local: api/pipelines/stable_diffusion/text2img
407
+ title: Text-to-image
408
+ - local: api/pipelines/stable_diffusion/img2img
409
+ title: Image-to-image
410
+ - local: api/pipelines/stable_diffusion/svd
411
+ title: Image-to-video
412
+ - local: api/pipelines/stable_diffusion/inpaint
413
+ title: Inpainting
414
+ - local: api/pipelines/stable_diffusion/depth2img
415
+ title: Depth-to-image
416
+ - local: api/pipelines/stable_diffusion/image_variation
417
+ title: Image variation
418
+ - local: api/pipelines/stable_diffusion/stable_diffusion_safe
419
+ title: Safe Stable Diffusion
420
+ - local: api/pipelines/stable_diffusion/stable_diffusion_2
421
+ title: Stable Diffusion 2
422
+ - local: api/pipelines/stable_diffusion/stable_diffusion_3
423
+ title: Stable Diffusion 3
424
+ - local: api/pipelines/stable_diffusion/stable_diffusion_xl
425
+ title: Stable Diffusion XL
426
+ - local: api/pipelines/stable_diffusion/sdxl_turbo
427
+ title: SDXL Turbo
428
+ - local: api/pipelines/stable_diffusion/latent_upscale
429
+ title: Latent upscaler
430
+ - local: api/pipelines/stable_diffusion/upscale
431
+ title: Super-resolution
432
+ - local: api/pipelines/stable_diffusion/k_diffusion
433
+ title: K-Diffusion
434
+ - local: api/pipelines/stable_diffusion/ldm3d_diffusion
435
+ title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
436
+ - local: api/pipelines/stable_diffusion/adapter
437
+ title: T2I-Adapter
438
+ - local: api/pipelines/stable_diffusion/gligen
439
+ title: GLIGEN (Grounded Language-to-Image Generation)
440
+ title: Stable Diffusion
441
+ - local: api/pipelines/stable_unclip
442
+ title: Stable unCLIP
443
+ - local: api/pipelines/text_to_video
444
+ title: Text-to-video
445
+ - local: api/pipelines/text_to_video_zero
446
+ title: Text2Video-Zero
447
+ - local: api/pipelines/unclip
448
+ title: unCLIP
449
+ - local: api/pipelines/unidiffuser
450
+ title: UniDiffuser
451
+ - local: api/pipelines/value_guided_sampling
452
+ title: Value-guided sampling
453
+ - local: api/pipelines/wuerstchen
454
+ title: Wuerstchen
455
+ title: Pipelines
456
+ - isExpanded: false
457
+ sections:
458
+ - local: api/schedulers/overview
459
+ title: Overview
460
+ - local: api/schedulers/cm_stochastic_iterative
461
+ title: CMStochasticIterativeScheduler
462
+ - local: api/schedulers/consistency_decoder
463
+ title: ConsistencyDecoderScheduler
464
+ - local: api/schedulers/cosine_dpm
465
+ title: CosineDPMSolverMultistepScheduler
466
+ - local: api/schedulers/ddim_inverse
467
+ title: DDIMInverseScheduler
468
+ - local: api/schedulers/ddim
469
+ title: DDIMScheduler
470
+ - local: api/schedulers/ddpm
471
+ title: DDPMScheduler
472
+ - local: api/schedulers/deis
473
+ title: DEISMultistepScheduler
474
+ - local: api/schedulers/multistep_dpm_solver_inverse
475
+ title: DPMSolverMultistepInverse
476
+ - local: api/schedulers/multistep_dpm_solver
477
+ title: DPMSolverMultistepScheduler
478
+ - local: api/schedulers/dpm_sde
479
+ title: DPMSolverSDEScheduler
480
+ - local: api/schedulers/singlestep_dpm_solver
481
+ title: DPMSolverSinglestepScheduler
482
+ - local: api/schedulers/edm_multistep_dpm_solver
483
+ title: EDMDPMSolverMultistepScheduler
484
+ - local: api/schedulers/edm_euler
485
+ title: EDMEulerScheduler
486
+ - local: api/schedulers/euler_ancestral
487
+ title: EulerAncestralDiscreteScheduler
488
+ - local: api/schedulers/euler
489
+ title: EulerDiscreteScheduler
490
+ - local: api/schedulers/flow_match_euler_discrete
491
+ title: FlowMatchEulerDiscreteScheduler
492
+ - local: api/schedulers/flow_match_heun_discrete
493
+ title: FlowMatchHeunDiscreteScheduler
494
+ - local: api/schedulers/heun
495
+ title: HeunDiscreteScheduler
496
+ - local: api/schedulers/ipndm
497
+ title: IPNDMScheduler
498
+ - local: api/schedulers/stochastic_karras_ve
499
+ title: KarrasVeScheduler
500
+ - local: api/schedulers/dpm_discrete_ancestral
501
+ title: KDPM2AncestralDiscreteScheduler
502
+ - local: api/schedulers/dpm_discrete
503
+ title: KDPM2DiscreteScheduler
504
+ - local: api/schedulers/lcm
505
+ title: LCMScheduler
506
+ - local: api/schedulers/lms_discrete
507
+ title: LMSDiscreteScheduler
508
+ - local: api/schedulers/pndm
509
+ title: PNDMScheduler
510
+ - local: api/schedulers/repaint
511
+ title: RePaintScheduler
512
+ - local: api/schedulers/score_sde_ve
513
+ title: ScoreSdeVeScheduler
514
+ - local: api/schedulers/score_sde_vp
515
+ title: ScoreSdeVpScheduler
516
+ - local: api/schedulers/tcd
517
+ title: TCDScheduler
518
+ - local: api/schedulers/unipc
519
+ title: UniPCMultistepScheduler
520
+ - local: api/schedulers/vq_diffusion
521
+ title: VQDiffusionScheduler
522
+ title: Schedulers
523
+ - isExpanded: false
524
+ sections:
525
+ - local: api/internal_classes_overview
526
+ title: Overview
527
+ - local: api/attnprocessor
528
+ title: Attention Processor
529
+ - local: api/activations
530
+ title: Custom activation functions
531
+ - local: api/normalization
532
+ title: Custom normalization layers
533
+ - local: api/utilities
534
+ title: Utilities
535
+ - local: api/image_processor
536
+ title: VAE Image Processor
537
+ - local: api/video_processor
538
+ title: Video Processor
539
+ title: Internal classes
540
+ title: API
diffusers/docs/source/en/advanced_inference/outpaint.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Outpainting
14
+
15
+ Outpainting extends an image beyond its original boundaries, allowing you to add, replace, or modify visual elements in an image while preserving the original image. Like [inpainting](../using-diffusers/inpaint), you want to fill the white area (in this case, the area outside of the original image) with new visual elements while keeping the original image (represented by a mask of black pixels). There are a couple of ways to outpaint, such as with a [ControlNet](https://hf.co/blog/OzzyGT/outpainting-controlnet) or with [Differential Diffusion](https://hf.co/blog/OzzyGT/outpainting-differential-diffusion).
16
+
17
+ This guide will show you how to outpaint with an inpainting model, ControlNet, and a ZoeDepth estimator.
18
+
19
+ Before you begin, make sure you have the [controlnet_aux](https://github.com/huggingface/controlnet_aux) library installed so you can use the ZoeDepth estimator.
20
+
21
+ ```py
22
+ !pip install -q controlnet_aux
23
+ ```
24
+
25
+ ## Image preparation
26
+
27
+ Start by picking an image to outpaint with and remove the background with a Space like [BRIA-RMBG-1.4](https://hf.co/spaces/briaai/BRIA-RMBG-1.4).
28
+
29
+ <iframe
30
+ src="https://briaai-bria-rmbg-1-4.hf.space"
31
+ frameborder="0"
32
+ width="850"
33
+ height="450"
34
+ ></iframe>
35
+
36
+ For example, remove the background from this image of a pair of shoes.
37
+
38
+ <div class="flex flex-row gap-4">
39
+ <div class="flex-1">
40
+ <img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/original-jordan.png"/>
41
+ <figcaption class="mt-2 text-center text-sm text-gray-500">original image</figcaption>
42
+ </div>
43
+ <div class="flex-1">
44
+ <img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png"/>
45
+ <figcaption class="mt-2 text-center text-sm text-gray-500">background removed</figcaption>
46
+ </div>
47
+ </div>
48
+
49
+ [Stable Diffusion XL (SDXL)](../using-diffusers/sdxl) models work best with 1024x1024 images, but you can resize the image to any size as long as your hardware has enough memory to support it. The transparent background in the image should also be replaced with a white background. Create a function (like the one below) that scales and pastes the image onto a white background.
50
+
51
+ ```py
52
+ import random
53
+
54
+ import requests
55
+ import torch
56
+ from controlnet_aux import ZoeDetector
57
+ from PIL import Image, ImageOps
58
+
59
+ from diffusers import (
60
+ AutoencoderKL,
61
+ ControlNetModel,
62
+ StableDiffusionXLControlNetPipeline,
63
+ StableDiffusionXLInpaintPipeline,
64
+ )
65
+
66
+ def scale_and_paste(original_image):
67
+ aspect_ratio = original_image.width / original_image.height
68
+
69
+ if original_image.width > original_image.height:
70
+ new_width = 1024
71
+ new_height = round(new_width / aspect_ratio)
72
+ else:
73
+ new_height = 1024
74
+ new_width = round(new_height * aspect_ratio)
75
+
76
+ resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
77
+ white_background = Image.new("RGBA", (1024, 1024), "white")
78
+ x = (1024 - new_width) // 2
79
+ y = (1024 - new_height) // 2
80
+ white_background.paste(resized_original, (x, y), resized_original)
81
+
82
+ return resized_original, white_background
83
+
84
+ original_image = Image.open(
85
+ requests.get(
86
+ "https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png",
87
+ stream=True,
88
+ ).raw
89
+ ).convert("RGBA")
90
+ resized_img, white_bg_image = scale_and_paste(original_image)
91
+ ```
92
+
93
+ To avoid adding unwanted extra details, use the ZoeDepth estimator to provide additional guidance during generation and to ensure the shoes remain consistent with the original image.
94
+
95
+ ```py
96
+ zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
97
+ image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)
98
+ image_zoe
99
+ ```
100
+
101
+ <div class="flex justify-center">
102
+ <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/zoedepth-jordan.png"/>
103
+ </div>
104
+
105
+ ## Outpaint
106
+
107
+ Once your image is ready, you can generate content in the white area around the shoes with [controlnet-inpaint-dreamer-sdxl](https://hf.co/destitech/controlnet-inpaint-dreamer-sdxl), a SDXL ControlNet trained for inpainting.
108
+
109
+ Load the inpainting ControlNet, ZoeDepth model, VAE and pass them to the [`StableDiffusionXLControlNetPipeline`]. Then you can create an optional `generate_image` function (for convenience) to outpaint an initial image.
110
+
111
+ ```py
112
+ controlnets = [
113
+ ControlNetModel.from_pretrained(
114
+ "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
115
+ ),
116
+ ControlNetModel.from_pretrained(
117
+ "diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16
118
+ ),
119
+ ]
120
+ vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
121
+ pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
122
+ "SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnets, vae=vae
123
+ ).to("cuda")
124
+
125
+ def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
126
+ if seed is None:
127
+ seed = random.randint(0, 2**32 - 1)
128
+
129
+ generator = torch.Generator(device="cpu").manual_seed(seed)
130
+
131
+ image = pipeline(
132
+ prompt,
133
+ negative_prompt=negative_prompt,
134
+ image=[inpaint_image, zoe_image],
135
+ guidance_scale=6.5,
136
+ num_inference_steps=25,
137
+ generator=generator,
138
+ controlnet_conditioning_scale=[0.5, 0.8],
139
+ control_guidance_end=[0.9, 0.6],
140
+ ).images[0]
141
+
142
+ return image
143
+
144
+ prompt = "nike air jordans on a basketball court"
145
+ negative_prompt = ""
146
+
147
+ temp_image = generate_image(prompt, negative_prompt, white_bg_image, image_zoe, 908097)
148
+ ```
149
+
150
+ Paste the original image over the initial outpainted image. You'll improve the outpainted background in a later step.
151
+
152
+ ```py
153
+ x = (1024 - resized_img.width) // 2
154
+ y = (1024 - resized_img.height) // 2
155
+ temp_image.paste(resized_img, (x, y), resized_img)
156
+ temp_image
157
+ ```
158
+
159
+ <div class="flex justify-center">
160
+ <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/initial-outpaint.png"/>
161
+ </div>
162
+
163
+ > [!TIP]
164
+ > Now is a good time to free up some memory if you're running low!
165
+ >
166
+ > ```py
167
+ > pipeline=None
168
+ > torch.cuda.empty_cache()
169
+ > ```
170
+
171
+ Now that you have an initial outpainted image, load the [`StableDiffusionXLInpaintPipeline`] with the [RealVisXL](https://hf.co/SG161222/RealVisXL_V4.0) model to generate the final outpainted image with better quality.
172
+
173
+ ```py
174
+ pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
175
+ "OzzyGT/RealVisXL_V4.0_inpainting",
176
+ torch_dtype=torch.float16,
177
+ variant="fp16",
178
+ vae=vae,
179
+ ).to("cuda")
180
+ ```
181
+
182
+ Prepare a mask for the final outpainted image. To create a more natural transition between the original image and the outpainted background, blur the mask to help it blend better.
183
+
184
+ ```py
185
+ mask = Image.new("L", temp_image.size)
186
+ mask.paste(resized_img.split()[3], (x, y))
187
+ mask = ImageOps.invert(mask)
188
+ final_mask = mask.point(lambda p: p > 128 and 255)
189
+ mask_blurred = pipeline.mask_processor.blur(final_mask, blur_factor=20)
190
+ mask_blurred
191
+ ```
192
+
193
+ <div class="flex justify-center">
194
+ <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/blurred-mask.png"/>
195
+ </div>
196
+
197
+ Create a better prompt and pass it to the `generate_outpaint` function to generate the final outpainted image. Again, paste the original image over the final outpainted background.
198
+
199
+ ```py
200
+ def generate_outpaint(prompt, negative_prompt, image, mask, seed: int = None):
201
+ if seed is None:
202
+ seed = random.randint(0, 2**32 - 1)
203
+
204
+ generator = torch.Generator(device="cpu").manual_seed(seed)
205
+
206
+ image = pipeline(
207
+ prompt,
208
+ negative_prompt=negative_prompt,
209
+ image=image,
210
+ mask_image=mask,
211
+ guidance_scale=10.0,
212
+ strength=0.8,
213
+ num_inference_steps=30,
214
+ generator=generator,
215
+ ).images[0]
216
+
217
+ return image
218
+
219
+ prompt = "high quality photo of nike air jordans on a basketball court, highly detailed"
220
+ negative_prompt = ""
221
+
222
+ final_image = generate_outpaint(prompt, negative_prompt, temp_image, mask_blurred, 7688778)
223
+ x = (1024 - resized_img.width) // 2
224
+ y = (1024 - resized_img.height) // 2
225
+ final_image.paste(resized_img, (x, y), resized_img)
226
+ final_image
227
+ ```
228
+
229
+ <div class="flex justify-center">
230
+ <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/final-outpaint.png"/>
231
+ </div>
diffusers/docs/source/en/api/activations.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Activation functions
14
+
15
+ Customized activation functions for supporting various models in 🤗 Diffusers.
16
+
17
+ ## GELU
18
+
19
+ [[autodoc]] models.activations.GELU
20
+
21
+ ## GEGLU
22
+
23
+ [[autodoc]] models.activations.GEGLU
24
+
25
+ ## ApproximateGELU
26
+
27
+ [[autodoc]] models.activations.ApproximateGELU
diffusers/docs/source/en/api/attnprocessor.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Attention Processor
14
+
15
+ An attention processor is a class for applying different types of attention mechanisms.
16
+
17
+ ## AttnProcessor
18
+ [[autodoc]] models.attention_processor.AttnProcessor
19
+
20
+ ## AttnProcessor2_0
21
+ [[autodoc]] models.attention_processor.AttnProcessor2_0
22
+
23
+ ## AttnAddedKVProcessor
24
+ [[autodoc]] models.attention_processor.AttnAddedKVProcessor
25
+
26
+ ## AttnAddedKVProcessor2_0
27
+ [[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
28
+
29
+ ## CrossFrameAttnProcessor
30
+ [[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
31
+
32
+ ## CustomDiffusionAttnProcessor
33
+ [[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
34
+
35
+ ## CustomDiffusionAttnProcessor2_0
36
+ [[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
37
+
38
+ ## CustomDiffusionXFormersAttnProcessor
39
+ [[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
40
+
41
+ ## FusedAttnProcessor2_0
42
+ [[autodoc]] models.attention_processor.FusedAttnProcessor2_0
43
+
44
+ ## SlicedAttnProcessor
45
+ [[autodoc]] models.attention_processor.SlicedAttnProcessor
46
+
47
+ ## SlicedAttnAddedKVProcessor
48
+ [[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor
49
+
50
+ ## XFormersAttnProcessor
51
+ [[autodoc]] models.attention_processor.XFormersAttnProcessor
52
+
53
+ ## AttnProcessorNPU
54
+ [[autodoc]] models.attention_processor.AttnProcessorNPU
diffusers/docs/source/en/api/configuration.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Configuration
14
+
15
+ Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which stores all the parameters that are passed to their respective `__init__` methods in a JSON-configuration file.
16
+
17
+ <Tip>
18
+
19
+ To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
20
+
21
+ </Tip>
22
+
23
+ ## ConfigMixin
24
+
25
+ [[autodoc]] ConfigMixin
26
+ - load_config
27
+ - from_config
28
+ - save_config
29
+ - to_json_file
30
+ - to_json_string
diffusers/docs/source/en/api/image_processor.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # VAE Image Processor
14
+
15
+ The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
16
+
17
+ All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or NumPy arrays as image inputs and return outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="latent"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
18
+
19
+ ## VaeImageProcessor
20
+
21
+ [[autodoc]] image_processor.VaeImageProcessor
22
+
23
+ ## VaeImageProcessorLDM3D
24
+
25
+ The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
26
+
27
+ [[autodoc]] image_processor.VaeImageProcessorLDM3D
28
+
29
+ ## PixArtImageProcessor
30
+
31
+ [[autodoc]] image_processor.PixArtImageProcessor
32
+
33
+ ## IPAdapterMaskProcessor
34
+
35
+ [[autodoc]] image_processor.IPAdapterMaskProcessor
diffusers/docs/source/en/api/internal_classes_overview.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Overview
14
+
15
+ The APIs in this section are more experimental and prone to breaking changes. Most of them are used internally for development, but they may also be useful to you if you're interested in building a diffusion model with some custom parts or if you're interested in some of our helper utilities for working with 🤗 Diffusers.
diffusers/docs/source/en/api/loaders/ip_adapter.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # IP-Adapter
14
+
15
+ [IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder.
16
+
17
+ <Tip>
18
+
19
+ Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading](../../using-diffusers/loading_adapters#ip-adapter) guide, and you can see how to use it in the [usage](../../using-diffusers/ip_adapter) guide.
20
+
21
+ </Tip>
22
+
23
+ ## IPAdapterMixin
24
+
25
+ [[autodoc]] loaders.ip_adapter.IPAdapterMixin
26
+
27
+ ## IPAdapterMaskProcessor
28
+
29
+ [[autodoc]] image_processor.IPAdapterMaskProcessor
diffusers/docs/source/en/api/loaders/lora.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # LoRA
14
+
15
+ LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the denoiser, text encoder or both. The denoiser usually corresponds to a UNet ([`UNet2DConditionModel`], for example) or a Transformer ([`SD3Transformer2DModel`], for example). There are several classes for loading LoRA weights:
16
+
17
+ - [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
18
+ - [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
19
+ - [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
20
+ - [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
21
+ - [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
22
+
23
+ <Tip>
24
+
25
+ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
26
+
27
+ </Tip>
28
+
29
+ ## StableDiffusionLoraLoaderMixin
30
+
31
+ [[autodoc]] loaders.lora_pipeline.StableDiffusionLoraLoaderMixin
32
+
33
+ ## StableDiffusionXLLoraLoaderMixin
34
+
35
+ [[autodoc]] loaders.lora_pipeline.StableDiffusionXLLoraLoaderMixin
36
+
37
+ ## SD3LoraLoaderMixin
38
+
39
+ [[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin
40
+
41
+ ## AmusedLoraLoaderMixin
42
+
43
+ [[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
44
+
45
+ ## LoraBaseMixin
46
+
47
+ [[autodoc]] loaders.lora_base.LoraBaseMixin
diffusers/docs/source/en/api/loaders/peft.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # PEFT
14
+
15
+ Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.
16
+
17
+ <Tip>
18
+
19
+ Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
20
+
21
+ </Tip>
22
+
23
+ ## PeftAdapterMixin
24
+
25
+ [[autodoc]] loaders.peft.PeftAdapterMixin
diffusers/docs/source/en/api/loaders/single_file.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Single files
14
+
15
+ The [`~loaders.FromSingleFileMixin.from_single_file`] method allows you to load:
16
+
17
+ * a model stored in a single file, which is useful if you're working with models from the diffusion ecosystem, like Automatic1111, and commonly rely on a single-file layout to store and share models
18
+ * a model stored in their originally distributed layout, which is useful if you're working with models finetuned with other services, and want to load it directly into Diffusers model objects and pipelines
19
+
20
+ > [!TIP]
21
+ > Read the [Model files and layouts](../../using-diffusers/other-formats) guide to learn more about the Diffusers-multifolder layout versus the single-file layout, and how to load models stored in these different layouts.
22
+
23
+ ## Supported pipelines
24
+
25
+ - [`StableDiffusionPipeline`]
26
+ - [`StableDiffusionImg2ImgPipeline`]
27
+ - [`StableDiffusionInpaintPipeline`]
28
+ - [`StableDiffusionControlNetPipeline`]
29
+ - [`StableDiffusionControlNetImg2ImgPipeline`]
30
+ - [`StableDiffusionControlNetInpaintPipeline`]
31
+ - [`StableDiffusionUpscalePipeline`]
32
+ - [`StableDiffusionXLPipeline`]
33
+ - [`StableDiffusionXLImg2ImgPipeline`]
34
+ - [`StableDiffusionXLInpaintPipeline`]
35
+ - [`StableDiffusionXLInstructPix2PixPipeline`]
36
+ - [`StableDiffusionXLControlNetPipeline`]
37
+ - [`StableDiffusionXLKDiffusionPipeline`]
38
+ - [`StableDiffusion3Pipeline`]
39
+ - [`LatentConsistencyModelPipeline`]
40
+ - [`LatentConsistencyModelImg2ImgPipeline`]
41
+ - [`StableDiffusionControlNetXSPipeline`]
42
+ - [`StableDiffusionXLControlNetXSPipeline`]
43
+ - [`LEditsPPPipelineStableDiffusion`]
44
+ - [`LEditsPPPipelineStableDiffusionXL`]
45
+ - [`PIAPipeline`]
46
+
47
+ ## Supported models
48
+
49
+ - [`UNet2DConditionModel`]
50
+ - [`StableCascadeUNet`]
51
+ - [`AutoencoderKL`]
52
+ - [`ControlNetModel`]
53
+ - [`SD3Transformer2DModel`]
54
+ - [`FluxTransformer2DModel`]
55
+
56
+ ## FromSingleFileMixin
57
+
58
+ [[autodoc]] loaders.single_file.FromSingleFileMixin
59
+
60
+ ## FromOriginalModelMixin
61
+
62
+ [[autodoc]] loaders.single_file_model.FromOriginalModelMixin
diffusers/docs/source/en/api/loaders/textual_inversion.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Textual Inversion
14
+
15
+ Textual Inversion is a training method for personalizing models by learning new text embeddings from a few example images. The file produced from training is extremely small (a few KBs) and the new embeddings can be loaded into the text encoder.
16
+
17
+ [`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings.
18
+
19
+ <Tip>
20
+
21
+ To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide.
22
+
23
+ </Tip>
24
+
25
+ ## TextualInversionLoaderMixin
26
+
27
+ [[autodoc]] loaders.textual_inversion.TextualInversionLoaderMixin
diffusers/docs/source/en/api/loaders/unet.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # UNet
14
+
15
+ Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] function instead.
16
+
17
+ The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
18
+
19
+ <Tip>
20
+
21
+ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
22
+
23
+ </Tip>
24
+
25
+ ## UNet2DConditionLoadersMixin
26
+
27
+ [[autodoc]] loaders.unet.UNet2DConditionLoadersMixin
diffusers/docs/source/en/api/logging.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Logging
14
+
15
+ 🤗 Diffusers has a centralized logging system to easily manage the verbosity of the library. The default verbosity is set to `WARNING`.
16
+
17
+ To change the verbosity level, use one of the direct setters. For instance, to change the verbosity to the `INFO` level.
18
+
19
+ ```python
20
+ import diffusers
21
+
22
+ diffusers.logging.set_verbosity_info()
23
+ ```
24
+
25
+ You can also use the environment variable `DIFFUSERS_VERBOSITY` to override the default verbosity. You can set it
26
+ to one of the following: `debug`, `info`, `warning`, `error`, `critical`. For example:
27
+
28
+ ```bash
29
+ DIFFUSERS_VERBOSITY=error ./myprogram.py
30
+ ```
31
+
32
+ Additionally, some `warnings` can be disabled by setting the environment variable
33
+ `DIFFUSERS_NO_ADVISORY_WARNINGS` to a true value, like `1`. This disables any warning logged by
34
+ [`logger.warning_advice`]. For example:
35
+
36
+ ```bash
37
+ DIFFUSERS_NO_ADVISORY_WARNINGS=1 ./myprogram.py
38
+ ```
39
+
40
+ Here is an example of how to use the same logger as the library in your own module or script:
41
+
42
+ ```python
43
+ from diffusers.utils import logging
44
+
45
+ logging.set_verbosity_info()
46
+ logger = logging.get_logger("diffusers")
47
+ logger.info("INFO")
48
+ logger.warning("WARN")
49
+ ```
50
+
51
+
52
+ All methods of the logging module are documented below. The main methods are
53
+ [`logging.get_verbosity`] to get the current level of verbosity in the logger and
54
+ [`logging.set_verbosity`] to set the verbosity to the level of your choice.
55
+
56
+ In order from the least verbose to the most verbose:
57
+
58
+ | Method | Integer value | Description |
59
+ |----------------------------------------------------------:|--------------:|----------------------------------------------------:|
60
+ | `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` | 50 | only report the most critical errors |
61
+ | `diffusers.logging.ERROR` | 40 | only report errors |
62
+ | `diffusers.logging.WARNING` or `diffusers.logging.WARN` | 30 | only report errors and warnings (default) |
63
+ | `diffusers.logging.INFO` | 20 | only report errors, warnings, and basic information |
64
+ | `diffusers.logging.DEBUG` | 10 | report all information |
65
+
66
+ By default, `tqdm` progress bars are displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] are used to enable or disable this behavior.
67
+
68
+ ## Base setters
69
+
70
+ [[autodoc]] utils.logging.set_verbosity_error
71
+
72
+ [[autodoc]] utils.logging.set_verbosity_warning
73
+
74
+ [[autodoc]] utils.logging.set_verbosity_info
75
+
76
+ [[autodoc]] utils.logging.set_verbosity_debug
77
+
78
+ ## Other functions
79
+
80
+ [[autodoc]] utils.logging.get_verbosity
81
+
82
+ [[autodoc]] utils.logging.set_verbosity
83
+
84
+ [[autodoc]] utils.logging.get_logger
85
+
86
+ [[autodoc]] utils.logging.enable_default_handler
87
+
88
+ [[autodoc]] utils.logging.disable_default_handler
89
+
90
+ [[autodoc]] utils.logging.enable_explicit_format
91
+
92
+ [[autodoc]] utils.logging.reset_format
93
+
94
+ [[autodoc]] utils.logging.enable_progress_bar
95
+
96
+ [[autodoc]] utils.logging.disable_progress_bar
diffusers/docs/source/en/api/models/asymmetricautoencoderkl.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # AsymmetricAutoencoderKL
14
+
15
+ Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN*
20
+
21
+ Evaluation results can be found in section 4.1 of the original paper.
22
+
23
+ ## Available checkpoints
24
+
25
+ * [https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-1-5](https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-1-5)
26
+ * [https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-2](https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-2)
27
+
28
+ ## Example Usage
29
+
30
+ ```python
31
+ from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
32
+ from diffusers.utils import load_image, make_image_grid
33
+
34
+
35
+ prompt = "a photo of a person with beard"
36
+ img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
37
+ mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
38
+
39
+ original_image = load_image(img_url).resize((512, 512))
40
+ mask_image = load_image(mask_url).resize((512, 512))
41
+
42
+ pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
43
+ pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
44
+ pipe.to("cuda")
45
+
46
+ image = pipe(prompt=prompt, image=original_image, mask_image=mask_image).images[0]
47
+ make_image_grid([original_image, mask_image, image], rows=1, cols=3)
48
+ ```
49
+
50
+ ## AsymmetricAutoencoderKL
51
+
52
+ [[autodoc]] models.autoencoders.autoencoder_asym_kl.AsymmetricAutoencoderKL
53
+
54
+ ## AutoencoderKLOutput
55
+
56
+ [[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
57
+
58
+ ## DecoderOutput
59
+
60
+ [[autodoc]] models.autoencoders.vae.DecoderOutput
diffusers/docs/source/en/api/models/aura_flow_transformer2d.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # AuraFlowTransformer2DModel
14
+
15
+ A Transformer model for image-like data from [AuraFlow](https://blog.fal.ai/auraflow/).
16
+
17
+ ## AuraFlowTransformer2DModel
18
+
19
+ [[autodoc]] AuraFlowTransformer2DModel
diffusers/docs/source/en/api/models/autoencoder_oobleck.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # AutoencoderOobleck
14
+
15
+ The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.*
20
+
21
+ ## AutoencoderOobleck
22
+
23
+ [[autodoc]] AutoencoderOobleck
24
+ - decode
25
+ - encode
26
+ - all
27
+
28
+ ## OobleckDecoderOutput
29
+
30
+ [[autodoc]] models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput
31
+
32
+ ## OobleckDecoderOutput
33
+
34
+ [[autodoc]] models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput
35
+
36
+ ## AutoencoderOobleckOutput
37
+
38
+ [[autodoc]] models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput
diffusers/docs/source/en/api/models/autoencoder_tiny.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Tiny AutoEncoder
14
+
15
+ Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.
16
+
17
+ To use with Stable Diffusion v-2.1:
18
+
19
+ ```python
20
+ import torch
21
+ from diffusers import DiffusionPipeline, AutoencoderTiny
22
+
23
+ pipe = DiffusionPipeline.from_pretrained(
24
+ "stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
25
+ )
26
+ pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16)
27
+ pipe = pipe.to("cuda")
28
+
29
+ prompt = "slice of delicious New York-style berry cheesecake"
30
+ image = pipe(prompt, num_inference_steps=25).images[0]
31
+ image
32
+ ```
33
+
34
+ To use with Stable Diffusion XL 1.0
35
+
36
+ ```python
37
+ import torch
38
+ from diffusers import DiffusionPipeline, AutoencoderTiny
39
+
40
+ pipe = DiffusionPipeline.from_pretrained(
41
+ "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
42
+ )
43
+ pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
44
+ pipe = pipe.to("cuda")
45
+
46
+ prompt = "slice of delicious New York-style berry cheesecake"
47
+ image = pipe(prompt, num_inference_steps=25).images[0]
48
+ image
49
+ ```
50
+
51
+ ## AutoencoderTiny
52
+
53
+ [[autodoc]] AutoencoderTiny
54
+
55
+ ## AutoencoderTinyOutput
56
+
57
+ [[autodoc]] models.autoencoders.autoencoder_tiny.AutoencoderTinyOutput
diffusers/docs/source/en/api/models/autoencoderkl.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # AutoencoderKL
14
+
15
+ The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.*
20
+
21
+ ## Loading from the original format
22
+
23
+ By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
24
+ from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
25
+
26
+ ```py
27
+ from diffusers import AutoencoderKL
28
+
29
+ url = "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors" # can also be a local file
30
+ model = AutoencoderKL.from_single_file(url)
31
+ ```
32
+
33
+ ## AutoencoderKL
34
+
35
+ [[autodoc]] AutoencoderKL
36
+ - decode
37
+ - encode
38
+ - all
39
+
40
+ ## AutoencoderKLOutput
41
+
42
+ [[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
43
+
44
+ ## DecoderOutput
45
+
46
+ [[autodoc]] models.autoencoders.vae.DecoderOutput
47
+
48
+ ## FlaxAutoencoderKL
49
+
50
+ [[autodoc]] FlaxAutoencoderKL
51
+
52
+ ## FlaxAutoencoderKLOutput
53
+
54
+ [[autodoc]] models.vae_flax.FlaxAutoencoderKLOutput
55
+
56
+ ## FlaxDecoderOutput
57
+
58
+ [[autodoc]] models.vae_flax.FlaxDecoderOutput
diffusers/docs/source/en/api/models/autoencoderkl_cogvideox.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License. -->
11
+
12
+ # AutoencoderKLCogVideoX
13
+
14
+ The 3D variational autoencoder (VAE) model with KL loss used in [CogVideoX](https://github.com/THUDM/CogVideo) was introduced in [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) by Tsinghua University & ZhipuAI.
15
+
16
+ The model can be loaded with the following code snippet.
17
+
18
+ ```python
19
+ from diffusers import AutoencoderKLCogVideoX
20
+
21
+ vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-2b", subfolder="vae", torch_dtype=torch.float16).to("cuda")
22
+ ```
23
+
24
+ ## AutoencoderKLCogVideoX
25
+
26
+ [[autodoc]] AutoencoderKLCogVideoX
27
+ - decode
28
+ - encode
29
+ - all
30
+
31
+ ## AutoencoderKLOutput
32
+
33
+ [[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
34
+
35
+ ## DecoderOutput
36
+
37
+ [[autodoc]] models.autoencoders.vae.DecoderOutput
diffusers/docs/source/en/api/models/cogvideox_transformer3d.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License. -->
11
+
12
+ # CogVideoXTransformer3DModel
13
+
14
+ A Diffusion Transformer model for 3D data from [CogVideoX](https://github.com/THUDM/CogVideo) was introduced in [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) by Tsinghua University & ZhipuAI.
15
+
16
+ The model can be loaded with the following code snippet.
17
+
18
+ ```python
19
+ from diffusers import CogVideoXTransformer3DModel
20
+
21
+ vae = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-2b", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
22
+ ```
23
+
24
+ ## CogVideoXTransformer3DModel
25
+
26
+ [[autodoc]] CogVideoXTransformer3DModel
27
+
28
+ ## Transformer2DModelOutput
29
+
30
+ [[autodoc]] models.modeling_outputs.Transformer2DModelOutput
diffusers/docs/source/en/api/models/consistency_decoder_vae.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Consistency Decoder
14
+
15
+ Consistency decoder can be used to decode the latents from the denoising UNet in the [`StableDiffusionPipeline`]. This decoder was introduced in the [DALL-E 3 technical report](https://openai.com/dall-e-3).
16
+
17
+ The original codebase can be found at [openai/consistencydecoder](https://github.com/openai/consistencydecoder).
18
+
19
+ <Tip warning={true}>
20
+
21
+ Inference is only supported for 2 iterations as of now.
22
+
23
+ </Tip>
24
+
25
+ The pipeline could not have been contributed without the help of [madebyollin](https://github.com/madebyollin) and [mrsteyk](https://github.com/mrsteyk) from [this issue](https://github.com/openai/consistencydecoder/issues/1).
26
+
27
+ ## ConsistencyDecoderVAE
28
+ [[autodoc]] ConsistencyDecoderVAE
29
+ - all
30
+ - decode
diffusers/docs/source/en/api/models/controlnet.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # ControlNetModel
14
+
15
+ The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
20
+
21
+ ## Loading from the original format
22
+
23
+ By default the [`ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
24
+ from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
25
+
26
+ ```py
27
+ from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
28
+
29
+ url = "https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_canny.pth" # can also be a local path
30
+ controlnet = ControlNetModel.from_single_file(url)
31
+
32
+ url = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned.safetensors" # can also be a local path
33
+ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
34
+ ```
35
+
36
+ ## ControlNetModel
37
+
38
+ [[autodoc]] ControlNetModel
39
+
40
+ ## ControlNetOutput
41
+
42
+ [[autodoc]] models.controlnet.ControlNetOutput
43
+
44
+ ## FlaxControlNetModel
45
+
46
+ [[autodoc]] FlaxControlNetModel
47
+
48
+ ## FlaxControlNetOutput
49
+
50
+ [[autodoc]] models.controlnet_flax.FlaxControlNetOutput
diffusers/docs/source/en/api/models/controlnet_flux.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # FluxControlNetModel
14
+
15
+ FluxControlNetModel is an implementation of ControlNet for Flux.1.
16
+
17
+ The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
18
+
19
+ The abstract from the paper is:
20
+
21
+ *We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
22
+
23
+ ## Loading from the original format
24
+
25
+ By default the [`FluxControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
26
+
27
+ ```py
28
+ from diffusers import FluxControlNetPipeline
29
+ from diffusers.models import FluxControlNetModel, FluxMultiControlNetModel
30
+
31
+ controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
32
+ pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)
33
+
34
+ controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
35
+ controlnet = FluxMultiControlNetModel([controlnet])
36
+ pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)
37
+ ```
38
+
39
+ ## FluxControlNetModel
40
+
41
+ [[autodoc]] FluxControlNetModel
42
+
43
+ ## FluxControlNetOutput
44
+
45
+ [[autodoc]] models.controlnet_flux.FluxControlNetOutput
diffusers/docs/source/en/api/models/controlnet_hunyuandit.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # HunyuanDiT2DControlNetModel
14
+
15
+ HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
16
+
17
+ ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
18
+
19
+ With a ControlNet model, you can provide an additional control image to condition and control Hunyuan-DiT generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
20
+
21
+ The abstract from the paper is:
22
+
23
+ *We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
24
+
25
+ This code is implemented by Tencent Hunyuan Team. You can find pre-trained checkpoints for Hunyuan-DiT ControlNets on [Tencent Hunyuan](https://huggingface.co/Tencent-Hunyuan).
26
+
27
+ ## Example For Loading HunyuanDiT2DControlNetModel
28
+
29
+ ```py
30
+ from diffusers import HunyuanDiT2DControlNetModel
31
+ import torch
32
+ controlnet = HunyuanDiT2DControlNetModel.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-ControlNet-Diffusers-Pose", torch_dtype=torch.float16)
33
+ ```
34
+
35
+ ## HunyuanDiT2DControlNetModel
36
+
37
+ [[autodoc]] HunyuanDiT2DControlNetModel
diffusers/docs/source/en/api/models/controlnet_sd3.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # SD3ControlNetModel
14
+
15
+ SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3.
16
+
17
+ The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
18
+
19
+ The abstract from the paper is:
20
+
21
+ *We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
22
+
23
+ ## Loading from the original format
24
+
25
+ By default the [`SD3ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
26
+
27
+ ```py
28
+ from diffusers import StableDiffusion3ControlNetPipeline
29
+ from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
30
+
31
+ controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny")
32
+ pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet)
33
+ ```
34
+
35
+ ## SD3ControlNetModel
36
+
37
+ [[autodoc]] SD3ControlNetModel
38
+
39
+ ## SD3ControlNetOutput
40
+
41
+ [[autodoc]] models.controlnet_sd3.SD3ControlNetOutput
42
+
diffusers/docs/source/en/api/models/controlnet_sparsectrl.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License. -->
11
+
12
+ # SparseControlNetModel
13
+
14
+ SparseControlNetModel is an implementation of ControlNet for [AnimateDiff](https://arxiv.org/abs/2307.04725).
15
+
16
+ ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
17
+
18
+ The SparseCtrl version of ControlNet was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
19
+
20
+ The abstract from the paper is:
21
+
22
+ *The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial uncertainty. The research community thus leverages the dense structure signals, e.g., per-frame depth/edge sequences, to enhance controllability, whose collection accordingly increases the burden of inference. In this work, we present SparseCtrl to enable flexible structure control with temporally sparse signals, requiring only one or a few inputs, as shown in Figure 1. It incorporates an additional condition encoder to process these sparse signals while leaving the pre-trained T2V model untouched. The proposed approach is compatible with various modalities, including sketches, depth maps, and RGB images, providing more practical control for video generation and promoting applications such as storyboarding, depth rendering, keyframe animation, and interpolation. Extensive experiments demonstrate the generalization of SparseCtrl on both original and personalized T2V generators. Codes and models will be publicly available at [this https URL](https://guoyww.github.io/projects/SparseCtrl).*
23
+
24
+ ## Example for loading SparseControlNetModel
25
+
26
+ ```python
27
+ import torch
28
+ from diffusers import SparseControlNetModel
29
+
30
+ # fp32 variant in float16
31
+ # 1. Scribble checkpoint
32
+ controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-scribble", torch_dtype=torch.float16)
33
+
34
+ # 2. RGB checkpoint
35
+ controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-rgb", torch_dtype=torch.float16)
36
+
37
+ # For loading fp16 variant, pass `variant="fp16"` as an additional parameter
38
+ ```
39
+
40
+ ## SparseControlNetModel
41
+
42
+ [[autodoc]] SparseControlNetModel
43
+
44
+ ## SparseControlNetOutput
45
+
46
+ [[autodoc]] models.controlnet_sparsectrl.SparseControlNetOutput
diffusers/docs/source/en/api/models/dit_transformer2d.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # DiTTransformer2DModel
14
+
15
+ A Transformer model for image-like data from [DiT](https://huggingface.co/papers/2212.09748).
16
+
17
+ ## DiTTransformer2DModel
18
+
19
+ [[autodoc]] DiTTransformer2DModel
diffusers/docs/source/en/api/models/flux_transformer.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # FluxTransformer2DModel
14
+
15
+ A Transformer model for image-like data from [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).
16
+
17
+ ## FluxTransformer2DModel
18
+
19
+ [[autodoc]] FluxTransformer2DModel
diffusers/docs/source/en/api/models/hunyuan_transformer2d.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # HunyuanDiT2DModel
14
+
15
+ A Diffusion Transformer model for 2D data from [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT).
16
+
17
+ ## HunyuanDiT2DModel
18
+
19
+ [[autodoc]] HunyuanDiT2DModel
20
+
diffusers/docs/source/en/api/models/latte_transformer3d.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ ## LatteTransformer3DModel
14
+
15
+ A Diffusion Transformer model for 3D data from [Latte](https://github.com/Vchitect/Latte).
16
+
17
+ ## LatteTransformer3DModel
18
+
19
+ [[autodoc]] LatteTransformer3DModel
diffusers/docs/source/en/api/models/lumina_nextdit2d.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # LuminaNextDiT2DModel
14
+
15
+ A Next Version of Diffusion Transformer model for 2D data from [Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X).
16
+
17
+ ## LuminaNextDiT2DModel
18
+
19
+ [[autodoc]] LuminaNextDiT2DModel
20
+
diffusers/docs/source/en/api/models/overview.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Models
14
+
15
+ 🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
16
+
17
+ All models are built from the base [`ModelMixin`] class which is a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.
18
+
19
+ ## ModelMixin
20
+ [[autodoc]] ModelMixin
21
+
22
+ ## FlaxModelMixin
23
+
24
+ [[autodoc]] FlaxModelMixin
25
+
26
+ ## PushToHubMixin
27
+
28
+ [[autodoc]] utils.PushToHubMixin
diffusers/docs/source/en/api/models/pixart_transformer2d.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # PixArtTransformer2DModel
14
+
15
+ A Transformer model for image-like data from [PixArt-Alpha](https://huggingface.co/papers/2310.00426) and [PixArt-Sigma](https://huggingface.co/papers/2403.04692).
16
+
17
+ ## PixArtTransformer2DModel
18
+
19
+ [[autodoc]] PixArtTransformer2DModel
diffusers/docs/source/en/api/models/prior_transformer.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # PriorTransformer
14
+
15
+ The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples.*
20
+
21
+ ## PriorTransformer
22
+
23
+ [[autodoc]] PriorTransformer
24
+
25
+ ## PriorTransformerOutput
26
+
27
+ [[autodoc]] models.transformers.prior_transformer.PriorTransformerOutput
diffusers/docs/source/en/api/models/sd3_transformer2d.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # SD3 Transformer Model
14
+
15
+ The Transformer model introduced in [Stable Diffusion 3](https://hf.co/papers/2403.03206). Its novelty lies in the MMDiT transformer block.
16
+
17
+ ## SD3Transformer2DModel
18
+
19
+ [[autodoc]] SD3Transformer2DModel
diffusers/docs/source/en/api/models/stable_audio_transformer.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # StableAudioDiTModel
14
+
15
+ A Transformer model for audio waveforms from [Stable Audio Open](https://huggingface.co/papers/2407.14358).
16
+
17
+ ## StableAudioDiTModel
18
+
19
+ [[autodoc]] StableAudioDiTModel
diffusers/docs/source/en/api/models/stable_cascade_unet.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # StableCascadeUNet
14
+
15
+ A UNet model from the [Stable Cascade pipeline](../pipelines/stable_cascade.md).
16
+
17
+ ## StableCascadeUNet
18
+
19
+ [[autodoc]] models.unets.unet_stable_cascade.StableCascadeUNet
diffusers/docs/source/en/api/models/transformer2d.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Transformer2DModel
14
+
15
+ A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.
16
+
17
+ When the input is **continuous**:
18
+
19
+ 1. Project the input and reshape it to `(batch_size, sequence_length, feature_dimension)`.
20
+ 2. Apply the Transformer blocks in the standard way.
21
+ 3. Reshape to image.
22
+
23
+ When the input is **discrete**:
24
+
25
+ <Tip>
26
+
27
+ It is assumed one of the input classes is the masked latent pixel. The predicted classes of the unnoised image don't contain a prediction for the masked pixel because the unnoised image cannot be masked.
28
+
29
+ </Tip>
30
+
31
+ 1. Convert input (classes of latent pixels) to embeddings and apply positional embeddings.
32
+ 2. Apply the Transformer blocks in the standard way.
33
+ 3. Predict classes of unnoised image.
34
+
35
+ ## Transformer2DModel
36
+
37
+ [[autodoc]] Transformer2DModel
38
+
39
+ ## Transformer2DModelOutput
40
+
41
+ [[autodoc]] models.modeling_outputs.Transformer2DModelOutput
diffusers/docs/source/en/api/models/transformer_temporal.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # TransformerTemporalModel
14
+
15
+ A Transformer model for video-like data.
16
+
17
+ ## TransformerTemporalModel
18
+
19
+ [[autodoc]] models.transformers.transformer_temporal.TransformerTemporalModel
20
+
21
+ ## TransformerTemporalModelOutput
22
+
23
+ [[autodoc]] models.transformers.transformer_temporal.TransformerTemporalModelOutput
diffusers/docs/source/en/api/models/unet-motion.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # UNetMotionModel
14
+
15
+ The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
20
+
21
+ ## UNetMotionModel
22
+ [[autodoc]] UNetMotionModel
23
+
24
+ ## UNet3DConditionOutput
25
+ [[autodoc]] models.unets.unet_3d_condition.UNet3DConditionOutput
diffusers/docs/source/en/api/models/unet.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # UNet1DModel
14
+
15
+ The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 1D UNet model.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
20
+
21
+ ## UNet1DModel
22
+ [[autodoc]] UNet1DModel
23
+
24
+ ## UNet1DOutput
25
+ [[autodoc]] models.unets.unet_1d.UNet1DOutput
diffusers/docs/source/en/api/models/unet2d-cond.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # UNet2DConditionModel
14
+
15
+ The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet conditional model.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
20
+
21
+ ## UNet2DConditionModel
22
+ [[autodoc]] UNet2DConditionModel
23
+
24
+ ## UNet2DConditionOutput
25
+ [[autodoc]] models.unets.unet_2d_condition.UNet2DConditionOutput
26
+
27
+ ## FlaxUNet2DConditionModel
28
+ [[autodoc]] models.unets.unet_2d_condition_flax.FlaxUNet2DConditionModel
29
+
30
+ ## FlaxUNet2DConditionOutput
31
+ [[autodoc]] models.unets.unet_2d_condition_flax.FlaxUNet2DConditionOutput
diffusers/docs/source/en/api/models/unet2d.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # UNet2DModel
14
+
15
+ The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
20
+
21
+ ## UNet2DModel
22
+ [[autodoc]] UNet2DModel
23
+
24
+ ## UNet2DOutput
25
+ [[autodoc]] models.unets.unet_2d.UNet2DOutput
diffusers/docs/source/en/api/models/unet3d-cond.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # UNet3DConditionModel
14
+
15
+ The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 3D UNet conditional model.
16
+
17
+ The abstract from the paper is:
18
+
19
+ *There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
20
+
21
+ ## UNet3DConditionModel
22
+ [[autodoc]] UNet3DConditionModel
23
+
24
+ ## UNet3DConditionOutput
25
+ [[autodoc]] models.unets.unet_3d_condition.UNet3DConditionOutput