mi804 commited on
Commit
a1919f5
·
verified ·
1 Parent(s): c72a4b5

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+ license: apache-2.0
5
+ tasks:
6
+ - any-to-any
7
+ ---
8
+
9
+ ## What is the Nexus-Gen
10
+ Nexus-Gen is a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models. To align the embedding space of the LLM and diffusion model, we conduct a dual-phase alignment training process. (1) The autoregressive LLM learns to predict image embeddings conditioned on multimodal inputs, while (2) the vision decoder is trained to reconstruct high-fidelity images from these embeddings. During training the LLM, we identified a critical discrepancy between the autoregressive paradigm's training and inference phases, where error accumulation in continuous embedding space severely degrades generation quality. To avoid this issue, we introduce a prefilled autoregression strategy that prefills input sequence with position-embedded special tokens instead of continuous embeddings. Through dual-phase training, Nexus-Gen has developed the integrated capability to comprehensively address the image understanding, generation and editing tasks as follows.
11
+
12
+ More information please refer to our repo: https://github.com/modelscope/Nexus-Gen.git
13
+
14
+ ![cover](assets/illustrations/gen_edit.jpg)
15
+ ![architecture](assets/illustrations/architecture.png)
16
+
17
+ ## Getting Started
18
+ ### Installation
19
+ 1. Install [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio.git) from source:
20
+ ```shell
21
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
22
+ cd DiffSynth-Studio
23
+ pip install -e .
24
+ ```
25
+ 2. Install requirements
26
+ ```
27
+ pip install -r requirements.txt
28
+ ```
29
+ 3. Install [ms-swift](https://github.com/modelscope/ms-swift.git) if you want to perform finetuning on Nexus-Gen.
30
+ ```
31
+ pip install ms-swift -U
32
+ ```
33
+ ### Prepare models
34
+ ```shell
35
+ python download_models.py
36
+ ```
37
+ ### Image Understanding
38
+ ```shell
39
+ python image_understanding.py
40
+ ```
41
+
42
+ ### Image Generation
43
+ image generation with detailed prompt.
44
+ ```shell
45
+ python image_generation.py
46
+ ```
47
+ Polish prompt and generate images with Nexus-Gen.
48
+ ```shell
49
+ image_generation_with_selfpolish.py
50
+ ```
51
+
52
+ ### Image Editing
53
+ ```shell
54
+ python image_editing.py
55
+ ```
56
+
57
+ ### Training Codes
58
+ Nexus-Gen is trained base on [ms-swift](https://github.com/modelscope/ms-swift.git) and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio.git). You can find the training scripts in `train/scripts/train_decoder.sh` and `train_llm.sh`.