File size: 8,329 Bytes
c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 430edaa c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a 430edaa c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 109e314 c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 430edaa c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a c59a8fa 57ca87a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
---
'[object Object]': null
language:
- en
license: other
license_name: autodesk-non-commercial-3d-generative-v1.0
license_link: LICENSE.md
tags:
- make-a-shape
- sv-to-3d
---
---
# Model Card for Make-A-Shape Single-View to 3D Model
This model is part of the Make-A-Shape paper, capable of generating high-quality 3D shapes from single-view images with intricate geometric details, realistic structures, and complex topologies.
## Model Details
### Model Description
Make-A-Shape is a novel 3D generative framework trained on an extensive dataset of over 10 million publicly-available 3D shapes. The single-view to 3D model is one of the conditional generation models in this framework. It can efficiently generate a wide range of high-quality 3D shapes from single-view image inputs in just 2 seconds. The model uses a wavelet-tree representation and adaptive training strategy to achieve superior performance in terms of geometric detail and structural plausibility.
- **Developed by:** Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu
- **Model type:** 3D Generative Model
- **License:** Autodesk Non-Commercial (3D Generative) v1.0
For more information please look at the [Project](https://www.research.autodesk.com/publications/generative-ai-make-a-shape/) [Page](https://edward1997104.github.io/make-a-shape/) and [the ICML paper](https://proceedings.mlr.press/v235/hui24a.html).
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [https://github.com/AutodeskAILab/Make-a-Shape](https://github.com/AutodeskAILab/Make-a-Shape)
- **Paper:** [Make-A-Shape: a Ten-Million-scale 3D Shape Model](https://proceedings.mlr.press/v235/hui24a.html)
- **Demo:** [in progress...]
## Uses
### Direct Use
Please look at the instructions [here](https://github.com/AutodeskAILab/Make-a-Shape?tab=readme-ov-file#single-view-to-3d) to test this model for research and academic purposes.
### Downstream Use
This model could potentially be used in various applications such as:
- 3D content creation for gaming and virtual environments
- Augmented reality applications
- Computer-aided design and prototyping
- Architectural visualization
### Out-of-Scope Use
The model should not be used for:
- Commercial use
- Generating 3D shapes of sensitive or copyrighted content without proper authorization
- Creating 3D models intended for harmful or malicious purposes
- uses outside of the [Autodesk Acceptable Use Policy](https://www.autodesk.com/company/terms-of-use/en/acceptable-use)
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
- The model may inherit biases present in the training dataset, which could lead to uneven representation of certain object types or styles.
- The quality of the generated 3D shape depends on the quality and clarity of the input image.
- The model may occasionally generate implausible shapes, especially when the input image is ambiguous or of low quality.
- The model's performance may degrade for object categories or styles that are underrepresented in the training data.
### Recommendations
Users should be aware of the potential biases and limitations of the model. It's recommended to:
- Use high-quality, clear input images for best results
- Verify and potentially post-process the generated 3D shapes for critical applications
- Be cautious when using the model for object categories that may be underrepresented in the training data
- Consider ethical implications and potential biases
- DO NOT USE for commercial or public-facing applications
## How to Get Started with the Model
Please look at the instructions [here](https://github.com/AutodeskAILab/Make-a-Shape?tab=readme-ov-file#single-view-to-3d).
## Training Details
### Training Data
The model was trained on a dataset of over 10 million 3D shapes aggregated from 18 different publicly-available sub-datasets, including ModelNet, ShapeNet, SMPL, Thingi10K, SMAL, COMA, House3D, ABC, Fusion 360, 3D-FUTURE, BuildingNet, DeformingThings4D, FG3D, Toys4K, ABO, Infinigen, Objaverse, and two subsets of ObjaverseXL (Thingiverse and GitHub).
### Training Procedure
#### Preprocessing
Each 3D shape in the dataset was converted into a truncated signed distance function (TSDF) with a resolution of 256³. The TSDF was then decomposed using a discrete wavelet transform to create the wavelet-tree representation used by the model.
#### Training Hyperparameters
- **Training regime:** Please refer to the [paper](https://proceedings.mlr.press/v235/hui24a.html).
#### Speeds, Sizes, Times
- The model was trained on 48 × A10G GPUs for about 20 days, amounting to around 23,000 GPU hours.
- The model can generate shapes within two seconds for most conditions.
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was evaluated on a test set consisting of 2% of the shapes from each sub-dataset in the training data, as well as on the entire Google Scanned Objects (GSO) dataset, which was not part of the training data.
#### Factors
The evaluation considered various factors such as the quality of generated shapes, the ability to capture fine details and complex structures, and the model's performance across different object categories.
#### Metrics
The model was evaluated using the following metrics:
- Intersection over Union (IoU)
- Light Field Distance (LFD)
- Chamfer Distance (CD)
### Results
The single-view to 3D model achieved the following results on the "Our Val" dataset:
- LFD: 4071.33
- IoU: 0.4285
- CD: 0.01851
On the GSO dataset:
- LFD: 3406.61
- IoU: 0.5004
- CD: 0.01748
## Technical Specifications
### Model Architecture and Objective
The model uses a U-ViT architecture with learnable skip-connections between the convolution and deconvolution blocks. It employs a wavelet-tree representation and a subband adaptive training strategy to effectively capture both coarse and fine details of 3D shapes.
### Compute Infrastructure
#### Hardware
The model was trained on 48 × A10G GPUs.
## Citation
**BibTeX:**
@InProceedings{pmlr-v235-hui24a,
title = {Make-A-Shape: a Ten-Million-scale 3{D} Shape Model},
author = {Hui, Ka-Hei and Sanghi, Aditya and Rampini, Arianna and Rahimi Malekshan, Kamal and Liu, Zhengzhe and Shayani, Hooman and Fu, Chi-Wing},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {20660--20681},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hui24a/hui24a.pdf},
url = {https://proceedings.mlr.press/v235/hui24a.html},
abstract = {The progression in large-scale 3D generative models has been impeded by significant resource requirements for training and challenges like inefficient representations. This paper introduces Make-A-Shape, a novel 3D generative model trained on a vast scale, using 10 million publicly-available shapes. We first innovate the wavelet-tree representation to encode high-resolution SDF shapes with minimal loss, leveraging our newly-proposed subband coefficient filtering scheme. We then design a subband coefficient packing scheme to facilitate diffusion-based generation and a subband adaptive training strategy for effective training on the large-scale dataset. Our generative framework is versatile, capable of conditioning on various input modalities such as images, point clouds, and voxels, enabling a variety of downstream applications, e.g., unconditional generation, completion, and conditional generation. Our approach clearly surpasses the existing baselines in delivering high-quality results and can efficiently generate shapes within two seconds for most conditions.}
}
**APA:**
Hui, K. H., Sanghi, A., Rampini, A., Malekshan, K. R., Liu, Z., Shayani, H., & Fu, C. W. (2024). Make-A-Shape: a Ten-Million-scale 3D Shape Model. arXiv preprint arXiv:2401.08504.
## Model Card Contact
[[email protected]] |