arxiv:2503.15567

Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling

Published on Mar 19

· Submitted by

Authors:

Abstract

3D molecule generation is crucial for drug discovery and material science, requiring models to process complex multi-modalities, including atom types, chemical bonds, and 3D coordinates. A key challenge is integrating these modalities of different shapes while maintaining SE(3) equivariance for 3D coordinates. To achieve this, existing approaches typically maintain separate latent spaces for invariant and equivariant modalities, reducing efficiency in both training and sampling. In this work, we propose Unified Variational Auto-Encoder for 3D Molecular Latent Diffusion Modeling (UAE-3D), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error. This unified latent space eliminates the complexities of handling multi-modality and equivariance when performing latent diffusion modeling. We demonstrate this by employing the Diffusion Transformer--a general-purpose diffusion model without any molecular inductive bias--for latent generation. Extensive experiments on GEOM-Drugs and QM9 datasets demonstrate that our method significantly establishes new benchmarks in both de novo and conditional 3D molecule generation, achieving leading efficiency and quality.

View arXiv page View PDF Add to collection

Community

lyc0930

Paper submitter 1 day ago

UAE-3D is a recent effort on addressing the important challenge in 3D molecule generation—integrating heterogeneous modalities (atom types, chemical bonds, and 3D coordinates) while preserving the SE(3) equivariance critical for spatial consistency. Previous methods often rely on separate latent spaces for invariant and equivariant information, which complicates both training and sampling. In contrast, UAE-3D, a multi-modal variational autoencoder that compresses 3D molecular data into a single, unified latent space with near-zero reconstruction error. This design removes the complexities inherent in handling multi-modality and equivariance in latent diffusion modeling.

librarian-bot

about 5 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.15567 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.15567 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.15567 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.