UniPart: Part-Level 3D Generation with Unified 3D Geom-Seg Latents

Xufan He1,2,*, Yushuang Wu2, *‡, Xiaoyang Guo2, Chongjie Ye2,3, Jiaqing Zhou2, Tianlei Hu2, Xiaoguang Han3, Dong Du1,†

1Nanjing University of Science and Technology 2ByteDance 3The Chinese University of Hong Kong, Shenzhen
*Equal Contribution    Corresponding Author    Project Leader
This work was done by Xufan He as an intern at ByteDance supervised by Yushuang Wu.


Part-level 3D generation with unified 3D geometry and segmentation latents.

Method Pipeline

Qualitative results of part-level 3D generation for a given image. We visualize the “exploded” parts in the first line of each mesh pair for better visualization of part generation. Our UniPart can produce more reasonable part segmentations and higher-quality part geometries.

Abstract

Part-level 3D generation is essential for applications requiring decomposable and structured 3D synthesis. However, existing methods either rely on implicit part segmentation with limited granularity control or depend on strong external segmenters trained on large annotated datasets. In this work, we observe that part awareness emerges naturally during whole-object geometry learning and propose Geom-Seg VecSet, a unified geometry-segmentation latent representation that jointly encodes object geometry and part-level structure. Building on this representation, we introduce UniPart, a two-stage latent diffusion framework for image-guided part-level 3D generation. The first stage performs joint geometry generation and latent part segmentation, while the second stage conditions part-level diffusion on both whole-object and part-specific latents. A dual-space generation scheme further enhances geometric fidelity by predicting part latents in both global and canonical spaces. Extensive experiments demonstrate that UniPart achieves superior segmentation controllability and part-level geometric quality compared with existing approaches.

Pipeline

Method Pipeline

The pipeline of UniPart. It includes a Geom-Seg VAE that encodes both whole geometry and part segmentation information into a unified representation, Geom-Seg VecSet. The image-guided part-level generation adopts a two-level pipeline, where a whole-level DiT first generates the whole geometry and segmented part latent, and a part-level DiT then accepts the input image and the whole-part latent as conditions for dual-space part latent generation. The final object mesh is composed of each full-resolution part mesh.

BibTeX

@misc{he2025unipartpartlevel3dgeneration,
      title={UniPart: Part-Level 3D Generation with Unified 3D Geom-Seg Latents}, 
      author={Xufan He and Yushuang Wu and Xiaoyang Guo and Chongjie Ye and Jiaqing Zhou and Tianlei Hu and Xiaoguang Han and Dong Du},
      year={2025},
      eprint={2512.09435},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.09435}, 
}