SOPHY

SOPHY: Learning to Generate Simulation-Ready Objects
with PHYsical Materials

Technical University of Crete

University of Massachusetts Amherst

[arXiv] [Code] [Data]

Abstract

We present SOPHY, a generative model for 3D physics-aware shape synthesis. Unlike existing 3D generative models that focus solely on static geometry or 4D models that produce physics-agnostic animations, our method jointly synthesizes shape, texture, and material properties related to physics-grounded dynamics, making the generated objects ready for simulations and interactive, dynamic environments. To train our model, we introduce a dataset of 3D objects annotated with detailed physical material attributes, along with an efficient pipeline for material annotation. Our method enables applications such as text-driven generation of interactive, physics-aware 3D objects and single-image reconstruction of physically plausible shapes. Furthermore, our experiments show that jointly modeling shape and material properties enhances the realism and fidelity of the generated shapes, improving performance on both generative geometry and physical plausibility.

Model Architecture

Results

(Best viewed in a PC instead of a mobile device)

$\triangleright$ Image-to-4D Results $\triangleleft$

Input

Condition

B. Dec.$\dagger$

B. Perc.$\ddagger$

SOPHY

B. Dec.$\dagger$

B. Perc.$\ddagger$

SOPHY

Too soft headband.

Too stiff headband.

Too soft headband.

Too stiff headband.

Expected outcome: Throw a headband.

Poor alignment.

Unrealistic shape.

Poor alignment.

Unrealistic shape.

Expected outcome: Throw a chair.

Too stiff / Poor alignment.

Expected outcome: Drop a bag.

$\triangleright$ Text-to-4D Results $\triangleleft$

B. Dec.$\dagger$

B. Perc.$\ddagger$

SOPHY

B. Dec.$\dagger$

B. Perc.$\ddagger$

SOPHY

Too stiff plant.

Too soft container.

Too stiff plant.

Too soft container.

Input Condition: Drag a planter made of a plant, a soil, a wood vase neck, and a metal container.

Unrealistic flat shape.

Too stiff body.

Unrealistic flat shape.

Too stiff body.

Input Condition: Throw a teddy bear made of a polyester eye, a cotton ear, a denim mouth, a denim leg, a wool body, a cotton arm, a cotton head, and a wool design.

Poor alignment.

Unrealistic dynamics.

Poor alignment.

Unrealistic dynamics.

Input Condition: Throw a pillow made of a dark navy blue pillow fabric, and a dark gray fabric piping.

$\triangleright$ Generalization Results $\triangleleft$
(Input images are from Objaverse)

Input

Input

$\triangleright$ Generalization Results $\triangleleft$
(Input images are from GSO)

Input

Input

Dataset Examples

$\triangleright$ Material Annotations $\triangleleft$

3D Shape

Teddy Bear

Part-level Annotations

3D Shape

Loveseat

Part-level Annotations

$\triangleright$ Simulation Sequences $\triangleleft$

3D Shape

3D Shape

Hat

Drop

Sofa

Drop

Bag

Throw

Teddy Bear

Throw

Chair

Tilt

Sofa

Tilt

Planter

Drag

Planter

Drag

Planter

Wind

Vase

Wind

BibTeX

@article{Cao_2025_SOPHY,
    author    = {Cao, Junyi and Kalogerakis, Evangelos},
    title     = {{SOPHY}: Learning to Generate Simulation-Ready Objects with Physical Materials},
    journal   = {arXiv:2504.12684},
    year      = {2025}
}

Remark

$\dagger$: "B. Dec." is a baseline method considered in our experiments. This baseline excludes color and material properties from the generation process, i.e., it generates a 3D shape, then predicts color conditioned on the shape through a decoder, and then the material through another decoder. The choice of this baseline attempts to answer the question of whether there is any benefit of incorporating the physical materials in the generation process. Please refer to our paper for more details.

$\ddagger$: "B. Perc." is a baseline method considered in our experiments. This baseline uses perceptual models to estimate material properties based on an off-the-shelf 3D generation model. Specifically, we adopt TRELLIS, a state-of-the-art 3D generation model, to generate textured 3D shapes given image or text conditions. To obtain the material properties of each generated shape, we then leverage an open-vocabulary 3D part segmentation model, Find3D, to get the part labels for the sampled surface points. Note that the query part names we provide to Find3D are derived from the set of part labels in our dataset, which comes from 3DCoMPaT200. Finally, we leverage ChatGPT-4o by providing it with two renderings of the generated 3D object and asking it to estimate the material properties for each part of the object retrieved by Find3D. Please refer to our paper appendix for more details.

This webpage is modified from the template provided here.