Nestwork

Conditional 3D Furnished House Layout Generation with Heterogeneous Graph Variational Autoencoders

Deep Learning, Graph VAE, House Layout Generation, Heterogeneous Graph

Research Project

2025

Authors

Shuhan Miao, Biru Cao, Junling Zhuang

Publication

Under review — details to be announced

Note: This research is planned for publication in 2025 and is currently under review. For more detailed information, please watch out for updates!

💡

Can a heterogeneous graph model generate coherent fully furnished 3D house layouts by integrating room structures and furniture placement in a single framework? Does this approach improve realism and adaptability compared to traditional separate-step methods?

Introduction

Nestwork is a framework designed to automatically create 3D house layouts along with fully furnished interiors. Traditional approaches often focus on room layouts alone (in 2D) or on single-room furniture arrangements, making it challenging to generate coherent, house-scale designs with detailed furniture arrangements. Nestwork addresses this gap by encoding both room nodes and furniture nodes within a single heterogeneous graph, capturing how different components interact spatially. This way, the framework can generate a complete, cohesive 3D house that respects user-defined constraints (such as the number or types of rooms) while automatically placing suitable furniture within each room.

Our proposed pipeline generates 3D furnished house layouts from semantic room graphs using a graph-based conditional Variational Autoencoder (VAE) with an auotoregressive sampling mechanism.

Methods

Graph Formation:Each house is represented as a heterogeneous graph, where nodes can be: a house node (the root), room nodes (e.g., living room, bedroom) and furniture nodes (e.g., sofa, table). Edges store spatial relationships and relevant geometric features (e.g., relative distances, orientations).

Training: The encoder takes the full heterogeneous graph (rooms + furniture), learning to represent each node’s attributes (dimensions, orientation, shape) in a latent space. The decoder reconstructs bounding boxes, orientations, furniture categories, and furniture shape features from latent codes, thereby predicting a full 3D layout. An autoregressive prior sampling module predicts how many furniture pieces belong to each room and samples their latent codes from parent room features.

Inference: The input during inference is a room-only graph (what rooms exist and how they connect). The autoregressive module determines if any furniture goes into a particular room, how many pieces, and infers their latent codes. The decoder then converts those codes back into 3D boxes, orientations, and furniture shapes, creating a ready-to-visualize 3D furnished house.

Qualitative comparison of generated 3D furnished layouts from room-semantic graphs using different models. From left to right: input room graphs, Triplet-GCN (homogeneous GNN), Baseline 1 (i.i.d prior sampling from normal Gaussian), Baseline 2 (Room-conditioned prior i.i.d sampling), and Ours (heterogeneous graph-based model and autoregressive prior sampling).

See how this model is applied in a web-based design tool on the post Nestwork-webapp.