
PavilionNet
A StructureNet-Based Approach to Encode 3D Objects on Synthetic Architecture Dataset
Deep Learning, Recursive Graph VAE, Procedural Geometry Generation
💡
How can deep learning, specifically a Variational Autoencoder (VAE) framework, be used to encode and interpolate procedural architectural geometries? Can this approach create a unified latent representation that eliminates the need for bespoke generative scripts for different typologies?
Introduction
3D object representation methods—point clouds, meshes, voxels, neural implicit—typically focus on geometric features often overlooking the complex composition of architectural designs. These designs, far from being singular, comprise multiple layers, each with unique architectural semantics. This project introduces an innovative method to encode the composition of 3D architectural designs using StructureNet [1] , a hierarchical graph network. StructureNet effectively captures the geometry, hierarchical semantics and sibling relationships of 3D architecture, providing a more comprehensive representation.
Traditionally, StructureNet relies on PartNet [2] , an annotated dataset. Extending its use to custom object categories can be difficult. To overcome this, the project adapts StructureNet to synthetic datasets using Grasshopper, a prevalent parametric software in architectural design. Grasshopper script was developed to generate unique pavilion designs with ten typologies, unifying the process of data generation annotation. The StructureNet model was altered and trained on this dataset. The goal is to construct a high-quality latent space after training, providing a basis for future design inspiration and exploration.
Methods
This section introduces the innovative approaches employed in generating 3D architectural geometry. It details the application of StructureNet, a process for synthetic data generation, and the significant modifications made to the network architecture.
StructureNet: This innovative approach interprets 3D shapes by integrating geometric information, hierarchical semantics, and relationships between sibling nodes. Based on a Variational Autoencoder (VAE) [3] architecture, it encodes the root node into a 256-dimensional vector to generate shapes. Developed using the PartNet Dataset, annotated by 66 professionals across 24 categories, StructureNet faces limitations due to its labor-intensive annotation process and reliance on objects sourced from online platforms, which may deviate from original designs.
Synthetic Data Generation: To overcome these limitations, the project adapts StructureNet to synthetic datasets. This involves defining pavilion semantic labels and employing a Grasshopper script to generate varied architectural elements. Each element's geometric features are randomly produced within specific ranges, and their details are captured using oriented bounding boxes and hierarchical structures. This process, which integrates generation and annotation, includes a novel Grasshopper Python script for data normalization and edge detection between sibling nodes.
Network Modification: The StructureNet framework underwent modifications to cater to synthetic architectural datasets. Simplifying the geometric feature representation to only oriented bounding boxes, the network was streamlined to a box-only version. The modified network uses a Variational Autoencoder with a recursive encoder and decoder that process graph structures hierarchically. It includes a Child Encoder using graph convolutional neural networks and a Box Encoder for parameter encoding. The decoder has an additional Leaf Classifier block for predict leaf nodes. The training process involved tuning hyperparameters like latent feature dimensions and loss weights to optimize performance on the customized dataset.

Diagram of Proposed Pipeline
Procedurally-Generated Typologies
To address the limitations of the original dataset, the project adapted StructureNet for use with synthetic datasets, integrating part generation and annotation. A set of semantic labels for pavilion structures was created, including elements like roofs, supporting structures, and ground features. A Grasshopper script was then utilized to generate different elements, considering parameters such as height, position, aspect ratio, and rotation angle. This approach aligns with standard parametric design modeling processes. Geometric features of each element were generated randomly within specific ranges. Additionally, the representation of each subcomponent was enhanced using oriented bounding boxes and hierarchical structures for a more precise depiction of architectural forms. For network compatibility, new Grasshopper Python components were developed. These components normalized the generated pavilions into a unit sphere, integrated them into the root node, and identified edges between child nodes. This scripting process, largely original, marked a departure from the initial scripts used in StructureNet. The dataset created consisted of 1,000 pavilions, each with 10 distinct typologies, generated through a specialized Grasshopper script. These variations, inspired by real-world pavilion designs, aimed to provide a diverse range of data for the development of a comprehensive latent space. The dataset was divided into training, validation, and test sets in a 7:2:1 ratio.

Procedural Generation of Pavilion Typologies for Synthetic Dataset

Grasshopper Script for the Procedural Generation
Reconstruction Results
Training for this project was executed using Google Colab for 100 epochs and completing in approximately one day. Modifications to the model's structure and hyperparameters resulted in satisfactory reconstruction outcomes. While the box reconstruction quality was notably high, semantic reconstruction did not reach optimal levels for certain typologies. Despite this, the model demonstrated a robust capability in encoding pavilions into latent code and successfully decoding them into 3D compositions. An analysis of the latent space revealed an interesting pattern: pavilions sharing similar typologies tended to cluster together in similar positions. This clustering provides insight into the model's ability to discern and categorize different architectural forms based on their underlying typologies.

T-SNE Visualization of the Latent Space of Reconstruction Results
Interpolation
The interpolation process in this study involves blending the latent codes of a source pavilion and a target pavilion, both of which are encoded using a trained encoder. The resulting interpolated latent codes are then decoded by the trained decoder, facilitating the exploration of new typologies. This approach is particularly intriguing for designers, as it enables the merging of two distinct typologies to uncover potential new designs. Such a capability is challenging to replicate in traditional manual design processes, highlighting the unique strengths of neural networks' numerical nature in fostering innovative design exploration. This method not only enhances the creative design process but also exemplifies the transformative potential of machine learning in architectural design.

Interpolation Results of Pavilion Typologies

Rendering of Generated Pavilions
Conclusion
Architectural-scale objects like pavilions pose unique difficulties due to the vast dimensional disparities between components, which necessitates further adjustments in network structures and hyperparameters. The training process is time-consuming, and the current results are not yet optimal. Limitations also arise from the simplified semantic labeling used in the study. Future research should focus on enhancing semantic labeling, customizing network structures, and incorporating semantic constraints into objective functions for a more comprehensive exploration of the design space. Potential directions include exploring detailed geometric representations such as neural implicit or point clouds, while efficiently computing inter-part relationships. This study's insights lay the groundwork for advancing architectural design representation in the realm of machine learning and computational design
[1] K. Mo et al., “StructureNet,” Acm Transactions Graph Tog, vol. 38, no. 6, pp. 1–19, 2019, doi: 10.1145/3355089.3356527.
[2] K. Mo et al., “PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding,” 2019 Ieee Cvf Conf Comput Vis Pattern Recognit Cvpr, vol. 00, pp. 909–918, 2019, doi: 10.1109/cvpr.2019.00100.
[3] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” CoRR, vol. abs/1312.6114, 2013, [Online]. Available: https://api.semanticscholar.org/CorpusID:216078090