Deep Spatial Memory

Quantifying Spatial Experiences Via Agent-Modeling and Deep Learning

Deep Learning, Spatial Experience, Agent-driven Simulation, Isovist

Research Project

2024

Supervisor

Takehiko Nagakura

Authors

Shuhan Miao, Wenzhe Peng, Daniel Tsai, Takehiko Nagakura

Publication

2024 CAADRIA Conference

https://doi.org/10.52842/conf.caadria.2024.1.109

Note: This blog post is a simplified summary of the original research paper, intended to make the key ideas more accessible. For a more detailed discussion, including methodology and results, please refer to the full paper here: https://doi.org/10.52842/conf.caadria.2024.1.109.

💡

Can spatial experiences inside a building be quantitatively represented? How can we compare these spatial experiences across different buildings?

Introduction

In architectural theory, the spatial experience is dynamic, evolving from sequences of interconnected views shaped by past encounters and future expectations. Traditional computational methods such as Isovists[1] provide geometric insights but fall short in representing their sequential nature. To address this gap, the project introduces a novel methodology that combines agent driven simulation, 3D Isovist sampling, and deep learning for quantitative analysis and comparison of spatial experiences in architecture. The methodology is first validated through a controlled experiment with various sequence typologies, affirming its efficacy in recognizing typological similarities. A case study is conducted comparing Louis Kahn's designs with Roman architecture, quantitatively analysing their intertwined spatial experiences. This research offers a framework for quantitatively comparing spatial experiences across buildings and interpreting the nuanced impact of historical references on modern spaces.

Methods

This study integrates agent-driven simulation, 3D Isovist sampling, and self-supervised representation learning to analyze spatial sequences in architecture. It aims to simulate pedestrian behavior, sample spatial geometry, and extract latent spatial experience representations.

Agent-Driven Simulation: Utilizes PedSim[2], a Grasshopper plug-in, which simulates pedestrian paths using the social force model and anticipatory collision avoidance. The algorithm moves agents through spaces, visiting points of interest that represent architectural features, like columns and arches. These paths are recorded as 3D polylines, reflecting diverse experiences in a unified spatial sequence.

3D Isovist Sampling: Implements a custom Python-scripted component in Grasshopper. Simulated paths are segmented into 30 points, at each of which rays are projected and distances at intersection points are computed. These distances are transformed into grayscale values and then into depth panorama images[3] . These images capture spatial geometry and are formatted as video sequences for further analysis.

Self-Supervised Representation Learning (SSL): Involves training models on large datasets without predefined labels. The study employs MemDPC[4] , a model using frame prediction as its pretext task. It divides each video into blocks, encoding them into embeddings. These are time-aggregated using Recurrent Neural Networks (RNNs) to extract context features, which are then condensed into a feature vector representing the latent spatial experience. The study uses sequential typologies as labels to translate SSL-acquired features into human-readable terms, analyzing these features to discern underlying similarities.

Diagram of Proposed Pipeline

Experiment I : Sequence Typologies

The first experiment in the study focuses on testing the self supervised learning (SSL) model's ability to extract features from a dataset of 2,700 path sequences. These sequences represent three space types: 'room', 'passage', and 'exterior', both at the start and end of each path, resulting in nine distinct combinations. The data undergoes augmentation during SSL training, including brightness adjustments, playback speed variations, cropping, and horizontal flips.

The SSL model's performance is first evaluated based on its validation accuracies in the frame prediction task, achieving 75% top-1 and 99% top-5 accuracies. This indicates robust feature learning without relying on labeled data. To further assess the learned features, unsupervised clustering using the K-means algorithm is applied. When comparing the clustering results with the ground truth of sequential typology labels, a 96% accuracy is achieved. The clustering’s relevance and precision are also confirmed by an Adjusted Mutual Information score of 0.95, indicating a high correlation between the unsupervised clustering outcomes and the actual labels. This demonstrates that the SSL-extracted features possess a self-clustered organization closely related to human-defined categories.

Additionally, a Multi-Layer Perceptron (MLP) classifier is trained on these features to convert them into human-readable labels. The classifier outputs a 6-D probability vector, predicting the start and end types of the sequences. It achieves high validation (99%) and test set (98%) accuracies, confirming its effectiveness in accurately categorizing sequential typologies from the SSL extracted features.

Left: Nine Sequential Typologies, Right: T-SNE Visualization of Classified Labels

Sampled Trajectories in Four Case Study Buildings

Experiment II: Comparative Architectural Case Study

The second experiment investigates the influence of Roman architecture on Louis Kahn's designs by fine-tuning the SSL model and MLP classifier from the first experiment. This study aims to analyze complex spatial relationships and nuanced design influences, a challenging task for traditional methods. The experiment is designed to mirror human cognitive processes, interpreting new spatial experiences based on prior knowledge.

The focus is on Kahn's designs, particularly influenced by his 1950 visit to Rome, as documented in various sources[5] . The Indian Institute of Management (IIM) was chosen for its design references to Roman ruins, intended to evoke monumentality. Additionally, the Pantheon, Trajan’s Market, and Baths of Caracalla, known to have influenced Kahn, were included in the study.

The experiment involves a sampling strategy that selects trajectories based on architectural significance. These trajectories highlight unique design elements, areas frequently visited, or spaces with notable architectural details. The final selection includes 12 trajectories, each simulated with 100 paths to enrich the dataset.

The SSL model and MLP classifier were fine-tuned using a typology dataset combined with a subset of the case study dataset, consisting of 31% of the total data. This subset included 5 trajectories with distinguishable sequential types, chosen to avoid ambiguous labels.The fine-tuned models achieved a 96% validation accuracy on the combined dataset and 94% on the labelled case study subset. They effectively grouped similar sequence typologies in the case study dataset.

The K-means clustering and nearest neighbour similarity analysis showed notable similarities between the IIM and Trajan’s Market, particularly in architectural elements like terraces and colonnades. They also highlighted differences with the Pantheon, Caracalla’s Baths, and a standard courtyard.This analysis suggests that the framework can uncover subtle connections between architectural elements, providing insights into Kahn's design strategy and philosophy. It complements traditional qualitative analyses by enabling a data-driven exploration of design choices.

Left is a nested scatter plot illustrating the predicted probabilities for 12 sampled trajectories across 9 sequential types. On the right, 3D plots highlight the probability vectors for specific start and end types, connecting pairs of points for each building

Left half shows the prediction results of simulated paths in Pantheon Porch using trained classifier. Right half shows analysis of 'Exterior to Exterior' feature vectors

Conclusion

This research demonstrates a powerful analytical pipeline that combines deep learning's computational pattern recognition with the nuanced interpretation required in architectural studies. Future enhancements include incorporating additional sensory channels into the model, like semantic segmentation or object detection, and refining Isovist data resolution for a more detailed understanding of architectural elements. These improvements aim to directly visualize the links between model features and specific architectural aspects. Moreover, by expanding the dataset and using clustering algorithms, the research seeks to quantify common spatial experience attributes, exemplified by 'monumentality' in the case study, thus establishing a new approach in architectural analysis that marries qualitative insights with quantitative accuracy.

[1] M. Benedikt, “To Take Hold of Space: Isovists and Isovist Fields,” Environment and Planning B: Planning and Design, vol. 6, no. 1, pp. 47–65, 1979, doi: 10.1068/b060047.

[2] J. Riise, PedSim. 2022. [Online]. Available: https://github.com/julianriise/pedsim

[3] W. Peng, F. Zhang, and T. Nagakura, “Machines’ Perception of Space,” in Proceedings of the 37th Annual Conference of the Association of Computer Aided Design in Architecture (ACADIA), Oct. 2017, pp. 474–481.

[4] T. Han, W. Xie, and A. Zisserman, “Memory-augmented Dense Predictive Coding for Video Representation Learning,” arXiv, 2020, doi: 10.48550/arxiv.2008.01065.

[5] E. Barizza, Rome and the legacy of Louis I. Kahn. in Routledge research in architecture. Abingdon, Oxon ; New York, NY: Routledge, an imprint of the Taylor & Francis Group.