VO Thesis Proposal 082716

APPLICATIONS OF DATA SCIENCE IN ARCHITECTURE
Can data science methods be applied to architectural practice?
Doctor of Professional Practice

Thesis Proposal
Carnegie Mellon University
School of Architecture
Victor Okhoya
August 2016
Thesis Committee:
Ramesh Krishnamurti, PhD. Professor, CMU School of Architecture
John Haymaker, PhD. Director of Research, Perkins+Will
Aarti Singh, PhD. Associate Professor, CMU Machine Learning Department
Daniel Cardoso Llach, PhD. Assistant Professor, CMU School of Architecture
Contents
1.
ABSTRACT.............................................................................................................................................................. 3
2.
INTRODUCTION ............................................................................................................................................... 4
3.
PROBLEM STATEMENT ................................................................................................................................ 4
4.
PURPOSE ................................................................................................................................................................ 7
5.
METHODOLOGIES ........................................................................................................................................15
6.
LITERATURE REVIEW .................................................................................................................................15
7.
CASE STUDIES...................................................................................................................................................20
8.
EXPERIMENT ....................................................................................................................................................27
9.
OUTLINE..............................................................................................................................................................28
10.
PREVIOUS WORK .......................................................................................................................................30
11.
TIME LINES ....................................................................................................................................................31
12.
BIBLIOGRAPHY ...........................................................................................................................................32
1. ABSTRACT
Data science can be defined as a set of fundamental principles that support and guide the principled
extraction of information and knowledge from data (Hosack et al., 2015). It is one of the fastest
growing areas of computer science and is finding applications in several fields like healthcare,
finance and manufacturing. Glassdoor, a popular website where employees rank companies and
their management, rated data scientist as the best job in the United States in 2016 1. However, data
science methods are not yet being vigorously engaged in both architectural research and practice.
This thesis investigates whether data science methods can be applied to solve problems and support
decision making in architectural practice.
The thesis is motivated by three ideas. First, disciplines closely related to architecture are using data
science methods to good effect. In particular, the thesis will review data science examples from
Construction Management and Building Performance Analysis. Second, architectural decision
making in practice can be shown to lack rigor in some circumstances. This thesis will argue that data
science methods can improve the rigor and accuracy of architectural decision making. It will discuss
examples of decision making in architecture that data science methods can impact. Third, data
science methods point to a potential paradigm shift in digital design technology. This is because data
science methods herald the possibility of autonomously intelligent design, and the application of
computational creativity to architectural design.
Applying data science to architecture requires clear definitions of both data science method and
architectural practice. A conceptual framework is developed to help define these terms but also to
provide a context for relating them to the broader research question. The conceptual framework is
formulated based on identified sources of data in large contemporary architectural practices like
Perkins+Will, the definition of architectural services given in the Architectural Institute of America
(AIA) handbook of professional practice, and a description of data science methods derived from
the management consultants Booz Allen Hamilton in their data science field guide.
The thesis will examine four case studies based on projects and research undertaken at Perkins+Will
between 2015 and 2017. The first is a case study of micro-polling as a client engagement strategy and
statistical analysis of micro-polling survey results. The second studies the generation, analysis and
visualization of parametric energy analysis data, and design optimization based on the data analysis.
The third examines the utilization of Autodesk Revit journal data for anomaly detection. The last case
study explores using Revit model data for project performance monitoring.
Finally, a validating experiment will be performed where early stage design is undertaken using both
conventional design methods and then using data science methods. A comparative analysis
methodology will be used to evaluate the impact of the data science methods on the design process.
Conclusions based on the case study analyses and experiment will be drawn.
1
Retrieved on 6 August 2016 from https://www.glassdoor.com/List/Best-Jobs-in-America-LST_KQ0,20.htm
2. INTRODUCTION
Data science is an amalgamation of several disciplines. These include machine learning, statistical
analysis and data visualization. As such, data science is an important emergent discipline. According
to the McKinsey Global Institute digital data is now in every sector, in every economy, and in every
organization that uses digital technology. The ability to store and analyze data has become more
accessible with improvements in computing and storage such as cloud computing. This data can be
used to generate value (Table 1). For example, it is estimated that the potential value of data to the
US health care industry could be as much as $ 300 billion per year (Roberts & Sikes, 2011).
Table 1. Data by the numbers (Roberts & Sikes, 2011).
$600.00
5 billion
30 billion
40%
235 terabytes
15 out of 17
$600 billion
60%
140,000 190,000
1.5 million
Cost of a disk drive that can store all the music in the world.
Mobile phones in use in 2010.
Pieces of content shared on Facebook every month.
Projected growth in data generated per year.
Amount of data collected by the US Library of Congress by April 2011.
Sectors in the US have more data per company than the US Library of Congress.
Potential annual consumer surplus from using personal location data globally.
Potential increase in retailers operating margins possible with big data.
Deep analytical talent positions needed to take full advantage of data in the US.
Data savvy managers needed to take full advantage of data in the US.
Data science is concerned with the collection, preparation, analysis, visualization, management and
preservation of large collections of information (Stanton, 2012). As mentioned, data science is
related to several disciplines including statistics, artificial intelligence (AI), data analytics, business
intelligence and data mining. Given that we are generating large amounts of data through the
internet as well as in industry, data science seeks to transform this data into value.
This is already happening in several disciplines. According to Kaggle, a leading data science website,
industries using data science methods include: healthcare, finance, retail, insurance, construction, life
sciences, hospitality, manufacturing, travel, education and utilities. These industries are using data
science methods for marketing, sales, logistics, risk analysis, customer support and human
resources 2. The question then arises: how does data science impact architectural practice? This thesis
is an investigation of the relationship between data science methods and architectural practice.
3. PROBLEM STATEMENT
The thesis will seek to answer the question: can data science methods be applied to architectural
practice? The justification for this question is based on the authors personal observations and
experiences working for fifteen years in an architectural practice environment in North America. In
2
Retrieved on 5 March 2016 from https://www.kaggle.com/wiki/DataScienceUseCases.
the authors experience architectural practitioners have not yet embraced data science methods. In
addition, while architectural researchers have begun to investigate data science methods in
architecture they are not yet as engaged as researchers from related disciplines, for example, civil
engineering.
Two pieces of evidence are given for these claims. First, review of thirty projects undertaken in the
last five years by two large North American practices, Kasian Architecture and Interior Designs
(Kasian) 3 and Perkins+Will 4, shows that compared to other recent computational technologies like
Building Information Modeling (BIM) and computational design, very few architectural projects
have applied data science methods in their execution (Table 2). Second, review of an international
architectural computation research publication, the International Journal for Architectural
Computing (IJAC) 5, shows about a quarter as much research into data science methods as a
comparable international civil engineering computation publication, the Journal of Computing in
Civil Engineering (JCCE) 6, over the last three years (Table 3). Taken together these two factors
suggest a gap in architectural research and practice with respect to data science methods.
This gap is at odds with the fact that architects have historically contemplated aspects of data science
like artificial intelligence. Christopher Alexander in Notes on the Synthesis of Form (Alexander, 1964), for
example, tried to apply AI thinking to solve the problem of growing complexity in architecture.
Nicholas Negroponte in The Architecture Machine (Negroponte, 1975) envisioned an architecture
machine with which a designer could have a creative, symbiotic dialogue. Cedric Price, according to
Royston Landau (Landau, 1984) sought in his Generator project to create an intelligent machine that
allowed users to set the terms of their interaction with architecture as opposed to accepting the
imposed will of the designer. 7 Investigating the impact of data science on architectural practice can
be seen in the light of this tradition.
Table 2. Use of data science methods on 30 recent projects at Kasian and Perkins+Will
PROJECT
DESCRIPTION
Okanagan Integrated
Health Care Facility
BC Hydro, Vernon
Diagnostic medical facility
Ifrane Palace
Joseph Brant Hospital
Industrial warehouse
Luxury residential
Design Build Finance hospital
redevelopment
Qatar Ministry of Interior 150 bed general hospital
General Hospital
Data Science
Methods
Computational
Design
BIM
Retrieved on 19 July 2016 from http://kasian.com/

Retrieved on 19 July 2016 from http://perkinswill.com/
5
Retrieved on 18 March 2015 from http://www.multi-science.co.uk/ijac.htm
6
Retrieved on 18 March 2015 from http://ascelibrary.org/page/jccee5/editorialboard
7
Reproduced from Bayesian Networks as an Architectural Decision Support Tool an unpublished paper submitted to the CMU
SOA in 2015.
3
4
Royal Inland Hospital
Clinical services building
St. Michaels Hospital

Redevelopment
Vancouver Airport
Terminal Improvement
Project
Audi Service
Design Building Finance patient

care tower
Airport terminal renovation
Porsche Vancouver
Car dealership
Kelowna Downtown
Hotel
RCMP Headquarters,
Kelowna
Sport Check, Vancouver
Hotel
Willow Park
Armory
Vandusen Botanical
Gardens
Shannon Mews
Urban recreation building
Chinook Hospital
Regional hospital
Earth Systems Sciences

Building
King Abdullah Financial
District
Pitt Rivers School
Academic building
Centre for Interactive

Research on
Sustainability
San Francisco Airport
Administrative Campus
Orchard Commons
Research centre
Car dealership
Police station
Retail store
Mixed use residential
Commercial building
K-12 School
Corporate campus building
True North
University academic and residential

building
Commercial Mixed Use
Marine Gateway
Great Northern Way
Ryerson University
Residence
YVR Miller Road
University residence
Pitt Rivers School
Middle school building
Ottawa Light Rail Transit
Light rail transit stations
Light industrial
Table 3. Data science related topics from journals of computing in architecture and civil engineering.
IJAC
JCCE
2013
2014
2015
2013
2014
2015
Data Science
Related Entries
2
3
5
34
47
66
Total Number % of Data Science

of Abstracts
Entries per Abstract
22
27
15.15%
17
69
68
61.25%
103
4. PURPOSE
4.1
Goals
The thesis question can be broken down into the following goals:
Define data science and data science methods in the context of architectural practice
Provide a rationale for researchers and practitioners to engage data science methods in
architecture
Establish a conceptual framework for analyzing and describing data science methods in
architectural practice
Analyze case studies of research and projects that have sought to apply data science methods in
architectural practice
Perform an experiment that validates the benefits of data science in architectural practice
compared to conventional design methods
Develop conclusions in response to the thesis question based on the analyses and experiments
undertaken
4.2
Rationale
The thesis will begin by providing literature-based rationale for architectural researchers and
practitioners to vigorously engage data science methods. It will give three arguments why data
science should be of interest to architectural researchers and practitioners. First, the thesis will
demonstrate that data science methods are being used to good effect in disciplines closely related to
architecture. In particular, the thesis will look at machine learning and genetic algorithm methods
being used in construction management, and machine learning methods being used to solve building
performance analysis problems.
Second, the thesis will argue that data science methods represent a more rigorous approach to
analysis and decision making in architecture. Using examples from practice the thesis will argue that
architectural decision making, in certain contexts, is in need of improved rigor and that data science
methods can provide such improvements including the ability to analyze complex problems more
accurately, the ability to improve the quality of decision making, and the ability to visualize problems
and solutions more effectively.
Third, the thesis will claim that architects need to be concerned about data science methods because
data science represents a potential paradigm shift in machine aided human cognition.
Whereas other historical information technologies in architecture like Computer Aided Design
(CAD) and Building Information Modelling (BIM) have seen the machine assist people in decision
making and task performance, data science is ushering in an age of autonomous machine
intelligence. It is likely that creative problem solving by the machine will come to the fore in this
new paradigm and we will move from trying to solve problems ourselves to trying to create
machines that will solve our problems for us. This fundamental shift in the person-machine
relationship makes it important for architects to begin investigating data science methods in design.
The thesis is written primarily from a design software experts perspective. That being said, the
thesis should be of interest to architectural practitioners, architectural researchers as well as
computer scientists. For architectural practitioners it provides an opportunity to improve process
and outcomes while leveraging the mountains of digital data they now routinely generate. For
architectural researchers it contributes to cross-disciplinary research between architecture and
computation. For computer scientists it helps open a unique niche sector for exploration in terms of
finding applications for innovative methods in data science.
4.3
Definitions
In order to effectively discuss the application of data science methods on architectural projects,
some definitions are required. In particular, it is useful to provide definitions of data science
methods as well as of architectural practice as they are discussed here. Data science methods are
defined as activities associated with the data science process (Figure 1). Data science methods are
thus distinct from general data processing activities. Data processing can be defined as the collection
of and manipulation of items of data to produce meaningful information (French, 1996). Data
processing is, as such, broadly defined and includes many data science activities in its definition.
However, it is clear that not all data processing activities are data science methods. For example,
relational database theory is not considered a part of data science according to this view. Architects,
like many other professionals, have been involved with data processing activities but not as much
with data science methods.
In this thesis the data science process is defined as being comprised of four steps: data collection,
data preparation, data analysis and data visualization. Each of these steps has a range of associated
activities (Table 4). It is these activities that we refer to as data science methods.
Figure 1. The data science process

It should be noted that the data science process as defined here is conceptually simplified. In reality
the different steps are more iterative. For example, data visualization is useful during exploration in
the data preparation stage as a means to establish the best strategy for analysis. Data visualization
can also be an integral part of the data analysis process, particularly with interactive visualizations.
Thus, for the data science conceptual framework proposed in this thesis, data visualization is
primarily focused on the results of data analysis and it is understood that this is a simplification.
Similarly, architectural practice is defined in terms of architectural services as described in The
Architects Handbook of Professional Practice (Demkin, 2001). Services are defined in three categories:
planning and pre-design, design and construction, and operations and maintenance (Table 5).
Planning and pre-design services seek to assist the client in defining the problem and constraints of
the project. Design and construction services describe activities between the schematic design phase
and the contract administration phase of a project. Operations and maintenance refers to activities
undertaken post-construction including post-occupancy evaluation and facilities management.
4.4
Contributions
The main contribution of this thesis is to help close the research and practice gap with respect to the
application of data science methods in architecture. While other authors have discussed the question
of data in contemporary architectural practice this thesis will provide a focus on system and
implementation details. The thesis discusses not only the theoretical and strategic value of data
science, but design software experts will also find technical details on specific tools and technologies.
Thus, for example, Deutsch (Deutsch, 2015) has written a comprehensive strategic treatise on data
driven methods in architecture but does not enter into implementation details. In an article titled
How Big Data is Transforming Architecture (Davis, 2015) Davis outlines how big data is impacting the
architectural profession but again does not provide implementation details. In both cases, this thesis
provides complementary material by investigating how data science strategic objectives can be
implemented in practice.
4.5
Conceptual Framework
In order to relate the case studies to each other as well as to the overall research question a
conceptual framework of data science in architectural practice is desirable (Figure 2). This framework
is based on the classification of data sources in architectural practice shown in Table 4, the
architectural services identified in Table 5 as well as on the description of data science methods given
in Table 4.
Like most other industries data sources in architecture have increased tremendously in recent years.
This can be attributed to several factors. Faster computers generate data at a faster rate and produce
more of it. Cloud computing frameworks allow for the processing of larger volumes of data. There
are many more data authoring tools and applications than ever before and there are also many more
proficient users of these data authoring tools. All this leads to an explosion of data in todays
architectural practices. Table 4 shows a classification of the common data sources identified at
Perkins+Will.
Figure 2. The data science conceptual framework in architectural practice

As mentioned above, the data science process involves data collection, data preparation, data
analysis and data visualization. These methods are inspired by the forty five methods listed as data
analytic techniques by Booz Allen Hamilton in The Field Guide to Data Science (Booz Allen Hamilton,
2016). They are selected as techniques the author has observed in research and practice at
Perkins+Will. Data collection involves gathering raw data from the listed data sources for the purposes
of analysis and visualization. Such data is rarely ready for data analysis in this raw form. Substantial
effort is required to prepare the data for analysis. This is the data preparation process. Thereafter
appropriate data analysis techniques are applied to the data from a selection of statistical and artificial
intelligence methods. Once the data is analyzed it is presented for decision making using appropriate
data visualization formats.
Architectural practice is defined in terms of architectural services as shown in Table 5. According to
the AIA Architects Handbook of Professional Practice (Demkin, 2001) these are the services that a typical
architectural organization offers its clients. The conceptual framework provides the means to
describe how a particular service can be enhanced using data science methods by processing data
10
derived from the sources of data in architecture. To apply the framework, a service is selected from
Table 5 and data sources applicable to that service are identified from Table 4. A description of the
data science process applied to the service, and based on the data sources, is then developed.
The thesis documents an example of the application of the conceptual framework based on a
research project named Building Data Analytics undertaken at Carnegie Mellon University (CMU)
School of Architecture (SOA) by Lasternas and Aziz 8 (Figure 3). The service provided is an energy
monitoring service from the Operations and Maintenance category. The data source is post
occupancy data. Data collection is achieved using internet of things sensors, metering data and utility
data. A sophisticated pipeline involving Microsoft Azure and scripting in Java is used for data
preparation. Data analysis is performed using machine learning algorithms that enable predictive
building performance monitoring. Finally, web-based dashboard interfaces are used for visualization
and reporting.
Four case studies will be discussed within the context of the conceptual framework as shown in
Table 6. Two case studies will involve services from Pre-design and Planning as well as Design and
Construction. The other two case studies will involve BIM data sources. The case studies represent
research projects and real world project applications of data science methods at Perkins+Will
between the years 2015 and 2017.
4.6
Limitations
This thesis will focus on the application of data science methods in the context of research and on
projects at a single large North American practice Perkins+Will. In discussing data science
methods, the thesis will restrict its focus to the list of methods identified as part of the data science
conceptual framework in section 4.5. This list is certainly not a comprehensive list of data science
methods such as is found in The Field Guide to Data Science (Booz Allen Hamilton, 2016), but it does
represent data science methods as observed by the author in practice at Perkins+Will. It is assumed
that these methods, and the conceptual framework, can generalize to other similar architectural
practices. Similarly, in discussing architectural practice, the thesis is restricted to the services
architects provide as defined in the Architectural Handbook of Professional Practice. Other aspects of
architectural practice could be amenable to data science research but this thesis will not consider that
question. Finally, there are other topics within the broad definition of data science such as Big Data
Analytics or Business Intelligence that the thesis will not explicitly address.
Lasternas, B. & Aziz, A. have undertaken the Building Data Analytics project at the CMU, School of Architecture since
2013.
8
11
Table 4. Data science methods.
DATA COLLECTION
(Data Sources)
Client Engagement Data

Space Planning Data
GIS Data
BIM Data
Energy Analysis Data
Computational Design Data
Business Systems Data
Post Occupancy Data
Unstructured Data
DATA PREPARATION
Filtering
Cleaning
Querying
Transformation
Normalization
Dimensional Reduction
DATA ANALYSIS
Machine Learning
Bayesian Networks
Statistical Analysis
Genetic Algorithms
Markov Decision Process
Design of Experiments
DATA VISUALIZATION
Parallel Coordinates Plots

Dendrograms
Tree Maps
Scatter Plot Matrixes
Box Plots
3D Graphs
Pivot Charts
Dashboards
Sankey Diagrams
12
Table 5. Architectural Services (Demkin, 2001).

ARCHITECTURAL SERVICES
Planning Predesign
Design Construction
Operations Maintenance
Programming,
Research Services,
Site Analysis,
Strategic Facility Planning,
Zoning Process Assistance.
Accessibility Compliance,
Architectural Acoustics,
Building Design,
Code Compliance,
Construction Documentation
Drawings,
Construction Documentation Specifications,
Construction Management,
Construction Procurement,
Contract Administration,
Design-Build,
Energy Analysis and Design,
Environmental Graphic
Design,
Historic Preservation,
Interior Design,
Lighting Design,
Seismic Analysis and Design,
Space Planning,
Sustainable Building Design.
Commissioning,
Construction Defect Analysis,
Energy Monitoring,
Facility Management,
Indoor Air Quality Consulting,
Move Management,
Post Occupancy Evaluation.
Figure 3. The Building Data Analytics process. (Source: CMU SOA)
13
Table 6. Case studies based on the conceptual framework

Services Based
CASE STUDIES
Micro-polling
Service Category
Planning Pre-Design
Parametric
Energy
Analysis
BIM Based
Anomaly
Detection
Design Construction
Project
Performance
Monitoring
Service/Data
Source
Strategic
Facilities
Planning
Energy Analysis
Building
Information
Modeling
Building
Information
Modeling
Description
Gathering survey
data during client
engagement and
establish a
framework for
statistical analysis
of survey data
Gather and parse

Revit journal data
and use it to detect
potential file issues
by performing
anomaly detection.
Gather Revit model

data and Deltek
Vison data and use
predictive analysis
to predict project
performance based
on model data
Data Collection
Micro-polling
using Current
technology
Generate large
parametric energy
analysis datasets
and use them for
interactive
visualization and
design
optimization
Cloud based data
generation using
Microsoft Azure
Data
Preparation
Transformation
of micro-poll
survey data for
analysis.
Data Analysis
Establish a
framework for
statistical analysis
of survey data
Data
Visualization
Use Watson
Analytics and MS
Excel for analysis
and visualization
Use Parallel
Coordinates Plots,
Bayesian
Networks and
Pivot Charts with
Slicers for
visualization
Use HTM Studio to

visualize and report
the anomalies in
the data.
Gather journal files Use Imaginit Clarity

using Power Shell
to automate export
scripting
of Revit data and
store in MS SQL
Server. Export data
from Deltek Vision
in csv format.
Prepare data using Parse Revit data
Transform data
Computational
using the Revit
using a MS SQL
Design tools like
Journal Reader
Server query and use
Grasshopper &
a stored procedure
energy analysis
to merge with Deltek
tools like Energy
Vision data
Plus
Use an objective
Pipe the parsed
Run machine
function in MS
data into an
learning algorithms
Excel and Design anomaly detection to train Revit data on
of Experiments in tool like HTM
Deltek Vision labels
JMP to analyse
Studio
and optimize data.
Perform predictive
classification on
new Revit models
and output reports
14
5. METHODOLOGIES
In discussing research methodology for this thesis Groat & Wangs book Architectural Research
Methods will be referenced 9. Groat & Wang recognize three levels of research activity: systems of
inquiry, research methodology and research methods. They identify systems of inquiry with research
philosophies like positivism, constructivism and critical theories. They identify research
methodologies with research strategies and research methods with research tactics. Based on their
exposition, three principle research methods will be used in this thesis: literature review, case studies
and experiment.
Since data science is a technical discipline the system of inquiry for this thesis is a post-positivist
approach. Post-positivist approaches are an evolution of positivist approaches. Positivist approaches
believe in an objective reality that can be fully understood. Post-positivist approaches are more
nuanced. They believe, rather, in an objective reality that can be known up to a level of probability.
Post-positivism is particularly suited to data science since data science methods are themselves
stochastic. The ontological assumptions of post-positivist systems of inquiry are the objectivity of
reality while the epistemological assumption of post-positivism is that the researcher is independent
of the research, and observes research variables in a dispassionate manner.
While the underlying philosophy of the research will be post-positivist, the specific research
methods will be literature review, case studies and experiment. Literature review will be used to
define data science and provide the rationale for the thesis. Four case studies will be used to
investigate whether architects can apply data science to architectural practice. An experiment will be
conducted to compare the conventional design process to a data science driven design process.
Finally, conclusions will be drawn based on the cases studies and experiments.
6. LITERATURE REVIEW
6.1
Definition and importance of data science
According to ONeil and Schutt (ONeill & Schutt, 2013), data science involves statistics (traditional
analysis), data munging (parsing, scraping and formatting data) as well as visualization (graphs,
interactive tools, etc.). They cite Drew Conways Data Science Venn Diagram (Figure 4) as a pithy
depiction of what data science entails. In the diagram, data science is depicted as an overlap between
mathematical and statistical skills, computer science skills (hacking skills) as well as domain
knowledge (substantive expertise). They explain that data science is emerging as an important
discipline at this point in history because of datafication. They describe datafication as the process of
taking all aspects of life and converting it into data.
Discussion is based on the book Architectural Research Methods. (David Wang, 2002)
15
Figure 4.The Venn Diagram of Data Science 10

Data science is helping organizations in industry as a whole be more competitive in an increasingly
data driven world. According to The Field Guide to Data Science (Booz Allen Hamilton, 2016) some of
the benefits of data science in industry are summarized in (Table 7). Data science is also improving
decision making. According to research by Milkman et al (Milkman, Chugh, & Bazerman, 2009)
better decision making is needed more than ever because errors are costly and will only get costlier.
Data science demonstrably improves decision making. This translates into enhanced productivity.
Research from MIT and Wharton shows that there is a 4-6% increase in productivity in
organizations that use data driven decision making compared to those that do not (Provost &
Fawcett, 2013).
Most significantly, perhaps, data science can enhance innovation and problem solving. For example
IBM Watson (Kelly, 2015), a pioneering cognitive computing platform, is pushing intelligent
computation to new frontiers. In the field of medical imaging, for example, Watson analyzes X-rays,
MRIs and ultrasound images. It processes the natural language of medical journals, textbooks and
articles. It uses machine learning to correct and improve its understanding. And it develops deep
knowledge representations and reasoning that can help surface possible diagnoses. In this way
Watson can recommend cures that human doctors cannot.
6.2
Data driven methods in architecture
In Data Driven Design and Construction, Deutsch provides a contemporary treatise on data driven
methods in architectural practice (Deutsch, 2015). He describes the benefits and challenges of
gathering, analyzing and applying building data in architecture. He discusses the factors leading to
the leveraging of data in the construction industry. He then describes methods for capturing,
analyzing and applying building data. He describes the significance of data for architectural
professionals and builds a case for leveraging data, concluding with a discussion of the future of data
in the Architecture Engineering Construction (AEC) industry.
10
Retrieved on 21st May 21, 2016 from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
16
Table 7. The Business Impacts of Data Science (Booz Allen Hamilton, 2016)
The Business Impacts of Data Science
17-49% increase in productivity when organizations increase data usability by 10%
11-42% return on assets (ROA) when organizations increase data access by 10%
241% increase in ROI when organizations use big data to improve competitiveness
1000% increase in ROI when deploying analytics across most of the organization, aligning daily operations
with senior management's goals, and incorporating big data
5-6% performance improvement for organizations making data-driven decisions
Deutsch identifies five factors that compel architectural practitioners to leverage data driven
methods. First is technology. The ability to process large quantities of data, access to cloud
computation and less expensive storage have made data driven methods easier to adapt. Second,
people are an important catalyst to change as a new generation embraces computation in all aspects
of life and develops new processes to leverage data driven methods in design. Third, although there
is more data than ever before, it is also easier to access this data than ever before through cloud
frameworks, web portals, company intranets, social media and traditional websites. Fourth, building
performance is becoming more important with increasing global concerns about sustainability.
Building performance analysis methods tend to be heavily data driven. Fifth, architects have begun
to understand that theirs is a fragmented industry with equally fragmented processes to the
detriment of their project delivery methods. They are increasingly looking to technology, including
data driven technology, to help improve their disjointed process.
Deutsch also recognizes five trends leading to the increase of data in the AEC industry. First,
instrumentation is being added to almost everything. The internet of things, as the network of
sensors and instruments is often referred to, is a massive source of real time data. Second,
datafication, described earlier, prescribes the conversion of all aspects of practice to data. Analog
content and processes are everywhere being converted to digital content and data driven processes.
Third, production methods and the demands of the supply chain require construction components
to be represented as data. This abets fabrication, procurement, tracking and installation. Fourth, data
is being relied upon more and more for the validation of designs and design decisions. Fifth, the
generating, analysing and visualizing of data leads to deeper insights into problems and their
potential solutions.
6.3
Data science in related fields
Data science is having an impact in fields closely related to architecture. In construction
management data science methods have been used to predict project success as well as to estimate
the cost at completion for construction projects. In Project success prediction using an evolutionary support
vector machine inference model Cheng et al (Cheng, Wu, & Wu, 2010) describe a model to predict project
success using a tool that integrates a support vector machine (SVM) with a genetic algorithm. In
17
Estimate at completion for construction projects using Evolutionary Gaussian Process Inference Model Cheng et al
(Huang & Cheng, 2011) employ a data driven artificial intelligence method of Estimate at
Completion (EAC) to extract historical data from previous projects, input the data into a Gaussian
Process algorithm for learning and then use Particle Swarm Optimization for optimizing the
process.
In Building Performance Analysis, data science methods have been used to solve occupant modeling
problems as well as improve efficiency of building systems during building operations. In Improving
Efficiency and Reliability of Building Systems using Machine Learning and Automated Online Evaluation (Wu et
al., 2012) the authors present an approach that uses machine learning and automated online
evaluation of historical and real time building data to improve efficiency of building operations. In
An Occupant Behavior Model Based on Artificial Intelligence for Energy Building Simulation Bonte et al (Bonte,
Thellier, Lartigue, & Perles, 2014) propose a new method aimed at reducing the uncertainty created
by oversimplified occupant models. Behavioural adaptation is considered the most important
occupant influence on building energy performance and thermal comfort is one of its main aspects.
The authors believe that statistical analysis is insufficient for the complex task of analyzing thermal
comfort based on human behaviour. They believe that a better model forecasts occupant behaviour
using AI. Their specific approach uses Reinforcement Learning.
6.4
Data science as a rigorous analysis method
Architects make decisions all the time. Many of these decisions are critical to the appropriate and
successful design and construction of their projects. In a world where the need for improved
decision making is increasingly important, how can architects improve the quality and accuracy of
design decision making. This thesis argues that data science can improve decision making and
therefore this should be incentive for architects and researchers to pursue data science methods.
Two examples from the authors experience in practice are discussed: one is the analysis of survey
data during client engagement and the second is the selection of best performing design options
during early stage design.
As part of the client engagement process for a corporate administrative campus project survey data
was gathered from end users with the goal of developing a basis of design document. The survey
sought to capture user experiences, concerns and departmental priorities among others. The
gathered data was analyzed by the architect and the results included in a report shared with the
client. Unfortunately, the analysis methods used were rudimentary with more emphasis placed on
the graphic outputs than on appropriate statistical methods. Indeed, some of the conclusions were
found to be in conflict with the actual data.
The second example involves early stage decision making with respect to energy and daylight
analysis. Architects are becoming more conscious of the need to improve building performance in
line with the global movement for a more sustainable planet. Many architects are performing
building performance analysis early in the design process to obtain guidance for critical early
decisions. However, it is known that some of the key design drivers in early stage design are
18
antagonistic. For example, improving daylight performance often adversely affects thermal
performance and vice-versa. Unfortunately, many architects do not use formal methods to arrive at
optimized decisions in resolving this type of conflict.
Data science methods can provide a data analysis framework for the statistical analysis of survey data
that improves the quality of the analysis and validates design hypotheses by performing statistical
significance tests. This improves the quality of design by using more accurate information for
decision making as well as allowing for the discovery of insights that are not evident without mining
the data. Data science methods can also assist with the optimization process of early stage building
performance analysis by developing objective functions that capture the interactive combinations of
critical design factors like daylighting and thermal comfort. Optimization methods like genetic
algorithms or the design of experiments can then be applied to these objective functions to
maximize desirability. The resulting combinations of inputs that yield optimized objective functions
are important information for the designer to possess and should lead to better performing design
outcomes.
6.5
Data science as a paradigm shift
Paradigm shifts are important because they have deeper impacts than the normal progress of
scientific development through incremental discovery. Paradigm shifts tend to be revolutionary in
character with far reaching implications, not just for the discipline in question, but for the entire
endeavour of human inquiry, and sometimes on human history itself. They are, therefore,
particularly worthy of attention whenever they are perceived to occur. In this thesis the claim is that
data science methods are of particular interest to the architect, as they are to others, because they
represent a potential paradigm shift in computation and human cognition.
In what way does data science represent a paradigm shift? To understand this, we first need to
understand that included under the broad umbrella of data science are several artificial intelligence
methods and models. While data science and AI are not synonymous, there is substantial overlap
with many modern AI methods being explicitly data driven. There are approaches to AI such as
decision theory that are not part of data science, and there are also parts of data science that are
separate from AI. Nonetheless AI methods like machine learning, Bayesian networks and artificial
neural networks are an important part of data science.
Right from its earliest inceptions in the 1950s AI promised to be one of the most impacting
paradigm shifts in the history of human cognition. For the first time humans would have another
sapient being to help with the challenging business of knowledge acquisition. Mankind had devised
machines, many marvelous, to help do work. But they had yet to devise a machine that could help
acquire knowledge or autonomously solve problems. The big promise of early AI was the prospect
of autonomous intelligence.
However, early AI soon proved to be a failure. Many of its promises went undelivered and the
discipline lapsed into what has been described as an AI winter (Russell & Norvig, 2009). Only in the
19
last two decades has there been a resurgence in AI interest and research. This resurgence has been
driven in large part by data driven intelligence in short by data science.
The arrival of true AI will undoubtedly be a monumental achievement, and if we are now in the
birth throes of this event then it is hard to think of any more significant subject for investigation. At
the very least, the advent of intelligent computation will allow machines to solve problems that
humans cannot by virtue of sheer complexity. We are already seeing this beginning to happen with
cognitive computation in healthcare (Kelly, 2015). In architecture, we must also start to ask what
impact data science in general, and AI methods in particular, will have on the design process.
7. CASE STUDIES
7.1
Micro-polling and statistical analysis
Micro-polling is used as a survey technique during client engagement. Micro-polling has two main
benefits for architectural surveys. First, it is mobile-based which means it is possible to capture user
responses at different times of the day and also while they are at different locations within a
building. Time and location makes a difference to user experience and capturing this nuance is
useful for design decision making. Second, micro-polling is typically set up to be repetitive sending
the same questions over and over again over a prescribed survey duration. This increases
participation, fidelity and sample size and makes the end results more accurate.
The micro-polling process allows design teams to collect insights related to the thermal, acoustic and
visual comfort of occupants. In addition, issues like productivity, engagement and wellness can also
be tracked. At Perkins+Will micro-polling involves the use of a propriety cloud based computing
platform (CBCP) called Current (Figure 5). Current lets the design team send out polls to the client in
real time using email or text messages. This allows the users to respond on their mobile devices
anywhere and at any time. Current thus has the advantage over traditional surveys of capturing the
users response at exactly the moment they are experiencing a particular space or design condition.
20
Figure 5. Micro-polling process

Once micro-polling data has been collected and prepared for analysis, systematic analysis methods
are required to derive the most valuable insights for the designer. In this thesis a methodology for
statistical analysis of architectural survey data is examined. The methodology has the following steps.
First, identify the key design variables. These are usually items like acoustic discomfort, thermal
discomfort and daylighting. Second, establish the descriptive statistics for the key design variables.
The maximum, minimum, mean, standard deviation and outlier values provide important cues into
the nature of the data as well as the character of the surveyed population. Third, identify the inputs
that are most closely correlated with the key design variables. These are the key design drivers. They
are the inputs that need most design attention in order to influence the key design variables. Fourth,
develop a set of design hypotheses and use statistical tests to test these hypotheses for statistical
significance.
7.2
Parametric Energy Analysis
Conventional energy analysis methods study only a handful of design options, whereas parametric
energy analyses typically perform large numbers of simulations. The large design space is produced
by the stepwise variation of energy analysis parameters to generate thousands of combinations of
inputs for design exploration and optimization. Research shows that parametric energy analysis
optimizations can reduce total site energy consumption by as much as 33% compared to baseline
cases (Karaguzel, Zhang, & Lam, 2014).
However, simulating large design spaces comes at a cost. Even on powerful computers each
simulation run of a design space exploration takes several seconds to compute. On medium to large
projects the processing time for several thousand parametric energy analysis runs can take several
days. This is not practical given the relentless pace at which projects are executed in todays
practices. Therefore, cloud based computing platforms are invoked to provide the required
computing power. Figure 6 depicts the cloud based parametric energy analysis process that will be
investigated in this case study.
21
In addition, this case study reviews three data visualization strategies for parametric energy analysis
data: Parallel Coordinates Plots, Pivot Charts and Bayesian Networks. Parallel Coordinate Plots are a
graphic representation of data with every instance in a dataset represented by a polyline that
intersects several vertical axes, each axis representing a variable in the design space (Figure 7). Pivot
Charts are a graphic representation of Microsofts pivot tables while slicers allow users to
interactively input values into the pivot chart (Figure 8). Finally, Bayesian Networks are a graphic
reasoning tool that use probability networks to compute the values of interdependent variables
(Figure 9).
Figure 6. Parametric energy analysis process.
Figure 7. Parallel Coordinates Plot (Source: Perkins+Will)
22
Figure 8. Pivot Chart with Slicers
Figure 9. Bayesian Network

7.3
Autodesk Revit journal data for anomaly detection
Revit journal files (Figure 10) record activity during Revit sessions. Although cryptic, they are human
readable and parsing them can potentially provide useful information. Autodesk Product Support, for
example, typically uses these journal files to troubleshoot Revit support requests.
23
Figure 10. Revit Journal file
Figure 11. Revit Journal Reader interface
24
Okhoya 11 developed a Revit journal parser dubbed the Revit Journal Reader (Figure 11). This parser
iterated through the journal files at specified network locations and read in their text. The parser
used regular expressions to identify specified text patterns corresponding to a user command and
returned session data associated with the command: Date, User, Project and View. In this way the
Journal Reader obtains a record of all instances of specific commands executed during all the Revit
sessions for all users on a project. The data extracted from the Revit Journal Reader can be exported to
csv formats for further analysis.
Gathering the Revit journal files is, however, not a trivial task as they are typically spread out on user
machines across a network. Hunter 12 developed a Microsoft Power Shell script that can trawl user
machine locations across a network and gather all journal files into a single location. Once this is
done it is easier to perform a journal read on the files.
This case study will begin by briefly describing the method used to collect Revit journal data at
Perkins+Will. It will then focus on the structure of Revit journal data and the techniques used to
parse and extract the data into tabular data formats. It will then describe how parsed data can be
introduced into HTM Studio, an anomaly detection tool, in order to detect anomalies within the data.
Finally, it will describe how anomalies can be visualized in HTM Studio (Figure 12). It will also discuss
how the number of anomalies in a project can be used to flag the file for model management review.
Figure 12. Revit journal file anomalies in HTM Studio

7.4
Autodesk Revit model data for project performance monitoring
Revit is a relational database application. This can be seen by using the Export to ODBC compliant
database feature in Revit (Figure 13). This allows an export to any ODBC database format such as
Microsoft Access, Microsoft SQL Server, Oracle, etc. using an appropriate database driver. Many Revit addon applications take advantage of this relational database structure to create bi-directional links with
Revit.
11
12
Revit Journal Reader is a log file parsing application developed by Victor Okhoya in 2009.
Power shell script developed by Mathew Hunter, Site IT Lead, Perkins+Will in 2011.
25
Figure 13. Revit model data in MS SQL Server

It stands to reason that the data exported from Revit can be gathered and used for data science
activities. In general, however, a large volume of model data from several hundred projects is
required. Collecting this amount of data is a non-trivial data collection exercise. In 2015,
Petermann13 used Imaginit Clarity scheduled tasks to extract 532 Revit project models from the
Chicago office of Perkins+Will. Clarity is a web-based service that automates the extraction of data
from Revit models. The data can be in pdf, dwg, nwd, dwf format, among a host of other common
CAD data formats.
Once this data has been collected, it is necessary to prepare it for data analysis. How this is done will
depend on the specific data analysis intended. This thesis seeks to investigate the use of Revit model
data as a tool for project performance monitoring. Data extracted from hundreds of Revit project
models will be labeled with time sheet data from Deltek Vision (Figure 13). Vision is the accounting,
invoicing and time tracking enterprise application at Perkins+Will.
Once this data has been labelled with data from Vision machine learning algorithms will be trained to
predict project hours required to complete projects based on model data. If the training is
successful, this will be a valuable tool for project managers to use in project performance tracking.
Based on data in their Revit models they can get an idea of how well their project is performing with
respect to predicted time to complete estimates.
In order to prepare the raw Revit data in MS SQL Server for labeling with Vision data it will need to
be transformed into a format suitable for labeling and predictive classification. First, target model
categories are identified as suitable for the machine learning exercise. These include wall volumes,
13
Revit model data extraction exercise conducted by Matt Petermann, Digital Practice Manager, Perkins+Will in 2015.
26
floor volumes, curtain wall areas, ceiling areas, door counts, window counts and stair counts. Next,
an SQL query will be used to extract distinct project names from the project name table and each
target model category will be aggregated by project name. Finally, Vision total hours for each project
name identified from Revit will be used to label the data. This labeled data will be used for machine
learning and predictive classification.
Figure 14. Time sheet data in Deltek Vision

8. EXPERIMENT
Assuming the thesis question can be answered in the affirmative, there is still the question of
showing that applying data science methods to architectural practice has practical benefits. This is
the validation question and the DEAM method will be used to perform this validation.
DEAM (Clevenger, Haymaker, & Ehrich, 2013) is a methodology for comparing design process
impacts and outcomes. It was developed in response to the need for a method to rationally compare
design processes. In this thesis it is used to compare the conventional design process to a data
science driven design process on the Sprout Space project.
DEAM uses defined metrics to measure and compare design processes along challenge, strategy and
exploration axes. These axes are characterized as representing the design problem, the approach to
solving it and the design alternatives evaluated through the solution process respectively. DEAM
enables quantitative and objective assessment of the guidance, or strategic decision making, during
design exploration. The level of guidance is an objective measure of the impact of an approach, such
as data science driven design, on the design process.
27
The DEAM process will be undertaken on Sprout Space (Figure 15), a research initiative at
Perkins+Will looking at learning environments. A fixed set of design parameters will be provided
with each parameter being constrained to a given range. The designer will be permitted to use
variations of the design parameters that satisfy the constraints to develop their design. They will also
be free to use any tools to perform energy or daylighting simulations using conventional methods.
Separately, the design will be analyzed using parametric energy analysis with data driven design
optimization. The outcomes of both approaches will be analyzed using DEAM in order to evaluate
the impact of the data science process.
Figure 15. Sprout Space (Source: Perkins+Will).

9. OUTLINE
The thesis will be delivered in two parts. In the first part, a brief introduction to data science will be
given and the rationale for the thesis discussed using literature review methods. A data science
conceptual framework will be developed to identify the key constructs of the study and how they
relate to the research question. The second part of the thesis will examine four case studies of the
application of data science methods in architectural practice. An experiment will then be performed
to compare conventional practice methods to data science methods. A critical comparative analysis
using the Design Exploration Assessment Methodology (DEAM) will be used to evaluate the impact
of applying data science methods (see Section 9).
28
9.1
9.2
Definitions
Data science defined
Why is data science important?
What does data science involve?
Data science in architecture
Rationale
Data science in related disciplines
Data science methods in construction management
Data science methods for building performance analysis
Data science as a rigorous analysis method
Data science as a paradigm shift
Conceptual Framework
Sources of data in architectural practice
Architectural services
The data science process
9.3
Example of Conceptual Framework Application

Building Data Analytics (Lasternas & Aziz)
9.4
Case Studies
Micro-polling and Statistical Analysis
Service: Planning Pre-design: Strategic Facility Planning
Data Source: Client Engagement Data
Data Collection: Micro-polling survey
Data Preparation: Transformation of survey data
Data Analysis: Statistical Analysis of survey data
Data Visualization: Survey data visualization strategy
Service: Design Construction: Energy Analysis and Design
Data Source: Energy Analysis Data
Data Collection: Cloud based data generation in MS Azure
Data Preparation: Computational Design data generation in Grasshopper
Data Analysis: Multi-objective optimization, Design of Experiments
Data Visualization: Parallel Coordinates Plots, Bayesian Networks, Pivot Charts
29
Anomaly Detection with Revit Journal Data

Service: Data Source: Revit Journal data,
Data Collection: Power Shell scripting
Data Preparation: Revit Journal Reader
Data Analysis: Anomaly detection in HTM Studio
Data Visualization: HTM Studio as a visualization tool
Project Performance Monitoring with Revit Model Data
Service: Data Source: Revit Model data
Data Collection: Imaginit Clarity
Data Preparation: MS SQL Server queries, Deltek Vision data
Data Analysis: Machine learning and predictive classification
Data Visualization: Predictive classification reporting
9.5
9.6
Validation
Perkins+Will Research Sprout Space
The Design Exploration Assessment Methodology
The Sprout Space project
Using DEAM on Sprout Space
Conclusions
10. PREVIOUS WORK

The proposed thesis will rely on previous work done by the author as part of the CMU DPP
program. The works are primarily research papers submitted as course work or for qualification as a
DPP candidate, and experiment reports of initial experimental work conducted to better understand
the feasibility of various case studies proposed in this thesis. The key works are listed below and the
experimental reports are gathered into an appendix document that accompanies this proposal.
30
Table 8. Table of Previous Work

Research Papers:
TITLE
DESCRIPTION
Application of Machine Learning to Occupant

Modeling for Building Performance Analysis
Bayesian Networks as an Architectural Decision
Support Tool
Towards the Application of Machine Learning on
Architectural Projects
Monitoring Revit Projects Efficiently
Data Visualization Strategy for Early Stage
Principles of Practice Coursework (2015)
Bayesian Networks for Early Design Energy

Analysis Decision Support
Machine Learning for Door Scheduling
Space Planning using Machine Learning and
Genetic Algorithms
Presented to Thesis Committee (2015)
Experiment Reports:
DPP Qualifying Paper (2015)

DPP Qualifying Paper (2015)
Autodesk University 2011 presentation
Submitted to Acadia 2016 conference

11. TIME LINES

Table 9. Time lines for thesis development
DATE
September 2016
October 2016
November 2016
December January 2017
February March 2017
April May 2017
June - July 2017
August 2017
End of August 2017
DURATION
4 weeks
4 weeks
4 weeks
8 weeks
8 weeks
8 weeks
8 weeks
4 weeks
TASK
Chapter 1: What is data science?

Chapter 2: Motivations for the thesis
Chapter 3: Conceptual framework
Chapter 4: Micro-polling & statistical survey
Chapter 5: Parametric energy analysis
Chapter 6: Building Information Modelling
Chapter 7: Validation
Conclusions
Defend Thesis
31
12. BIBLIOGRAPHY
12.1
Works Cited
Alexander, C. (1964). Notes on the Synthesis of Form. October, 57(2), 216. http://doi.org/10.1086/601876
Bonte, M., Thellier, F., Lartigue, B., & Perles, A. (2014). An occupant behavior model based on artificial
intelligence for energy building simulation. In Proceedings of the 13th International IBPSA Conference BS2013,
Chambery, France.
Booz Allen Hamilton. (2016). The Field Guide to Data Science (2nd ed.). McLean, Virginia: Boos Allen Hamilton.
Cheng, M.-Y., Wu, Y.-W., & Wu, C.-F. (2010). Project success prediction using an evolutionary support
vector machine inference model. Automation in Construction, 19(3), 302307.
Clevenger, C. M., Haymaker, J. R., & Ehrich, A. (2013). Design exploration assessment methodology: testing
the guidance of design processes. Journal of Engineering Design, 24(3), 165184.
http://doi.org/10.1080/09544828.2012.698256
David Wang, L. G. (2002). Architectural Research Methods. Wiley.
Davis, D. (2015). How Big Data is Transforming Architecture. Architect. Retrieved from
http://www.architectmagazine.com/technology/how-big-data-is-transforming-architecture_o
Demkin, J. (Ed.). (2001). The Architects Handbook of Professional Practice (13th ed.). New York: John Wiley &
Sons.
Deutsch, R. (2015). Data-Driven Design and Construction: 25 Strategies for Capturing, Analyzing and Applying Building
Data. Wiley. Retrieved from https://books.google.ca/books?id=uyKsBwAAQBAJ
French, C. (1996). Data Processing and Information Technology (10th ed.). London: Thomson.
Hosack, B., Sagers, G., Provost, F., Fawcett, T., McKinsey & Company, Wang, Y., Demirkan, H. (2015).
Applied doctorates in IT: A case for designing data science graduate programs. Journal of the Midwest
Association for Information Systems, 1(1), 6168. http://doi.org/10.1080/01443610903114527
Huang, C.-C., & Cheng, M.-Y. (2011). Estimate at completion for construction projects using evolutionary
gaussian process inference model. In Multimedia Technology (ICMT), 2011 International Conference on (pp.
44144417).
Karaguzel, O. T., Zhang, R., & Lam, K. P. (2014). Coupling of whole-building energy simulation and multidimensional numerical optimization for minimizing the life cycle costs of office buildings. Building
Simulation, 7(2), 111121. http://doi.org/10.1007/s12273-013-0128-5
Kelly, J. E. (2015). Computing, cognition and the future of knowing. IBM White Paper, 7.
Landau, R. (1984). A Philosophy of Enabling. In The Square Book. London: Architectural Association.
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on
Psychological Science, 4(4), 379383.
Negroponte, N. (1975). The architecture machine. Computer-Aided Design, 7(3), 190195.
http://doi.org/10.1016/0010-4485(75)90009-3
Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision
making. Big Data, 1(1), 5159.
Roberts, R. P., & Sikes, J. (2011). McKinsey Global Survey results: A rising role for IT. McKinsey Global Survey
Results, (Exhibit 1), 19.
Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach, 3rd edition. Prentice Hall.
http://doi.org/10.1017/S0269888900007724
Stanton, J. (2012). Data Science. An Introduction, 1157. Retrieved from
http://jsresearch.net/groups/teachdatascience/wiki/welcome/attachments/72f24/DataScienceBook1_
1.pdf\npapers2://publication/uuid/99B2E09F-00FE-448F-8E88-89102110B293
Wu, L., Kaiser, G., Solomon, D., Winter, R., Boulanger, A., & Anderson, R. (2012). Improving efficiency and
reliability of building systems using machine learning and automated online evaluation. In Systems,
Applications and Technology Conference (LISAT), 2012 IEEE Long Island (pp. 16).
32
12.2
References
Alexander, C. (1964). Notes on the Synthesis of Form. October, 57(2), 216. http://doi.org/10.1086/601876
Alfonsi, E., Capolongo, S., & Buffoli, M. (2014). Evidence based design and healthcare: an unconventional
approach to hospital design. Annali Di Igiene, 26(2), 137143.
Aussem, A. (2010). Bayesian networks. Neurocomputing (Vol. 73). http://doi.org/10.1016/j.neucom.2009.11.001
Bell, G., Hey, T., & Szalay, A. (2009). Computer science. Beyond the data deluge. Science (New York, N.Y.),
323(5919), 12971298. http://doi.org/10.1126/science.1170411
Bonte, M., Thellier, F., Lartigue, B., & Perles, A. (2014). An occupant behavior model based on artificial
intelligence for energy building simulation. In Proceedings of the 13th International IBPSA Conference BS2013,
Chambery, France.
Booz Allen Hamilton. (2016). The Field Guide to Data Science (2nd ed.). McLean, Virginia: Boos Allen Hamilton.
Brewka, G. (1996). Artificial intelligencea modern approach by Stuart Russell and Peter Norvig, Prentice Hall. Series in
Artificial Intelligence, Englewood Cliffs, NJ. The Knowledge Engineering Review (Vol. 11).
http://doi.org/10.1017/S0269888900007724
Carbonari, A., Vaccarini, M., & Giretti, A. (2014). Bayesian Networks for Supporting Model Based Predictive
Control of Smart Buildings. http://doi.org/10.5772/58470
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys
(CSUR), 41(September), 158. http://doi.org/10.1145/1541880.1541882
Cheng, M.-Y., Wu, Y.-W., & Wu, C.-F. (2010). Project success prediction using an evolutionary support
vector machine inference model. Automation in Construction, 19(3), 302307.
Clayton, M., Kunz, J., & Fischer, M. (1998). The Charrette Test Method.
Clevenger, C. M., Haymaker, J. R., & Ehrich, A. (2013). Design exploration assessment methodology: testing
the guidance of design processes. Journal of Engineering Design, 24(3), 165184.
http://doi.org/10.1080/09544828.2012.698256
Cochrane, A. L. (1971). Effectiveness and Efficiency: Random reflections on health services. The Nuffield
Provincial Hospitals Trust. http://doi.org/10.1136/bmj.328.7438.529
Corbusier, L. (1986). Towards a new architecture. Design.
David Wang, L. G. (2002). Architectural Research Methods. Wiley.
Davis, D. (2015). How Big Data is Transforming Architecture. Architect. Retrieved from
http://www.architectmagazine.com/technology/how-big-data-is-transforming-architecture_o
Delen, D., & Demirkan, H. (2013). Data, information and analytics as services. Decision Support Systems, 55(1),
359363. http://doi.org/10.1016/j.dss.2012.05.044
Demkin, J. (Ed.). (2001). The Architects Handbook of Professional Practice (13th ed.). New York: John Wiley &
Sons.
Deutsch, R. (2015). Data-Driven Design and Construction: 25 Strategies for Capturing, Analyzing and Applying Building
Data. Wiley. Retrieved from https://books.google.ca/books?id=uyKsBwAAQBAJ
Euclid, Heath, T. L., & Densmore, D. (2002). Euclids Elements: all thirteen books complete in one volume: the Thomas
L. Heath translation. Green Lion Press. Retrieved from
https://books.google.ca/books?id=nc1UAAAAYAAJ
French, C. (1996). Data Processing and Information Technology (10th ed.). London: Thomson.
Friedow, B. (2012). An Evidence Based Design Guide for Interior Designers. University of Nebraska-Lincoln.
Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd Edition. Morgan Kaufman.
Heschong Mahone Group. (1999). Daylighting in Schools.
Hosack, B., & Sagers, G. (2015). Applied doctorates in IT: A case for designing data science graduate
programs. Journal of the Midwest Association for Information Systems, 1(1), 6168.
Hosack, B., Sagers, G., Provost, F., Fawcett, T., McKinsey & Company, Wang, Y., Demirkan, H. (2015).
Applied doctorates in IT: A case for designing data science graduate programs. Journal of the Midwest
Association for Information Systems, 1(1), 6168. http://doi.org/10.1080/01443610903114527
Huang, C.-C., & Cheng, M.-Y. (2011). Estimate at completion for construction projects using evolutionary
gaussian process inference model. In Multimedia Technology (ICMT), 2011 International Conference on (pp.
44144417).
Jencks, C. (1977). The language of post-modern architecture. Notes (Vol. 0).
33
Karaguzel, O. T., Zhang, R., & Lam, K. P. (2014). Coupling of whole-building energy simulation and multidimensional numerical optimization for minimizing the life cycle costs of office buildings. Building
Simulation, 7(2), 111121. http://doi.org/10.1007/s12273-013-0128-5
Kelly, J. E. (2015). Computing, cognition and the future of knowing. IBM White Paper, 7.
Korolija, I., Marjanovic-Halburd, L., Zhang, Y., & Hanby, V. I. (2013). UK office buildings archetypal model
as methodological approach in development of regression models for predicting building energy
consumption from heating and cooling demands. Energy and Buildings, 60, 152162.
http://doi.org/10.1016/j.enbuild.2012.12.032
Kricheff, R. (2014). Data Analytics for Corporate Debt Markets: Using Data for Investing, Trading, Capital Markets, and
Portfolio Management. Pearson.
Kuhn, T. S. (1996). The Structure of Scientific Revolution. Economy and Society (Vol. 29).
Landau, R. (1984). A Philosophy of Enabling. In The Square Book. London: Architectural Association.
Liu, C. (2008). A Simulation-Based Experience in Learning Structures of Bayesian Networks to Represent
How Students Learn Composite Concepts. International Journal of Artificial Intelligence in Education, 18, 237
285. Retrieved from http://iospress.metapress.com/content/3074000428p22130/
Lorentz, H. A., Einstein, A., Minkowski, H., Weyl, H., & Sommerfeld, A. (1952). The Principle of Relativity: A
Collection of Original Memoirs on the Special and General Theory of Relativity. Dover. Retrieved from
https://books.google.ca/books?id=S1dmLWLhdqAC
Manning, H. P. (2013). Introductory Non-Euclidean Geometry. Dover Publications. Retrieved from
https://books.google.ca/books?id=EOa_ykDmmLUC
Margaritis, D., Thrun, S., Faloutsos, C., Moore, A. W., & Cooper, G. F. (2003). Learning Bayesian Network
Model Structure from Data. Learning, (May).
Mattmann, C. A. (2013). Computing: A vision for data science. Nature, 493(7433), 473475.
http://doi.org/10.1038/493473a
McKinsey & Company. (2011). Big data: The next frontier for innovation, competition, and productivity.
McKinsey Global Institute, (June), 156. http://doi.org/10.1080/01443610903114527
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on
Psychological Science, 4(4), 379383.
Negroponte, N. (1975). The architecture machine. Computer-Aided Design, 7(3), 190195.
http://doi.org/10.1016/0010-4485(75)90009-3
Newton, I., Motte, A., & Chittenden, N. W. (1850). Newtons Principia: The Mathematical Principles of Natural
Philosophy. Geo. P. Putnam. Retrieved from https://books.google.ca/books?id=N-hHAQAAMAAJ
Nguyen, A.-T., & Reiter, S. (2015). A performance comparison of sensitivity analysis methods for building
energy models. Building Simulation, 8(6), 651664. http://doi.org/10.1007/s12273-015-0245-4
Nightingale, F. (1960). What is and what is not. London: Harrison.
Oh, J., Hwang, J., Smith, S. F., & Koile, K. (2006). Learning from Main Streets. Artificial Intelligence, 325340.
ONeil, C., & Schutt, R. (2013). Doing Data Science. OReilly. Retrieved from
http://proquest.safaribooksonline.com.proxy.library.cmu.edu/book/databases/9781449363871
Petrov, T. P. (n.d.). Application of bayesian believe networks for continuous risk evaluation and decision
support of safety management in mining.
Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision
making. Big Data, 1(1), 5159.
Roberts, R. P., & Sikes, J. (2011). McKinsey Global Survey results: A rising role for IT. McKinsey Global Survey
Results, (Exhibit 1), 19.
Russell, A. D., Chiu, C.-Y., & Korde, T. (2009). Visual representation of construction management data.
Automation in Construction, 18(8), 10451062. http://doi.org/10.1016/j.autcon.2009.05.006
Russell, A. D., Chiu, C.-Y., & Korde, T. (2009). Visual representation of construction management data.
Automation in Construction, 18(8), 10451062. http://doi.org/10.1016/j.autcon.2009.05.006
Russell, S., & Norvig, P. (2009). Artificial Intelligence: A Modern Approach, 3rd edition. Prentice Hall.
http://doi.org/10.1017/S0269888900007724
Stanton, J. (2012). Data Science. An Introduction, 1157. Retrieved from
http://jsresearch.net/groups/teachdatascience/wiki/welcome/attachments/72f24/DataScienceBook1_
34
1.pdf\npapers2://publication/uuid/99B2E09F-00FE-448F-8E88-89102110B293
Studio, H. A. (2011). Energy Modeling: A Guide For The Building Professional. Energy, (May), 15. Retrieved
from http://rechargecolorado.org
Suppes, P. (1960). Axiomatic Set Theory. Dover Publications. Retrieved from
https://books.google.ca/books?id=sxr4LrgJGeAC
Ulrich, R. (1984). View through a window may influence recovery. Science, 224(4647), 224225.
Venturi, R. (1977). Contradiction in Architecture. New York. http://doi.org/10.1080/10464883.2012.714912
Wang, Y. (2009). On cognitive computing. Int. J. Software Sci. Comput. Intell., 1(3), 115.
Wang, Y., Baciu, G., Yao, Y., Kinsner, W., Chan, K., Zhang, B., Zhu, H. (2010). Perspectives on Cognitive
Informatics and Cognitive Computing. International Journal of Cognitive Informatics and Natural Intelligence,
4(1), 129. http://doi.org/10.4018/jcini.2010010101
Wu, L., Kaiser, G., Solomon, D., Winter, R., Boulanger, A., & Anderson, R. (2012). Improving efficiency and
reliability of building systems using machine learning and automated online evaluation. In Systems,
Applications and Technology Conference (LISAT), 2012 IEEE Long Island (pp. 16).
Zhang, Y., & Korolija, I. (2010). Performing complex parametric simulations with jEPlus. SET2010-9th
International Conference on Sustainable . Retrieved from
http://www.iesd.dmu.ac.uk/~yzhang/wiki/lib/exe/fetch.php?media=software:java:set2010-shanghaise102.pdf\nhttp://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Performing+complex
+parametric+simulations+with+jEPlus#0
Ziga-Can, C. L., & Burguillo, J. C. (2014). Advances in Artificial Intelligence -- IBERAMIA 2014: 14th
Ibero-American Conference on AI, Santiago de Chile, Chile, November 24-27, 2014, Proceedings. In L.
C. A. Bazzan & K. Pichara (Eds.), (pp. 698709). Cham: Springer International Publishing.
http://doi.org/10.1007/978-3-319-12027-0_56
35

VO Thesis Proposal 082716

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

VO Thesis Proposal 082716

Încărcat de

Drepturi de autor:

Formate disponibile

APPLICATIONS OF DATA SCIENCE IN ARCHITECTURE

Can data science methods be applied to architectural practice?

Doctor of Professional Practice

PROBLEM STATEMENT ................................................................................................................................ 4

LITERATURE REVIEW .................................................................................................................................15

PREVIOUS WORK .......................................................................................................................................30

TIME LINES ....................................................................................................................................................31

Retrieved on 6 August 2016 from https://www.glassdoor.com/List/Best-Jobs-in-America-LST_KQ0,20.htm

Retrieved on 5 March 2016 from https://www.kaggle.com/wiki/DataScienceUseCases.

Diagnostic medical facility

Retrieved on 19 July 2016 from http://kasian.com/

Royal Inland Hospital

Clinical services building

St. Michaels Hospital

Design Building Finance patient

Urban recreation building

Earth Systems Sciences

Centre for Interactive

Mixed use residential

Corporate campus building

University academic and residential

Commercial Mixed Use

Great Northern Way

Commercial Mixed Use

Pitt Rivers School

Middle school building

Ottawa Light Rail Transit

Light rail transit stations

Total Number % of Data Science

Figure 1. The data science process

Figure 2. The data science conceptual framework in architectural practice

Table 4. Data science methods.

Client Engagement Data

Parallel Coordinates Plots

Table 5. Architectural Services (Demkin, 2001).

Figure 3. The Building Data Analytics process. (Source: CMU SOA)

Table 6. Case studies based on the conceptual framework

Gather and parse

Gather Revit model

Use HTM Studio to

Gather journal files Use Imaginit Clarity

Figure 4.The Venn Diagram of Data Science 10

Retrieved on 21st May 21, 2016 from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Figure 5. Micro-polling process

Figure 6. Parametric energy analysis process.

Figure 7. Parallel Coordinates Plot (Source: Perkins+Will)

Figure 8. Pivot Chart with Slicers

Figure 9. Bayesian Network

Figure 10. Revit Journal file

Figure 11. Revit Journal Reader interface

Figure 12. Revit journal file anomalies in HTM Studio

Figure 13. Revit model data in MS SQL Server

Figure 14. Time sheet data in Deltek Vision

Figure 15. Sprout Space (Source: Perkins+Will).

Example of Conceptual Framework Application

Anomaly Detection with Revit Journal Data

10. PREVIOUS WORK

Table 8. Table of Previous Work

Application of Machine Learning to Occupant

Principles of Practice Coursework (2015)

Bayesian Networks for Early Design Energy

Presented to Thesis Committee (2015)

DPP Qualifying Paper (2015)