Documente Academic
Documente Profesional
Documente Cultură
Engineering Informatics
Manuscript Draft
Order of Authors: Muhammad Bilal, MSc, BSc; Lukumon O. Oyedele, PhD, LLM,
MSc, BSc (Hons.); Junaid Qadir, PhD, MSc, BSc; Kamran Munir, PhD, MSc,
BSc; Saheed O Ajayi, MSc, BSc; Olugbenga O Akinade, MSc, BSc; Hakeem A
Owolabi, MSc, BSc; Hafiz A Alaka, MSc, BSc; Maruf Pasha, PhD, MSc, BSc
Cover Letter & Title Page
The Editor
Advanced Engineering Informatics
Please find attached the full manuscript for review and publication in the Advanced Engineering
Informatics. The manuscript is titled: “Big Data in the Construction Industry: A Review of Present
Status, Opportunities, and Future Trends”.
The overall aim of this study is to understand state of the construction industry for the adoption Big
Data technologies. It is highlighted that the industry is already employing many applications of Data
Science, however, the adoption of Big Data technologies is still at nascent stage and lags the wider
uptake like other industries. To this end, opportunities in the various sub-domains of construction
industry are presented along with the future works and potential pitfalls of Big Data in this industry.
This study provides an excellent reference of diverse Big Data concepts and terminologies for the
researchers and practitioners in the construction industry.
Please a quick turnover regarding the review of the manuscript and subsequent publication would
be greatly appreciated.
Should you require further information, please feel free to contact me at the above e-mail address.
Kind Regards,
for
By
Muhammad Bilal1,a; Lukumon O. Oyedele2,a,* ; Junaid Qadir3,b; Kamran Munir4,a, Saheed O. Ajayi5,a;
Olugbenga O. Akinade6,a; Hakeem A. Owolabi7,a; Hafiz A. Alaka8,a; Maruf Pasha9,c;
(Authors)
Affiliations
a
Bristol Enterprise, Research and Innovation Centre (BERIC)
Bristol Business School
University of West of the England, Bristol, United Kingdom
b
School of Electrical Engineering & Computer Science (SEECS)
National University of Sciences & Technology (NUST), Islamabad, Pakistan
c
Department of Information Technology,
Bahauddin Zakariya University, Multan, Pakistan
Research Highlights
Click here to view linked References
1
2 Big Data in the Construction Industry: A Review of Present
3
4 Status, Opportunities, and Future Trends
5
6
7
8
9
10 Research Highlights
11
12
13
14
15 • Existing works for Big Data Analytics/Engineering in the construction industry are
16
17
18 discussed
19
20 • It is highlighted that the adoption of Big Data is still at nascent stage
21
22
23 • Opportunities to employ Big Data technologies in construction sub-domains are
24
25 highlighted
26
27
28 • Future works for Big Data technologies are presented
29
30
31
• Pitfalls of Big Data technologies in the construction industry are also pointed out
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Manuscript
Click here to view linked References
1
1
2
3
4
Big Data in the Construction Industry: A Review of
5
6 Present Status, Opportunities, and Future Trends
7
8
9
10
11 1 Abstract—The ability to process large amounts of data and computing), compressed, in diverse proprietary formats, and 51
2 to extract useful insights from data has revolutionised society. intertwined [6]. Accordingly, this diverse data is collated in 52
12 This phenomenon—dubbed as Big Data—has applications for a
3 federated BIM models, which are enriched gradually and 53
13 wide assortment of industries, including the construction industry.
4
persisted beyond the end-of-life of facilities. BIM files can 54
14 5 The construction industry already deals with large volumes of quickly get voluminous, with the design data of a 3-story 55
15 6 heterogeneous data; which is expected to increase exponentially
as technologies such as sensor networks and the Internet of
building model easily reaching 50GB in size [7]. Noticeably, 56
16 7
this data in any form and shape has intrinsic value to the 57
17 8 Things are commoditized. In this paper, we present a detailed
9 survey of the literature, investigating the application of Big Data performance of the industry. With the advent of embedded 58
18 devices and sensors, facilities have even started to generate
10 techniques in the construction industry. We reviewed related 59
19 11 works published in the databases of American Association of massive data during the operations and maintenance stage, 60
20 12 Civil Engineers (ASCE), Institute of Electrical and Electronics eventually leading to more rich sources of Big BIM Data. This 61
21 13 Engineers (IEEE), Association of Computing Machinery (ACM), vast accumulation of BIM data has pushed the construction 62
22 14 and Elsevier Science Direct Digital Library. While the application industry to enter the Big Data era. 63
23 15 of data analytics in the construction industry is not new, the Big Data has three defining attributes (a.k.a. 3V‘s), namely 64
24 16 adoption of Big Data technologies in this industry remains at a (i) volume (terabytes, petabytes of data and beyond); (ii) 65
27 20 the construction industry. This paper fills the void and presents a data). The 3V‘s of Big Data are clearly evident in construction 68
28 21 wide-ranging interdisciplinary review of literature of fields such data. Construction data is typically large, heterogeneous, and 69
29 22 as statistics, data mining and warehousing, machine learning, and dynamic [8]. Construction data is voluminous due to large 70
30 23 Big Data Analytics in the context of the construction industry. volumes of design data, schedules, Enterprise Resource Plan- 71
31 24 We discuss the current state of adoption of Big Data in the ning (ERP) systems, financial data, etc. The diversity of con- 72
32 25 construction industry and discuss the future potential of such struction data can be observed by noting the various formats 73
26 technologies across the multiple domain-specific sub-areas of the supported in construction applications including DWG (short
33 27 construction industry. We also propose open issues and directions
74
44 36 increased in 2012 is approximately 2.5 quintillion (1018 ) bytes To understand the subtleties of Big Data, we need to 85
45 37 per day [2]. This data growth brings significant opportunities disambiguate between two of its complementary aspects: Big 86
46 38 to scientists for identifying useful insights and knowledge. Data Engineering (BDE) and Big Data Analytics (BDA). The 87
47 39 Arguably, the accessibility of data can improve the status domain of BDE is primarily concerned with supporting the 88
48 40 quo in various fields by strengthening existing statistical and relevant data storage and processing activities, needed for 89
49 41 algorithmic methods [3], or by even making them redundant analytics [9]. BDE encompasses technology stacks such as 90
50 42 [4]. Hadoop and Berkeley Data Analytics Stack (BDAS). Big Data 91
51 43 The construction industry is not an exception to the per- Analytics (BDA), the second integral aspect, relates to the tasks 92
52 44 vasive digital revolution. The industry is dealing with sig- responsible for extracting the knowledge to drive decision- 93
53 45 nificant data arising from diverse disciplines throughout the making [9]. BDA is mostly concerned with the principles, 94
54 46 life cycle of a facility. Building Information Modelling (BIM) processes, and techniques to understand the Big Data. The 95
55 47 is envisioned to capture multi-dimensional CAD information essence of BDA is to discover the latent patterns buried 96
56 48 systematically for supporting multidisciplinary collaboration inside Big Data and derive useful insights therefrom [10]. 97
57 49 among the stakeholders [5]. BIM data is typically 3D ge- These insights have the capability to transform the future 98
58 50 ometric encoded, compute intensive (graphics and Boolean of many industries through data-driven decision-making. This 99
59
60
61
62
63
64
65
2
1
2
3 1 ability to identify, understanding and reacting to the latent and produce final results which are stored back to the file 55
4 2 trends promptly is indeed a competitive edge in this hyper- system. Hadoop—a popular Big Data platform—introduced 56
6 4 Contributions of this paper: While some data-driven so- to execute MR programs successfully. In a typical Hadoop 58
7 5 lutions have been proposed for the fields of the construction cluster, several mappers and reducers simultaneously run MR 59
8 6 industry, there is currently no comprehensive survey of the programs. MR is a powerful model for batch-processing tasks. 60
literature, targeting the application of Big Data in the context However, it is struggling with applications that require real- 61
9 7
8 of the construction industry. This paper fills the void and time, graph, or iterative processing. Latest versions of Hadoop 62
10
9 presents a wide-ranging interdisciplinary study of fields such as have encountered this issue to some extent where processing 63
11 aspect of MR is detached from rest of the ecosystem. To this
12 10 Statistics, Data Mining and Warehousing, Machine Learning, 64
11 Big Data and their applications in the construction industry. end, Yet Another Resource Negotiator (YARN) is introduced 65
13 that has taken Hadoop to an actually computationally-agnostic 66
14 12 Organization of this paper: The discussion in this paper
Big Data platform. MR runs as a service over YARN, while 67
15 13 follow the review structure shown in Fig. 1. We start with a
YARN handles scheduling and resource management related 68
16 14 thorough review of extant literature on BDE and BDA in the
functionalities. This separation has made Hadoop suitable for 69
17 15 construction industry in Section II and III, respectively. After
implementing innovative applications. 70
18 16 which, opportunities of Big Data in the construction industry
19 17 sub-domains are presented in Section IV. Discussions about 2) Directed Acyclic Graphs (DAG): DAG is an alternative 71
18 open research issues and future work, and pitfalls of Big Data processing model for Big Data platforms. In contrary to MR, 72
20
19 in the construction industry are then presented in Section V DAG relaxes the rigid map-then-reduce style of MR to a more 73
21
20 and VI, respectively. generic notion. BDAS—an emerging Big Data platform— 74
22
supports this kind of data processing through its resilient 75
23 component called Spark [14]. Spark holds supremacy over MR
21 II. B IG DATA E NGINEERING (BDE) 76
24 in many aspects. Particularly, in-memory computation and high 77
25 22 Big Data Engineering (BDE) provide infrastructure to sup-
23 port Big Data Analytics (BDA). Some discussions about the expressiveness are keys to wider adoption of Spark. These 78
26 capabilities heralded the Spark a natural choice to support 79
27 24 Big Data platforms worth consideration to understand the BDE
25 adequately. Various Big Data platforms are developed so far iterative as well as reactive applications [14]. Spark is reported 80
28 to have ten times faster than MR on disk-resident tasks, 81
29 26 with varied characteristics, which can be divided into two
27 groups: (i) horizontal scaling platforms (HSPs), the ones that whereas hundred times faster for memory-resident tasks [11]. 82
30 Fig. 3 shows components of Spark. These technologies are 83
31 28 distribute processing across multiple servers and scale out by
29 adding new machines to the cluster. (ii) And vertical scaling designed to support functions that are vital to the development 84
32 of enterprise applications. 85
30 platforms (VSPs), in which scaling is achieved by upgrading
33
31 hardware (processor or memory or disk) of the underlying [Fig. 3 about here.] 86
34
32 server since it is a single server-based configuration. In the
35 interest of brevity of this paper, the discussion here is confined
33 Examples of Construction Research using Big Data 87
36 to HSPs, notably Hadoop and BDAS only. We refer interested
34 Processing: MR and Spark have use cases across myriad 88
37 readers to Singh et al. [11] for a detailed explanation on their
35 information systems (IS) of the construction industry. Despite 89
38 36 comparison and selection criterion. significance, these tools are rarely used to process BIM data 90
39 37 Due to clear performance gains of BDAS over Hadoop, it in construction industry applications. 91
40 is getting more attention recently. However, BDAS is in its
38 Chang et al. [16] customised MR for BIM data (MR4B) 92
41 39 infancy with limited support and supporting tools. Whereas, to optimise the retrieval of partial BIM models. They found 93
42 40 Hadoop is still widely adopted and has become the de- legacy data distribution logic of Hadoop MR inadequate, since 94
43 41 facto framework for Big Data applications. These platforms BIM data is intertwined as well as highly relative, and merely 95
44 42 offer tools to store and process Big Data. Some of the most placing it randomly might sparsely distribute BIM elements 96
45 43 prominent tools are discussed in the subsequent sections. across different blocks on Hadoop cluster nodes. Such place- 97
46 ment degrades querying performance due to increased disk 98
47 44 A. Big Data Processing I/O required to bring sparsely distributed data together for 99
48 analysis using MR. To overcome this, a data pre-partition
45 [Fig. 2 about here.] 100
49 and processing step is devised to parse, analyse and partition 101
50 46 Parallel and distributed computation is at the core of BDE. logically relevant parts of BIM data (by floor number or 102
51 47 A large number of processing models are developed for this material family) and store it in the adjacent spaces on the 103
52 48 purpose, which includes but not limited to: Hadoop cluster. Node multi-threading is introduced to utilise 104
53 49 1) MapReduce (MR): MR is the distributed processing the CPU maximally during analysis [16]. This way Hadoop 105
54 50 model to handle Big Data [13]. The entire analytical tasks is customised for BIM data and querying components are 106
55 51 in MR are written as two functions, i.e., map and reduce implemented as YARN applications. A BIM system for clash 107
56 52 (see Fig. 2), which are submitted to separate processes called detection and quantity estimation is developed to exploit the 108
57 53 Mappers and Reducers. Mapper read data, process it, and gen- proposed YARN applications. It is reported that the system has 109
58 54 erate intermediate results. Reducers work on mappers’ output improved the performance manifold, and the required tasks are 110
59
60
61
62
63
64
65
3
1
2
3 1 executed at real time with reasonable response time. More importantly, NoSQL systems eschew the rigid schema- 55
4 2 Lin et al. [7] presented the development of a specialised big oriented storage in favour of schema-less storage to achieve 56
5 3 BIM data storage and retrieval system for experts and naive flexibility [18]. Today these systems are prevalent in myriad 57
6 4 BIM users. The intentions are to develop a highly interactive data-intensive applications in many industries. Pointedly, the 58
7 5 user interface for querying BIM data through mobile devices architecture of NoSQL systems is well suited to fragmented 59
8 6 to maximise its utility and usability. User queries in plain nature of construction industry’s data. 60
7 English are re-formulated using the proposed natural language NoSQL systems store schema-less data in a non-relational 61
9
8 processing approach to retrieve highly complex BIM data, data model. Presumably, there are four data models for these 62
10 systems.
9 which are mapped onto a variety of visualisations. To optimise 63
11
10 query execution, an MR join pre-processing is demonstrated 1) Key-Value: This is the simplest data model to store 64
12 to merge two BIM collections before query evaluation. The
11 unstructured data. However, the underlying data is not 65
13 12 response time is reported to have enhanced by more than 40% self-describing. 66
14 13 compared to the same join pre-processing written in traditional 2) Document: This data model is suitable for storing self- 67
15 14 technologies. describing entities. However, the storage of this model 68
16 can be inefficient. 69
17 3) Columnar: This data model favours the storage of sparse 70
18 15 B. Big Data Storage datasets, grouped sub-columns, and aggregated columns. 71
19 16 Another aspect of BDE is the Big Data storage, which is 4) Graph: This is a relatively new data model that supports 72
20 17 provided either by the distributed file systems or emerging relationship traversal over a huge dataset of property- 73
21 18 NoSQL databases. These technologies are briefly discussed in graphs. Graph databases are getting popular than other 74
22 19 the following subsections. data models (see Fig. 4, where the x-axis represents 75
23 the period of popularity and y-axis shows a change in 76
20 1) Distributed File Systems: In this subsection, we are
24 popularity). Table VIII describes features of 12 prominent 77
21 discussing two competing distributed file systems, namely
25
22 HDFS and Tachyon. databases. 78
31 27 higher in such settings, it provides greater fault tolerance Storage: Despite significance for massive BIM data storage, 82
32 28 for hardware failures. Data distribution and replications existing applications are still lacking their successful imple- 83
33 29 are the key traits of HDFS to achieve the fault tolerance mentation. Das et al. [20] proposed Social-BIM to capture 84
34 30 and high availability. There are however situations when social interactions of users along with the building models. 85
35 31 the usage of HDFS degrades performance, particularly in A distributed BIM framework, called BIMCloud, is developed 86
36 32 applications requiring low-latency data access. Similarly, to store this data through IFC. Apache Cassandra, hosted 87
33 it is also not ideal for storing a large number of small on Amazon EC2, is used. Jeong et al. [21] proposed a 88
37
34 files due to the associated overhead for managing their hybrid data management infrastructure comprised two tiers. 89
38
35 metadata. Lastly, HDFS is not the choice of technology The client tier that utilises MongoDB for storing the structured 90
39 data temporarily for efficiently completing analytical tasks,
36 if applications require a significant number of concurrent 91
40 37 modifications at random places in data. whereas, the central tier employs Apache Cassandra to store 92
41 permanently the streams of sensor data generated over time.
42 38 • Tachyon is the BDAS flagship distributed file system 93
39 that extends HDFS and provides access to the distributed Cheng et al. [22] have also employed the Apache Cassandra for 94
43 presenting their query language to extract partial BIM models. 95
44 40 data at memory speed across the cluster. Some of the
41 features where Tachyon has outsmarted HDFS include: Similarly, Lin et al. [7] exploited MongoDB to store BIM data 96
45 of building models for distributed processing through MapRe- 97
42 (i) in-memory data caching for enhanced performance
46 duce. MongoDB is tailored for IFC, with minor alterations to 98
43 and (ii) backwards compatibility to work seamlessly
47 IFC hierarchy for supporting MR-efficient query execution. 99
44 with Spark as well as MR tasks without any code
48 45 changes required to the programs.
49 III. B IG DATA A NALYTICS 100
50 46 2) NoSQL Databases: Relational databases served IT in- Big Data Analytics has a rich intellectual tradition and 101
51 47 dustry for the past couple of decades as de facto data man- borrows from a wide variety of fields. There have been 102
52 48 agement standard. However, recently applications emerged traditionally many related disciplines that have essentially the 103
53 49 that demanded more scalability, performance, and flexibility. same core focus: finding useful patterns in data (but with a 104
54 50 Relational databases are found unsuitable for these applications different emphasis). These related fields are Statistics (18301 ) 105
55 51 due to their specialised storage and processing needs. Conse- 1 While it can be difficult to pin down the exact time of genesis of a
56 52 quently, new systems came into being—called “Not only SQL” technology, the year in which the domain’s seminal work was proposed is
57 53 (NoSQL) systems—to fill this technological gap. NoSQL sys- provided to approximately sequence the various Big Data Analytics enabling
58 54 tems improved traditional data management in numerous ways. technologies chronologically.
59
60
61
62
63
64
65
4
1
2
3 1 [23], Data Mining (1980), Predictive Analytics (1989 [24]), Databases are crucial to empowering various aspects of 54
4 2 Business Analytics (1997), Knowledge Discovery from Data data mining, in particular by taking care of the activities of 55
5 3 (KDD) (2002), Data Analytics (2010), Data Science (2010) efficient data access, group and ordering of operations and 56
6 4 and now Big Data (2012). Fig. 5 shows the relevance of these optimising the queries to scale up data mining algorithms. 57
7 5 multidisciplinary fields to Big Data. So, Big Data Analytics Databases provide native support for analytics in the form 58
8 6 is a broadening of the field of data analytics and incorporates of data warehousing. In data warehousing, the copy of the 59
7 many of the techniques that have already been performed. This transactional data is stored specifically structured for querying 60
9
8 is the key reason that most of the existing work, presented in and the analysis [37], [41]. The transactional data is collated 61
10
9 subsequent subsections, has focused on data analytics rather from the operational databases using a process usually known 62
11 10 than Big Data is that the Big Data revolution—i.e., the ability as Extract, Transform, and Load (ETL) [42]. Data in the 63
12 11 to process large amounts of diverse data on a large scale—has warehouse is typically analysed through the Online Analytical 64
13 12 only recently happened. Existing approaches can be possibly Processing (OLAP). OLAP outperforms SQL in computing the 65
14 13 extended to the environments, dealing with large, diverse summaries (roll-up) and breakdown (roll-down) of the data. 66
15 14 datasets. Examples of Construction Research using Data Mining: 67
16 Kim et al. [31] employed data mining techniques to identify 68
17 15 [TABLE 2 about here.] the key factors that cause delays in construction projects. They 69
18 presented knowledge discovery in databases (KDD) framework
16 Some ML-based tools have been developed for Big Data 70
19 to analyse massive construction datasets. Limitations of ML
17 analytics. Table IX highlight some of the important ones. To 71
20 18 showcase the implementation of BDA, we use MLlib (MLbase) algorithms (such as incorrect prediction) are discussed and 72
21 19 code in the subsequent subsections. overcome through statistical methods. Buchheit et al. [43] also 73
22 presented KDD process for the construction industry. Data 74
23 20 [Fig. 5 about here.] preprocessing is found to be the most challenging and time- 75
25 21 1) Statistics: In scientific studies, rigorous and efficient applicability of KDD to construction datasets for identifying 77
26 22 techniques are used to answer research questions. Careful causes of construction delays, cost overrun, and quality con- 78
28 24 investigations. Statistics is the study of collecting, analysing, Carrilli et al. [32] used data mining to learn from past 80
29 25 and drawing conclusions from the data, with the primary projects and improve future project delivery. Approaches such 81
30 26 focus on selecting the right tools and techniques at every as text analysis, link analysis and dimensional matrix analysis 82
31 27 data analysis stage [29]. Right from the data collections, are performed on data from multiple projects. Liao et al. [45] 83
32 28 to efficiently analysing it, and then inferring or formulating employed association rule mining to proactively prevent oc- 84
33 29 conclusions out of it, all of these steps comes under the scope cupational injuries. In another similar study [46], data mining 85
34 30 of statistics [30]. Various fields of analytics are borrowing is used to explore the causes and distribution of occupational 86
35 31 techniques from statistics [29]. injuries and revealed that falls and collapses are the primary 87
36 32 Examples of Construction Research using Statistics: The reasons of occupational fatalities. While Oh et al. [47] em- 88
37 33 industry is employing statistical methods in a variety of appli- ployed DW in construction productivity data, which is utilised 89
38 34 cation areas, such as identifying causes of construction delays using a multi-layer analysis through OLAP in the proposed 90
39 35 [31], learning from post-project reviews (PPRs) [32], decision system. SQL is quite prevalent in the industry for querying 91
40 36 support for construction litigation [33], detecting structural partial BIM models query languages such as Express Query 92
41 37 damages of buildings [34], identifying actions of workers and Language (EQL) and Building Information Modelling Query 93
42 38 heavy machinery [35], [36], etc., are to name a few. Language (BIMQL) are developed in the various construction 94
54 49 ability of particular technique(s) for solving the given business a sub-field of Artificial Intelligence (AI), focuses on the task 105
55 50 problem. Models with the highest accuracy and tolerance are of enabling computational systems to learn from data about 106
56 51 chosen and applied to the actual data for generating predictive specific task automatically. ML tasks can be categorized into: 107
57 52 results (including predictions, rules, probability, and predictive i) classification (or supervised learning); ii) clustering (or 108
59
60
61
62
63
64
65
5
1
2
3 1 [51]. status, education level, number of family members to support, 56
4 2 ML has many applications across the construction applica- safety knowledge, drinking habits, direct employer, and indi- 57
5 3 tions, such as the modelling of judicial reasoning and pre- vidual safety behaviour. 58
14 11 cerned about predicting the numerical value of a target variable focused decisions are involved. Since these algorithms learn 66
15 12 based on input variables. For instance, estimating the cost of by examples, carefully crafted examples of correct decisions 67
16 13 the design based on design specifications. Regression can be of aside with input data are vital for algorithms to learn pre- 68
17 14 the following types. The simple linear regression that is used cisely. These algorithms learn to mimic the examples of right 69
18 15 for modelling the relationship between a dependent variable y decisions contrary to clustering in which algorithms decide 70
19 16 and one explanatory variable x. Multiple linear regression that on their own without prior guidance. Classification intends 71
20 17 is used for modelling the relationship between one dependent to choose a single choice from the limited set of possible 72
21 18 variable (continuous) and two or more explanatory variables. choices. Prominent classification algorithms include Logistic 73
22 19 This is commonly used regression approach. The logistic Regression, Naive Bayes, Decision Trees, and Support Vector 74
23 20 regression that is used for modelling the relationship between Machine (SVM). These algorithms are slightly discussed in 75
24 21 on categorical dependent variable and one or more explanatory the subsequent sections. 76
29 26 2 toDF ( ” l a b e l ” , ” f e a t u r e s ” ) within the given set of cases. The attributes are considered 81
27 3
30 28 4 v a l r e g = new L o g i s t i c R e g r e s s i o n ( ) . s e t M a x I t e r ( 1 5 ) independent of each other, and this consideration is known as 82
33 31 7 model . t r a n s f o r m ( d f ) . show ( ) cation is made by taking into account the prior information 85
32 8
34 33 9
and likelihood of incoming information to constitute posteriori 86
44 2 val t r ai n in g = s p l i t s (0) 94
41 reveal that cost and time overruns are frequently occurring 3 val t e s t = s p l i t s (1) 95
45 42 delay factors. Similarly, Sambasivan et al. [62] studied relation- 4 v a l model = N a i v e B a y e s . t r a i n ( t r a i n i n g , lambda = 1 . 0 , 96
46 43 ship between the cause and effect of delays in the Malaysian modelType = ” m u l t i n o m i a l ” ) 97
47 44 construction industry using regression models. 5 v a l p r e d i c t i o n A n d L a b e l = t e s t . map ( p => ( model . 98
53 50 Chan et al. [64] employed multiple regression analysis for Classifiers: Jiang et al. [34] presented a Bayesian probabilis- 103
54 51 predicting the partnering success of contracting parties. tic methodology for detecting the structural damages. Bayes 104
55 52 Fang et al. [65] applied logistic regression analysis to factor evaluation metric is computed from Bayes theorem 105
56 53 explore the relationship between safety climate and individual and Gaussian distribution assumption for accurate damage 106
57 54 behaviour. The results demonstrate the vivid relationship of identification. The effectiveness of the proposed techniques 107
58 55 safety climate and personal behaviour such as gender, marital is reported for assessing damage confidence of structures 108
59
60
61
62
63
64
65
6
1
2
3 1 over five damaged scenarios of four-story buildings bench- an SVM has a unique solution (since it involves optimisation 57
4 2 mark. Gong et al. [35] presented a framework for automated of a concave function). SVM uses kernel methods to map data 58
5 3 classification of actions of workers and heavy machinery from input/parametric space to higher level dimensional feature 59
6 4 in complex construction scenarios. They employed Bag-of- space. Listing 4 shows MLlib code to illustrate SVM, where 60
7 5 Video-Features-Model alongside the Bayesian probability for algorithm builds a model, compute accuracy on test data, and 61
8 6 evaluating and tuning action discovery. It is revealed that the evaluate the model. 62
14 7 v a l model = SVMWithSGD . t r a i n ( t r a i n , n u m I t e r a t i o n s ) 69
13 data, the proposed approach is found to measure the stiffness 8 70
15 14 accurately. The approaches as mentioned earlier are reportedly 9 v a l s c o r e A n d L a b e l s = t e s t . map { p o i n t => 71
16 15 revealed as compute-intensive; hence require contemporary 10 v a l s c o r e = model . p r e d i c t ( p o i n t . f e a t u r e s ) ( s c o r e , 72
17 16 Big Data technologies for enhanced accuracy and response. point . label )} 73
74
18 11
34 32 5 val impurity = ” gini ” ment classification using models, based on SVM and Latent 89
33 6 val maxDepth = 5
35 34 7 val maxBins = 32
Semantic Analysis (LSA). The classification accuracy of these 90
36 35 8 val model = D e c i s i o n T r e e . t r a i n C l a s s i f i e r ( t r a i n D a t a , models is compared and contrasted against the Gold standard 91
38 37 maxDepth , maxBins ) attained (with accuracy between 71% to 91%) than the pre- 93
38 9 v a l l a b e l A n d P r e d s = t e s t D a t a . map { p o i n t => v a l viously used models. In another study [70], a construction
39 39 p r e d i c t i o n = model . p r e d i c t ( p o i n t . f e a t u r e s ) (
94
40 40 point . label , prediction ) legal decision support system is developed using SVM. SVM 95
43 41 Examples of Construction Research using Decision Trees: second, and third-degree polynomial kernel SVM models are 98
44 42 Pietrzyk et al. [66] studied the issue of mould germination in compared and contrasted. Highest accuracy is revealed for the 99
45 43 building structures using fault tree analysis. Structure related first and second-degree polynomial SVM, of 76% and 85% 100
46 44 deficiencies that are introduced during the construction process respectively, implemented using TF-IDF. Similarly, SVM is 101
47 45 are identified and classified. A probabilistic quantification used in fault detection system for HVAC under real working 102
48 46 model is generated to compare building structures based on conditions [71]. The SVM classifiers for fault detection and 103
47 their tendency for mould germination. Desai et al. [67] have isolation (FDI) are developed. The proposed approach can 104
49
48 employed decision trees to analyse and assess the labour pro- efficiently detect and isolate many typical HVAC faults. 105
50
51 49 ductivity in the construction industry. The traditional decision
4) Artificial Neural Networks (ANN): Artificial Neural 106
55 53 3) Support Vector Machines (SVM): SVM is a widely problems. Multi-layer perceptron (MLP) is the most commonly 110
56 54 used technique that is remarkable for being practical and used type of ANN. ANNs are typically made up of three 111
57 55 theoretically sound, simultaneously. SVM is rooted in the field layers including an input layer, hidden (intermediate) layer, 112
58 56 of statistical learning theory, and is systematic: e.g., training and output layer. 113
59
60
61
62
63
64
65
7
1
2
3 1 Data samples in MLP neural network are normalised and evolution process. It computes better solutions to optimisation 59
4 2 are fed into the input layer. This data moves from the input problems using the concepts such as inheritance, mutation, 60
5 3 layer to one or two hidden layers and is finally passed onto the selection, and crossover. Typically GA algorithms involve 61
6 4 output layer, producing an output of the given ANN algorithm. creating two integral components, including (i) genetic rep- 62
7 5 Typically, x:y:z is used to describe the ANN topology in which resentation (array of bits) of the problem, and (ii) a fitness 63
8 6 x, y, z corresponds to the number of nodes in input, hidden, function to evaluate solution domain. The process starts with 64
7 and output layers, respectively. During the training phase, initiating a solution randomly and then keeps improving it 65
9
8 the values of connections between nodes (a.k.a weights) are through iterative application of mutation, crossover, inversion 66
10
9 adjusted. Back propagation, simulated annealing and genetic and selection unless an optimal solution is found. 67
11 10 algorithms are commonly used for training ANNs. Listing 5 Examples of Construction Research using GA: Chen et 68
12 11 shows MLlib code to explain lifecycle stages of ANN model al. [76] used GA to develop cost/schedule integrated planning 69
13 12 development and evaluation. system (CSIPS) which is focused on assigning crew optimally 70
14 under complex set of constraints pertaining to resources and 71
15 13 1 v al s p l i t s = d a t a . randomSplit ( Array ( 0 . 6 , 0 . 4 ) , seed
14 = 1234L ) workforce. GA couple with BIM and object sequencing matrix 72
16 15 2 val t r a i n = s p l i t s (0) is used to achieve feasible crew assignment in CSIPS system. 73
17 16 3 val t e s t = s p l i t s (1) Similarly, Moon et al. [77] developed an active BIM system 74
18 17 4
for assessing the risks imposed by schedule and workspace 75
19 18 5 v al l a y e r s = Array [ I n t ] ( 4 , 5 , 4 , 3)
19 6
conflicts that typically happens during the construction phase 76
20 20 7 v a l t r a i n e r = new M u l t i l a y e r P e r c e p t r o n C l a s s i f i e r ( ) of a project. This active BIM system used fuzzy and GA 77
21 21 8 . setLayers ( layers ) . setBlockSize (128) algorithms for efficiently generating the optimal plan for 78
35 35 diagnosis. Fang et al. [74] employed ANN for structural Listing 6. A Snippet of MLlib Code for Latent Semantic Analysis
36 36 damage detection. Back propagation algorithm, empowered
37 37 by heuristics-based tunable steepest descent method, is used 1) Precision—is the fraction of retrieved documents, which 92
38 38 for training the neural network. Frequency response functions are relevant. It is useful to assess the quality of LSA 93
39 39 (FRF) are used for structural damage detection. A case study of approaches. 94
40 40 cantilevered beam is analysed for unseen, single, and multiple 2) Recall—is the fraction of the relevant documents, which 95
41 41 damage types. Similarly, ANN is employed alongside GA in are retrieved. Recall mostly informs about the complete- 96
42 42 [73] for fault classification, in which ANN and GA comple- ness of LSA approaches. 97
43 43 mented each other in reconstructing the missing input data. 3) F-Measure—is often used to combine precision and recall 98
44 44 Moselhi et al. [75] deliberated the usefulness of ANN over the for assessing the accuracy of tests. 99
45 45 conventional expert-based systems, employed in developing Listing 6 shows MLlib code to demonstrate the implemen- 100
46 46 various applications for the construction industry. A generic tation of LDA, where a corpus is created, and documents are 101
47 47 neural network based architecture is described, which is val- clustered based on word distribution. 102
54 55 Data), which seek special attention in all the construction evaluation of proposed technique provided satisfactory classi- 109
55 56 industry applications where ANN is employed. fication results. Mahfouz et al. [69] employed a hybrid ML- 110
57 57 5) Genetic Algorithms (GA): Genetic Algorithms (GA) are on top of SVM and LSA. The presented results are relatively 112
58 58 evolutionary ML algorithms that are inspired by the natural better than approaches based on a single ML technique. Salama 113
59
60
61
62
63
64
65
8
1
2
3 1 et al. [78] employed LSA-based classifiers for this purpose Based on these ratios (as independent variables), an automated 56
4 2 where the documents clauses are automatically classified into ML-based algorithm (Ripple Down Rules) is employed to 57
5 3 predefined categories such as environmental, health, etc., be- classify the overrun potential of construction projects into 58
6 4 fore rule extraction. The developed method is reported to following discrete values of Near, Overrun, BigOverrun. This 59
7 5 achieve 100% and 96% recall and precision respectively. exploration has revealed interesting rules for assessing the 60
13 11 classification, document analysis, image-based classification, uses site videos to measure the workers’ behaviour towards 66
14 12 the classification for predicting project overrun, and finally, the safety. The proposed approach analyse the 3D skeleton motion 67
15 13 classification for safety analysis. Pointedly, these applications model of the workers to identify their actions. Since safe and 68
16 14 need to be revamped with the Big Data technologies, since they unsafe actions are known, so the training data is correctly 69
17 15 present similar challenges of high dimensionality, velocity, labelled for safe and unsafe actions, which is exploited by 70
18 16 and variety. Besides, these applications also involve classy the classifier for learning. As a case study, the motion of 71
19 17 computation while performing domain-specific tasks. worker while climbing the ladder is analysed. It is revealed 72
27 25 The prototype system is evaluated and found very relevant. characteristics. Intuitively, clustering is akin to unsupervised 78
28 26 Rehman et al. [81] classified construction documents into classification: while classification in supervised learning as- 79
29 27 two distinct groups of good and bad information-containing sumed the availability of a correctly labelled training set, the 80
30 28 documents. Three layered ML approach is employed. Decision unsupervised task of clustering seeks to identify the structure 81
31 29 Trees (DT), Naive Bayes, SVM, and KNN algorithms are used of input data directly. Items in one cluster are similar to each 82
32 30 to check the accuracy of classification. Except for the DT, the other whereas different from the items of other clusters. Some 83
33 31 rest of algorithms have significantly improved the classification of the examples of clustering algorithms include K-means, O- 84
34 32 accuracy. Similarly, Liu et al.[82] presented the process for means, fuzzy K-means, and canopy. Listing 7 shows MLlib 85
35 33 structured document retrieval for engineering based document code for clustering data using K-Means and evaluating the 86
37 1 val numClusters = 2 88
35 Document Analysis: Soibelman et al. [83] proposed a 2 v a l n u m I t e r a t i o n s = 20 89
38 comprehensive platform to store and analyse unstructured doc-
36 3 v a l c l u s t e r s = KMeans . t r a i n ( p a r s e d D a t a , n u m C l u s t e r s , 90
39 37 uments used within a construction project. The system captures numIterations ) 91
40 38 the essential attributes of these document types containing 4 v a l WSSSE = c l u s t e r s . c o m p u t e C o s t ( p a r s e d D a t a ) 92
41 39 diverse data about text, web, image, and linking and stores Listing 7. A Snapshot of MLlib Code for K-Means
42 40 it in an analytic-friendly format. These documents are then
43 41 automatically linked to the appropriate binary files (building Examples of Construction Research using Clustering: 93
44 42 models) using different ML classifiers, which dramatically Ng et al. [88] used clustering to group the facilities based 94
45 43 improved the information retrieval and significantly reduced on the deficiency descriptions stored in the facility condition 95
46 44 overall searching time of project managers. assessment database. The results have shown that facility 96
47 deficiencies are unique and always a function of location and 97
48 45 Image-Based Classification: Construction site photography
46 logs comprise a significant chunk of construction documen- type of the facility. Fan et al. [89] employed clustering for 98
49 developing construction case retrieval system to identifying 99
50 47 tation. A novel ML-based classification system is proposed
48 in [84] which uses Whitening Transform (WT), SVM, and accidents occurred in the past. The goal is to resolve the 100
51 disputes before provoking litigation and work interruptions. 101
49 Biased Discriminant Transform (BDT) algorithms to classify
52 It is noticed that the NLP based approaches performed far 102
50 and index construction site images. The proposed approach has
53 51 significantly boosted the results of traditional search engines. better than case-based reasoning techniques, while measuring 103
55 52 Predicting Overrun Potential: Williams et al. [85] analysed A hybrid approach is adopted in [90] to group construction 105
56 53 highway project bidding data for interested trends informing project documents automatically. The approach initially uses 106
57 54 about project overruns. Data exploration revealed that bids clustering to generate classes for these documents based on 107
58 55 with higher ratios tend to have significant cost overruns. textual similarity measures. Later on text classifier is used 108
59
60
61
62
63
64
65
9
1
2
3 1 to classify relevant documents from the construction docu- et al. [96] developed IR system called Knowledge Map Model 54
4 2 ment information system. This hybrid approach has drastically System (KMMS) to facilitate construction professionals for 55
5 3 improved the recall and F-measure. Clustering becomes non- managing and reusing construction knowledge from a va- 56
6 4 trivial with massive datasets comprising millions of dimen- riety of unstructured documents. Fan et al. [97] proposed 57
26 23 of contract documents. The Kappa score and F-measure have idea of document discourse, which determines the semantic 78
27 24 significantly improved knowledge acquisition, while construct- similarity of documents. A classification algorithm, using 79
28 25 ing legal ontology. The works in [92], [93], [94] proposed document discourse, is implemented for classifying project 80
29 26 an NLP-based information extraction system for automated documents. The system is evaluated by a group of experts. 81
54 49 Examples of Construction Research using IR: Demian et massive datasets to uncover non-obvious correlations related to 104
55 50 al. [95] developed CoMem-XML system to augment searching design, procurement, materials, and supply-chain, which could 105
56 51 through granularity and context. The system is enhanced lead to waste during the actual construction stage. It explores 106
57 52 for contextual similarity, which is revealed to be of greater waste data in a forward-looking way [104], [106]. Advanced 107
58 53 usefulness and usability to construction professionals. Tserng analytical approaches could be employed to forecast waste and 108
59
60
61
62
63
64
65
10
1
2
3 1 prescribe the best course of actions to pre-emptively minimise in next generation GD tools. Table XIV summarises the state 55
13 12 ularly, robust waste generation estimation models, BIM-based solutions are still tedious and time-consuming for efficient 65
14 13 optimal materials selection during design specification, and process automation. There are two aspects of these systems. 66
15 14 holistic waste minimisation framework are key research areas Firstly, adequate knowledge management is at the crux of 67
16 15 which call for the applications of these Big Data technologies these systems to achieve accuracy. Wang et al. [38] proposed 68
17 16 to be employed. Table XIV summarises the state of the art and a knowledge-based system for acquiring, formulating, and de- 69
18 17 potential opportunities for resource and waste optimisation. ploying knowledge in BIM-enabled MEP design coordination. 70
19 18 Some of these opportunities are further explained in Section However, much is required in this direction. Additionally, for 71
22 20 [TABLE 7 about here.] These aspects are the subject of Big Data technologies, which 74
25 21 B. Value Added Services capabilities. Table XIV summarises the state of the art and 77
32 27 generate many designs automatically based on the specified maintaining, and rehabilitating the pavement structures. These 84
33 28 design objectives, such as functional requirements, material models use a large number of variables and their great combi- 85
34 29 type, manufacturing method, performance criteria, and cost nations, in which they influence each other as well as overall 86
35 30 restriction, among others. The intended GD tools employ so- model performance, and are developed using simple statistical 87
36 31 phisticated algorithms to synthesise design space and generate approach (like linear regression) to computational intelligence 88
37 32 a wide assortment of design solutions that meet the given techniques (as ANN). Karagah et al. [109] evaluated various 89
33 design requirements. These designs are presented to designers prediction models for predicting their accuracy for pavement
38 90
34 for evaluation based on their performance. This evaluation deterioration trends. Their evaluation shows that these system
39 91
35 enables the designers to reiterate designs by adjusting design involve computation-savvy analysis, which is time-consuming
40 92
36 goals and constraints unless a design is produced to their and hard for traditional technologies to process at a real
41 37 satisfaction. Advancements in this field can bring lots of bene-
93
52 48 The tool has to generate and compare a permutation of (1) the problems that have clearly defined and logical solutions; 104
53 49 models for single client requirement. This field requires more and (2) the problems that have approximate heuristic solutions 105
54 50 R&D for getting mature to be usable in the enterprise-grade (and no logic-based straightforward solution applies). The 106
55 51 applications. These challenges of GD tools are expressly the former category is handled through automated approaches, 107
56 52 jurisdiction of using Big Data technologies. These technologies whereas the later ones are tackled through visualisation. 108
57 53 can undoubtedly bring new levels of usability, accessibility, Human knowledge, creativity, and intuition are pivotal for 109
58 54 and democratisation in the design exploration and optimisation effective visualisation. Human knowledge works perfectly with 110
59
60
61
62
63
64
65
11
1
2
3 1 smaller datasets, but its application in involving high dimen- networking services (BSNS) over the cloud-enabled platform. 58
4 2 sional larger datasets becomes impractical. The field of Visual The goal is to enhance the overall comprehension of BIM 59
6 4 soning and visualisation to solve complex analytical problems. However, a robust framework is required to capture every 61
7 5 Such systems are phenomenal to empower analytical abilities useful social interaction into the BIM right from the design 62
6 of users while perceiving, understanding, and reasoning about to end-of-life of the building. Since data of social interactions 63
8
7 complex and uncertain situations. VA is one of the key domains are likely to be in variety, velocity, and volume, Big Data 64
9
8 that require Big Data technologies to execute data visualisation technologies could be harnessed to develop interesting domain 65
10 applications for enhancing the productivity of stakeholders.
9 to provide personal views and interactive exploration of data. 66
11 Table XIV summarises the state of the art and potential
10 One of the key reasons behind the widespread adoption 67
12 opportunities for this subdomain.
11 of BIM lies in its versatile visualisation capabilities. Existing 68
13
12 software are quite competitive to visualise all dimensions (nD) 6) Personalized Services: In personalised services, the pri-
14 69
13 of the design using the right set of tools and techniques. mary emphasis lies on an adaptation of the given facilities 70
15 14 In this context, Castronov et al. [110] studied the role of
16 based on the user’ choice. The users are empowered to 71
15 visualisation in 4D construction management. Shortcomings of control the overall usage of services the way they desire. 72
17 16 existing BIM visualisation are identified, and general guideli-
18 These systems adapt based on various parameters such as user 73
17 nes/ protocols are prescribed for developing 4D visualisation behaviour. The input to such services could be manual as well 74
19 18 in BIM authoring tools. To enable participation of technically
20 as automatic. 75
19 unskilled BIM users, Zhadanovsky et al. [154] studied the issue Gao et al. [116] developed SPOT+ system to enable office 76
21 of generating master plan visualisation. Similarly to promote
20
workers to personalise the indoor thermal comfort. SPOT+ 77
22 sustainable energy use, Goodwin et al. [111] employed VA
21
used Predictive Personal Vote (PPV) to automatically adjust in- 78
23 22 for classifying energy users. The data of household energy door thermal comfort that mainly involve heating. The system 79
24 23 consumption along with geo-demographic data is used for turns on the heating before the arrival of occupants whereas 80
25 24 deeper insights. Classification is reported to enable clusters turns the heating off immediately after their departure. Rabbani 81
26 25 and trends for understanding energy usage. However, state-of- et al. [117] proposed an enhanced personalised thermal com- 82
27 26 the-art approaches of visualisation are needed during clustering fort system called SPOT* that enables users to adjust lower 83
28 27 process and decision making to enable overall comprehension. and upper bounds of indoor temperature as desired, which is 84
29 28 Chuang et al. [112] studied the development of a cloud-enabled automatically regulated accordingly. SPOT* supports heating 85
30 29 web-based system for BIM visualisation and manipulation. The as well cooling of indoor spaces. The system has significant 86
31 30 system improved communication and distribution of relevant potential for energy reduction while maintaining the overall 87
32 31 information among the stakeholders. comfort at desired level. Panagopoulos et al. [118] proposed 88
33 32 The scope of BIM is widening with more applications the AdaHeat system that uses intelligent agents to regulate 89
34 33 from construction as well FM stage has started utilising and the heating for domestic consumption. A novel aspect of this 90
35 34 extending it. As BIM data grows, these models get highly system is that it requires minimal user input. Chen et al. 91
36 35 dimensional, so the visualisation of high-dimensional BIM [119] studied the correlation of human behaviour and energy 92
37 36 models is challenging. VA is essential to both BIM and consumption in smart homes. Computational models to predict 93
38 37 Big Data and provides sophisticated techniques to improve energy consumption based on user behaviour are developed. 94
39 38 BIM and Big Data visualisation for better comprehension and These models are used to develop a web-based system that 95
40 39 interpretation. Table XIV summarises the state of the art and provides user with insights based on behaviour for optimal 96
55 54 Meadati et al. [114] studied the integration of RFID, BIM, and Facilities management (FM) integrate organisational pro- 110
56 55 social media to support facility managers in locating data from cesses to maintain the agreed services that support and improve 111
57 56 multiple documents. Jiao et al. [115] brought the web3D-based the effectiveness of its primary activities. Operations and man- 112
58 57 AR environment for integration of BIM and business social agement are the central parts of FM and are the longest stage in 113
59
60
61
62
63
64
65
12
1
2
3 1 whole building lifecycle. Mostly FM activities (such as assets data sources and to process it for generating actionable in- 57
4 2 management, preventive maintenance, etc.) are laborious, and sights. Despite BEMSs use the state-of-the-art multi-processor 58
5 3 the efficiency of such tasks can improve by incorporating infrastructures, the issue of data management and processing 59
6 4 suitable supporting technology. Localization information is is reported to have taxed the boundaries of these systems. 60
7 5 of great importance to these technology solutions. Today Hong et al. [125] proposed a cloud-based storage system to 61
8 6 these facilities utilise advanced automation and integration to store and process energy data generated from a network of 62
7 measure, monitor, control, and optimise building operations thousands of Zigbee sensors. To persist this data, Singh et al. 63
9
8 and maintenance. They provide adaptive, real-time control over [126] proposed cloud-based storage and processing architec- 64
10
9 an ever-expanding array of building activities in response to a ture. Berges et al. [127], [128] proposed novel approach for 65
11 10 wide range of internal and external data streams. As investment identifying appliances and their events (on/off or low/high) to 66
12 11 ramps up and more intelligent systems are brought online, measure their electric consumption precisely from the electric 67
13 12 more data will enter the energy management platform at faster influx. It is reportedly revealed that proposed approach requires 68
14 13 speeds. emerging data management and processing capabilities for real 69
15 14 Taneha et al. [120] proposed an approach to determine the life deployment. Similarly, Goodwin et al. [111] employed 70
16 15 FM personal location information using localization technolo- visual analytics for energy users classification. It is highlighted 71
17 16 gies to support FM related activities. The system employs that state-of-the-art approaches of visualisation are at the 72
18 17 three technologies like RFID, Wireless LAN, and Inertial core of clustering process, decision making, and enhanced 73
19 18 measurement units (IMUs) for this localization. To reduce the overall comprehension of energy consumption. Wei et al. [128] 74
20 19 FM cost, Ng et al. [121] applied knowledge discovery and proposed an IOT-based framework to monitor and analyse the 75
21 20 data mining over the facilities maintenance databases. Liu et energy consumption of Smart Buildings. 76
22 21 al. [122] evaluated the capabilities of BIM to support the FM The software as mentioned above perfectly presents the 77
23 22 operations. A detailed needs of FM professionals are identified opportunities for Big Data analytics to advance the field. Point- 78
24 23 to harness BIM to support relevant tasks. The factors affecting edly, energy-related data is of immense importance for various 79
25 24 the maintainability of facilities are mainly considered. analytics, which is usually discarded by building owners and 80
26 25 Motamedi et al. [155] highlighted three challenges faced by utility companies at a time interval. To present this data nicely 81
27 26 the majority of FM systems. These include (i) inefficient & for advanced analytics is the next frontier of innovation in this 82
28 27 time-consuming searching interfaces, (ii) no unified interface field. Table XIV summarises the state of the art and potential 83
29 28 for FM system to exchange information, and (iii) inability to opportunities for this subdomain. 84
37 state of the art and potential opportunities for this subdomain. (BIM) is conceived to revolutionise construction industry in 91
38
many aspects [156], [131]. BIM is empowered with an extra 92
39
layer of data, captured throughout the whole building lifecycle 93
40 D. Energy Management & Analytics
38
[131], [132]. This data can be unleashed to develop useful ap- 94
41 39 Two type of energy software are prevalent. Firstly, building plications for improving the overall building delivery process. 95
42 40 energy simulation software to model the energy consumption Theoretically, BIM is declared as the de facto standard for 96
43 41 of buildings. Their accuracy depends on the accuracy of managing building data, its applications, in practice, across 97
44 42 provided parameters that are fine-tuned by experts. This fine every lifecycle stages of building are yet to develop, however. 98
45 43 tweaking is laborious and time-consuming. Automatic fine Preconstruction stages are well-known for widely adopting the 99
46 44 tweaking involves lots of computations. Sanyal et al. [123] BIM, whereas, it is progressively used lesser in the later stages 100
47 45 studied the automatic generation of accurate input model of building lifecycle [5]. Substantial research is made to extend 101
48 46 with proposed Autotune workflow for the EnergyPlus energy BIM for encapsulating different types of related data. 102
49 47 simulation software. Pointedly, it is informed that the software Goedert et al. [133] extended BIM for construction process 103
50 48 operates on raw data of about 270 terabytes, and condenses documentation. Chiang et al. [157] integrated power consump- 104
51 49 that to approximately 80 terabytes of useful data. Data storage, tion data with BIM models. Isikdag et al. [134] integrated 105
52 50 transfer, and processing such datasets is inevitably the subject geographic information systems (GIS) data with BIM for 106
53 51 of Big Data technologies. developing a fire response system. Yeh et al. [158] employed 107
54 52 Secondly, Building Energy Management Systems (BEMSs) BIM for onsite building information retrieval using augmented 108
55 53 are vital for buildings. And as part of their architecture, reality. Wang et al. [38] extended BIM for spatial conflict data 109
56 54 hundreds to thousands of sensors are installed to capture for MEP models. Yu et al. [135] integrated BIMserver with 110
57 55 data. Linda et al. [124] used computational intelligence based OpenStudion (a platform for assessing the energy efficiency 111
58 56 anomaly detection to fuse data from multiple heterogeneous of building designs). Das et al. [20] tailored BIM for social 112
59
60
61
62
63
64
65
13
1
2
3 1 interactions taking place while reviewing and commenting Amarnath et al. [161] deployed Revit Server on the cloud for 58
4 2 on different aspects of the design. Zheng et al. [136] in- collaboration and coordination of architectural and structural 59
5 3 tegrated BIM with diverse project data sources. Chaung et models. Rawai et al. [162] explored cloud computing for 60
6 4 al. [112] exploited BIM for cloud-enabled design exploration green and sustainable developments. Fathi et al. [142] used 61
7 5 and manipulation. Jiao et al. [113] sorted out the issues of the cloud for BIM-based context-aware computing. Beach et 62
8 6 integrating BIM with project schedules, progress monitoring al. [143] discussed the issues of enabling Google SketchUp 63
7 data, and work assignments. Meadit et al. [114] integrated over the Amazon EC2 cloud. Chong et al. [144] evaluated 64
9
8 RFID data into BIM to locate project documents. Volk et al. existing cloud computing applications and highlighted Google 65
10
9 [129] illustrated the automated creation of BIM models for Apps, Autodesk BIM 360, and Viewpoint, among others, 66
11 10 existing buildings. support majority of designers features on the cloud. Grilo et 67
12 al. [145] used the cloud for creating e-procurement platform—
11 These illustrate the gradual increase in the size and scope 68
13 Cloud Marketplaces. Jiao et al. [113] exploited cloud frame-
12 of the contents of BIM models, which eventually restricts the 69
14 work for integrating project management data with building
13 capabilities of traditional BIM-based storage and processing 70
15 14 systems. To tackle this, Jiao et al. [6] tailored MapReduce for models. Jiao et al. [115] integrated cloud computing with 71
16 15 storage and processing BIM. However, there are still many latest technologies such as AR and business social networking 72
17 16 use cases which may require sophisticated customizations to services to create virtual environment to visualise better and 73
18 17 the way BIM is stored and processed. So in future, we are understand BIM models. Wong et al. [137] highlighted the 74
19 18 expecting BIM specialised Big Data storage and processing legal issues related to cloud-based BIM models, including 75
20 19 platforms. Up until recently, BIM is envisaged to contain data security, responsibility, liability, and design ownership. 76
21 20 of construction industry only; however the emergence of linked Cloud computing has already accelerated the uptake of IT 77
22 21 building data has changed this perception. Despite linking BIM in the construction industry by transforming many domain 78
23 22 data to inter-industry applications, many interesting applica- specific applications as discussed above. And the role of 79
24 23 tions can be developed by enabling the integration of BIM with Big Data in this transformation is overwhelming. Table XIV 80
25 24 Linked Open Data (LOD) datasets, such as weather, flooding, summarises the state of the art and potential opportunities for 81
29 28 area of BIM. Table XIV summarises the state of the art and fact about Internet is it keeps evolving since its percep- 84
30 29 potential opportunities for this subdomain. tion. It started with Internet-of-Computers and had evolved 85
30 2) Big Data with Cloud Computing: Cloud computing is shift. With fast emerging technologies, the devices are getting 87
32
31 Internet computing paradigm in which on-demand access to a smaller and powerful, and the broadband connectivity is get- 88
33
32 shared pool of configurable resources is provided [159]. The ting cheaper and ubiquitous. This has led to the proliferation 89
34
33 idea is to outsource data storage and computation to third-party of connected devices on the Internet, eventually resulted in an 90
35 exciting trend coined as the Internet-of-Things (IOT) [150].
34 datacentres. Multiple users can simultaneously access the cloud 91
36 The primary vision behind IOT is to bring together the smart
35 services without having to purchase individual licenses. Cloud 92
37 36 computing offers three service models. (i) Infrastructure-as-a- devices and objects the vital parts of Internet. Fusing these 93
38 37 service (IaaS): In IaaS, the user is provided with an abstrac- exciting physical and digital worlds are creating fascinating 94
39 38 tion to manage virtual/physical computers and cloud network opportunities of growth. Some of the popular areas where 95
40 39 services. (ii) Platform-as-a-service (PaaS): In PaaS, a user IOT applications are successfully demonstrated across the 96
41 40 is provided with services pertaining to development environ- industries include logistics, transport, assets tracking, smart 97
42 41 ments such as operating systems, programming languages, or homes, smart buildings, to energy, defence and agriculture. 98
43 42 databases, among others; (iii) Software-as-a-service (SaaS): In Elghamrawy et al. [146] demonstrated RFID usage for 99
44 43 SaaS, the user is provided access to enterprise applications via construction monitoring and quality control. Meadati et al. 100
45 44 the internet such as Revit 360. [114] integrated RFID with 3D BIM documents of assets 101
46 45 Cloud computing is widely adopted in the construction for searching and locating objects quickly. Wei et al. [128] 102
47 46 industry since it supports the integration of tasks in BIM-based proposed an IOT-based framework for building energy mon- 103
48 47 applications. Hong et al. [125] utilised cloud computing for itoring. Zanella et al. [148] presented specifications of urban 104
49 48 building energy management systems using Zigbee sensors. IOT to envision the idea of Smart Cities. Kortuem et al. [163] 105
50 49 Das et al. [20] proposed a cloud-based BIM framework for discussed the technical specifications of the smart object for 106
51 50 integrating stakeholders interactions with BIM. Zhang et al. petrochemical and road construction industries. Curry et al. 107
52 51 [136] utilised private clouds to offer BIM services across the [147] examined the storage and processing of energy sensors 108
53 52 whole building lifecycle. Klinc et al. [139] proposed SaaS data using cloud-based data management framework. 109
54 53 platform for the structural analysis applications. Kumar et The applications of IOT are non-trivial and often deploy 110
55 54 al. [140] employed cloud for SMEs design and construction hundreds or even thousands of sensor devices for data collec- 111
56 55 firms. Chuang et al. [112] used cloud computing for BIM tion. Since construction industry presents unlimited use cases 112
57 56 design exploration and manipulation. Redmond et al. [160] for IOT, Big Data is inherently the subject of interest. IOT and 113
58 57 employed cloud for interoperability between BIM applications. Big Data are complementary trends, with former to generate 114
59
60
61
62
63
64
65
14
1
2
3 1 large volumes of data and the later to store and analyse buildings APIs can bridge this technology gap and enable 58
4 2 these data at the real time in construction specific domain integration of sensors, users, control systems, machinery for 59
5 3 applications. Table XIV summarises the state of the art and providing innovative smart building services that promise com- 60
6 4 potential opportunities for this subdomain. fort, safety, and energy. Table XIV summarises the potential 61
10 8 energy and producing lots of greenhouse gases [159]. Smart reality (AR), which is an offshoot of virtual reality, is the field 65
11 9 building technology is a paradigm shift to embrace the integra- in which computer-generated virtual objects are superimposed 66
12 10 tion of contemporary technologies with the prevailing building over real-world scenes to produce mix worlds. It enables a 67
13 11 systems for striking the trade-off between the comfort maximi- semi-immersive environment that accurately aligns real scenes 68
14 12 sation and energy minimisation [149]. Building systems such with corresponding virtual world imagery. This mixed overlay 69
15 13 as building automation, life safety, telecommunications, user enables the users to obtain additional information about the 70
16 14 systems, facility management systems, among others, provide real world. It is an emerging technology for enhancing human 71
18 16 allows the users to control their interactions with building Rankohi et al. [165] argued that visualisation and simulation 73
19 17 services better. The smart building incorporates technologies aspects of the construction industry apps can be revamped 74
20 18 into building systems through a unified view. Often, these with AR to enhance their usability. Some of the exciting AR 75
21 19 systems generate vast amounts of data and majority of this application areas are highlighted such as virtual site visits, 76
22 20 data remain untapped and often discarded. To truly realise proactive schedule dispute identification and resolution, and 77
23 21 smart buildings, this data of unprecedented size need to be as-planned vs. as-built comparison. Chi et al. [166] pointed 78
22 analysed—a task that presents significant data management out the following four pillars for wider AR adoption in the 79
24
23 and processing issues. To this end, Big Data analytics is of construction industry. (i) Localization, the ability to accurately 80
25
24 immense importance to optimise total building performance impose virtual object on the real-life scene. (ii) A natural user 81
26 interface, which provides easy and intuitive user experiences to
27 25 via predictive analytics. 82
26 McKinsey [159] highlighted smart buildings amongst the increase the usability of AP software. (iii) Cloud computing, 83
28 which enables apps to store and retrieve information seam- 84
29 27 top ten emerging technology businesses. Azam et al. [149]
28 implemented a prototype software Project Dasher to illustrate lessly everywhere, and (iv) mobile devices, which are getting 85
30 smaller, cheaper, and powerful and play a vital role in AR 86
31 29 Smart Buildings. Data of sensors related to motion, CO2 ,
30 temperature, airflow, lighting, and other acoustics properties environment. William et al. [153] went ahead by bringing 87
32 BIM, mobile technology and AR together. The BIM aspects of 88
33 31 are gathered and analysed. It is reportedly revealed that more
32 than 2 billion data entries are accumulated in 3 months that geometry translation, indoor localization, attribute assignment, 89
34 and registration are explored for integration with mobile AR. 90
33 reached the limits of legacy relational databases. Stankovic
35 The study proposed BIM2MAR, which provides general guide- 91
34 et al. [150] developed sensor based fire-fighting systems for
36 lines for integrating BIM with mobile AR. It is emphasised 92
35 skyscraper office building with the authorities to detect fires,
37 36 alter fire situations, and aid in evacuation. Bonino et al. [151] robust BIM integration requires new approaches for BIM 93
38 37 studied complex event processing in smart buildings. spChain geometry conversion and indoor localisation of BIM using 94
39 38 framework is proposed to support the real-time processing of geo-coordinates. Jiao et al. [115] developed a web3D-based 95
40 39 sensor data. Miller et al. [152] analysed significant energy data AR environment to integrate BIM, business social networking 96
41 40 through proposed DayFilter approach to precisely identify the services (BSNS), and cloud services. 97
42 41 diurnal patterns from the data. AR and Big Data inevitably converge. The complexity 98
54 53 APIs are often proprietary and lack standardization in many There are many interesting open research issues within the 109
55 54 cases. For these reasons, these APIs can only be exploited by construction industry for Big Data. Some of these include (but 110
56 55 the BMS software itself, and are not amenable to the third- are not limited to) the following: 111
4 2 the construction industry. Estimating construction waste accu- limited utilisation across the construction and FM stages of 58
5 3 rately, at the early stages of design or as the project proceeds, is the building. The real intent of BIM could never be achieved 59
6 4 core to so many exciting project activities. Particularly, waste until it is employed in every stage of the building lifecycle. 60
7 5 estimation is preliminary to waste minimisation at the early At present, no such mechanism can facilitate the tracking of 61
8 6 stages of design, where it provides insights about how the de- progress of various construction sites using automated tools. 62
7 sign is generating waste. These insights enable the designers to It is indeed labour-intensive as well impractical (to some 63
9
8 explore further and carry out corrective measures proactively, extent) to update the BIM model with such minute details 64
10
9 for waste efficiency at the early stages of design. So, construc- pertaining to the daily construction progress. As a result, real- 65
11 10 tion waste estimation has become the key research question time construction progress monitoring is not an easy task, 66
12 11 in construction waste management research. This estimation because managers are required to visit their sites regularly and 67
13 12 requires thorough design exploration and optimisation from a assess the progress subjectively with the intended schedule, 68
14 13 myriad of dimensions. Existing waste estimation models are which is less effective and error-prone. Employing Big Data 69
15 14 based on very limited, and static project attributes such as and sensing technologies could move the state of the art in 70
16 15 GFA, project contract sum, etc. [107], [108], [167], [168]. domain of construction progress monitoring to the next level. 71
17 16 However, these attributes are incapable of informing about Using latest imaging technology, the progress of the on-going 72
18 17 the true size of construction waste, hence unable to generate construction is captured at the real time. Big Data analytics 73
19 18 a reliable waste estimate, regardless of how much data is will process the real-time streams of these images to measure 74
20 19 used during their model development. A comprehensive waste the daily change and updated the BIM models and construction 75
21 20 estimation model that considers dynamic project attributes of schedule accordingly. The project managers are presented with 76
22 21 deconstruction, standardisation and dimension coordination, an update to date progress on the schedule, which will, in turn, 77
23 22 reuse and recycling, and procurement, among together, needs enable them to see whether they are lagging behind on the 78
24 23 to be developed. The model is also required to consider project or still follow the schedule. Accordingly, the project 79
25 24 many attributes of construction materials, which heralds the managers can proactively respond in case of any delay is 80
26 25 development of a comprehensive materials database using reported. This will save them a lot of money due to penalty 81
27 26 open and linked data standards. The waste estimation model whenever the deadline is missed, and improve the overall 82
28 27 and construction materials database will be bundled into a project monitoring and control. This is also aligned with the 83
29 28 standard and handy simulation tool, where waste estimates are vision of BIM adoption. In this way, Big Data can help the 84
30 29 visualised onto design elements through analytical dashboard industry to deliver the projects on time. 85
34 33 to backstage its storage and computation related workloads. requirements and the designers experience. Thus, such designs 88
37 35 Existing interoperability efforts in the construction industry the data collected by manufacturers on hundred or thousands 91
36 are mainly concerned about exchanging the building data of their product lines during design specification, which might 92
38
37 between domain-specific applications (architectural, structural, be quite valuable. Similarly, many other sources of data can 93
39
38 MEP, energy simulation, etc.) pertaining to the construction be relevant to designs such as users’ sentiments while inter- 94
40
39 industry. However, many interesting use cases can be achieved acting with facilities, weather, flooding, energy consumption, 95
41 40 from greater integration of BIM data with external data commute pattern in that vicinity, and population densities, 96
42 41 sources such as materials, GIS, sensors, geodata, etc. This to name a few. These datasets could be harnessed to sup- 97
43 42 interoperability, at a wider scale, enables the construction port for example the generation of an optimal construction 98
44 43 industry to achieve automation of its business processes, which schedule. And the good thing is that these data are captured 99
45 44 can improve the overall efficiency of the project participants. using technologies such as the web, sensors, smart meters, 100
46 45 Linked data coupled with the Web of data technologies are mobile phones, etc., and are made available through open 101
47 46 found phenomenal for this integration. Substantial progress is data initiative (in most cases). However, the design world 102
48 47 made to develop various enabling artefacts for this integration is still detached from harnessing these data sources for their 103
49 48 such as ifcOWL ontology [169], [170]. However, much has purpose. Currently, there is no such tool that can facilitate the 104
50 49 yet to be done. To this end, the development of a robust BDA designers to leverage these data during their design activities. 105
51 50 enabled platform that supports the storage and processing of If this is achieved, this can result in the paradigm shift 106
52 51 these diverse linked data sets pertaining to the building as well of Design with Data, where these diverse data sources are 107
53 52 as other data, is required. This platform can provide the basis integrated within the BIM authoring tools and made available 108
54 53 for the development of interesting applications, particularly for to architects, engineers, contractors, and facility managers at 109
55 54 energy analytics and smart buildings. early design stages. Big Data Analytics is indeed the key 110
59
60
61
62
63
64
65
16
1
2
3 1 requirements of sustainability, users, environment, and even that construction industry can gain huge revenue from this 54
4 2 broader infrastructures of emerging concept of smart cities. investment as experienced by other industries, provided the 55
6 3 VI. P ITFALLS OF B IG DATA IN C ONSTRUCTION implication of Big Data is, however, difficult to quantify. More 57
10 7 concern. This section discusses some of these challenges and To monitor project site activities at real-time, instant data 61
11 8 provides suggestions to deal with them for the successful transmission between project sites (dams, highways, etc.) and 62
12 9 implementation and dissemination of Big Data technologies centralised Big Data repository should be supported. However, 63
13 10 across various domain applications of the construction indus- project sites usually have low bandwidth; due to unavailability 64
14 11 try. of sophisticated networking infrastructure in rural, underde- 65
15 veloped areas. Advanced wireless sensor networks need to be
16 12 A. Data Security, Privacy and Protection: 66
21 17 access control, intrusion prevention, Denial of Service (DoS) The effectiveness of Big Data cannot be measured just by 71
22 18 prevention, etc. [171], [172], [173]. These issues also require accumulating large volumes of data; it is more of the use 72
23 19 more study in the context of BIM-related construction data, cases or industrial problems that dictate the usefulness of 73
24 20 and the appropriate solutions also need to be adopted in the these technologies. It is feared that the construction industry 74
25 21 underlying analytics workflows. might not extract the full value of accessible Big BIM Data 75
26 22 B. Data Quality of Construction Industry Datasets: if the conceived use cases are vague. To this end, researchers 76
28 23 The construction industry is well-known for fragmented data problems that are the subject of Big Data. This way Big Data 78
29 24 management practices. Despite the aggressive promotion of as a technology will not be the driving force rather the industry 79
30 25 BIM, companies using BIM are rare. Null values, misleading itself will lead the innovation by applying contemporary tools 80
31 26 values, outliers, non-standardised values, among others, are to solve its topical issues. Additionally, Big Data is not the 81
32 27 some of the essential traits of industry data. And producing silver bullet, it merely sets the stage. Skilled professionals 82
33 28 high-valued analytics is challenging due to poor data manage- and domain experts, empowered with sophisticated analytical 83
34 29 ment practices. High-quality data is preliminary for successful workflows, are equally necessary to reap the overall benefits. 84
35 30 Big Data projects. It is observed that analytics projects usually Without whom, the applications are likely to get into the 85
31 require approximately 80% of time cleaning noisy datasets pitfall of producing too much information that should not be
36 86
32 before embarking on analytics. So, Big Data projects in delivering significant insights for the purpose. 87
37
33 construction industry shall also be specially taken care of, for
38
34 data quality related issues. Otherwise, the resulting insights are
39 35 likely to mislead, which in turn will result in unpleasant and VII. C ONCLUSIONS 88
52 47 construction business is considered amongst the low-profit- precursor to modern Big Data Analytics techniques have been 101
53 48 margin businesses, and introducing such costly add-ons to deployed in various domain-specific construction applications. 102
54 49 projects are more likely to be opposed and difficult to be Principal Big Data technology streams are explained to help 103
55 50 defended. However, Big Data has the potential to enhance the readers to understand the complicated subject. Concepts of Big 104
56 51 overall project delivery by optimising processes and reducing Data Engineering and Big Data Analytics are demarcated; the 105
57 52 risks that companies usually bear due to myriad inefficien- works utilising these technologies across various subdomains 106
58 53 cies such as delays, litigations, etc. It is highly optimistic of the construction industry are deliberated. 107
59
60
61
62
63
64
65
17
1
2
3 1 Through our research, we conclude that while data-driven [14] V. S. Agneeswaran, Big Data Analytics Beyond Hadoop: Real-Time 62
2 analytics have long been used in the construction industry due Applications with Storm, Spark, and More Hadoop Alternatives. FT 63
4 Press, 2014. 64
5 3 to the broad applicability of such techniques in many con-
4 struction subdomains, the adoption of the recent, much agiler [15] H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark: 65
6 Lightning-Fast Big Data Analysis. ” O’Reilly Media, Inc.”, 2015. 66
7 5 and powerful, Big Data technology has been relatively slow.
[16] C.-Y. Chang and M.-D. Tsai, “Knowledge-based navigation system for 67
8 6 Although Big Data trend is gradually creeping in the industry; building health diagnosis,” Advanced Engineering Informatics, vol. 27, 68
7 its applicability is amplified further by many other emerging no. 2, pp. 246–260, 2013.
9 69
8 trends such as BIM, IOT, cloud computing, smart buildings, [17] T. White, Hadoop: The definitive guide. ” O’Reilly Media, Inc.”, 2012.
10 70
9 and augmented reality, which are also slightly elaborated. We
11 [18] P. Helland, “If you have too much data, then’good enough’is good 71
10 presented some of the prominent future works along with enough,” Communications of the ACM, vol. 54, no. 6, pp. 40–47, 2011. 72
12 11 potential pitfalls associated with Big Data while adopting it
13 [19] M. Gelbmann, “Graph DBMSs are gaining in popularity faster than 73
12 in the industry. To the best of our knowledge, this is the any other database category,” 2014. 74
14 13 first in-depth review of the applications of Big Data related [20] M. Das, J. C. Cheng, and S. S. Kumar, “BIMCloud: A distributed 75
15 14 techniques in the construction industry. In our work, we have cloud-based social BIM framework for project collaboration,” in the 76
16 15 identified many potential application areas in which Big Data 15th International Conference on Computing in Civil and Building 77
17 16 techniques can significantly advance the state-of-the-art in the Engineering (ICCCBE 2014), Florida, United States, 2014. 78
18 17 construction industry. This work is of utility and relevance to [21] S. Jeong, J. Byun, D. Kim, H. Sohn, I. H. Bae, and K. H. Law, “A 79
19 18 all the construction researchers and practitioners who will like data management infrastructure for bridge monitoring,” in SPIE Smart 80
Structures and Materials+ Nondestructive Evaluation and Health 81
20 19 to harness the power of Big Data in the construction industry Monitoring, pp. 94350P–94350P, International Society for Optics and 82
21 20 for developing exciting business applications. Photonics, 2015. 83
22 [22] J. C. Cheng and M. Das, “A cloud computing approach to partial 84
23 21 R EFERENCES exchange of BIM models,” in Proc. 30th CIB W78 International 85
26 24 Mifflin Harcourt, 2013. natural science and statistics, 1830–1835,” The Historical Journal, 88
vol. 26, no. 03, pp. 587–616, 1983. 89
27 25 [2] E. Siegel, Predictive Analytics: The Power to Predict who Will Click,
26 Buy, Lie, Or Die. John Wiley & Sons, 2013. [24] R. O. Duda, P. E. Hart, et al., Pattern classification and scene analysis, 90
28 vol. 3. Wiley New York, 1973. 91
29 27 [3] “Blog post by Anand Rajaraman on how ‘More Data Usually
28 Beats Better Algorithms’ (2013) [Online].” http://anand.typepad.com/ [25] S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in action. 92
30 29 datawocky/2008/03/more-data-usual.html. Accessed: 2013-09-12. Manning Shelter Island, 2011. 93
31 30 [4] C. Anderson, “The end of theory,” Wired Magazine, vol. 16, 2008. [26] R. Ihaka and R. Gentleman, “R: a language for data analysis and 94
32 31 [5] R. Eadie, M. Browne, H. Odeyinka, C. McKeown, and S. McNiff, graphics,” Journal of computational and graphical statistics, vol. 5, 95
33 32 “BIM implementation throughout the UK construction project lifecy- no. 3, pp. 299–314, 1996. 96
34 33 cle: An analysis,” Automation in Construction, vol. 36, pp. 145–151, [27] R. Yadav, Spark Cookbook. Packt Publishing Ltd, 2015. 97
35 34 2013. [28] J. Dean, Big data, data mining, and machine learning: value creation 98
36 35 [6] Y. Jiao, S. Zhang, Y. Li, Y. Wang, B. Yang, and L. Wang, “An for business leaders and practitioners. John Wiley and Sons, 2014. 99
36 augmented Mapreduce framework for building information modeling [29] L. Wasserman, All of statistics: a concise course in statistical infer-
37 37 applications,” in Computer Supported Cooperative Work in Design
100
ence. Springer Science & Business Media, 2013. 101
38 38 (CSCWD), Proceedings of the 2014 IEEE 18th International Confer-
[30] D. S. Moore and G. P. McCabe, Introduction to the Practice of 102
39 39 ence on, pp. 283–288, IEEE, 2014.
Statistics. WH Freeman/Times Books/Henry Holt & Co, 1989. 103
40 40 [7] J.-R. Lin, Z.-Z. Hu, J.-P. Zhang, and F.-Q. Yu, “A natural-language-
41 based approach to intelligent data retrieval and representation for cloud [31] H. Kim, L. Soibelman, and F. Grobler, “Factor selection for delay 104
41 analysis using knowledge discovery in databases,” Automation in 105
42 BIM,” Computer-Aided Civil and Infrastructure Engineering, 2015.
42 Construction, vol. 17, no. 5, pp. 550–560, 2008. 106
43 [8] G. Aouad, M. Kagioglou, R. Cooper, J. Hinks, and M. Sexton, “Tech-
43 44 nology management of it in construction: a driver or an enabler?,” [32] P. Carrillo, J. Harding, and A. Choudhary, “Knowledge discovery 107
44 45 Logistics Information Management, vol. 12, no. 1/2, pp. 130–137, from post-project reviews,” Construction Management and Economics, 108
46 47 [9] F. Provost and T. Fawcett, Data Science for Business: What you need [33] T. S. Mahfouz, Construction legal support for differing site conditions 110
48 to know about data mining and data-analytic thinking. ” O’Reilly (DSC) through statistical modeling and machine learning (ML). PhD 111
47 thesis, Iowa State University, 2009. 112
49 Media, Inc.”, 2013.
48 [34] X. Jiang and S. Mahadevan, “Bayesian probabilistic inference for
50 [10] F. T. Matsunaga, J. D. Brancher, and R. M. Busto, “Data mining 113
49 51 techniques and tasks for multidisciplinary applications: a system- nonparametric damage detection of structures,” Journal of engineering 114
50 52 atic review,” Revista Eletrônica Argentina-Brasil de Tecnologias da mechanics, vol. 134, no. 10, pp. 820–831, 2008. 115
51 53 Informação e da Comunicação, vol. 1, no. 2, 2015. [35] J. Gong, C. H. Caldas, and C. Gordon, “Learning and classifying 116
52 54 [11] D. Singh and C. K. Reddy, “A survey on platforms for big data actions of construction workers and equipment using bag-of-video- 117
55 analytics,” Journal of Big Data, vol. 29, no. 9, 2015. feature-words and Bayesian network models,” Advanced Engineering 118
53 Informatics, vol. 25, no. 4, pp. 771–782, 2011. 119
54 56 [12] J. Qadir, N. Ahad, E. Mushtaq, and M. Bilal, “SDNs, clouds, and
57 big data: New opportunities,” in Frontiers of Information Technology [36] Y. Huang and J. L. Beck, “Novel sparse Bayesian learning for struc- 120
55 58 (FIT), 2014 12th International Conference on, pp. 28–33, IEEE, 2014. tural health monitoring using incomplete modal data,” in Proceedings 121
56 59 [13] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on of the 2013 ASCE International Workshop on Computing in Civil 122
57 60 large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107– Engineering, Los Angeles, 2013. 123
58 61 113, 2008. [37] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to 124
59
60
61
62
63
64
65
18
1
2
3 1 knowledge discovery in databases,” AI magazine, vol. 17, no. 3, p. 37, [58] D. Arditi and T. Pulket, “Predicting the outcome of construction 65
2 1996. litigation using boosted decision trees,” Journal of Computing in Civil 66
4 Engineering, vol. 19, no. 4, pp. 387–393, 2005. 67
3 [38] L. Wang and F. Leite, “Knowledge discovery of spatial conflict
5 4 resolution philosophies in BIM-enabled mep design coordination using [59] J.-H. Chen and S. Hsu, “Hybrid ann-cbr model for disputed change 68
6 5 data mining techniques: a proof-of-concept,” in Proceedings of the orders in construction projects,” Automation in Construction, vol. 17, 69
7 6 ASCE International Workshop on Computing in Civil Engineering no. 1, pp. 56–64, 2007. 70
8 7 (WCCE13), pp. 419–426, 2013. [60] M. Siu, R. Ekyalimpa, M. Lu, and S. Abourizk, “Applying regres- 71
9 8 [39] M. Kantardzic, Data mining: concepts, models, methods, and algo- sion analysis to predict and classify construction cycle time,” Proc., 72
9 rithms. John Wiley & Sons, 2011. Computing in Civil Engineering (2013). 73
10
10 [40] M. J. Berry and G. S. Linoff, Data mining techniques: for marketing, [61] A. Aibinu and G. Jagboro, “The effects of construction delays on 74
11 sales, and customer relationship management. John Wiley & Sons,
11 project delivery in nigerian construction industry,” International jour- 75
12 12 2004. nal of project management, vol. 20, no. 8, pp. 593–599, 2002. 76
13 13 [41] T. Rujirayanyong and J. J. Shi, “A project-oriented data warehouse for [62] M. Sambasivan and Y. W. Soon, “Causes and effects of delays 77
14 14 construction,” Automation in Construction, vol. 15, no. 6, pp. 800–807, in malaysian construction industry,” International Journal of project 78
15 15 2006. management, vol. 25, no. 5, pp. 517–526, 2007. 79
16 16 [42] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, “Data mining with big data,” [63] S. M. Trost and G. D. Oberlender, “Predicting accuracy of early cost 80
17 17 Knowledge and Data Engineering, IEEE Transactions on, vol. 26, estimates using factor analysis and multivariate regression,” Journal of 81
18 no. 1, pp. 97–107, 2014. Construction Engineering and Management, vol. 129, no. 2, pp. 198– 82
18 204, 2003.
19 [43] R. Buchheit, J. Garrett Jr, S. Lee, and R. Brahme, “A knowledge 83
19 20 discovery case study for the intelligent workplace,” Proc., Computing [64] A. P. Chan, D. W. Chan, Y. Chiang, B. Tang, E. H. Chan, and K. S. 84
20 21 in Civil Engineering, pp. 914–921, 2000. Ho, “Exploring critical success factors for partnering in construction 85
21 22 [44] L. Soibelman and H. Kim, “Data preparation process for construction projects,” Journal of Construction Engineering and Management, 86
22 23 knowledge generation through knowledge discovery in databases,” vol. 130, no. 2, pp. 188–198, 2004. 87
23 24 Journal of Computing in Civil Engineering, 2002. [65] D. Fang, Y. Chen, and L. Wong, “Safety climate in construction indus- 88
24 25 [45] C.-W. Liao and Y.-H. Perng, “Data mining for occupational injuries try: A case study in hong kong,” Journal of construction engineering 89
26 in the taiwan construction industry,” Safety Science, vol. 46, no. 7, and management, 2006. 90
25 27 pp. 1091–1102, 2008.
26 [66] K. Pietrzyk, “A systemic approach to moisture problems in buildings 91
28 [46] C.-W. Cheng, S.-S. Leu, Y.-M. Cheng, T.-C. Wu, and C.-C. Lin, for mould safety modelling,” Building and Environment, vol. 86, 92
27 29 “Applying data mining techniques to explore factors contributing pp. 50–60, 2015. 93
28 30 to occupational injuries in taiwan’s construction industry,” Accident [67] V. S. Desai and S. Joshi, “Application of decision tree technique to 94
29 31 Analysis & Prevention, vol. 48, pp. 214–222, 2012. analyze construction project data,” in Information Systems, Technology 95
30 32 [47] S.-W. Oh, M.-H. Kim, and Y.-S. Kim, “The application of data ware- and Management, pp. 304–313, Springer, 2010. 96
31 33 house for developing construction productivity management system,” [68] H.-B. Liu and Y.-B. Jiao, “Application of genetic algorithm-support 97
34 Korean Journal of Construction Engineering and Management, vol. 7, vector machine (ga-svm) for damage identification of bridge,” Interna-
32 35 no. 2, pp. 127–137, 2006.
98
tional Journal of Computational Intelligence and Applications, vol. 10, 99
33 36 [48] D. Koonce, L. Huang, and R. Judd, “EQL an express query language,” no. 04, pp. 383–397, 2011. 100
34 37 Computers & industrial engineering, vol. 35, no. 1, pp. 271–274, 1998. [69] T. Mahfouz, J. Jones, and A. Kandil, “A machine learning approach 101
35 38 [49] W. Mazairac and J. Beetz, “BIMQL–an open query language for for automated document classification: A comparison between svm 102
36 39 building information models,” Advanced Engineering Informatics, and lsa performances,” International Journal of Engineering Research 103
37 40 vol. 27, no. 4, pp. 444–456, 2013. & Innovation, p. 53, 2010. 104
38 41 [50] L. Soibelman, L. Y. Liu, and J. Wu, “Data fusion and modeling [70] T. Mahfouz and A. Kandil, “Construction legal decision support using 105
42 for construction management knowledge discovery,” in International support vector machine (svm),” Proc. of the CRC 2010: Innovation 106
39 Conference on Computing in Civil and Building Engineering, Weimar,
43 for Reshaping Construction, 2010. 107
40 44 Germany, 2004.
[71] D. Dehestani, F. Eftekhari, Y. Guo, S. Ling, S. Su, and H. Nguyen, 108
41 45 [51] I. H. Witten and E. Frank, Data Mining: Practical machine learning “Online support vector machine application for model based fault 109
42 46 tools and techniques. Morgan Kaufmann, 2005. detection and isolation of hvac system,” International Journal of 110
43 47 [52] J. E. Diekmann and T. A. Kruppenbacher, “Claims analysis and Machine Learning and Computing, vol. 1, no. 1, p. 66, 2011. 111
44 48 computer reasoning,” Journal of construction engineering and man- [72] Q. Chen, Y. Chan, and K. Worden, “Structural fault diagnosis and iso- 112
45 49 agement, vol. 110, no. 4, pp. 391–408, 1984. lation using neural networks based on response-only data,” Computers 113
50 [53] D. Arditi, F. E. Oksay, and O. B. Tokdemir, “Predicting the outcome of & structures, vol. 81, no. 22, pp. 2165–2172, 2003. 114
46
51 construction litigation using neural networks,” Computer-Aided Civil [73] T. Marwala and S. Chakraverty, “Fault classification in structures with 115
47 52 and Infrastructure Engineering, vol. 13, no. 2, pp. 75–81, 1998. incomplete measured data using autoassociative neural networks and 116
48 53 [54] M. P. Kim, “Us army corps engineers construction contract claims genetic algorithm,” Bangalore-Current Science, vol. 90, no. 4, p. 542, 117
49 54 guidance system,” in Excellence in the Constructed Project, pp. 203– 2006. 118
50 55 209, ASCE, 1989. [74] X. Fang, H. Luo, and J. Tang, “Structural damage detection using neu- 119
51 56 [55] K. Chau, “Predicting construction litigation outcome using particle ral network with learning rate improvement,” Computers & structures, 120
52 57 swarm optimization,” in Innovations in Applied Artificial Intelligence, vol. 83, no. 25, pp. 2150–2161, 2005. 121
58 pp. 571–578, Springer, 2005. [75] O. Moselhi, T. Hegazy, and P. Fazio, “Neural networks as tools in
53 122
59 [56] O. B. Tokdemir, “Using case-based reasoning to predict the outcome construction,” Journal of Construction Engineering and Management, 123
54 60 of construction litigation,” Computer-Aided Civil and Infrastructure 1991. 124
55 61 Engineering, vol. 14, no. 6, pp. 385–393, 1999. [76] Y.-J. Chen, C.-W. Feng, Y.-R. Wang, H.-M. Wu, et al., “Using BIM 125
56 62 [57] K. Chau, “Prediction of construction litigation outcome–a case-based model and genetic algorithms to optimize the crew assignment for 126
57 63 reasoning approach,” in Advances in Applied Artificial Intelligence, construction project planning,” International Journal of Technology, 127
59
60
61
62
63
64
65
19
1
2
3 1 [77] H. Moon, H. Kim, L. Kang, and C. Kim, “BIM functions for [97] H. Fan, F. Xue, and H. Li, “Project-based as-needed information 66
2 optimized construction management in civil engineering,” Gerontech- retrieval from unstructured AEC documents,” Journal of Management 67
4 3 nology, vol. 11, no. 2, p. 67, 2012. in Engineering, vol. 31, no. 1, p. A4014012, 2014. 68
5 4 [78] D. M. Salama and N. M. El-Gohary, “Semantic text classification for [98] J.-y. Hsu et al., “Content-based text mining technique for retrieval of 69
6 5 supporting automated compliance checking in construction,” Journal cad documents,” Automation in Construction, vol. 31, pp. 65–74, 2013. 70
7 6 of Computing in Civil Engineering, p. 04014106, 2013. [99] H.-T. Lin, N.-W. Chi, and S.-H. Hsieh, “A concept-based information 71
8 7 [79] T. Mahfouz, “Unstructured construction document classification model retrieval approach for engineering domain-specific technical docu- 72
9 8 through support vector machine (SVM),” pp. 126–133, 2011. ments,” Advanced Engineering Informatics, vol. 26, no. 2, pp. 349– 73
39 43 struction, vol. 42, pp. 36–49, 2014. developing performance prediction models,” in Computing in Civil 107
and Building Engineering (2014), pp. 1222–1229, ASCE. 108
40 44 [91] M. Al Qady and A. Kandil, “Concept relation extraction from con-
struction documents using natural language processing,” Journal of [110] F. Castronovo, S. Lee, D. Nikolic, and J. I. Messner, “Visualization 109
41 45
in 4d construction management software: A review of standards and 110
42 46 Construction Engineering and Management, vol. 136, no. 3, pp. 294–
47 302, 2009. guidelines,” in In: Proceedings of the International Conference on 111
43 Computing in Civil and Building Engineering. Orlando USA., pp. 315– 112
48 [92] J. Zhang and N. M. El-Gohary, “Semantic NLP-based information 322, 2014. 113
44 49 extraction from construction regulatory documents for automated
45 50 compliance checking,” Journal of Computing in Civil Engineering, [111] S. Goodwin and J. Dykes, “Visualising variations in household energy 114
52 58 tion,” in Proc., 2012 ASCE Construction Research Congress (CRC), cloud approach to unified lifecycle data management in architecture, 122
57 64 map model in construction industry,” Journal of Civil Engineering and Education, Research and Practice, pp. 570–78, 2010. 128
58 65 Management, vol. 16, no. 3, pp. 332–344, 2010. [115] Y. Jiao, S. Zhang, Y. Li, Y. Wang, and B. Yang, “Towards cloud 129
59
60
61
62
63
64
65
20
1
2
3 1 augmented reality for construction application by BIM and SNS case study approach,” Automation in construction, vol. 24, pp. 149– 65
2 integration,” Automation in Construction, vol. 33, pp. 37–47, 2013. 159, 2012. 66
4
3 [116] P. X. Gao and S. Keshav, “Optimal personal comfort management [133] J. D. Goedert and P. Meadati, “Integrating construction process doc- 67
5 4 using spot+,” in Proceedings of the 5th ACM Workshop on Embedded umentation into building information modeling,” Journal of construc- 68
6 5 Systems For Energy-Efficient Buildings, pp. 1–8, ACM, 2013. tion engineering and management, vol. 134, no. 7, pp. 509–516, 2008. 69
7 6 [117] A. Rabbani and S. Keshav, “The spot* system for flexible personal [134] U. Isikdag, J. Underwood, G. Aouad, and N. Trodd, “Investigating the 70
8 7 heating and cooling,” in Proceedings of the 2015 ACM Sixth Inter- role of building information models as a part of an integrated data 71
9 8 national Conference on Future Energy Systems, pp. 209–210, ACM, layer: a fire response management case,” Architectural Engineering 72
9 2015. and Design Management, vol. 3, no. 2, pp. 124–142, 2007. 73
10
11 10 [118] A. A. Panagopoulos, M. Alam, A. Rogers, and N. R. Jennings, [135] N. Yu, Y. Jiang, L. Luo, S. Lee, A. Jallow, D. Wu, J. I. Messner, 74
11 “Adaheat: A general adaptive intelligent agent for domestic heating R. M. Leicht, and J. Yen, “Integrating BIMserver and openstudio for 75
12 12 control,” in Proceedings of the 2015 International Conference on energy efficient building,” in Proceedings of 2013 ASCE International 76
13 13 Autonomous Agents and Multiagent Systems, pp. 1295–1303, Inter- Workshop on Computing in Civil Engineering, 2013. 77
14 14 national Foundation for Autonomous Agents and Multiagent Systems, [136] J. Zhang, Q. Liu, F. Yu, Z. Hu, and W. Zhao, “A framework of cloud- 78
15 15 2015. computing-based BIM service for building lifecycle,” Computing in 79
16 16 [119] C. Chen and D. J. Cook, “Behavior-based home energy prediction,” in Civil and Building Engineering, pp. 1514–1521, 2014. 80
17 Intelligent Environments (IE), 2012 8th International Conference on, [137] J. Wong, X. Wang, H. Li, G. Chan, and H. Li, “A review of cloud-based
17 pp. 57–63, IEEE, 2012.
81
18 BIM technology in the construction sector,” The Journal of Information 82
18 Technology in Construction, vol. 19, pp. 281–291, 2014.
19 [120] S. Taneja, A. Akcamete, B. Akinci, J. Garrett, L. Soibelman, and E. W. 83
19 20 East, “Analysis of three indoor localization technologies to support [138] Y. Hsieh, “Cloud-computing based parameter identification system– 84
20 21 facility management field activities,” in Proceedings of the Interna- with applications in geotechnical engineering,” in Computing in Civil 85
21 22 tional Conference on Computing in Civil and Building Engineering, and Building Engineering (2014), pp. 1384–1392, ASCE. 86
23 24 [121] H. S. Ng and L. Soibelman, “Knowledge discovery in maintenance Engineering SaaS web application framework,” in Proceedings of the 88
25 databases: Enhancing the maintainability in higher education facili- 2014 International Conference on Computing in Civil and Building 89
24 ties,” in Proc., 2003 Construction Research Congress, ASCE, Reston,
26 Engineering, pp. 23–25, 2014. 90
25 27 Va, 2003.
[140] B. Kumar, J. C. Cheng, and L. McGibbney, “Cloud computing and its 91
26 28 [122] R. Liu and R. Issa, “Issues in BIM for facility management from implications for construction it,” in Computing in Civil and Building 92
27 29 industry practitioners perspectives,” Computing in Civil Engineering Engineering, Proceedings of the International Conference, vol. 30, 93
29 31 [123] J. Sanyal and J. New, “Simulation and big data challenges in tuning [141] A. Redmond, A. V. Hore, and R. West, “Building support for cloud 95
30 32 building energy models,” in Modeling and Simulation of Cyber- computing in the Irish construction industry,” 2010. 96
33 Physical Energy Systems (MSCPES), 2013 Workshop on, pp. 1–6, [142] F. Fathi, M. Abedi, S. Rambat, S. Rawai, M. Z. Zakiyudin, et al.,
31 34 IEEE, 2013.
97
“Context-aware cloud computing for construction collaboration,” Jour- 98
32 35 [124] O. Linda, D. Wijayasekara, M. Manic, and C. Rieger, “Computational nal of Cloud Computing, vol. 2012, pp. 1–11, 2012. 99
33 36 intelligence based anomaly detection for building energy management [143] T. H. Beach, O. F. Rana, Y. Rezgui, and M. Parashar, “Cloud 100
34 37 systems,” in Resilient Control Systems (ISRCS), 2012 5th International computing for the architecture, engineering & construction sector: 101
35 38 Symposium on, pp. 77–82, IEEE, 2012. requirements, prototype & experience,” Journal of Cloud Computing, 102
36 39 [125] I. Hong, J. Byun, and S. Park, “Cloud computing-based building vol. 2, no. 1, pp. 1–16, 2013. 103
37 40 energy management system with zigbee sensor network,” in Innovative [144] H.-Y. Chong, J. S. Wong, and X. Wang, “An explanatory case study on 104
41 Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 cloud computing applications in the built environment,” Automation in 105
38 42 Sixth International Conference on, pp. 547–551, IEEE, 2012. Construction, vol. 44, pp. 152–162, 2014. 106
39 43 [126] R. P. Singh, S. Keshav, and T. Brecht, “A cloud-based consumer-centric [145] A. Grilo and R. Jardim-Goncalves, “Cloud-marketplaces: Distributed 107
40 44 architecture for energy data analytics,” in Proceedings of the fourth e-procurement for the AEC sector,” Advanced Engineering Informat- 108
41 45 international conference on Future energy systems, pp. 63–74, ACM, ics, vol. 27, no. 2, pp. 160–172, 2013. 109
42 46 2013.
[146] T. Elghamrawy and F. Boukamp, “Managing construction informa- 110
43 47 [127] M. Berges, E. Goldman, H. S. Matthews, and L. Soibelman, “Learning tion using rfid-based semantic contexts,” Automation in construction, 111
44 48 systems for electric consumption of buildings,” in ASCI international vol. 19, no. 8, pp. 1056–1066, 2010. 112
49 workshop on computing in civil engineering, vol. 38, 2009.
45 [147] E. Curry, J. ODonnell, E. Corry, S. Hasan, M. Keane, and S. ORiain, 113
50 [128] C. Wei and Y. Li, “Design of energy consumption monitoring and “Linking building data in the cloud: Integrating cross-domain building 114
46 energy-saving management system of intelligent building based on
51 data using linked data,” Advanced Engineering Informatics, vol. 27, 115
47 52 the internet of things,” in Electronics, Communications and Control no. 2, pp. 206–219, 2013. 116
48 53 (ICECC), 2011 International Conference on, pp. 3650–3652, IEEE,
[148] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, 117
49 54 2011.
“Internet of things for smart cities,” Internet of Things Journal, IEEE, 118
50 55 [129] R. Volk, J. Stengel, and F. Schultmann, “Building information mod- vol. 1, no. 1, pp. 22–32, 2014. 119
56 eling (BIM) for existing buildingsliterature review and future needs,”
51 57 Automation in Construction, vol. 38, pp. 109–127, 2014.
[149] A. Khan and K. Hornbæk, “Big data from the built environment,” in 120
58 64 [132] K. Barlish and K. Sullivan, “How to measure the benefits of BIMa vol. 1, pp. 415–447, 2013. 128
59
60
61
62
63
64
65
21
1
2
3 1 [152] C. Miller, Z. Nagy, and A. Schlueter, “Automated daily pattern filtering [173] O. Tene and J. Polonetsky, “Big data for all: Privacy and user control 66
2 of measured building performance data,” Automation in Construction, in the age of analytics,” Nw. J. Tech. & Intell. Prop., vol. 11, p. xxvii, 67
4 3 vol. 49, pp. 1–17, 2015. 2012. 68
5 4 [153] G. Williams, M. Gheisari, P.-J. Chen, and J. Irizarry, “BIM2MAR: An
6 5 efficient BIM translation to mobile augmented reality applications,”
7 6 Journal of Management in Engineering, 2014.
8 7 [154] B. Zhadanovsky and S. Sinenko, “Visualization of design, organization
9 8 of construction and technological solutions,” in Computing in Civil and
9 Building Engineering (2014), pp. 137–142, ASCE.
10
10 [155] A. Motamedi, Improving Facilities Lifecycle Management Using RFID
11 11 Localization And BIM-Based Visual Analytics. PhD thesis, Concordia
12 12 University, 2013.
13 13 [156] N. Gu and K. London, “Understanding and facilitating BIM adoption
14 14 in the AEC industry,” Automation in construction, vol. 19, no. 8,
15 15 pp. 988–999, 2010.
16 16 [157] C. Chiang, T. Ho, and C. Chou, “A BIM-enabled platform for
17 power consumption data collection and analysis,” Computing in Civil
17 18 Engineering, pp. 90–98, 2012.
18 19 [158] K.-C. Yeh, M.-H. Tsai, and S.-C. Kang, “On-site building information
19 20 retrieval by using projection-based augmented reality,” Journal of
20 21 Computing in Civil Engineering, 2012.
21 22 [159] J. Bughin, M. Chui, and J. Manyika, “Clouds, big data, and smart as-
22 23 sets: Ten tech-enabled business trends to watch,” McKinsey Quarterly,
24 vol. 56, no. 1, pp. 75–86, 2010.
23
25 [160] A. Redmond, A. Hore, M. Alshawi, and R. West, “Exploring how
24 26 information exchanges can be enhanced through cloud BIM,” Automa-
25 27 tion in Construction, vol. 24, pp. 175–183, 2012.
26 28 [161] C. Amarnath, A. Sawhney, and J. U. Maheswari, “Cloud computing
27 29 to enhance collaboration, coordination and communication in the con-
28 30 struction industry,” in Information and Communication Technologies
31 (WICT), 2011 World Congress on, pp. 1235–1240, IEEE, 2011.
29
32 [162] N. M. Rawai, M. S. Fathi, M. Abedi, and S. Rambat, “Cloud comput-
30 33 ing for green construction management,” in Intelligent System Design
31 34 and Engineering Applications (ISDEA), 2013 Third International
32 35 Conference on, pp. 432–435, IEEE, 2013.
33 36 [163] G. Kortuem, F. Kawsar, D. Fitton, and V. Sundramoorthy, “Smart ob-
34 37 jects as building blocks for the internet of things,” Internet Computing,
38 IEEE, vol. 14, no. 1, pp. 44–51, 2010.
35
39 [164] H.-A. Jacobsen, R. H. Katz, H. Schmeck, and C. Goebel, “Smart build-
36 40 ings and smart grids (dagstuhl seminar 15091),” Dagstuhl Reports,
37 41 vol. 5, no. 2, 2015.
38 42 [165] S. Rankohi and L. Waugh, “Review and analysis of augmented reality
39 43 literature for construction industry,” Visualization in Engineering,
40 44 vol. 1, no. 1, pp. 1–18, 2013.
41 45 [166] H.-L. Chi, S.-C. Kang, and X. Wang, “Research trends and opportu-
46 nities of augmented reality applications in architecture, engineering,
42 47 and construction,” Automation in construction, vol. 33, pp. 116–122,
43 48 2013.
44 49 [167] B. Bossink and H. Brouwers, “Construction waste: quantification and
45 50 source evaluation,” Journal of construction engineering and manage-
46 51 ment, vol. 122, no. 1, pp. 55–60, 1996.
47 52 [168] V. W. Tam, C. M. Tam, S. Zeng, and W. C. Ng, “Towards adoption
53 of prefabrication in construction,” Building and environment, vol. 42,
48 54 no. 10, pp. 3642–3654, 2007.
49 55 [169] P. Pauwels and W. Terkaj, “Express to owl for construction industry:
50 56 towards a recommendable and usable ifcowl ontology,” Automation in
51 57 Construction, vol. 63, pp. 100–133, 2016.
52 58 [170] J. Beetz, J. Van Leeuwen, and B. De Vries, “Ifcowl: A case of
53 59 transforming express schemas into ontologies,” Artificial Intelligence
60 for Engineering Design, Analysis and Manufacturing, vol. 23, no. 01,
54 61 pp. 89–101, 2009.
55 62 [171] R. Wong, “Big data privacy,” Journal of Information Technology and
56 63 Software Engineering, vol. 2, p. e114, 2012.
57 64 [172] A. Cavoukian and J. Jonas, Privacy by design in the age of big data.
58 65 Information and Privacy Commissioner of Ontario, Canada, 2012.
59
60
61
62
63
64
65
22
1
2
3 1 L IST OF F IGURES
4 2 1 Proposed Review Structure of the Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 3 2 An overview of MapReduce processing [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 4 3 Apache Spark and Related Technology Stack [15] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7 5 4 Databases Popularity Trend [19] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8 6 5 Multidisciplinary Nature of Big Data Analytics (Adapted from [28]) . . . . . . . . . . . . . . . . . . . . . . . . . 27
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
FIGURES 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48 Fig. 1. Proposed Review Structure of the Paper
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
FIGURES 24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Fig. 2. An overview of MapReduce processing [12]
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
FIGURES 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36 Fig. 3. Apache Spark and Related Technology Stack [15]
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
FIGURES 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 Fig. 4. Databases Popularity Trend [19]
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
FIGURES 27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35 Fig. 5. Multidisciplinary Nature of Big Data Analytics (Adapted from [28])
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
FIGURES 28
1
2
3 1 L IST OF TABLES
4 I Prominent NoSQL Systems and Their Critical Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2339
8 V Classification-based works for the construction industry, categorized per classification technique . . . . . . . . . . 30 2343
15 XII Classification-based works for the construction industry, categorized per classification technique . . . . . . . . . . 37 2350
8 HBase HBase is distributed data store that extended Google Bigtable to scale on HDFS. Its novelty
Columnar
Java Locks Hadoop
Consistent,
Key-Value Partition Tolerance
9 lies in storing and accessing data with random access. It doesn‘t restrict the kind of data
being stored.
10
Disk
11 Hadoop Consistent,
HyperTable Hypertable supports data distribution for scalable data management. It offers maximum Columnar C++ MVCC
12 GlusterFS, Partition Tolerance
efficiency and superior performance. However, it lacks data management features such as
Kosmos File System
transaction and join processing.
13
Document Disk,GFS, Consistent,
14 MongoDB MongoDB is a document-oriented database. It facilitates storage of documents with
Key-Value
C++ Locks
Plugin Partition Tolerance
variable schemas and is suitable for applications, storing complex types.
15
Document Erlang, High Availability,
16 CouchDB CouchDB is suitable for large scale web and mobile applications. It facilitate data storage
Key-Value C
MVCC Disk
Partition Tolerance
that are queried through web browsers, via HTTP. JavaScript is used to index, integrate,
17 and transform the database.
18 C, GFS Consistent,
19 MarkLogic MarkLogic facilitates storing documents efficiently for easy and intuitive search. It is Document Java, ACID Hadoop High Availability,
suitable for applications that derive revenue, streamline operations, risk management, and Python S3, RDF Partition Tolerance
20 security.
21 Redis Redis is in-memory system that can be used as a database, cache, and message broker. Key-Value
ANSI
Locks RAM
Consistent,
22 C Partition Tolerance
When configured on cluster, it becomes scalable and highly available. It also supports
transaction processing.
23
High Availability,
24 Riak Riak is a distributed database that provides scalability and high availability. It achieves Key-Value Erlang ACID Disk, Plugin
Partition Tolerance
performance and fault tolerance through built-in distribution and replications.
25
Consistency,
26 BarkeleyDB Berkeley DB is embedded database for key-value dataset. It is written in C but supports Key-Value Java ACID RDF Availability,
27 application development for C++, PHP, Java, Perl, among others. Partition-Tolerance
28 Neo4J Neo4J is a semantic store for creating, updating, deleting, and retrieving graph data. It Graph Java Locks Disk
High Availability,
Partition Tolerance
29 captures relationships natively and processes queries as paths through language called
Cypher. Neo4J is good option for applications, dealing with connected data.
30
Disk, Consistent,
31 OrientDB OrientDB is a system for large-scale and distributed graph management. The core features Graph Java ACID Plug-in High Availability,
include multi-master replication and sharding. RAM, SSD Partition Tolerance
32
33 Columnar Consistency,
Oracle Document Berkeley DB Availability,
Oracle NoSQL is designed specifically to provide highly reliability, scalability, and Java ACID
34 NoSQL
maximum availability across the cluster of storage nodes. Data is replicated to survive
Key-Value Architecture, RDF Limited
Graph Partition-Tolerance
35 rapid failure and load balancing for distributed query processing.
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
TABLES 30
1
2 TABLE II. B IG DATA A NALYTICS (BDA) ML T OOLS
3
4 Supported Supported
Tool Name Description ML at Scale
5 Languages Algorithms
6 -Collaborative Filtering
7 Apache Mahout is an open-source machine learning framework for quickly writing scalable and high -Java -Classification
Yes
Mahout [25] performance ML applications. -Scala. -Clustering
8 -Regression
9 -Collaborative Filtering
R is an open-source programming language for statistical analysis. R is extremely extensible. With
10 R [26] huge developer base, thousands of R packages are available to provide variety of functionalities.
Many
Yes
-Classification
11 languages -Clustering
The graphics supported by R are highly polished and very powerful.
-Regression
12
Spark has constituted a novel ML platform called MLbase, which has brought together highly
13 robust ML components, such as ML optimizer, MLI, and MLlib, to support the full lifecycle -Java
-Collaborative Filtering
-Classification
14 MLbase [27] activities, required to implement as well as use ML algorithms. ML optimizer automates the tasks -Scala Yes
-Clustering
of ML pipeline construction to efficiently search algorithms of MLI and MLlib. MLI is the API to -Python
15 develop ML algorithms using high-level constructs. MLlib is the Spark distributed ML library.
-Regression
16
Oryx is an open-source ML library that has evolved over time out of the libraries and toolkits -Collaborative Filtering
17 developed by Cloudera. Based on the distributed input from HDFS, it builds predictive models -Classification
Oryx [14] -Java Yes
18 that are written to output in predictive model markup language (PMML). An interesting feature -Clustering
of Oryx is its ability to keep the model updated under emerging streams of data from Hadoop. -Regression
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
TABLES 31
1
2 TABLE III. S UMMARY OF WORKS WITH STATISTICAL METHODS
3
4 Purpose of use Technique(s) employed References
5 -Frequency charts
6 Identifying the causes of -Correlation matrix
[31]
construction delays -Factor analysis
7 -Bayesian networks
8
Learning from -Link analysis
9 [32]
post project reviews (PPRs) -Dimensional matrix analysis
10 -Naı̈ve Bayes
11 Decision support
-Decision trees [33]
systems for construction litigation
12 -Rule inductive
Response to Reviewers
1
2
3 Big Data in the Construction Industry: A Review of Present Status,
4 Opportunities, and Future Trends
5
6
7 Advanced Engineering Informatics
8
9
10 We would like to firstly thank the anonymous reviewers for their valuable and constructive
11
12 comments, which helped us in improving the paper at several places. We took each comment
13
14 into consideration and did the necessary modifications in the revised version.
15
16
17 In this document, we provide a point-by-point reply to the reviewer recommendations. To
18
19 improve the readability of revised manuscript, we have used different colour formatting in the
20
21 revised manuscript. Please note that this colour coding is not relevant while reading this
22
23 rebuttal document.
24
25
26 Red colour—text where correction are made to improve the English and grammar of
27
28 manuscript;
29
30 Green colour—paragraphs where text is reduced to shorten the paper;
31
32 Purple colour—Places where noun phrases are rephrased to improve readability of the
33
34 paper as highlighted by the reviewers;
35
36
37 Responses to reviewers’ comments are given below in the table:
38
39
40
41 Reviewers’ Comments Revision Made
42
43
44 REVIEWER 1
45
46
47
48 Not all comments were Thanks for the comment, this comment is fully implemented.
49
50 addressed.
51
52 o The subsection on SHM is removed in the revised
53
54 1 I suggest removing all manuscript. See Page 13: line number 24.
55
56 topics outside of the
57
58 construction industry (for
59
60
61
62
63
64
65
example SHM)
1
2
3
4
5
6
7 Instead of shortening the Many thanks for pointing this out; we have now addressed it.
8
9 2 paper, the paper seems to We removed some sections and shortened others to reduce the
10
11 have become longer. overall size of the paper. This has resulted in the reduction of
12
13 one and a half page from the revised manuscript. The places
14
15
where text is edited to reduce the size is coloured in GREEN in
16 the revised manuscript.
17
18
19 Page 4 line numbers 70-85;
20
21 Page 6 line numbers 13-58;
22
23 Page 7 line numbers 1-16;
24
25 Page 11 line numbers 3-59;
26
27 Page 12 line numbers 1-21 & 100-110;
28
29 Page 13 line numbers 1-33;
30
31 Page 16 line numbers 100-109;
32
33 Page 17 line numbers 1-42;
34
35
36
37
38
39
40 The authors need to reread Noun phrase “Construction industry Works using …” is
41
42 the text and improve the rephrased as “Examples of Construction Research using …”.
43
44 English. These changes are coloured in PURPLE in the revised
45
46
manuscript. See the following places:
47
48
49 3
Even the reply document Page 3 line number 49;
50 has English errors Page 4 line number 67;
51
52 indicating that the authors Page 6 line numbers 6 & 40;
53
54 need to ask a native English Page 7 line number 57;
55
speaker to read and correct
56 Page 8 line numbers 36 & 82;
57
58 Page 9 line number 4;
59
60
61
62
63
64
65
the final text. Page 10 line numbers 15, 52, 87, & 103;
1
2 Page 12 line numbers 1, 32, & 65;
3
4
5
For example, the noun
6 phrase "Construction To improve the English, the manuscript is carefully read and
7
8 Industry Works" can be changes are made at many places accordingly. These changes
9
10 improved. are coloured in RED in the revised manuscript. Every page has
11
12 many such corrections.
13
14
15
Another example Page 1-25, excluding the page 14.
16 "Personalised services
17
18 always require to scan"
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65