Leveraging Operational Data for Intelligent Decision Support ...
LEVERAGING OPERATIONAL DATA FOR INTELLIGENT DECISION SUPPORT IN
CONSTRUCTION EQUIPMENT MANAGEMENT
Research Proposal for the Degree of Doctor of Philosophy in
Construction Engineering and Management
Hole School of Construction Engineering
Department of Civil and Environmental Engineering
University of Alberta
February 24th, 2006
Dr. Simaan M. AbouRizk
Dept. of Civil and Environmental Engineering,
The University of Alberta, Edmonton, Canada T6G 2W2,
3-133 Markin/CNRL Natural Resources Engineering Facility,
Tel: +1 780/492-4235,Fax: +1 780/492-0249,
Dr. Hyoungkwan Kim
School of Civil and Environmental Engineering, Yonsei University,
134 Shinchon-dong, Seodaemun-gu, Seoul 120-749, Korea,
Tel: +82 2/2123-2799, Fax: +82 2/364-5300,
Leveraging Operational Data for Intelligent Decision Support in
Construction Equipment Management
Computerized construction equipment management has greatly simplified the daily tasks including
equipment tracking, maintenance, repair and operations. Recent developments in computer hardware
and software technologies, coupled with the modern controls in construction equipment further
empower the contractors with advanced features of automatic data collection, integrated systems,
parameter-driven reporting and automation of some managerial functions.
Researches in construction equipment management are mainly focused on the automation and robotic
technologies, real-time data communications and information processing [Arditi et al. 1997; Chen and
Liew 2003; LeBlond et al. 1998; Schexnayder and David 2002], statistical data analysis for decision
support [Gillespie and Hyde 2004; Lucko and Vorster 2002]. While various commercial solutions for
computerized construction equipment management are developed in a fast pace, the research works in
this area are limited due to the primary objective of such systems being designed to replace the
routine tasks in equipment management.
The premises of this research are two problems faced by large contractors in equipment management.
Firstly, the data collected using various computer systems or applications tend to be noisy,
heterogeneous and scattered, as a result, retrieving information from the data is difficult; secondly,
there exists no handy yet powerful computer tools to automatically uncover implicit knowledge from
the collected data for the purpose of decision support. The current construction equipment
management systems, being capable of generating a wide variety of customized reports, provide
limited features for information retrieval and knowledge discovery.
The proposed solutions for tackling these problems are emerging data warehousing and data mining
technologies in computer science. Data warehousing consolidates and re-organizes enterprise data
into a centralized data repository for efficient data analysis and information retrieval. Some other
benefits of data warehousing include improved data quality, integrated data, and analysis-friendly
structure. Data mining can automatically discover hidden rules/patterns or unusual behavior in the
data and explicitly represent the knowledge to user; data mining is also capable of predicting future
occurrence of events. From the technological viewpoint, both data warehousing and data mining
techniques can be integrated seamlessly with the current equipment management system.
In partnership with a large road building contractor in Canada and based on its current construction
equipment management system, “MTrack” developed by the NSERC/Alberta Construction Research
Chair [NSERC/Alberta Construction Research Chair 2005], this research will accomplish the
1. Build a prototype construction equipment data warehouse as the enterprise data source for
decision support. Explore the opportunities and challenges at different stages of data
warehousing, including planning, design and implementation, for equipment management.
2. Design and test of a novel nonparametric outlier mining algorithm for generic problem detection
in construction equipment data, as well as other engineering data. Testing, evaluation and
modification of current data mining algorithms for decision support in construction equipment
3. Design and implement the prototype intelligent equipment management system using integrated
--- Page 2 of 8 ---
equipment data warehouse and embedded data mining models; make recommendations on
system planning and design.
PROBLEM STATEMENT AND RESEARCH MOTIVATION
Academic researches and industrial developments in the area of construction equipment management
are largely focused at the operational level. Examples include automation controls of equipment
operations, real-time data collation and diagnostics, computer-aided equipment maintenance and
repair control, order processing and inventory control. These technologies enable the contractors to
capture the operational data and obtain various summary reports on equipment management in an
efficient manner. Nevertheless, the usability of the large amounts of data collected is undermined by a
number of problems as stated below:
1. Data quality is generally poor. Data in some applications or information systems, especially
legacy systems, contain lots of noises due to entry error and lack of a mechanism for validating
2. Data are scattered across different systems, applications, or departments, though they
characterize the same domain problem.
3. Data are not stored in a structure efficient for data analysis. Most data are stored in relational
databases, spreadsheets, text files etc. Answering unanticipated business questions based on these
data repositories is technically challenging.
4. Lack of advanced computer tools for automatically discovering knowledge from the data. The
hidden rules/patterns or irregularities in the data are commonly uncovered by equipment
management using statistical tools in a trial-and-error approach.
Most data generated in construction equipment management operations is stored in relational
databases. Based on a relational database model, the transactional systems such as an equipment
management system are designed for efficient capturing of operational data. The process-oriented
transactional systems guarantee that the data are added and updated efficiently during daily operations;
however, it does not perform well for sophisticated data analyses. Extracting information out of a
transactional database requires building queries across different database objects and can only be
accomplished by database specialists. Using an operational system for decision support becomes even
more inefficient with today’s increasing data volume and complexity. To tackle the problem, the data
warehousing technology is employed to re-package and present the data in an integrated data repository
using a multi-dimensional data model.
Compared to a relational database, the data warehouse has two distinct features facilitating dynamic
decision support: subject-orientation and multi-dimensional structure. Subject-orientation means that data
models center around each subject, such as work order cost, fuel consumption etc. the subject-oriented
data model contains all the information on the subject; Multi-dimensional model has a star-shaped
structure with fact table in the center and a number of dimension tables surrounding and connecting to the
fact table. Such a data structure makes it possible to perform data analysis along any combination of
dimensions, and at various granularities for each subject. With a single equipment data warehouse, the data
collected across the enterprise are scrubbed, integrated and re-structured; the data warehouse can answer
various business questions by simple point-and-click and other visual operations. It also serves as a
universal data source for automated knowledge discovery and other analysis tasks on equipment data.
Knowledge buried in the data is valuable assets of a contractor. Traditional approaches such as statistical
analysis, visualization, and mathematical modeling become inefficient for large amounts of data. Our
interview with the collaborating contractor found that only a small portion of the data collected in
equipment management is used for direct decision support due to “lack of tools”. There is an urgent need
--- Page 3 of 8 ---
for converting data and scraps of information into knowledge using automated approaches.
Data mining is an interdisciplinary field with confluence of statistics, machine learning, database
technology, information science etc. and is capable of “extracting interesting (non-trivial, implicit,
previously unknown and potentially useful) information or patterns from data in large databases” [Han and
Kamber 2000]. Depending on its purposes, data mining are categorized as descriptive data mining and
predicative data mining. The former helps to better understand the data by uncovering the relationships
and patterns in the database, while the later is used to generate data-driven models for predication,
classification, forecasting etc.
Data mining model has the following advantages as compared with traditional mathematical model and
expert system: firstly, the model is data-driven, data mining models are obtained from data using specific
algorithms, the models are based on derived facts rather than expert opinions or personal experiences;
secondly, the data mining model may become the only viable solution when the system is too complex to
be described by other models.
Even though data mining is a well-researched area and has been applied in various industries, applying
data mining techniques to construction equipment management faces some specific challenges, such as
noisy data, dynamic changes of data, lack of pre-labelled data, and its exploratory nature. This research
will select some data mining algorithms and equipment management problems for in-depth investigation.
One of the data mining tasks in this research is outlier mining. Hawkins defines outlier as “an observation
which deviates so much from other observations as to arouse suspicions that it was generated by a
different mechanism” [Hawkins 1986]. Searching, sorting and ranking outliers in equipment database can
identify problems in equipment field operations, equipment performance, management decisions etc.;
however neither traditional statistical methods nor current outlier mining algorithms can provide flexible
and reliable solutions when applied to real-world datasets due to their stringent pre-assumptions on data
distributions or sensitivity of outlier mining results to the input parameters. Based on the idea of resolution
change used by Andrew and Zaiane [Foss and Zaiane 2002] in a non-parametric clustering algorithm, this
research will explore the design and implementation of a non-parametric outlier mining algorithm, for
generic problem detection in engineering data, such as equipment data.
Prediction of a continuous variable based on a number of known attributes, of either categorical or
continuous value, is a mundane problem in construction equipment management. Traditional solutions,
such as statistical analysis, artificial neural network etc., suffer from problems such as inaccurate results,
black-box model or hard system integration. AutoRegressive Tree (ART) technique proposed by Meek et
al.  provides a satisfactory solution by overcoming these problems using a transparent data mining
model. Other data mining tasks such as time-series forecasting of equipment cost will also be studied in
A comprehensive review on construction equipment automation was conducted by Chen and Liew [Chen
and Liew 2003]. The authors pointed out the automation and robotic technologies are one important
research areas in construction since 1980 to overcome problems in safety, quality, productivity and
competition. Other resources also reported innovation in construction equipment and its influences to the
industry [LeBlond et al. 1997; Jahren 2000].
A pilot research project and an investigation into the application of data warehousing technology in
construction were conducted by Chau et al. (2002) at the University of Hong Kong. The authors built a
decision support system based on Online Analytical Processing (OLAP) for inventory management of
construction materials. Ma et al [Ma et al. 2004] applied data warehousing technique to improve document
management in construction for multi-party multi-purpose use.
--- Page 4 of 8 ---
In outlier mining, Knorr and Ng first introduced distance-based outlier and DB (p,D)-outlier mining
algorithm [Knorr and Ng 1998], that can efficiently deal with multi-dimensional, large datasets. The
problem with the DB-outlier definition is that it cannot cope with datasets containing clusters with
significantly different densities. To overcome this deficiency, Breunig et al. proposed a definition of Local
Outlier Factor (LOF) and LOF-based outlier mining algorithm [Breunig et al. 2000]. Other researchers
have also suggested improvements to the LOF method, such as the Connectivity-based Outlier Factor
(COF) proposed by Tang et al. [Tang et al. 2002] for cases of low density patterns.
Many researches in construction industry explored the application of data mining techniques for data
analysis and decision support. Soibelman and Kim  conducted a systematic research on data
preparation and the entire Knowledge Discovery in Database (KDD) process for construction knowledge
generation; as an example, the researchers applied the decision tree algorithm C4.5 for evaluation of
construction delays in pipeline installation. Caldas et al.  proposed an automated approach for
classification of construction documents through the integration of a model-based information system with
support Vector Machine (SVM). Lu and AbouRizk applied artificial neural network for estimating
construction productivity [Lu and AbouRizk 2000]; Wilmot and Mei conducted research on highway cost
estimation using neural network [Wilmot and Mei 2005]; Lee et al. investigated the application of decision
tree to classify and quantify cumulative impact of change orders on productivity [Lee et al. 2004].
An intelligent construction equipment management system based on MTrack is proposed as a workbench
throughout the research. An in-depth investigation will first be conducted on the techniques and challenges
of building an enterprise-wide equipment data warehouse using disparate operational data sources;
secondly automated knowledge discovery from operational data for decision support, will be explored
using various data mining techniques.
Data warehousing refers to all the processes needed to build up and implement the data warehouse. The
procedures for data warehousing include: (1) Identification of data sources - the data may come from
different operational systems, applications, or flat files; (2) Data staging, which usually involves data
Extraction, Transformation and Loading (ETL) from the heterogeneous sources to a consolidated data
warehouse; (3) Presentation of the data to the user through data access tools. The three steps are illustrated
in Figure 1.
External Data(Fueling Data Extraction,
records e-data etc.) Transformation and Loading
(ETL) Equipment Data
Misc. databases (Legacy
Accounting system, etc)
Figure 1. Procedures for building equipment data warehouse
--- Page 5 of 8 ---
The architectural design and multi-dimensional modeling of an equipment data warehouse will also be
explored in this research.
Architectural design: High level planning and design of the equipment data warehouse adopts the Data
Warehouse Bus (DWB) Architecture proposed by Kimball and Ross . A bus matrix depicting the
whole picture of the data warehouse is used to identify subjects for operational processes within the
enterprise and to obtain a master suite of standardized dimensions and facts that are uniformly interpreted
across the enterprise [Kimball and Ross 2002].
Multidimensional modeling: Multidimensional data models are to be designed based on the data
warehouse bus matrix. The best model structure is star schema, which includes a measurement fact table at
the center, with all the associated dimensions arranged around it. In the data warehouse, the star schema
represents an interested subject as a data cube, with all the numerical measurements in the central fact
table and all the descriptive attributes in the surrounding dimension tables. Questions of when, where,
who, etc., can be answered after the schema is transformed into a dimensional data cube. Proper modeling
of each data cube, with its underlying fact table and dimension tables, enables comprehensive data analysis
on an individual subject. All the stars in the system will collectively provide an integrated view of
equipment management performance.
Figure 2 shows the multidimensional data model for the subject “Repair Cost”, where the fact table in the
center consolidates all cost measurements while the surrounding dimensions contain different descriptive
attributes with various levels of detail.
PK man_wk PK,FK1 man_wk
PK,FK3 Dpt_wk the_Department
manID PK,FK4 rcType_wk
Manufacturer PK Dpt_wk
Labor dollar amount
Parts dollar amount
Total dollar amount
Figure 2. Multidimensional data model for subject “Repair Cost”
A non-parametric outlier mining algorithm will be proposed in this research for genetic problem detection
from engineering data. Based on the clustering algorithm TURN* [Foss and Zaiane 2002], an outlier factor
—Resolution Outlier Factor (ROF) and ROF-based outlier mining algorithm will be studied and tested in
this research using both synthetic and real world datasets. Instead of tracking statistical properties of each
--- Page 6 of 8 ---
cluster, the ROF-based outlier mining algorithm will track the isolativity of each data point based on its
behavior in merging into its neighboring clusters during resolution change. Preliminary test results show
the ROF definition and ROF-based algorithm performs better in real world dataset as compared with
current distance-based DB(p,D) outlier [Knorr and Ng 1998] and density-based LOF outlier [Breunig et al.
2000] mining algorithms. Figure 3 shows the proposed flowchart for the algorithm.
Start from initial maximum resoluton
Data clustering based on closeness of
Record Resolution-based Outlier Factor
All points in a NO
single cluster ?
Sort dataset as per Resolution-based Outlier
Get top N
Figure 3. ROF-based outlier mining algorithm
An example of other data mining algorithms to be investigated is AutoRegressive Tree (ART) technique
proposed by Meek et al. . ART technique uses a hybrid algorithm of decision tree model and
regression model, where regression models are built on each leaf node of the decision tree for prediction
on a continuous target variable. ART will be tested and evaluated for real-time evaluation of “estimated
work orders” and time-series forecasting of equipment cost. This research will investigate and try to solve
the technical problems associated with the application of these algorithms, i.e. how to fine-tune the
algorithm parameters to get the best results? Are there any other similar algorithms which can provide
better performance? Is it possible to improve the results through modification of the algorithm?
From the perspective of system implementation, the proposed equipment data warehouse will work as an
“add-on” data source in addition to the current transactional databases, all the data mining algorithms will
be integrated with the current equipment management system as “plug-in” components as shown in Figure
4. The computer technologies for the high level system integration are Microsoft Data Mining Expression
(DMX) and communication protocols such as OLE DB for Data Mining [Tang and MacLennan 2005].
Different from other studies on application of data warehousing and data mining technologies in
--- Page 7 of 8 ---
construction, I will primarily focus on the algorithmic level of data mining to propose novel data mining
algorithms and tailor the current algorithms for mining engineering data; at the same time, I will address
problems in the conceptual design of an equipment data warehouse, as well as system integration between
data warehousing/data mining and the current equipment management system. The later will greatly
benefit the construction industry by facilitating the transfer of data warehousing and data mining
technologies. The expected contributions of my research are as follows:
1. This research will provide guidelines for applying data warehousing technology to construction
equipment management for improved decision support. These include the opportunities, challenges,
and suggestions for planning and design of an equipment data warehouse, as well as software
2. A novel non-parametric outlier mining algorithm is proposed for generic problem detection in both
equipment management and other engineering applications. This will contribute to the body of
knowledge in data mining community.
3. A number of current data mining algorithms, such as family of decision tree algorithms, will be tested,
evaluated and modified for intelligent decision support in construction equipment management. This
research will report my findings and make recommendations on the general application of data mining
technology in construction equipment management.
4. This research will summarize and make recommendations on the architectural design and
implementation of an intelligent equipment management information system using combined data
warehousing/data mining techniques, to meet industrial expectations.
Equipment Management System Presentation Layer
Graphical User Interface
Current Software Components
Data Mining Model Components
Figure 4. Proposed intelligent equipment management system
Arditi, D., Kale, S. and Tangkar, M. (1997). “Innovation in construction equipment and its flow into
the construction industry.” J. Constr. Engrg. and Mgmt., ASCE, 123(4),371-378
Breunig, M., Kriegel, H., Ng, R., and Sander, J. 2000. LOF: identifying density-based local outliers”,
Proceedings of ACM SIGMOD 2000 International Conference on Management of Data, Dalles,
--- Page 8 of 8 ---
Caldas, C. H., Soibelman, L. and Han J. (2002) “Automated Classification of Construction Project
Documents.” ASCE Journal of Computing in Civil Engineering, 16(4), 234-243
Chau, K.W., Cao, Y., Anson, M., and Zhang J. 2002. Application of Data Warehouse and Decision
Support System in Construction Management. Automation in Construction, 12: 213–224.
Chen, W. F. and Liew, J. R. (2003). Civil Engineering Handbook, second edition, Chapter 6. CRC
Press, Florida, USA.
Foss, A., and Zaiane, O. (2002). “A parameter-less method for efficiently discovering clusters of
arbitrary shape in large datasets.” Proceedings of 2002 IEEE International Conference on Data
Mining (ICDM'02), Maebashi City, Japan
Gillespie, J.S. and Hyde, A.S. (2004) The Replace/Repair Decision For Heavy Equipment. Virginia
Transportation Research Council. Final report: VTRC 05-R8
Han, J. and Kamber, M. (2000). Data Mining: Concepts and Techniques, Morgan Kaufmann
Publishers, August 2000.
Hawkins, D. (1980). Identification of Outliers. Chapman and Hall, London.
Jahren, C.T. (2000). “Transportation construction equipment.” Transportation in the New Millennium,
TRB Annual meeting, January 2000.
Kimball, R. and Ross, M. 2002. The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling, second edition, John Wiley & Sons, Inc., New York, pp. 13–88.
Knorr, E., and Ng, R. (1998) “Algorithms for mining distance-based outliers in large datasets.”
Proceedings of Very Large Data Bases Conference, New York, USA
LeBlond, D., Owen, F., Gibson G.. E., Hass, C. T. and Traver, A.E. (1998). “Control improvement for
advanced construction equipment.” J. Constr. Engrg. and Mgmt., ASCE, 124(4),289-296
Lee, M., Hanna, A.S. and Loh, W.Y. (2004). “Decision Tree Approach to Classify and Quantify
cumulative Impact of Change Orders on Productivity.” J. Comp. in Civ. Engrg., ASCE, 18(2),
Lu, M., AbouRizk, S.M. and Hermann U.H. (2002). “Estimating labor productivity using probability
inference neural network” J. Comp. in Civ. Engrg., ASCE, 14(4), 241-248
Lucko, G. and Vorster, M.C. (2002) “Predicting the Residual Value of Heavy Construction
Equipment.” Proceedings of the 4th Joint International Symposium on Information Technology
in Civil Engineering 2003, Tennessee, USA.
Ma, Z., Wond, K.D., Heng, L. and Jun Y. (2005) “Utilizing exchanged documents in construction
projects for decision support based on data warehousing technique.” Automation in Construction,
NSERC/Alberta Construction Research Chair, (2005). http://irc.construction.ualberta.ca/ html/research/
Schexnayder, C.J. and David S.A. (2002). “Past and Future of Construction Equipment—Part IV” J.
Constr. Engrg. and Mgmt., ASCE, 128(4),279-286
Soibelman, L and Kim, H. (2002). “Data Preparation Process for Construction Knowledge Generation
through Knowledge Discovery in Databases.” ASCE Journal of Computing in Civil Engineering,
Tang, J., Chen, Z., Fu, A., and Cheung, D. (2002). “Enhancing effectiveness of outlier detections for
low density patterns.”, Proceedings of the 6th Pacific-Asia Conference on Advances in
Knowledge Discovery and Data Mining, Taipei, Taiwan. pp. 535 – 548
Tang, Z. and MacLennan, J. (2005). Data Mining with SQL Server 2005. Wiley Publishing, Inc.
--- Page 9 of 8 ---
Wilmot C.G. and Mei, B. (2005). “Neural Network Modeling of Highway Construction Costs.” J.
Constr. Engrg. and Mgmt., ASCE, 131(7),765-771
--- Page 10 of 8 ---