These slides are based on the following two research questions:
1) What are the possible areas where Deep Learning can be applied in Construction Industry?
2) What are the problems associated with the application of Deep Learning in Construction Industry?
2. Presentation by:
Gaurav Verma
M.Tech [IEM]
21103038
Submitted to:
Prof. Sudhir Misra
Dept. of Civil Engineering
IIT Kanpur
Literature Review on Different
Topics
3. OUTLINES
1. Research Questions
2. Classification of DLApplications in Construction Industry
3. Paper 1: A deep hybrid learning model to detect unsafe behaviour
4. Paper 2: Detecting Non-hardhat use
5. Paper 3: Automated text classification of near misses
6. Paper 4: Deep Learning for site safety
7. Paper 5: CNN for Pavement Roughness Assessment
8. Paper 6: Construction Vehicles Tracking
9. What else I have done?
4. RQ1.What are the
applications of Deep
Learning in Construction
Industry?
RQ2. What are the
challenges in the
applications of Deep
Learning in Construction
Industry?
RESEARCH QUESTIONS
5. CLASSIFICATION OF DL APPLICATIONS
Classificatio
n of DL
Applications
in
Construction
Industry
Construction
Safety &
Management
Equipment
Tracking
Sewer
Assessment
Crack
Detection
3D Point
Cloud
Enhancement
Miscellaneou
s
Applications
7. A deep hybrid learning model to
detect unsafe behavior: Integrating
convolution neural networks and long
short-term memory
8. INTRODUCTION
• Approximately 88% of all accidents that occur during construction materialize as a
consequence of unsafe behavior of workers.
• Conventional methods to determine workers' behavior have been predominately based
upon observational methods. While such methods may provide useful information, they
are time-consuming, labor-intensive and are subjective in nature. Due to these
limitations, computer vision technology, which has been used for object recognition, can
be applied to identify workers' unsafe actions on-site.
Reference: https://doi.org/10.1016/j.autcon.2017.11.002
9. DEEP LEARNING MODEL
• The deep models are trained to compute feature representations from the action videos,
which are structured using a combination of CNNs and LSTM models.
• 1 video stream =
25 video clips
• 2048 dimension
feature vector.
• 25 feature
vectors.
• The LSTM has an
advanced RNN
architecture, which
can learn long-range
dependencies due to
its memory cell.
Reference: https://doi.org/10.1016/j.autcon.2017.11.002
10. EXPERIMENT DETAILS
• Occupational Safety and Health Administration (OSHA) accident statistics were used.
• Falls are one of leading causes of accidents in construction, accounting for 34% fatalities
and 24% non-fatalities. Notably, falls from ladders account for 9% of deaths and 6% of
injuries.
• Video recordings of a person climbing and dismounting from a ladder were collected.
• Each video is on average 8 s in length, and has a resolution of 1920 ∗ 1080.
• For each class of actions, 50 samples (i.e., the number of cycles) were collected.
Reference: https://doi.org/10.1016/j.autcon.2017.11.002
11. IMPLEMENTATION DETAILS
Total of 200
videos
160 for
Training sets
40 for Testing
sets
Reference: https://doi.org/10.1016/j.autcon.2017.11.002
2 Types of Labels Used
Label Used Label Code Actions
Accuracy in
DL Model
Two types of
labels
(0,1)
0: Normal
ladder
climbing
1: Abnormal
ladder
climbing
97 %
Four types of
labels
(0,1,2,3)
0: Normal
climbing
1: With an
object
2: Backward
Facing
3: Reaching
Far
92 %
13. LIMITATION & FUTURE SCOPE
Limitations Future Scopes
Unable to identify unsafe behaviours
in case of multiple workers in a single
image.
Model that simultaneously
accommodate multiple pieces of
equipment/workers contained within
video frames.
Reference: https://doi.org/10.1016/j.autcon.2017.11.002
15. INTRODUCTION
• According to the United States' Bureau of Labor Statistics, the number of fatalities in the
US has gradually increased from 849 to 985 between 2012 and 2015.
• According to the UK Health and Safety Executive (HSE), 38 construction workers
suffered fatal injuries in Great Britain between April 2014 and March 2015, while this
figure rose to 45 during the same period the following year.
• From 2003 to 2010, 2210 construction workers in the United States died as a result of
traumatic brain injuries, accounting for 24% of the total number of deaths from
construction accidents.
• A survey conducted by the US Bureau of Labor Statistics (BLS) suggests that 84% of
workers who had suffered impact injuries to the head were not wearing head protection
equipment.
Reference: https://doi.org/10.1016/j.autcon.2017.09.018
16. DEEP LEARNING MODEL
• Compared with speed, high recognition precision and recall rate are more important for
NHU detection. Therefore, in this paper, Faster R-CNN is proposed for the detection of
construction NHU worker.
Advantages of Faster R-CNN:
1. Robust in dealing with complex construction site environments.
2. High precision of Faster R-CNN can fulfill the needs of practical engineering
applications.
3. Coupled with the short processing time of Faster R-CNN, real-time monitoring of NHU
can be achieved.
Reference: https://doi.org/10.1016/j.autcon.2017.09.018
17. EXPERIMENT DETAILS
• More than 100,000 image frames of surveillance videos from 25 different construction
projects. In order to create a comprehensive dataset (of assorted situations), the videos
were collected for more than one year.
• A total of 81,000 images from this dataset were randomly selected to comprise the
training dataset.
• All the images in the testing dataset were classified into several categories based on
weather, illumination, individuals' posture, visual range and occlusions.
Reference: https://doi.org/10.1016/j.autcon.2017.09.018
Evaluation Performance Metrices
Precision True Positive / (True Positive + False Positive)
Recall True Positive / (True Positive + False Negative)
Miss Rate 1 - Recall
18. RESULTS
Results under different Visual Range
Value TP FP FN Precision (%) Recall (%) Miss Rate (%) Speed (s)
Large 3374 226 280 93.7 92.3 7.7 0.212
Middle 2065 91 102 95.8 95.3 4.7 0.207
Small 1089 18 47 98.4 95.9 4.1 0.204
Results under different Weather Conditions
Value TP FP FN Precision (%) Recall (%) Miss Rate (%) Speed (s)
Sunny 2459 83 123 96.7 95.2 4.8 0.204
Cloudy 2155 98 94 95.7 95.8 4.2 0.202
Misty 1586 107 98 93.7 94.2 5.8 0.209
Rainy 2186 123 164 94.7 93.0 7.0 0.210
• Similarly, for other categories like illumination levels, individual postures, & occlusions
results have been described.
Reference: https://doi.org/10.1016/j.autcon.2017.09.018
19. LIMITATION & FUTURE SCOPE
Limitations Future Scopes
Currently, this algorithm is able to
detect NHU workers but not identify
the workers involved.
It is recommended that future
research focus on the identification
and integration of worker information
into real-time safety monitoring
systems as this will then enable
disciplinary action and targeted safety
training to be carried out.
Reference: https://doi.org/10.1016/j.autcon.2017.09.018
21. INTRODUCTION
• A near miss has been defined as an unplanned event that has the potential to cause but
does not result in personal injury, environmental or equipment damage, or interruption to
regular operation.
• Approximately 91% of accidents produced no injuries, while 9% were minor and less
than 1% major.
• The analysis of near-miss data can be labour-intensive and time-consuming, and it requires
an understanding of safety to be able to derive meaningful insights.
• As a result of classifying text using Deep Learning models, this can provide site managers
with an ability to identify work-areas and instances where the likelihood of an accident
may occur.
Reference: https://doi.org/10.1016/j.aei.2020.101060
22. DEEP LEARNING MODEL
• This paper utilized deep learning and Bidirectional Transformers for Language
Understanding (BERT) to develop a robust automatic text classification model of near-
misses.
• The BERT’s model architecture is a multi-layer bidirectional transformer encoder-decoder
structure.
• The encoder consists of six identical layers. Each layer has two sublayers: (1) a multi-head
self-attention mechanism; and a fully connected feed-forward network with simple and
position-wise.
• These two sub-layers are connected by a residual connection followed by layer
normalization, and then output a 768-dimension vectors.
Reference: https://doi.org/10.1016/j.aei.2020.101060
23. SOURCE OF DATA
• Approximately 3280 near-miss events are stored. Each near-miss contains its location,
time, name, description, safety level, categories, and images.
• These 3280 near-misses have been classified into 170 categories, such as quality of main
concrete structure, template installation, monitoring data overrun.
• The database is randomly divided into a training and testing database with a ratio of 8:2.
In other words, 2624 near-miss are used for training the BERT model, and 657 for testing
its performance.
Reference: https://doi.org/10.1016/j.aei.2020.101060
24. EXPERIMENT
Data Cleaning
• Punctuation
are removed.
• All words to
lowercase.
• Each sentence
is intercepted
by first N (64)
words.
Word-piece
Tokenization
• Completely
data-driven.
• Greedy
longest match
first algorithm
is used.
• “unaffable”
“un”,
“##aff”,
“##ble”.
Text-feature
Construction
• All the different tokens from
the previous step is arranged
and numbered from 1 to k.
• If the length of sentence is
less than N, it will be filled
with o.
• Each sentence will be
converted to an input feature
of length N.
Reference: https://doi.org/10.1016/j.aei.2020.101060
26. LIMITATION & FUTURE SCOPE
Limitations Future Scopes
• The developed model was unable to
100% accurately classify near-miss
reports due to sheer number of
categories (L = 170), which contained
too few events.
• The data source is in Chinese. In this
experiment, we translated the data into
English. Thus, the quality of the
translation may have affected the
experimental results.
• Further research is required to
improve the accuracy of classifying
safety data, particularly in the context
of annotating training text.
• Also, future research needs to focus
on creating larger datasets and using
unsupervised learning to improve the
accuracy of text classification.
Reference: https://doi.org/10.1016/j.aei.2020.101060
27. Deep learning for site safety: Real-
time detection of personal protective
equipment
28. INTRODUCTION
• The U.S. Occupational Safety and Health Administration (OSHA) and similar agencies in other
countries require that all personnel, working in close proximity of site hazards, wear proper PPE to
minimize the risk of being exposed to or injured by hazards.
• According to a report by the National Institute
for Occupational Safety and Health (NIOSH),
between 2003 and 2010, a total of 2,210
construction fatalities occurred because of
traumatic brain injury (TBI) which
represented 25% of all construction fatalities
during that period.
Percentage of fatal injuries caused by the “fatal four” in
construction industry in 2017.
• Three deep learning models are introduced for real time
detection of Personal Protective Equipment (PPE).
Reference: https://doi.org/10.1016/j.autcon.2020.103085
29. DEEP LEARNING MODELS
First Approach The algorithm detects workers, hats, and
vests and then, a machine learning model
(e.g., neural network and decision tree)
verifies if each detected worker is properly
wearing hat or vest.
Second Approach The algorithm simultaneously detects
individual workers and verifies PPE
compliance with a single convolutional
neural network (CNN) framework.
Third Approach The algorithm first detects only the
workers in the input image which are then
cropped and classified by CNN-based
classifiers (i.e., VGG-16, ResNet-50, and
Xception) according to the presence of PPE
attire.
Reference: https://doi.org/10.1016/j.autcon.2020.103085
33. FUTURE SCOPE
• The dataset can be expanded to detect other common PPE components, e.g., safety glass
and gloves, etc.
• The DL model can also be made to detect the identity of the faulty worker who has not
wore any of the PPE.
Reference: https://doi.org/10.1016/j.autcon.2020.103085
36. INTRODUCTION
• Various technologies, including vehicle-mounted laser profiling systems, have been developed and
adopted for road roughness (e.g., IRI—International Roughness Index) measurement; however,
their high cost limits their use.
• Yearly based inspections and limited coverage using the vehicle-mounted laser profiling systems
may not effectively reflect the overall health conditions of our extensive road networks in a timely
manner.
• These IRI estimation approaches that use vehicle dynamics, no matter which sensors are used,
intrinsically require a precise calibration of the vehicle model. This is typically done with known
road profiles or bump-induced vehicle responses at controlled vehicle speeds.
Reference: https://doi.org/10.1111/mice.12546
So, What?
37. INTRODUCTION
• No matter the calibration method, such a way of existing IRI estimations require precisely
calibrated vehicle models, which is not practically applicable for usual passenger vehicles.
• For example, the number and locations of passengers in the vehicle may change every day, and the
vehicle speeds vary all the time, and suspension characteristics change by aging over time.
Furthermore, the dynamic properties of vehicle mechanics may also change over time. The
location, direction, and way of smartphone mounting may also be different every time. These
ambient variations are closely related to the vehicle dynamics (i.e., vehicle mass, damping, pitch
inertia, etc.) and corresponding sensor measurements change significantly.
• Therefore, previous calibrations of the vehicle model become quickly invalid for the IRI estimation
under altered vehicle dynamics and sensor installations.
• This study develops a CNN-based road roughness (i.e., discrete IRI) estimation method
that utilizes anonymous passenger vehicles and their dynamic responses to compensate
for the drawbacks of current profiling-based technologies.
Reference: https://doi.org/10.1111/mice.12546
42. What Else?
1.Data Structures & Algorithms: Will be used in
the preparation of Deep Learning Models.
2.Various terminologies regarding DL.
3.Object Oriented Programming.