SlideShare a Scribd company logo
1 of 56
Download to read offline
저작자표시-비영리-변경금지 2.0 대한민국
이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게
l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다.
다음과 같은 조건을 따라야 합니다:
l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건
을 명확하게 나타내어야 합니다.
l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다.
저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다.
이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다.
Disclaimer
저작자표시. 귀하는 원저작자를 표시하여야 합니다.
비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다.
변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.
Master Thesis
A Comprehensive Study on Nondestructive
Inspection of Reinforced Concrete Utility
Poles with ISOMAP and Random Forest
Saeed Ullah
S&T Information Science
UNIVERSITY OF SCIENCE AND TECHNOLOGY
February 2019
ii
A Comprehensive Study on Nondestructive
Inspection of Reinforced Concrete Utility
Poles with ISOMAP and Random Forest
Saeed Ullah
A Dissertation Submitted in Partial Fulfillment of Requirements
For the Degree of the Master of Engineering
February 2019
UNIVERSITY OF SCIENCE AND TECHNOLOGY
S&T Information Science
Supervisor: Minjoong Jeong
iii
We hereby approve the M.S.
thesis of “Saeed Ullah”.
February 2019
UNIVERSITY OF SCIENCE AND TECHNOLOGY
iv
ACKNOWLEDGEMENT
The gratification that accompanies the successful accomplishment of this thesis
would not be completed without the mention of those people who made it possible
and whose motivation, encouragement and guidance has been a vital source of
inspiration throughout the course of my master’s research.
I would like to express my sincere gratitude to my supervisor Dr. Minjoong Jeong
for his valuable guidance, inspiration and productive suggestions throughout the
course of this work. I really thank him for his patience, motivation and support which
helped me to make my stay at KISTI. It has been an honor for me to be his student.
I am also very grateful to all the other members of my thesis committee, Dr.
Ilyoup Sohn, Dr. Min Sun Yeom, Dr. Young Mahn Han and Dr. Ji Hoon Kang for
their valuable advice and comments. I thank them all for reviewing my thesis and
providing me several insightful suggestions. Without their suggestions and comments
I would not be able to improve my thesis well.
I would also like to thank all of my colleagues and friends in KISTI, UST and in
Korea who made my time enjoyable and memorable. I really thank all of them for
accepting me into their team with an open heart and keeping me involved in different
team and cultural based activities, for helping me to get accustomed with Korean life,
v
for encouraging me and always being there for me to help. Thank you all for being
my family away from home.
Finally, I would like to thank my family for all of their motivation, support and
love. I would specially like to thank to my father and mother for their unconditional
love, support and blessings. I also thank my younger sister for always inspiring and
encouraging me to work harder.
I would like to dedicate this thesis to my parents, who worked very hard to provide
me with the best education and always encouraged me to achieve higher and to follow
my own dreams.
vi
초록
ISOMAP과 Random Forest를 이용한 철근 콘크리트
전주의 비파괴 검사에 관한 연구
철근 콘크리트 전주는 기계적인 힘의 증가와 전기적인 저항성이
증가된 경제적 효율성으로 인해 송전선로에서 매우 보편적이다.
이러한 전주는 실제 서비스 조건에서 내부 보강 강선의 균열, 부식,
열화 및 단락으로 인한 구조적 안전 문제가 발생할 수 있다.
따라서 전주의 구조적 안전성을 평가하기 위해 주기적인 안전성
검사가 필요하다. 그러나 전주 내부 보강 철근에 대한 현장 비파괴
검사는 복잡하고 경제성 문제가 있다. 본 연구에서는 기계학습
방법론을 기반으로 실제 전주에서 간편하고 효율적으로 취득할 수
있는 다중 채널의 자기장 신호를 활용하는 방법을 제안하였다.
수집된 센서 데이터는 전주의 안전 또는 손상 여부를 파악하기
위한 패턴학습에 사용되었다. 이를 위하여 수집된 다차원 센서
데이터를 ISOMAP (Isometric Feature Mapping) 알고리즘으로
차원축소하고 Random Forest 으로 손상여부를 학습하였다. 제안된
방법을 통하여 측정된 센서 데이터만을 이용하여 전주 내부의
vii
철근의 손상여부를 효과적으로 예측할 수 있었다. 이 방법은 향후
개발될 휴대용 센서 장비와 함께 철근 콘크리트 전주의 구조적
안전성을 평가하는 데 적용될 수 있을 것이다.
viii
ABSTRACT
A Comprehensive Study on Nondestructive Inspection of
Reinforced Concrete Utility Poles with ISOMAP and
Random Forest
Reinforced concrete poles are very popular in transmission lines due to their
economic efficiency, enhanced mechanical strength and better electrical
resistance. However, these poles have structural safety issues in their service
terms caused by cracks, corrosion, deterioration, and short-circuiting of
internal reinforcing steel wires. Therefore, these poles must be periodically
examined to evaluate their structural safety. There are many methods of
performing external inspection after installation at an actual site. However,
on-site nondestructive safety inspection of steel reinforcement wires inside
poles is very difficult. In this study, we developed a machine-learning based
application that classifies the magnetic field signals of multiple channels
acquired from the actual poles. Initially, the signal data were gathered by
inserting sensors into the poles, and these data were then used to learn the
patterns of safe and damaged features. These features were then processed
with the isometric feature mapping (ISOMAP) dimensionality reduction
algorithm. Subsequently, the resulting reduced data were processed with
ix
random forest algorithm. The proposed method could show that whether the
internal wires of the poles were broken or not according to the actual sensor
data. This method can be applied for evaluating the structural integrity of
concrete poles in combination with portable devices for signal measurement
(under development). 

*A thesis submitted to committee of the University of Science and Technology in a partial
fulfillment of the requirement for the degree of Master of Engineering conferred in
February 2019.
x
xi
Contents
1. Introduction
·············································································
1
1.1 Introduction························································· 1
1.2 Inspection Methods of Concrete Materials ····················· 2
1.2.1 Advantages of NDT Methods ····························· 3
1.2.2 Different Types of NDT Methods ························ 4
1.3 Literature Review ················································· 7
1.4 The Proposed System ············································ 8
1.5 Thesis Organization ·············································· 9
2. Experimental Setup ·················································· 10
2.1 Experimental Setup ·············································· 10
3. The Proposed System ··············································· 13
3.1 The Dataset ······················································· 13
3.2 Dimensionality Reduction and its Importance ··············· 14
3.3 The Flowchart of the Proposed System ······················· 16
3.4 ISOMAP ·························································· 17
xii
3.4.1 Deciding the Number of Dimensions for the ISOMAP
Algorithm ··················································
19
3.5 Random Forest ·················································· 21
4. Results and Discussion ············································ 26
4.1 Performance Evaluation of Our Classifier ·················· 26
4.2
Comparison of Random Forest with SVM and Decision
Trees on Our Data ···········································
31
5. Conclusions ·························································· 35
References······························································ 36
xiii
List of Figures
1. Magnetic sensing device with eight channels ······················· 11
2.
Field testing; laboratory testing; a broken steel wire; multiple steel
wires ······································································
12
3. A safe signal; a crack signal; safe signals; and crack signals ····· 14
4. Flowchart of the proposed system ··································· 16
5. Flowchart of the ISOMAP algorithm ································ 18
6. The residual variance of ISOMAP in the dataset ··················· 20
7. Random forest of three decision trees ································ 23
8. ROC curve of random forest algorithm ······························· 30
9. ROC curve of SVM ······················································ 32
10. ROC curve of decision tree ············································ 32
11. Performance measure graph of different algorithms ··············· 33
xiv
List of Tables
1. Specifications of all the parts of the magnetic sensing device ···· 12
2. Sample dataset for illustration of random forest algorithm ········ 22
3. Illustration of vote casting mechanism in random forest ·········· 23
4. Performance evaluation with different number of trees ··········· 25
5. Confusion matrix of the proposed system ·························· 27
1
Chapter 1
Introduction
1.1. Introduction
Reinforced concrete poles are widely used for telephone and electric transmission. The cost
effectiveness, longer life span (over 50 years), greater mechanical strength, potential to cover
longer distances, and better electrical resistance are some of the key reasons for their
widespread usage [1]. In comparison to steel poles, reinforced concrete poles have higher
variation in architectural shapes, better electrical resistance, lower maintenance charges and no
emission of hazardous materials. While comparing with timber poles, concrete poles are higher
resistible to hurricanes, not susceptible to decay and fire [2]. The disadvantage of these poles
is the high cost of transportation because of being bulky and heavy. However, the heavy weight
of these poles also helps in resisting the high winds in coastal areas. In reinforced concrete
poles, structural safety defects can occur mainly because of cracks, spalling, corrosion,
deterioration and short-circuit of internal reinforcing steel. The main causes of these problems
in reinforced concrete poles are poor construction practices, atmospheric exposure, hurricanes,
flood, earthquakes, moisture changes, various mechanical, physical and chemical reactions [1,
3]. Corrosion in reinforcing steel and deterioration of the concrete can lead to the loss of service
properties and failure of reinforced concrete supports. Corrosion of steel reinforcement is a
major reason of deterioration in structural concrete [4]. Rust and other corrosion in reinforced
concrete can cause tensile stresses, cover cracking, delamination and spalling [5]. Corrosion
can ominously affect the mechanical behavior of reinforcement, the damage caused by
corrosion is very expensive to repair and troublesome [3 – 6]. Structural problems need more
attention than non-structural problems, about half of the accidents caused from utility poles are
related to structural problems [1, 7]. These defects constitute a major reason for reduced life
expectancy, structural strength and serviceability of reinforced concrete utility poles. Structural
2
problems can lead to failure of poles at later stage. Moreover, the failure of poles can cause
serious problems, such as falling onto people or animals, causing injury (or death), and service
disruption. Earlier detection of such problems can save precious lives, time, and money.
Therefore, it is necessary to identify such issues at an initial stage for which accurate inspection
and monitoring is mandatory on regular basis. Further, reinforced concrete materials deteriorate
with the passage of time due to challenging environments and various loads, which affects their
strength and serviceability very badly. Deterioration in reinforced concrete structures occurs at
varying levels and in many situations the impairment is invisible, which is usually under the
ground or inside the poles. Therefore, for ensuring the structural integrity of reinforced concrete
utility poles, those already in service poles must be regularly monitored or frequently inspected
[8].
It is very important to perform accurate assessment of the integrity of existing poles because
maintenance and repair cost are growing rapidly. Hence, there is a rising demand to develop
more accurate, consistent and reliable inspection methods for assessing conditions of in-service
poles [2]. Stewart et al. [5] claimed that the shortage of long-standing predictive capabilities
show that the upgradation of structural reliability valuations are essential at regular intervals.
1.2. Inspection Methods of Concrete Materials
Inspection or evaluation methods of concrete materials can be broadly distributed into two
main types. Semi-destructive or partially destructive inspection and non-destructive inspection.
In semi-destructive methods, the surface of the concrete is slightly damaged and could be repaired
once the inspection is performed. Examples of semi-destructive testing methods are core tests,
pullout and pull off tests [9]. Further, such inspection methods may require the product to be taken
out of service for inspection and are labor intensive. For in-service utility poles, this is not an
option since it would require the power line to be temporarily shut down and possibly the utility
pole may be dismounted. In non-destructive tests (NDT), inspection can be performed without
3
harming the surface of the concrete. These tests are sometimes called as Nondestructive
Evaluation (NDE) [9, 10]. Semi-destructive methods is not the scope of this thesis. Here, the main
focus is only on NDT methods.
1.2.1. Advantages of NDT Methods
Direct sampling and observation of the structure assessment is always not possible in
internal damage situations or when physical access is prohibited to the structure of interest. In
such situations NDT methods are commonly used [10]. NDT is defined as the course of
inspecting, testing, or evaluating materials without affecting the structure or serviceability of
any part of the system. Carrying out assessment without affecting functionality of the system
is the main aim of NDT [10, 11]. NDT has the potential to be performed with minimal
expenditures of time and manpower. NDT techniques show good sensitivity to concrete
properties and defects. NDT techniques are able to easily identify delaminated, cracked and
spalled portions of the concrete. NDT methods are also able to provide information about
presence of voids, honeycomb, measurements of size and location of steel reinforcement,
corrosion action on the reinforcement, and the extent of damage caused by chemical exposure,
accidental fire, or freezing and thawing. In recent years, a lot of better, accurate, reliable, and
quantifiable NDT methods have been developed [9 – 11].
Non-destructive test methods are progressively developed and applied for identifying
different kinds of defects in concrete structures. In the civil and structural engineering industry,
a wide range of NDT methods are being used for the assessment of concrete structure [11]. This
abundance in applications of NDT methods is due to technological advancement in hardware
and software for data acquisition, improvement in data collection and analysis techniques and
the ability to perform quick and inclusive assessments of existing construction [12].
Different NDT methods for concrete (pre-stressed and reinforced) materials along with
their shortcomings are described in the next section.
4
1.2.2. Different Types of NDT Methods
NDT methods for concrete (pre-stressed and reinforced) materials can be categorized into
different types, which are: visual inspection, stress wave based methods, nuclear based
(radiometric and radiographic) methods, magnetic and electrical methods, penetrability
methods, infrared thermography and radar [9 – 21].
1.2.2.1. Visual Inspection
Visual inspection is traditionally used inspection technique. Visual inspection can provide
valuable information to a well trained eye [9]. In this method, inspectors usually drill holes in
poles, and record the signs of deterioration. As the inspection proceeds, a careful and complete
record of all the observations should be made. Visual inspection is useful for identification of
the impaired spots and assessing corrosion in uncovered reinforcement. This method is also
helpful in providing an early indication of the condition of the concrete for allowing the
formulation of a following testing program [9]. The devices used in this method are microscope,
optical magnification, magnifying glasses or small digital video camera and rulers [9, 12]. The
disadvantages of this method are the requirements of highly qualified and experienced
inspectors, and only visible surfaces can be inspected. Therefore, internal defects are not
assessed and in most cases detailed properties of the concrete are not acknowledged.
1.2.2.2. Stress Wave Techniques
Stress wave techniques have also been successfully applied for the inspection of reinforced
concrete. In these techniques, waves are propagated through the medium [9]. Several stress
wave based testing techniques have been developed for instance, impact-echo method,
ultrasonic through transmission method, ultrasonic-echo method, impulse response, acoustic
emission, and Spectral Analysis of Surface Waves (SASW) [10, 12]. The equipment required
for such kind of tests are sensors for wave detection, a source for propagation of waves, a data
acquisition and analysis system [10, 20]. These methods are very effective in locating large
voids or delaminations in concrete structures. However, these methods have some limitations
5
which are the need of the experienced operators, expensive devices, involving complex signal
processing and these methods also do not provide information on the depth of defect [10, 12].
1.2.2.3. Nuclear Methods
Different kinds of nuclear methods have also been developed for nondestructive inspection
of concrete materials [9]. Nuclear based techniques are usually subdivided into two groups:
Radiometric and Radiographic techniques [12]. In these methods, information regarding the
test object is gained with the help of high-energy electromagnetic radiation [12]. Radiometry
is using electromagnetic radiation (gamma rays) to assess the density of fresh or hardened
concrete [12]. In radiometry based techniques, radiation is emitted by a radioactive isotope and
perceived by a detector [9, 12]. Radiography based methods are same as that used to generate
medical “X-rays”. Radiation is passed via the test object for producing the photograph of the
inside structure of the concrete [12].
Photographic film records of internal structure of concrete are usually made in both of these
methods. In these methods, the intensity of the radiation is proportional to the exposure of the
film. The more refined the intensity of the radiation, the better the exposure of the film and
vice-versa [12]. Some problems associated with nuclear methods are the need of licensed
operators, measurement is sensitive to chemical composition and affected by near surface
material, requires bulky and expensive X-ray equipment, difficult to identify cracks
perpendicular to the radiation beam, tackling safety issues. Radiographic procedures are also
extremely costly. High voltages are used for x-ray equipment which are also very dangerous.
Mostly, the images do not deliver much significant details regarding the deepness of the defect,
and merely slight surface can be examined because of the limited scope of the sensitive film
[12].
1.2.2.4. Magnetic and Electrical Methods
Many magnetic and electrical based techniques have also been developed for the inspection
of the concrete structure. Some of the magnetic and electrical methods are covermeters, half-
6
cell potential and linear polarization [10, 12]. In these techniques, magnetic, electric or
electromagnetic fields are generated around the surface of inspection and then those fields are
analyzed for further results. Usually, these methods require knowledge about the quantity and
location of reinforcement for evaluating the strength of reinforced concrete members [9, 12].
Some of the limitations of these methods are less accuracy, experienced personnel requirement
for testing and interpretation, the need of electrical connection to reinforcement. Mostly, these
methods cannot identify the presence of second layer of reinforcement inside concrete structure
[12].
1.2.2.5. Penetrability Methods
Different penetrability methods such as Initial Surface-Absorption Test (ISAT), Figg (water-
absorption Test), CLAM test, Steinert method, Figg air-permeability test, Schönlin test, and
Surface airflow test are also being used for the assessment of concrete structure [12]. These
methods are usually very simple and inexpensive to perform. However, some of the problems
associated with these methods are sensitivity to concrete moisture condition, drilling is
necessary in most of the penetrability methods, and lengthy test time. Concrete surface also got
damaged in these methods [12].
1.2.2.6. Infrared Thermography Methods
Infrared thermography based methods are used for identifying subsurface defects within and
below concrete structures [9, 10, 12]. In these methods, the emission of thermal radiations is
sensed from the inspection surface and a visual image is produced [9, 15]. These methods have
extensively been applied for internal voids and cracks identification in the concrete structures
[9, 12]. Some of the limitations of the infrared based methods are the need of expensive
equipment, suitable environmental condition requirement for testing, the depth and thickness of
a subsurface anomaly cannot be measured. The test response varies with varying environmental
conditions and for meaningful and correct interpretation of the acquired data, trained individuals
are required [9, 12].
7
1.2.2.7. Radar (Radio Detection and Ranging)
Radar is a famous NDT method for inspection of concrete materials [9]. Radar can also be
referred as Ground Penetrating Radar (GPR). In radar, electromagnetic energy is propagated
through different dielectric materials [9]. In these methods, electromagnetic pulse is radiated
through a transmitter antenna, which are then reflected at the surface and internal layer borders
of the inspection object and then that electromagnetic pulse is recorded through a receiver
antenna [9, 10]. Radar is most useful in recognition of delaminations and the types of defects
occurring in plain or overlaid reinforced concrete decks. Radar has shown better ability in void
detection and depth of the thickness of the concrete materials. Radar has also good potential in
scanning of large surface ranges in a short period of time [9, 10]. Some of the limitations of
radar are, a large amount of data is acquired during scans, and experienced operators are required
for operating equipment and interpreting the results. Further, in radar the penetration of pulses
received from high resolution antennae have a very limited depth [9, 12].
Beside all of the above well-known NDT categorized methods, numerous case studies exist
in which different NDT based methods have been merged. Different NDT methods are combined
for the purpose of achieving the best performance, improving the assessment results
interpretations, accomplishing quick and accurate results or reducing the limitations of
individual methods [10].
1.3. Literature Review
Dackermann et al. [22] developed an NDT method based on guided wave propagation and
machine-learning techniques for timber utility poles. They combined improved signal
processing methods with multi-sensor based system and advanced machine-learning
algorithms. Wave signals were captured through sensor and then machine-learning algorithms
were applied in order to evaluate the condition of the pole. They achieved accuracy up to 95%
with the use of different classification algorithms.
8
Recently, Dackermann et al. [23, 24] developed NDT methods for timber poles, self-
compacting concrete poles without steel reinforcement, and generic concrete poles without
steel reinforcement, with the use of advanced signal processing and machine-learning
techniques. They applied stress wave methods for data gathering. They used a large number of
lower price tactile transducers and accelerometers (along with other devices). They achieved
accuracy up to 93%. They applied Principal Component Analysis (PCA) for the purpose of
feature reduction and Support Vector Machines (SVM) for the classification of signals
regarding prediction of the pole damage condition. They adopted Frequency Response
Functions (FRFs) and PCA for extracting signal features when capturing single-mode stress
waves for condition assessment.
It is very difficult to find any NDT method in the literature which is related to the
application of machine learning or data analysis techniques and applied specifically on
reinforced concrete utility poles.
1.4. The Proposed System
In this study, we propose a novel NDT method for structural health monitoring of reinforced
concrete utility poles based on dimensionality reduction and machine learning techniques. For
the purpose of training machine-learning algorithm, the data gathered through magnetic sensors
at actual site by SMART C&S [25] were used. Further, ISOMAP [26] is applied on the data for
feature reduction, and the resulting refined data constitute the input to a random forest classifier
[27, 28]. In particular, the refined signals are trained with a random forest method and are
categorized into safe and crack signals that are based on actual field experiments. Here, safe
signals are those corresponding to non-damaged wires, whereas crack signals correspond to
damaged wires. Two other well-known machine-learning algorithms are also applied on the
data: Support Vector Machine (SVM) [29] and decision trees [30].
All of the algorithms used in this study were written and executed with the Python language
9
machine learning scikit-learn library [31].
1.5. Thesis Organization
This thesis is organized as follows: In Chapter 1, reinforced concrete poles are briefly
introduced and compared with timber and steel poles. Different types of inspection methods on
concrete materials along with the benefits and the need of NDT methods are concisely
explained. Further, different types of NDT methods and their limitations are described in this
chapter. At last, the literature review and the proposed system are presented. In Chapter 2, the
complete experimental setup for data gathering is described. The proposed system resulting
from this research work is elaborated in Chapter 3. The experimental results are presented and
discussed in Chapter 4. Finally, the conclusions and the directions towards future work are
given in Chapter 5.
10
Chapter 2
Experimental Setup
In this chapter, the applied data gathering techniques, the devices which are used for data
gathering, and their complete specifications are elaborated.
2.1 Experimental Setup
The whole setup for data gathering is as follows: a magnetic sensing device, a Data
Acquisition (DAQ) device, a cable, and a portable computer. The magnetic sensing device (Hall
Effect Sensor) is used for detecting the magnetic field, DAQ device is used for converting those
magnetic field signals into digital values, which can later be manipulated by a computer and a
cable is used for connecting the magnetic sensing device and the DAQ device. The whole
arrangement of these devices is shown in Figure 1. To inspect a utility pole, the magnetic sensing
device can be inserted into the pole through holes, and the signals can be gathered with the help
of the magnetic sensing and DAQ device. Later, this device can be attached to a portable
computer to check the signals with the patterns of this application (the proposed system).
Regarding the training of the algorithm, the signals were gathered with the same device and the
dataset was manually configured by the field engineers of SMART C&S. They collected signals
from 30 reinforced concrete utility poles containing signals from damaged and non-damaged
wires. They labeled each signal manually on the basis of the condition of each wire. For
collecting signals from those 30 reinforced concrete utility poles, they uninstalled those utility
poles and broke down the concrete cover over steel wires to pull the wires out the concrete. In
particular, the field engineers pulled all of the wires out those 30 poles and gathered the signals
from each wire. They inserted the sensors into the poles through the bolt hole for maintenance
and gathered the signals from the ground portion up to the bolt hole of the pole, because the
tendency of defects in wires inside poles is mostly in the ground portion. Further, uninstallation
of the pole is not necessary during the inspection. The signals gathered from those 30 utility
11
poles can be used for the whole lifetime of the proposed system for the inspection.
Furthermore, the measuring section for signal gathering inside the poles was set to a
maximum height of 4 m, and the measurement rate was fixed to 0.3 m/s. The diameter of steel
wires in all of the poles was from 9 to 12 mm, and there were 16 total steel (eight tension and
eight reinforcing) wires in each utility pole. The thickness of the concrete cover over the steel
wires was from 12 to 24 mm in each pole. They tried to maintain the constant speed in all of the
experiments and repeated every experiment on each wire from 3 to 4 times. The field engineers
gathered 101 Hall Effect values for every signal from all of the wires. These Hall Effect values
are then referred to as “features” in the dataset. The dataset could contain many ambiguous,
repeated, unnecessary and insignificant features. Therefore, the dataset was filtered into
meaningful features for the purpose of using it with a classification algorithm, which is further
explained in Chapter 3.
The eight-channel magnetic sensing device along with all the parts is depicted in Figure 1.
The specifications of all the parts of this device are displayed in Table 1.
Figure 1: Magnetic sensing device with eight channels.
Table 1: Specifications of all the parts of the magnetic sensing device.
12
Sensor and DAQ
ADC Resolution 16 bit
ADC Input Channel 8 Differential Input Channels
ADC Sampling rate 50 S/s
Main Cable
Length 6 m
Diameter 22 mm
Figure 2(a) illustrates the process of removing a maintenance bolt and securing the bolt hole
to insert the magnetic sensing device into a working utility pole. Figure 2(b) shows an example
of laboratory verification experiments of the data-gathering device. Figure 2(c) represents an
example of a broken steel wire and Figure 2(d) shows multiple steel wires inside the poles.
Figure 2. (a) Field testing; (b) laboratory testing; (c) a broken steel wire; (d) multiple steel wires.
13
Chapter 3
The Proposed System
In this chapter, the data set which is used for training of the algorithm is explained, the
signals from the data set are presented in graphical form, and the need of dimensionality
reduction techniques are explained. Further, the proposed system is elaborated in a complete
step-wise manner.
3.1. The Dataset
The dataset for this research project is composed of n samples with m features, R = {(xi
u
, yi),
i = 1, 2, …..n}. Here, n = 240, m = 101 and u = 1, 2, ……m, xi is the input data while every row
in xi has a label yi ∈ {0, 1}. Note that 0 denotes safe signals, while 1 indicates crack signals.
Each row of xi represents a signal, while u indicates the features of every signal, which are the
Hall Effect values for each signal.
Figure 3 depicts plots of safe and crack signals from the dataset in two-dimensional graphs.
Figure 3(a) represents a single sample of a safe signal from the dataset. Figure 3(b) shows a
single sample of a crack signal in the dataset. Figure 3(c) shows all of the safe signals in the
dataset. Figure 3(d) depicts all the crack signals from the dataset.
14
Figure 3. (a) A safe signal; (b) a crack signal; (c) safe signals; and, (d) crack signals.
3.2. Dimensionality Reduction and its Importance
For data gathering, magnetic sensors were used and the dataset also contains many
replicated features. To reduce these replicated features, it is essential to apply a dimensionality
reduction technique. Dimensionality reduction is also important for finding meaningful low-
dimensional hidden structures in high dimensional data and to improve the performance of
classification algorithms. Dimensionality reduction techniques project higher dimensional
data to lower dimensional while retaining most of the relevant information. It helps in data
compression, and also reduces storage space. It makes much easier to analyze data for machine
learning algorithms. Dimensionality reduction is also helpful for visualization purpose.
Dimensionality reduction techniques are very useful in various domains, for example,
document categorization, protein disorder prediction, life sciences, computer vision based
tasks and machine learning models.
15
Dimensionality reduction can be expressed mathematically in the context of a given
dataset. Consider a dataset represented as a matrix X, where X is of size r x s, here r represents
the number of rows while s shows the number of columns of matrix X. Normally, the rows of
a matrix represent data points while the columns are considered as features. Dimensionality
reduction will reduce the number of features of each data point. It will turn the matrix X into
a new matrix X'
, the size of r x t, where t < s. For visualization purpose t is usually set to 1, 2
or 3.
Many algorithms have been developed that compute the dimensionality reduction of a
dataset [32]. Simpler algorithms such as Principal Component Analysis (PCA) produce the
best possible embedding by maximizing the variance in the data. State-of-the-art algorithms
can find an optimal combination of features so that the distances of a high dimensional space
are well-maintained in the new embedding. Dimensionality reduction is an active area of
research where researchers are developing new techniques in order to produce better
embeddings.
16
3.3. The Flowchart of the Proposed System
The overall flow of the proposed system is depicted in Figure 4.
Figure 4: Flowchart of the proposed system.
The flowchart in Figure 4 illustrates that for the dimensionality reduction of the data,
ISOMAP is applied, which is a manifold based global geometric framework mainly used for
non-linear dimensionality reduction. In recent years, numerous non-linear dimensionality
reduction algorithms have been developed for the purpose of addressing the limitations of the
traditional linear techniques [33, 34]. Many linear techniques for instance, PCA, classical
scaling, and factor analysis often fail to properly handle non-linear data [32, 34]. In contrast,
non-linear methods can perform well on higher dimensional non-linear data. In the last decade,
non-linear based dimensionality reduction techniques became very popular due to their superior
performance on high dimensional data when compared to linear techniques [34]. Maaten et al.
17
[34] applied twelve non-linear dimensionality reduction algorithms and PCA on ten datasets (05
artificial and 05 natural) and proved that most of the non-linear techniques performed better than
the linear one on high dimensional complex datasets. Jeong [35] showed that ISOMAP
outperforms PCA (a linear dimensionality reduction technique) on higher dimensional data.
At the end, random forest classification algorithm is applied on ISOMAP based reduced data
and training with random forest algorithm have been stopped once the out-of-bag (OOB) error
has reached less than 0.05 (5%). All of these steps and algorithms are explained with complete
details in the next sections.
3.4. ISOMAP
ISOMAP is a famous manifold based dimensionality reduction technique. A manifold is
any topological space, which is locally Euclidean. Typically, any item which is approximately
“flat” on trivial scales is a manifold. Manifold learning is a tactic to non-linear dimensionality
reduction. Manifold learning is referred to as uncovering the manifold structure in a data set,
where the data set usually lies along a low-dimensional manifold which is embedded in a high-
dimensional space, the lower dimensional space reveals the primary parameters while the
higher dimensional space represents the feature space [36]. Manifold learning is an important
problem in various data processing domains for instance, pattern recognition, machine learning,
data compression, and database navigation [33]. Manifold based algorithms works on the
notion that the dimensionality of a data set is merely artificially high [36].
ISOMAP is one of the first algorithms which is introduced specifically for manifold based
learning [36]. The manifold structure can be recovered very well with ISOMAP. ISOMAP is
an extension of Multidimensional Scaling (MDS), a classical method for embedding
dissimilatory information into a Euclidean space. In ISOMAP, global geometrical features of a
data set are preserved in the embedding space. The main concept of ISOMAP is to replace
Euclidean distances with an approximation of the geodesic distances on the manifold. Whereas,
18
geodesic distance can be described as the distance between any two points which is measured
on the manifold. For every point in ISOMAP, geodesic distances are estimated with the use of
shortest-path distances and then by using classical MDS, an embedding of these distances are
estimated in a Euclidean space. In low-dimensional space, ISOMAP attempts to map
neighboring points to neighboring points, and on the manifold it attempts to map faraway points
to faraway points.
ISOMAP is a three-step process, which are shown in figure 5.
Figure 5: Flowchart of the ISOMAP algorithm.
The three steps from the flowchart in figure 5 are further explained as:
1) Construct neighborhood graph; k-nearest neighbors of every data point are
defined and represented by a graph G; in G, every point is connected to its nearest
neighbors by edges.
2) Compute the shortest paths; the geodesic distances between all the pairs of
points are estimated using the Dijkstra algorithm [37]; the squares of these
distances are stored in a matrix of graph distances D(G).
3) Construct d-dimensional embedding; the classical MDS algorithm is applied on
D(G) in order to find a new low-dimensional embedding of the data in a d-
dimensional Euclidean space Y.
Recently, ISOMAP has been extended for handling conformal mappings and very large data
sets [36, 38].
19
3.4.1. Deciding the Number of Dimensions for the ISOMAPAlgorithm
Finding intrinsic dimensionality of a dimensionality reduction algorithm is very important.
Setting a lower dimensionality than the intrinsic dimensionality of a dimensionality reduction
algorithm can enhance the possibility of losing much important information. Whereas, on the
other hand by setting the dimensions higher than the intrinsic dimensionality can make the
dimensionality reduction algorithm very slow and dimensionality reduction algorithm can retain
many redundant and repeated features. Maaten et al. [34] mentioned that the classification
performance of non-linear dimensionality reduction techniques on many natural datasets was
not improved without utilizing the intrinsic dimensionality estimator properly. Therefore, it is
essential to find the intrinsic dimensions before using a dimensionality reduction technique. In
this case, the dataset is composed of 240 data items with 101 features. Firstly, we have to
determine the intrinsic number of dimensions of the ISOMAP space. For this purpose, the
residual variance [26, 39] has been calculated, which is typically used to evaluate the error of
dimensionality reduction. Residual variance is defined in Equation (1), as follows:
Rd = 1 – r2 (G, Dd) (1)
Where Rd is the residual variance, G shows the geodesic distance matrix, Dd represents the
Euclidean distance matrix in the d-dimensional space, and r(G, Dd) denotes the correlation
coefficient of G and Dd. The value of d is determined by a trial-and-error approach for reducing
the residual variance.
20
Figure 6: The residual variance of ISOMAP in the dataset.
Figure 6 clearly shows that in all cases the residual variance decreases as the number of
dimensions d is increased. It is recommended in [26, 39] to select the number of dimensions at
which this curve stops to decrease significantly with added dimensions. To evaluate the intrinsic
dimensionality of our data, we searched for the “elbow”, at which this curve is not decreasing
significantly with increasing dimensions. The arrow mark highlights the approximate
dimensions in the data set. If the dimensions are increased and it exhibit some residual variance
of the increased dimensions, then the data can be better explained by ISOMAP and the
classification algorithm will perform better based on the residual variance. For example, in this
dataset for a number of dimensions equal to 50 having a residual variance equal to 0.03, the
performance is slightly better than for a number of dimensions equal to 8. We set eight
dimensions for the classification algorithm because of the computational cost of both ISOMAP
and the random forest. We had already achieved better performance with eight dimensions.
However, the performance is not getting improved with increasing dimensions once the residual
variance becomes zero.
21
3.5. Random Forest
ISOMAP reduces the dataset into eight dimensions on which a random forest classification
algorithm is applied for the purpose of training features in the data. The random forest is an
ensemble method, and generally, ensemble methods perform better than the single classifiers.
Ensemble methods are built from a set of classifiers and then use the weightage of their
predictions to render the final output [40]. These methods employ more than one classification
technique and then combine their results. They are notably less prone to overfitting. Recently,
various ensemble methods have been proposed, in which boosting [41] and bagging [42] of
classification trees are most famous methods. In boosting based techniques, consecutive trees
assign extra weightage to points inaccurately predicted by previous predictors. At last, a weighted
vote is taken for the prediction purpose. In boosting, the performance of a tree is dependent on
the performance of the preceding trees. In bagging based techniques, each tree is grown
independently with the use of a bootstrap sample from the dataset and consecutive trees do not
relay on previous trees. In bagging techniques, a majority vote is also taken at the end for
prediction. In regular trees, splitting of each node is performed among all variables with the use
of the best split method. In the random forest method, every node is split amongst the subset of
predictors which are randomly picked at that node with the use of the best split method [28].
Random forest is considered as an enhanced version of bagging. Random forest is a widely used
technique in various recent research projects and real-world applications [43]. Random forest has
successfully been applied in land cover classification [44], Bioinformatics [45], Pattern
Recognition [46], Ecology [47], Medicine [48], Astronomy [49] and much more.
To demonstrate random forest algorithm and its voting mechanism, a simple dataset is taken,
which is shown in Table 2. Table 2 contains eight samples with four features. A random forest is
formed for the purpose of prediction of the value of the Play feature, it will predict that whether
a child is able to play or not, other features for these samples are, Outlook, HWDone (Homework
Done), and Weekend. From the dataset in Table 2, it is clear that if the child had completed his
22
homework on a sunny day then the child can play whether it is a weekday or weekend. Further,
on a rainy weekday, the child cannot play even if the child had finished the homework.
Table 2: Sample dataset for illustration of random forest algorithm.
Outlook HW Done Weekend Play
Sunny True True Yes
Sunny True False Yes
Sunny False True Yes
Sunny False False No
Rainy True True Yes
Rainy True False No
Rainy False True Yes
Rainy False False No
In order to classify new samples, a random forest of three decision trees is formed which is
illustrated in Figure 7. Table 3 displays the outcome of vote casting for classifying the samples.
From Table 3, it is clear that the votes of decision tree A and tree C are for “Yes”, while the vote
of decision tree B is for “No”. Therefore, from the majority voting mechanism, the winning vote
is “Yes” so that in this case the child can play.
23
Figure 7: Random forest of three decision trees.
Table 3: Illustration of vote casting mechanism in random forest.
Decision Tree Vote
A Yes
B No
C Yes
This demonstration of random forest algorithm along with the dataset and Figure 7 were adopted
from [50].
Furthermore, the random forest algorithm consists of three steps, which are:
1) Construct ntrees bootstrap samples from the input data while using the
Classification And Regression Trees (CART) [51] methodology.
24
2) Grow an unpruned tree for each of the ntrees, randomly sample mtry of the
predictors, and choose the best split among those variables.
3) Aggregate the predictions of the ntrees trees for the prediction of new data. In this
study, scikit-learn [31] based method is applied which combines classifiers by
averaging their probabilistic prediction.
In random forest technique, the generalization error in a forest is reliant on the strength of the
distinct trees and the correlation between all of the trees in a forest [27]. Breiman [27] applied
random forest and Adaboost [52] (a tree based ensemble method which works on boosting
principles), on 13 different datasets and stated that the random forest did not overfit during
training, while in Adaboost the problem of overfitting occurred in many cases. Breiman [27]
further claimed that the random forest worked well with categorized variables and also performed
well on data with weak variables. The accuracy of the random forest model was notably good as
that of the Adaboost and occasionally superior. Random forest worked better on noisy data as
compared to Adaboost [27]. Random forest can also perform better with a very huge amount of
input variables. Random forest is very simple and can be very easily parallelized so that several
random forests can be executed on different machines, and then the votes can be combined for
the purpose of getting the final outcome [28].
Choosing the number of trees in a random forest is an open question. Breiman [27] mentioned
that, the greater the number of trees, the better the performance of the random forest. However,
finding the optimal number of trees for the algorithm is very difficult. Oshiro et al. [43] applied
random forest algorithm on 29 different types of datasets with varying number of trees L in
exponential rates using base two, which is L = 2j
, j = 1, 2, . . . , 12. They concluded that sometimes
a larger number of trees in the forest only increases its computational cost, having no significant
impact on its performance. In this study, the same method as that of Oshiro et al. [43] is applied
for deciding the number of trees in the random forest classifier. The number of trees and their
performance on the dataset is reported in Table 4.
25
Table 4: Performance evaluation with different number of trees.
No of
Trees
2 4 8 16 32 64 128 256 512
Accuracy 0.93 0.94 0.97 0.97 0.97 0.97 0.97 0.98 0.98
Table 4 clearly depicts that using 8, 16, 32, 64, or 128 as the number of trees in a random
forest on the dataset resulted in the same performance. Therefore, the number of trees have been
set to eight. We did not set a large number of trees, as this would increase the computational cost
without any significance.
For deciding the number of random samples in random forest many methods have been
proposed. However, the common term applied is √ 𝑝, where p is the number of predictors. Here,
the dataset for random forest algorithm has eight features, therefore, we choose mtry = 3.
For training a machine-learning algorithm, it is essential to know when to stop the training
of an algorithm. It is very important to know that whether the algorithm has learnt enough and
whether it is in the position to put and test on unseen data or still need some more training. In
random forest for such purpose the OOB (out-of-bag) error estimate is used. The OOB error
check the performance of a random forest model. This error is calculated by averaging only
those trees corresponding to bootstrap samples that render wrong predictions [27, 53]. In this
study, the training of the model has been stopped when the OOB error reduced below 5%, which
is a quite good fit and shows that more than 95% of the decision trees are rendering correct
predications. As random training is applied to a random forest, therefore, the model is executed
100 times and then the mean OOB error has been calculated from all the OOB error values.
26
Chapter 4
Results and Discussion
In this chapter, the results originated from the proposed system on the available data are
presented. The results are elaborated with different performance metrics. Further, the random
forest algorithm is compared with decision trees and SVM on our data.
4.1. Performance Evaluation of Our Classifier
For training of the classification algorithm, 210 data items were manually chosen while 30
data items were also manually selected for testing purpose. The structural safety labels were
detached and were not exposed to the algorithm for evaluation purposes. The training set was
composed of 184 safe signals and 26 cracked signals while the test set contained 27 safe signals
and 3 cracked signals. The distribution of the data items were performed in such way for the
purpose of getting the algorithm enough trained with the cracked signals. The predicted results
from the algorithm were then compared with the labels. The accuracy was calculated as the ratio
between accurate predicted instances and the total number of examined instances, as defined in
Equation (2). As the random forest algorithm renders different results on different executions,
therefore, it was executed 100 times. The accuracy, defined as the mean value of all the 100
calculated accuracies, in this case the accuracy was 97%. Further, the dataset consisted of 211
safe signals and 29 crack signals. Therefore, it is an imbalanced dataset. It is likely that the
algorithm will only predict safe signals and calculate better accuracy on the basis of the result.
In such cases, accuracy alone is not enough for determining the performance of an algorithm.
The accuracy mostly biases to the majority class data, and presents several other weaknesses,
such as less distinctiveness, discriminability, and informativeness. Further, it is possible for a
classifier to perform well on one metric while being suboptimal on other metrics. Moreover,
different performance metrics measure different tradeoffs in the predictions made by a classifier.
27
Therefore, it is essential to assess algorithms on a set of performance metrics [54]. In this case,
it is not good to relay only on accuracy. In order to discriminate accurately and to select an
optimal solution for the model, the confusion matrix, precision, recall [55], and F-measure [55,
56] have also been calculated. The confusion matrix is widely used for describing the
performance of a classifier, because it shows the ways in which a classification model is
confused during predication. The confusion matrix is composed of rows and columns
corresponding to the classification labels. The rows of the table represent the actual class, while
the columns represent the predicted class. The main diagonal elements in the confusion matrix
are TP and TN, which denote correctly classified instances, while other elements (FP and FN)
denote incorrectly classified instances. Here, TP means true positive, which is when the
algorithm correctly predicts the positive class, while TN shows true negative, which is when the
algorithm correctly predicts the negative class. Further, FP indicates false positive, which is
when the algorithm fails to predict the negative class correctly. Finally, FN represents false
negative, which is when the algorithm fails to predict the positive class correctly. In this data,
the safe signals are denoted as a positive class, while the crack signals are represented as a
negative class. The confusion matrix of random forest algorithm on the test data is presented in
Table 5.
Table 5: Confusion matrix of the proposed system.
Predicated Safe
Signals
Predicated Crack
Signals
Actual Safe
Signals
TP = 26 FN = 1
Actual Crack
Signals
FP = 0 TN = 3
Note that here total test points are equal to 30, and the total number of correctly classified
instances is calculated as 26 + 3 = 29, while the total number of incorrectly classified instances
is given by 0 + 1 = 1.
28
The confusion matrix clearly shows that there is only one mistake made by the algorithm,
namely a safe signal is predicted as crack signal. On the basis of the confusion matrix, the
result is quite good. However, an analysis exclusively based on the confusion matrix is not
sufficient when evaluating the performance of an algorithm. In the case of imbalanced classes,
precision, and recall constitute a useful measure of success of prediction. Precision and recall
are commonly used in information retrieval for the evaluation of retrieval performance [57].
Precision is applied to measure the correctly predicted positive patterns from the total
predicted patterns in a positive class, whereas recall is used to find the effectiveness of a
classifier in identifying positive patterns. For full evaluation of the effectiveness of a model,
examining both precision and recall is necessary. High precision represents a low false positive
rate, while high recall represents a low false negative rate. Accuracy A, precision P, and recall
R are defined in Equation (2).
A =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁
, P =
𝑇𝑃
𝑇𝑃+𝐹𝑃
and R =
𝑇𝑃
𝑇𝑃+𝐹𝑁
(2)
The precision of the classifier on the test data is 97%, while the recall is also 97%. These
both are very high values for the test data. However, precision and recall are still not sufficient
to select an optimal solution or algorithm. To find a balance between both precision and recall,
F-measure is used. For computing score, the F-measure considers both precision and recall.
Moreover, F-measure indicates how many instances the classifier classifies correctly without
missing a significant number of instances. The greater the F-measure, the better the performance
of the model. F-measure is defined as the harmonic mean of precision and recall, as expressed
in Equation (3).
F-measure =
2
1/𝑃+1/𝑅
(3)
29
The F-measure of the algorithm is 97%. Sometimes, precision, recall and F-measure also fails
to find an optimal classifier. Because while calculating the score, all of these performance
measures do not balance positive and negative classes appropriately. In such situations,
Receiver Operating Characteristics (ROC) analysis are preferred which compare True Positive
Rate (TPR) with False Positive Rate (FPR) for a given classification algorithm. ROC graphs
are very helpful in order to organize a classifier better and to visualize its performance [55].
ROC graphs are widely being applied for medical decision making problems and now it is also
increasingly being used in data mining and machine learning [55]. ROC graphs are represented
as two-dimensional graphs in which FPR is plotted on the X axis while TPR is plotted on the Y
axis. TPR can also be referred to as hit rate or recall, while FPR is known as false alarm rate of
a classifier. TPR measures the fraction of correctly labeled positive instances while FPR
measure the fraction of negative instances which are incorrectly predicted as positives. TPR
and FPR can be defined as:
TPR =
𝑇𝑃
𝑇𝑃+𝐹𝑁
and FPR =
𝐹𝑃
𝐹𝑃+𝑇𝑁
(4)
From equation (4), it is clear that the TPR is the same as that of recall, ROC graphs shows
better balance between TP and FP. The lower left point (0,0) in the bottom left hand corner
shows that such classifier do not commit any FP errors and also gains no TP. While (1,1) in the
top right hand corner shows that the classifier has all the TP and all FP errors. The point (0,1)
in top left hand corner denotes a perfect classifier because in such case the FPR is 0 and TPR
are 100%. Whereas, the point (1,0) in the bottom right hand corner represent a worst classifier
in which the FPR is 100% while TPR is 0. In ROC graphs, AUC (area under the ROC curve)
is also used which measures the entire two-dimensional area underneath the whole ROC curve.
AUC can be expressed as the probability that a randomly picked positive instance is ranked
more highly than a randomly selected negative instance. AUC measures how well predications
are ranked and it measure the quality of the model’s prediction. AUC summarizes the
30
performance of a classifier in a single number. The greater the value of the AUC, the better will
be the performance of the classifier. The blue block in Figure 8 depicts the ROC curve of our
random forest model while the whole area underneath the blue block shows the AUC of the
random forest model.
Figure 8: ROC curve of random forest algorithm.
In ROC curve, the main goal is to be in the upper left hand corner. When one looks at the
ROC curve in Figure 8, it can easily be seen that it is properly close to the optimal. The AUC
score is 0.99, which demonstrates quite improved result.
The performance of the algorithm can now be measured in terms of classification accuracy,
confusion matrix, precision, recall, F-measure and ROC curves. In this study, all of them are
calculated for the evaluation of the classifier, and the results demonstrate that the classifier is
very reliable.
31
4.2. Comparison of Random Forest with SVM and Decision Trees on
Our Data
It is a common practice in machine learning to apply different machine-learning models on
a dataset and to check the performance of those models on test data. As a result, choose that
model which performs better on test data. Therefore, to evaluate and compare the results of the
random forest algorithm with other machine learning algorithms on the ISOMAP-based reduced
data, SVM and decision trees are also applied. SVM attempts to find the optimal separating
hyperplane between objects of different classes. Further, SVM is the most suitable for binary
classification but can also be configured for multi-classification tasks. Decision trees are applied
for both classification and regression tasks. In decision trees, each node represents a feature,
each link represents a decision, and each leaf represents an outcome. The root node is defined
as the attribute that best classifies the training data. This process is repeated for each branch of
the tree. In this study, SVM classifier with different types of kernels including linear, polynomial,
sigmoid and Radial Basis Function (RBF) were applied. The accuracy of SVM with linear kernel
was 86%, while with polynomial the accuracy was 93%, with sigmoid it was 90% and the
accuracy with RBF kernel was also 90%. Therefore, the one with the greatest accuracy which is
polynomial here were taken to compare with the random forest on different performance
evaluators. For decision trees, unpruned decision trees with entropy index-based and with gini
index-based with the best split method were implemented. The accuracy of entropy based
decision tree was 93% while the accuracy of the one with the gini index was 96%. Here, also
the gini index-based tree were chosen to compare with random forest model.
Using SVM and decision trees, the greatest achieved accuracies were 93% and 96%
respectively, the other performance measures of SVM and decision trees are shown in Figure
11.
The ROC curves of SVM and decision trees on test data are also drawn which are displayed
in Figure 9 and Figure 10 respectively. The ROC’s clearly shows that SVM is able to balance
32
negative and positive classes better as compare to decision tree but when both ROC’s are
compared with ROC of the random forest model then the random forest is the clear winner.
Figure 9: ROC curve of SVM.
Figure 10: ROC curve of decision tree.
33
Further, a complete comparison of all these algorithms on the test data is depicted in Figure 11.
Figure 11: Performance measure graph of different algorithms.
Figure 11 clearly shows that the random forest outperform SVM and decision trees. The
precision and recall of random forest and decision trees are almost the same, but the accuracy,
F-measure and AUC score of random forest are superior amongst all.
Furthermore, random forest is composed of many decision trees. In a random forest, several
decision trees are combined and every decision tree votes for their respective class. Therefore,
it is obvious that the random forest will outperform a decision tree because in decision trees the
result is taken only on the basis of a single tree. However, the comparison between random forest
and SVM is very difficult and deciding that one algorithm outperforms another one is a hectic
task. Liaw et al. [28] stated that random forest outperforms SVM, a decision tree and many other
machine learning algorithms. Random forest is an ensemble method and Dietterich et al. [40]
mentioned that ensemble based methods works well when compared with a single classifier.
Further, Gislason et al., Qi et al., and Jia et al. [44, 45, 58] applied random forest on different
34
kinds of datasets and compared it with SVM and concluded that on some datasets random forest
works better while in some cases, the performance of SVM was better than the random forest.
35
Chapter 5
Conclusions
In this study, a structural safety assessment method for reinforced concrete utility poles with
the use of ISOMAP and random forest is proposed. The proposed system is able to identify the
condition of wires inside reinforced concrete poles. It is also an easy and inexpensive system to
adopt, because only a few devices are needed for carrying out all the steps of the proposed system.
ISOMAP is used for data reduction and then a random forest classifier is applied for classification
purposes. The random forest algorithm outperformed other machine learning algorithms (such as
SVM and decision trees) on the data set. The performance of the system was evaluated with
different performance measures including accuracy, precision, recall, F-measure and ROC curves.
In the very first attempt, the achieved performance was quite better.
At the moment, there was a limited number of trained data items from field experiments.
Therefore, machine learning techniques were opted rather than deep learning methods. In future,
more experimental data can be received from the field engineers and then deep learning methods
can be applied for accomplishing better performance with higher reliability.
Another possible work will be to transform the existing time domain signals into frequency
domain through some transformation and filtering methods. Various methods such as Fourier
transform or time-dependent convolutional methods including low-pass and high-pass filters will
be applied. At the end, dimensionality reduction and machine learning algorithms will be applied
on the transformed data. Further, in this study the random forest algorithm is only compared with
two other machine learning algorithms. In future, our random forest model will be compared with
more up-to-the-minute ensemble based methods on the same dataset and the performance of the
classifier can also be evaluated with more performance measures.
36
References
1. Kliukas, R., Daniunas, A., Gribniak, V., Lukoseviciene, O., Vanagas, E., & Patapavicius, A.
(2018), ‘Half a century of reinforced concrete electric poles maintenance: Inspection, field-
testing, and performance assessment’, Structure and Infrastructure Engineering, 14(9),
1221-1232.
2. Baraneedaran, S., Gad, E. F., Flatley, I., Kamiran, A., & Wilson, J. L. (2009), ‘Review of in-
service assessment of timber poles’, Proceedings of the Australian Earthquake Engineering
Society, Newcastle, Australia.
3. Miszczyk, A., Szocinski, M., & Darowicki, K. (2016), ‘Restoration and preservation of the
reinforced concrete poles of fence at the former Auschwitz concentration and extermination
camp’, Case Studies in Construction Materials, 4, 42-48.
4. Cairns, J., Plizzari, G. A., Du, Y., Law, D. W., & Franzoni, C. (2005), ‘Mechanical properties
of corrosion-damaged reinforcement’, ACI Materials Journal, 102(4), 256.
5. Stewart, M. G. (2012), ‘Spatial and time-dependent reliability modelling of corrosion
damage, safety and maintenance for reinforced concrete structures’, Structure and
Infrastructure Engineering, 8(6), 607-619.
6. Ying, L., & Vrouwenvelder, A. C. W. M. (2007), ‘Service life prediction and repair of
concrete structures with spatial variability’, Heron, 52 (4).
7. Doukas, H., Karakosta, C., Flamos, A., & Psarras, J. (2011), ‘Electric power transmission:
An overview of associated burdens’, International journal of energy research, 35(11), 979-
988.
8. Val, D. V., & Stewart, M. G. (2009), ‘Reliability assessment of ageing reinforced concrete
structures—current situation and future challenges’, Structural Engineering
International, 19(2), 211-219.
9. No, T. C. S. (2002), ‘Guidebook on non-destructive testing of concrete structures’, Vienna:
International Atomic Energy Agency (IAEA).
37
10. Breysse D. (2012), ‘Non destructive assessment of concrete structures: Reliability and
Limits of Single and Combined Techniques’, Dordrecht: RILEM State of the Art Reports, vol 1.
Springer.
11. Helal, J., Sofi, M., & Mendis, P. (2015), ‘Non-destructive testing of concrete: A review of
methods’, Electronic Journal of Structural Engineering, 14(1), 97-105.
12. Davis, A. G., Ansari, F., Gaynor, R. D., Lozen, K. M., Rowe, T. J., Caratin, H., & Hertlein,
B. H. (1998), ‘Nondestructive test methods for evaluation of concrete in structures’, American
Concrete Institute, ACI, 228.
13. Hermanek, P., & Carmignato, S. (2016), ‘Reference object for evaluating the accuracy of
porosity measurements by X-ray computed tomography’, Case studies in nondestructive
testing and evaluation, 6, 122-127.
14. Makar, J., & Desnoyers, R. (2001), ‘Magnetic field techniques for the inspection of steel
under concrete cover’, NDT & e International, 34(7), 445-456.
15. Milovanović, B., & Banjad Pečur, I. (2013), ‘Detecting defects in reinforced concrete using
the method of infrared thermography’, HDKBR INFO Magazin, 3(3), 3-13.
16. Akhtar, S. (2013), ‘Review of nondestructive testing methods for condition monitoring of
concrete structures’, Journal of construction engineering.
17. Sannikov, D. V., Kolevatov, A. S., Vavilov, V. P., & Kuimova, M. V. (2018), ‘Evaluating
the Quality of Reinforced Concrete Electric Railway Poles by Thermal Nondestructive
Testing’, Applied Sciences, 8(2), 222.
18. Milovanović, B., & Banjad Pečur, I. (2016), ‘Review of active IR thermography for
detection and characterization of defects in reinforced concrete’, Journal of Imaging, 2(2), 11.
19. Szymanik, B., Frankowski, P. K., Chady, T., & John Chelliah, C. R. A. (2016), ‘Detection
and inspection of steel bars in reinforced concrete structures using active infrared thermography
with microwave excitation and eddy current sensors’, Sensors, 16(2), 234.
20. Zhang, J. K., Yan, W., & Cui, D. M. (2016), ‘Concrete condition assessment using impact-
echo method and extreme learning machines’, Sensors, 16(4), 447.
38
21. Cui, D. M., Yan, W., Wang, X. Q., & Lu, L. M. (2017), ‘Towards intelligent interpretation
of low strain pile integrity testing results using machine learning techniques’, Sensors, 17(11),
2443.
22. Dackermann, U., Skinner, B., & Li, J. (2014), ‘Guided wave–based condition assessment of
in situ timber utility poles using machine learning algorithms’, Structural Health
Monitoring, 13(4), 374-388.
23. Dackermann, U., Yu, Y., Niederleithinger, E., Li, J., & Wiggenhauser, H. (2017), ‘Condition
Assessment of Foundation Piles and Utility Poles Based on Guided Wave Propagation Using a
Network of Tactile Transducers and Support Vector Machines’, Sensors, 17(12), 2938.
24. Dackermann, U., Yu, Y., Li, J., Niederleithinger, E., & Wiggenhauser, H. (2015,
September), ‘A new non-destructive testing system based on narrow-band frequency excitation
for the condition assessment of pole structures using frequency response functions and principle
component analysis’, In Proceedings of the International Symposium Non-Destructive Testing
in Civil Engineering, Berlin, Germany, 15-17.
25. SMART C&S. Available online: http://smartcs.co.kr/
26. Tenenbaum, J. B., De Silva, V., & Langford, J. C. (2000), ‘A global geometric framework
for nonlinear dimensionality reduction’, science, 290(5500), 2319-2323.
27. Breiman, L. (2001), ‘Random forests’, Machine learning, 45(1), 5-32.
28. Liaw, A., & Wiener, M. (2002), ‘Classification and regression by randomForest’, R
news, 2(3), 18-22.
29. Osuna, E., Freund, R., & Girosi, F. (1997), ‘Support vector machines: Training and
applications’, Cambridge, MA, USA: Massachusetts Institute of Technology (MIT).
30. Quinlan, J. R. (1986), ‘Induction of decision trees’, Machine learning, 1(1), 81-106.
31. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... &
Vanderplas, J. (2011), ‘Scikit-learn: Machine learning in Python’, Journal of machine learning
research, 12(Oct), 2825-2830.
32. Shi, H., Yin, B., Kang, Y., Shao, C., & Gui, J. (2017), ‘Robust L-Isomap with a Novel
39
Landmark Selection Method’, Mathematical Problems in Engineering, 2017.
33. Ghodsi, A. (2006), ‘Dimensionality reduction a short tutorial’, Department of Statistics and
Actuarial Science, Univ. of Waterloo, Ontario, Canada, 37, 38.
34. Van Der Maaten, L., Postma, E., & Van den Herik, J. (2009), ‘Dimensionality reduction: a
comparative review’, J Mach Learn Res, 10, 66-71.
35. Jeong, M., Choi, J. H., & Koh, B. H. (2014), ‘Isomap‐based damage classification of
cantilevered beam using modal frequency changes’, Structural Control and Health
Monitoring, 21(4), 590-602.
36. Cayton, L. (2005), ‘Algorithms for manifold learning’, Univ. of California at San Diego
Tech. Rep, 12(1-17), 1.
37. Cormen, T. H., Leiserson, C. E., & Rivest, R. L., Clifford stein (2001), ‘Introduction to
algorithms’, Cambridge, MA, USA: MIT Press.
38. Silva, V. D., & Tenenbaum, J. B. (2003), ‘Global versus local methods in nonlinear
dimensionality reduction’, In Advances in neural information processing systems, 721-728.
39. Liang, Y. M., Shih, S. W., Shih, A. C. C., Liao, H. Y. M., & Lin, C. C. (2009, May),
‘Unsupervised analysis of human behavior based on manifold learning’, In International
Symposium on Circuits and Systems (ISCAS), Taipei, Taiwan, 2605-2608.
40. Dietterich, T. G. (2000, June), ‘Ensemble methods in machine learning’, In International
workshop on multiple classifier systems, Springer, Berlin, Heidelberg, 1-15.
41. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998), ‘Boosting the margin: A new
explanation for the effectiveness of voting methods’, The annals of statistics, 26(5), 1651-1686.
42. Breiman, L. (1996), ‘Bagging predictors’, Machine learning, 24(2), 123-140.
43. Oshiro, T. M., Perez, P. S., & Baranauskas, J. A. (2012, July), ‘How many trees in a random
forest?’, In International Workshop on Machine Learning and Data Mining in Pattern
Recognition, Springer, Berlin, Heidelberg, 154-168.
44. Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006), ‘Random forests for land
cover classification’, Pattern Recognition Letters, 27(4), 294-300.
40
45. Boulesteix, A. L., Janitza, S., Kruppa, J., & König, I. R. (2012), ‘Overview of random forest
methodology and practical guidance with emphasis on computational biology and
bioinformatics’, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6),
493-507.
46. Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., & Torr, P. H. (2008, June), ‘Randomized
trees for human pose detection’, In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 1-8.
47. Cutler, D. R., Edwards Jr, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J.
J. (2007), ‘Random forests for classification in ecology’, Ecology, 88(11), 2783-2792.
48. Klassen, M., Cummings, M., & Saldana, G. (2008), ‘Investigation of Random Forest
Performance with Cancer Microarray Data’, In proceedings of the ISCA 23rd International
Conference on Computers and Their Applications (CATA), Cancun, Mexico, 64-69.
49. Gao, D., Zhang, Y. X., & Zhao, Y. H. (2009), ‘Random forest algorithm for classification of
multiwavelength data’, Research in Astronomy and Astrophysics, 9(2), 220.
50. Fawagreh, K., Gaber, M. M., & Elyan, E. (2014), ‘Random forests: from early developments
to recent advancements’, Systems Science & Control Engineering: An Open Access Journal, 2(1),
602-609.
51. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984), ‘Classification and
Regression Trees’, Belmont, California, USA: Wadsworth Int. Group.
52. Freund, Y., & Schapire, R. E. (1996, July), ‘Experiments with a new boosting algorithm’,
Machine Learning: Proceedings of the Thirteenth International Conference, 148–156.
53. Hastie, T., Tibshirani, R., & Friedman, J. (2001), ‘The elements of statistical learning’, New
York, NY, USA, Springer.
54. Caruana, R., & Niculescu-Mizil, A. (2006, June), ‘An empirical comparison of supervised
learning algorithms’, In Proceedings of the 23rd International Conference on Machine learning,
Pittsburgh, PA, USA, 161-168.
55. Fawcett, T. (2006), ‘An introduction to ROC analysis’, Pattern recognition letters, 27(8), 861-
41
874.
56. Sasaki, Y. (2007), ‘The truth of the F-measure’, Teach Tutor mater, 1(5), 1-5.
57. Lewis, D. D. (1991), ‘Evaluating text categorization’, In Speech and Natural Language:
Proceedings of a Workshop Held at Pacific Grove, California.
58. Jia, S., Hu, X., & Sun, L. (2013), ‘The comparison between random forest and support vector
machine algorithm for predicting β-hairpin motifs in proteins’, Engineering, 5(10), 391.

More Related Content

Similar to A comprehensive study on nondestructive inspection of reinforced concrete utility poles with isomap and random forest

20141125_074915
20141125_07491520141125_074915
20141125_074915
Sam Ihtfp
 
A Review on Detection of Cracks Present in Composite Cantilever Beam by using...
A Review on Detection of Cracks Present in Composite Cantilever Beam by using...A Review on Detection of Cracks Present in Composite Cantilever Beam by using...
A Review on Detection of Cracks Present in Composite Cantilever Beam by using...
ijtsrd
 
Importance of ndt
Importance of ndtImportance of ndt
Importance of ndt
Muhammad Ans
 

Similar to A comprehensive study on nondestructive inspection of reinforced concrete utility poles with isomap and random forest (20)

IRJET - To Determine the Strength of Existing Structure through NDT Testi...
IRJET -  	  To Determine the Strength of Existing Structure through NDT Testi...IRJET -  	  To Determine the Strength of Existing Structure through NDT Testi...
IRJET - To Determine the Strength of Existing Structure through NDT Testi...
 
J04402063069
J04402063069J04402063069
J04402063069
 
IRJET- Use of Non Destructive Techniques to Analyze Fresh and Hardened State ...
IRJET- Use of Non Destructive Techniques to Analyze Fresh and Hardened State ...IRJET- Use of Non Destructive Techniques to Analyze Fresh and Hardened State ...
IRJET- Use of Non Destructive Techniques to Analyze Fresh and Hardened State ...
 
Damage Identification of Bridge Model by using Natural Frequency
Damage Identification of Bridge Model by using Natural FrequencyDamage Identification of Bridge Model by using Natural Frequency
Damage Identification of Bridge Model by using Natural Frequency
 
IRJET- Investigation of Damage Level and Study on Load Deflection Charact...
IRJET-  	  Investigation of Damage Level and Study on Load Deflection Charact...IRJET-  	  Investigation of Damage Level and Study on Load Deflection Charact...
IRJET- Investigation of Damage Level and Study on Load Deflection Charact...
 
IRJET- To Study and Analyze the Various Structures by the Structural Audit
IRJET- To Study and Analyze the Various Structures by the Structural AuditIRJET- To Study and Analyze the Various Structures by the Structural Audit
IRJET- To Study and Analyze the Various Structures by the Structural Audit
 
Condition Estimation through UPV
Condition Estimation through UPVCondition Estimation through UPV
Condition Estimation through UPV
 
20141125_074915
20141125_07491520141125_074915
20141125_074915
 
A Review on Detection of Cracks Present in Composite Cantilever Beam by using...
A Review on Detection of Cracks Present in Composite Cantilever Beam by using...A Review on Detection of Cracks Present in Composite Cantilever Beam by using...
A Review on Detection of Cracks Present in Composite Cantilever Beam by using...
 
IRJET- Structural Audit and Rehabilitation of Building
IRJET-  	  Structural Audit and Rehabilitation of BuildingIRJET-  	  Structural Audit and Rehabilitation of Building
IRJET- Structural Audit and Rehabilitation of Building
 
IRJET- Structural Audit and Rehabilitation of Building
IRJET- Structural Audit and Rehabilitation of BuildingIRJET- Structural Audit and Rehabilitation of Building
IRJET- Structural Audit and Rehabilitation of Building
 
Tool makers microscope
Tool makers microscopeTool makers microscope
Tool makers microscope
 
USING FINITE ELEMENT METHOD FOR ANALYSIS OF SINGLE AND MULTICELL BOX GIRDER B...
USING FINITE ELEMENT METHOD FOR ANALYSIS OF SINGLE AND MULTICELL BOX GIRDER B...USING FINITE ELEMENT METHOD FOR ANALYSIS OF SINGLE AND MULTICELL BOX GIRDER B...
USING FINITE ELEMENT METHOD FOR ANALYSIS OF SINGLE AND MULTICELL BOX GIRDER B...
 
K04402070075
K04402070075K04402070075
K04402070075
 
Importance of ndt
Importance of ndtImportance of ndt
Importance of ndt
 
A Review on Recent Trends in Non Destructive Testing Applications
A Review on Recent Trends in Non Destructive Testing  ApplicationsA Review on Recent Trends in Non Destructive Testing  Applications
A Review on Recent Trends in Non Destructive Testing Applications
 
IRJET- A Review of Behaviour of Building under Earthquake with or without...
IRJET-  	  A Review of Behaviour of Building under Earthquake with or without...IRJET-  	  A Review of Behaviour of Building under Earthquake with or without...
IRJET- A Review of Behaviour of Building under Earthquake with or without...
 
IRJET- Seismic Analysis of Plan Regular and Irregular Buildings
IRJET- Seismic Analysis of Plan Regular and Irregular BuildingsIRJET- Seismic Analysis of Plan Regular and Irregular Buildings
IRJET- Seismic Analysis of Plan Regular and Irregular Buildings
 
Fibre optic sensors and smart composites for
Fibre optic sensors and smart composites forFibre optic sensors and smart composites for
Fibre optic sensors and smart composites for
 
Study on Effect of Crack Inclination and Location on Natural Frequency for In...
Study on Effect of Crack Inclination and Location on Natural Frequency for In...Study on Effect of Crack Inclination and Location on Natural Frequency for In...
Study on Effect of Crack Inclination and Location on Natural Frequency for In...
 

More from Saeed Ullah (9)

downsampling and upsampling of an image using pyramids (pyr up and pyrdown me...
downsampling and upsampling of an image using pyramids (pyr up and pyrdown me...downsampling and upsampling of an image using pyramids (pyr up and pyrdown me...
downsampling and upsampling of an image using pyramids (pyr up and pyrdown me...
 
dilating and eroding in open cv
dilating and eroding in open cvdilating and eroding in open cv
dilating and eroding in open cv
 
line and circle detection using hough transform
line and circle detection using hough transformline and circle detection using hough transform
line and circle detection using hough transform
 
aip shape detection and tracking using contours
aip shape detection and tracking using contoursaip shape detection and tracking using contours
aip shape detection and tracking using contours
 
aip edge detection using sobel and canny methods
aip edge detection using sobel and canny methodsaip edge detection using sobel and canny methods
aip edge detection using sobel and canny methods
 
captcha formation with warping and random number generation
captcha formation with warping and random number generationcaptcha formation with warping and random number generation
captcha formation with warping and random number generation
 
histogram equalization of grayscale and color image
histogram equalization of grayscale and color imagehistogram equalization of grayscale and color image
histogram equalization of grayscale and color image
 
aip basic open cv example
aip basic open cv exampleaip basic open cv example
aip basic open cv example
 
Online Transaction Processing (OLTP) System for Dir Maidan Palace (Saeed BS p...
Online Transaction Processing (OLTP) System for Dir Maidan Palace (Saeed BS p...Online Transaction Processing (OLTP) System for Dir Maidan Palace (Saeed BS p...
Online Transaction Processing (OLTP) System for Dir Maidan Palace (Saeed BS p...
 

Recently uploaded

Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
IJECEIAES
 

Recently uploaded (20)

Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 

A comprehensive study on nondestructive inspection of reinforced concrete utility poles with isomap and random forest

  • 1. 저작자표시-비영리-변경금지 2.0 대한민국 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게 l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다. 다음과 같은 조건을 따라야 합니다: l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다. l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다. 저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다. 이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다. Disclaimer 저작자표시. 귀하는 원저작자를 표시하여야 합니다. 비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다. 변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.
  • 2. Master Thesis A Comprehensive Study on Nondestructive Inspection of Reinforced Concrete Utility Poles with ISOMAP and Random Forest Saeed Ullah S&T Information Science UNIVERSITY OF SCIENCE AND TECHNOLOGY February 2019
  • 3. ii A Comprehensive Study on Nondestructive Inspection of Reinforced Concrete Utility Poles with ISOMAP and Random Forest Saeed Ullah A Dissertation Submitted in Partial Fulfillment of Requirements For the Degree of the Master of Engineering February 2019 UNIVERSITY OF SCIENCE AND TECHNOLOGY S&T Information Science Supervisor: Minjoong Jeong
  • 4. iii We hereby approve the M.S. thesis of “Saeed Ullah”. February 2019 UNIVERSITY OF SCIENCE AND TECHNOLOGY
  • 5. iv ACKNOWLEDGEMENT The gratification that accompanies the successful accomplishment of this thesis would not be completed without the mention of those people who made it possible and whose motivation, encouragement and guidance has been a vital source of inspiration throughout the course of my master’s research. I would like to express my sincere gratitude to my supervisor Dr. Minjoong Jeong for his valuable guidance, inspiration and productive suggestions throughout the course of this work. I really thank him for his patience, motivation and support which helped me to make my stay at KISTI. It has been an honor for me to be his student. I am also very grateful to all the other members of my thesis committee, Dr. Ilyoup Sohn, Dr. Min Sun Yeom, Dr. Young Mahn Han and Dr. Ji Hoon Kang for their valuable advice and comments. I thank them all for reviewing my thesis and providing me several insightful suggestions. Without their suggestions and comments I would not be able to improve my thesis well. I would also like to thank all of my colleagues and friends in KISTI, UST and in Korea who made my time enjoyable and memorable. I really thank all of them for accepting me into their team with an open heart and keeping me involved in different team and cultural based activities, for helping me to get accustomed with Korean life,
  • 6. v for encouraging me and always being there for me to help. Thank you all for being my family away from home. Finally, I would like to thank my family for all of their motivation, support and love. I would specially like to thank to my father and mother for their unconditional love, support and blessings. I also thank my younger sister for always inspiring and encouraging me to work harder. I would like to dedicate this thesis to my parents, who worked very hard to provide me with the best education and always encouraged me to achieve higher and to follow my own dreams.
  • 7. vi 초록 ISOMAP과 Random Forest를 이용한 철근 콘크리트 전주의 비파괴 검사에 관한 연구 철근 콘크리트 전주는 기계적인 힘의 증가와 전기적인 저항성이 증가된 경제적 효율성으로 인해 송전선로에서 매우 보편적이다. 이러한 전주는 실제 서비스 조건에서 내부 보강 강선의 균열, 부식, 열화 및 단락으로 인한 구조적 안전 문제가 발생할 수 있다. 따라서 전주의 구조적 안전성을 평가하기 위해 주기적인 안전성 검사가 필요하다. 그러나 전주 내부 보강 철근에 대한 현장 비파괴 검사는 복잡하고 경제성 문제가 있다. 본 연구에서는 기계학습 방법론을 기반으로 실제 전주에서 간편하고 효율적으로 취득할 수 있는 다중 채널의 자기장 신호를 활용하는 방법을 제안하였다. 수집된 센서 데이터는 전주의 안전 또는 손상 여부를 파악하기 위한 패턴학습에 사용되었다. 이를 위하여 수집된 다차원 센서 데이터를 ISOMAP (Isometric Feature Mapping) 알고리즘으로 차원축소하고 Random Forest 으로 손상여부를 학습하였다. 제안된 방법을 통하여 측정된 센서 데이터만을 이용하여 전주 내부의
  • 8. vii 철근의 손상여부를 효과적으로 예측할 수 있었다. 이 방법은 향후 개발될 휴대용 센서 장비와 함께 철근 콘크리트 전주의 구조적 안전성을 평가하는 데 적용될 수 있을 것이다.
  • 9. viii ABSTRACT A Comprehensive Study on Nondestructive Inspection of Reinforced Concrete Utility Poles with ISOMAP and Random Forest Reinforced concrete poles are very popular in transmission lines due to their economic efficiency, enhanced mechanical strength and better electrical resistance. However, these poles have structural safety issues in their service terms caused by cracks, corrosion, deterioration, and short-circuiting of internal reinforcing steel wires. Therefore, these poles must be periodically examined to evaluate their structural safety. There are many methods of performing external inspection after installation at an actual site. However, on-site nondestructive safety inspection of steel reinforcement wires inside poles is very difficult. In this study, we developed a machine-learning based application that classifies the magnetic field signals of multiple channels acquired from the actual poles. Initially, the signal data were gathered by inserting sensors into the poles, and these data were then used to learn the patterns of safe and damaged features. These features were then processed with the isometric feature mapping (ISOMAP) dimensionality reduction algorithm. Subsequently, the resulting reduced data were processed with
  • 10. ix random forest algorithm. The proposed method could show that whether the internal wires of the poles were broken or not according to the actual sensor data. This method can be applied for evaluating the structural integrity of concrete poles in combination with portable devices for signal measurement (under development).   *A thesis submitted to committee of the University of Science and Technology in a partial fulfillment of the requirement for the degree of Master of Engineering conferred in February 2019.
  • 11. x
  • 12. xi Contents 1. Introduction ············································································· 1 1.1 Introduction························································· 1 1.2 Inspection Methods of Concrete Materials ····················· 2 1.2.1 Advantages of NDT Methods ····························· 3 1.2.2 Different Types of NDT Methods ························ 4 1.3 Literature Review ················································· 7 1.4 The Proposed System ············································ 8 1.5 Thesis Organization ·············································· 9 2. Experimental Setup ·················································· 10 2.1 Experimental Setup ·············································· 10 3. The Proposed System ··············································· 13 3.1 The Dataset ······················································· 13 3.2 Dimensionality Reduction and its Importance ··············· 14 3.3 The Flowchart of the Proposed System ······················· 16 3.4 ISOMAP ·························································· 17
  • 13. xii 3.4.1 Deciding the Number of Dimensions for the ISOMAP Algorithm ·················································· 19 3.5 Random Forest ·················································· 21 4. Results and Discussion ············································ 26 4.1 Performance Evaluation of Our Classifier ·················· 26 4.2 Comparison of Random Forest with SVM and Decision Trees on Our Data ··········································· 31 5. Conclusions ·························································· 35 References······························································ 36
  • 14. xiii List of Figures 1. Magnetic sensing device with eight channels ······················· 11 2. Field testing; laboratory testing; a broken steel wire; multiple steel wires ······································································ 12 3. A safe signal; a crack signal; safe signals; and crack signals ····· 14 4. Flowchart of the proposed system ··································· 16 5. Flowchart of the ISOMAP algorithm ································ 18 6. The residual variance of ISOMAP in the dataset ··················· 20 7. Random forest of three decision trees ································ 23 8. ROC curve of random forest algorithm ······························· 30 9. ROC curve of SVM ······················································ 32 10. ROC curve of decision tree ············································ 32 11. Performance measure graph of different algorithms ··············· 33
  • 15. xiv List of Tables 1. Specifications of all the parts of the magnetic sensing device ···· 12 2. Sample dataset for illustration of random forest algorithm ········ 22 3. Illustration of vote casting mechanism in random forest ·········· 23 4. Performance evaluation with different number of trees ··········· 25 5. Confusion matrix of the proposed system ·························· 27
  • 16. 1 Chapter 1 Introduction 1.1. Introduction Reinforced concrete poles are widely used for telephone and electric transmission. The cost effectiveness, longer life span (over 50 years), greater mechanical strength, potential to cover longer distances, and better electrical resistance are some of the key reasons for their widespread usage [1]. In comparison to steel poles, reinforced concrete poles have higher variation in architectural shapes, better electrical resistance, lower maintenance charges and no emission of hazardous materials. While comparing with timber poles, concrete poles are higher resistible to hurricanes, not susceptible to decay and fire [2]. The disadvantage of these poles is the high cost of transportation because of being bulky and heavy. However, the heavy weight of these poles also helps in resisting the high winds in coastal areas. In reinforced concrete poles, structural safety defects can occur mainly because of cracks, spalling, corrosion, deterioration and short-circuit of internal reinforcing steel. The main causes of these problems in reinforced concrete poles are poor construction practices, atmospheric exposure, hurricanes, flood, earthquakes, moisture changes, various mechanical, physical and chemical reactions [1, 3]. Corrosion in reinforcing steel and deterioration of the concrete can lead to the loss of service properties and failure of reinforced concrete supports. Corrosion of steel reinforcement is a major reason of deterioration in structural concrete [4]. Rust and other corrosion in reinforced concrete can cause tensile stresses, cover cracking, delamination and spalling [5]. Corrosion can ominously affect the mechanical behavior of reinforcement, the damage caused by corrosion is very expensive to repair and troublesome [3 – 6]. Structural problems need more attention than non-structural problems, about half of the accidents caused from utility poles are related to structural problems [1, 7]. These defects constitute a major reason for reduced life expectancy, structural strength and serviceability of reinforced concrete utility poles. Structural
  • 17. 2 problems can lead to failure of poles at later stage. Moreover, the failure of poles can cause serious problems, such as falling onto people or animals, causing injury (or death), and service disruption. Earlier detection of such problems can save precious lives, time, and money. Therefore, it is necessary to identify such issues at an initial stage for which accurate inspection and monitoring is mandatory on regular basis. Further, reinforced concrete materials deteriorate with the passage of time due to challenging environments and various loads, which affects their strength and serviceability very badly. Deterioration in reinforced concrete structures occurs at varying levels and in many situations the impairment is invisible, which is usually under the ground or inside the poles. Therefore, for ensuring the structural integrity of reinforced concrete utility poles, those already in service poles must be regularly monitored or frequently inspected [8]. It is very important to perform accurate assessment of the integrity of existing poles because maintenance and repair cost are growing rapidly. Hence, there is a rising demand to develop more accurate, consistent and reliable inspection methods for assessing conditions of in-service poles [2]. Stewart et al. [5] claimed that the shortage of long-standing predictive capabilities show that the upgradation of structural reliability valuations are essential at regular intervals. 1.2. Inspection Methods of Concrete Materials Inspection or evaluation methods of concrete materials can be broadly distributed into two main types. Semi-destructive or partially destructive inspection and non-destructive inspection. In semi-destructive methods, the surface of the concrete is slightly damaged and could be repaired once the inspection is performed. Examples of semi-destructive testing methods are core tests, pullout and pull off tests [9]. Further, such inspection methods may require the product to be taken out of service for inspection and are labor intensive. For in-service utility poles, this is not an option since it would require the power line to be temporarily shut down and possibly the utility pole may be dismounted. In non-destructive tests (NDT), inspection can be performed without
  • 18. 3 harming the surface of the concrete. These tests are sometimes called as Nondestructive Evaluation (NDE) [9, 10]. Semi-destructive methods is not the scope of this thesis. Here, the main focus is only on NDT methods. 1.2.1. Advantages of NDT Methods Direct sampling and observation of the structure assessment is always not possible in internal damage situations or when physical access is prohibited to the structure of interest. In such situations NDT methods are commonly used [10]. NDT is defined as the course of inspecting, testing, or evaluating materials without affecting the structure or serviceability of any part of the system. Carrying out assessment without affecting functionality of the system is the main aim of NDT [10, 11]. NDT has the potential to be performed with minimal expenditures of time and manpower. NDT techniques show good sensitivity to concrete properties and defects. NDT techniques are able to easily identify delaminated, cracked and spalled portions of the concrete. NDT methods are also able to provide information about presence of voids, honeycomb, measurements of size and location of steel reinforcement, corrosion action on the reinforcement, and the extent of damage caused by chemical exposure, accidental fire, or freezing and thawing. In recent years, a lot of better, accurate, reliable, and quantifiable NDT methods have been developed [9 – 11]. Non-destructive test methods are progressively developed and applied for identifying different kinds of defects in concrete structures. In the civil and structural engineering industry, a wide range of NDT methods are being used for the assessment of concrete structure [11]. This abundance in applications of NDT methods is due to technological advancement in hardware and software for data acquisition, improvement in data collection and analysis techniques and the ability to perform quick and inclusive assessments of existing construction [12]. Different NDT methods for concrete (pre-stressed and reinforced) materials along with their shortcomings are described in the next section.
  • 19. 4 1.2.2. Different Types of NDT Methods NDT methods for concrete (pre-stressed and reinforced) materials can be categorized into different types, which are: visual inspection, stress wave based methods, nuclear based (radiometric and radiographic) methods, magnetic and electrical methods, penetrability methods, infrared thermography and radar [9 – 21]. 1.2.2.1. Visual Inspection Visual inspection is traditionally used inspection technique. Visual inspection can provide valuable information to a well trained eye [9]. In this method, inspectors usually drill holes in poles, and record the signs of deterioration. As the inspection proceeds, a careful and complete record of all the observations should be made. Visual inspection is useful for identification of the impaired spots and assessing corrosion in uncovered reinforcement. This method is also helpful in providing an early indication of the condition of the concrete for allowing the formulation of a following testing program [9]. The devices used in this method are microscope, optical magnification, magnifying glasses or small digital video camera and rulers [9, 12]. The disadvantages of this method are the requirements of highly qualified and experienced inspectors, and only visible surfaces can be inspected. Therefore, internal defects are not assessed and in most cases detailed properties of the concrete are not acknowledged. 1.2.2.2. Stress Wave Techniques Stress wave techniques have also been successfully applied for the inspection of reinforced concrete. In these techniques, waves are propagated through the medium [9]. Several stress wave based testing techniques have been developed for instance, impact-echo method, ultrasonic through transmission method, ultrasonic-echo method, impulse response, acoustic emission, and Spectral Analysis of Surface Waves (SASW) [10, 12]. The equipment required for such kind of tests are sensors for wave detection, a source for propagation of waves, a data acquisition and analysis system [10, 20]. These methods are very effective in locating large voids or delaminations in concrete structures. However, these methods have some limitations
  • 20. 5 which are the need of the experienced operators, expensive devices, involving complex signal processing and these methods also do not provide information on the depth of defect [10, 12]. 1.2.2.3. Nuclear Methods Different kinds of nuclear methods have also been developed for nondestructive inspection of concrete materials [9]. Nuclear based techniques are usually subdivided into two groups: Radiometric and Radiographic techniques [12]. In these methods, information regarding the test object is gained with the help of high-energy electromagnetic radiation [12]. Radiometry is using electromagnetic radiation (gamma rays) to assess the density of fresh or hardened concrete [12]. In radiometry based techniques, radiation is emitted by a radioactive isotope and perceived by a detector [9, 12]. Radiography based methods are same as that used to generate medical “X-rays”. Radiation is passed via the test object for producing the photograph of the inside structure of the concrete [12]. Photographic film records of internal structure of concrete are usually made in both of these methods. In these methods, the intensity of the radiation is proportional to the exposure of the film. The more refined the intensity of the radiation, the better the exposure of the film and vice-versa [12]. Some problems associated with nuclear methods are the need of licensed operators, measurement is sensitive to chemical composition and affected by near surface material, requires bulky and expensive X-ray equipment, difficult to identify cracks perpendicular to the radiation beam, tackling safety issues. Radiographic procedures are also extremely costly. High voltages are used for x-ray equipment which are also very dangerous. Mostly, the images do not deliver much significant details regarding the deepness of the defect, and merely slight surface can be examined because of the limited scope of the sensitive film [12]. 1.2.2.4. Magnetic and Electrical Methods Many magnetic and electrical based techniques have also been developed for the inspection of the concrete structure. Some of the magnetic and electrical methods are covermeters, half-
  • 21. 6 cell potential and linear polarization [10, 12]. In these techniques, magnetic, electric or electromagnetic fields are generated around the surface of inspection and then those fields are analyzed for further results. Usually, these methods require knowledge about the quantity and location of reinforcement for evaluating the strength of reinforced concrete members [9, 12]. Some of the limitations of these methods are less accuracy, experienced personnel requirement for testing and interpretation, the need of electrical connection to reinforcement. Mostly, these methods cannot identify the presence of second layer of reinforcement inside concrete structure [12]. 1.2.2.5. Penetrability Methods Different penetrability methods such as Initial Surface-Absorption Test (ISAT), Figg (water- absorption Test), CLAM test, Steinert method, Figg air-permeability test, Schönlin test, and Surface airflow test are also being used for the assessment of concrete structure [12]. These methods are usually very simple and inexpensive to perform. However, some of the problems associated with these methods are sensitivity to concrete moisture condition, drilling is necessary in most of the penetrability methods, and lengthy test time. Concrete surface also got damaged in these methods [12]. 1.2.2.6. Infrared Thermography Methods Infrared thermography based methods are used for identifying subsurface defects within and below concrete structures [9, 10, 12]. In these methods, the emission of thermal radiations is sensed from the inspection surface and a visual image is produced [9, 15]. These methods have extensively been applied for internal voids and cracks identification in the concrete structures [9, 12]. Some of the limitations of the infrared based methods are the need of expensive equipment, suitable environmental condition requirement for testing, the depth and thickness of a subsurface anomaly cannot be measured. The test response varies with varying environmental conditions and for meaningful and correct interpretation of the acquired data, trained individuals are required [9, 12].
  • 22. 7 1.2.2.7. Radar (Radio Detection and Ranging) Radar is a famous NDT method for inspection of concrete materials [9]. Radar can also be referred as Ground Penetrating Radar (GPR). In radar, electromagnetic energy is propagated through different dielectric materials [9]. In these methods, electromagnetic pulse is radiated through a transmitter antenna, which are then reflected at the surface and internal layer borders of the inspection object and then that electromagnetic pulse is recorded through a receiver antenna [9, 10]. Radar is most useful in recognition of delaminations and the types of defects occurring in plain or overlaid reinforced concrete decks. Radar has shown better ability in void detection and depth of the thickness of the concrete materials. Radar has also good potential in scanning of large surface ranges in a short period of time [9, 10]. Some of the limitations of radar are, a large amount of data is acquired during scans, and experienced operators are required for operating equipment and interpreting the results. Further, in radar the penetration of pulses received from high resolution antennae have a very limited depth [9, 12]. Beside all of the above well-known NDT categorized methods, numerous case studies exist in which different NDT based methods have been merged. Different NDT methods are combined for the purpose of achieving the best performance, improving the assessment results interpretations, accomplishing quick and accurate results or reducing the limitations of individual methods [10]. 1.3. Literature Review Dackermann et al. [22] developed an NDT method based on guided wave propagation and machine-learning techniques for timber utility poles. They combined improved signal processing methods with multi-sensor based system and advanced machine-learning algorithms. Wave signals were captured through sensor and then machine-learning algorithms were applied in order to evaluate the condition of the pole. They achieved accuracy up to 95% with the use of different classification algorithms.
  • 23. 8 Recently, Dackermann et al. [23, 24] developed NDT methods for timber poles, self- compacting concrete poles without steel reinforcement, and generic concrete poles without steel reinforcement, with the use of advanced signal processing and machine-learning techniques. They applied stress wave methods for data gathering. They used a large number of lower price tactile transducers and accelerometers (along with other devices). They achieved accuracy up to 93%. They applied Principal Component Analysis (PCA) for the purpose of feature reduction and Support Vector Machines (SVM) for the classification of signals regarding prediction of the pole damage condition. They adopted Frequency Response Functions (FRFs) and PCA for extracting signal features when capturing single-mode stress waves for condition assessment. It is very difficult to find any NDT method in the literature which is related to the application of machine learning or data analysis techniques and applied specifically on reinforced concrete utility poles. 1.4. The Proposed System In this study, we propose a novel NDT method for structural health monitoring of reinforced concrete utility poles based on dimensionality reduction and machine learning techniques. For the purpose of training machine-learning algorithm, the data gathered through magnetic sensors at actual site by SMART C&S [25] were used. Further, ISOMAP [26] is applied on the data for feature reduction, and the resulting refined data constitute the input to a random forest classifier [27, 28]. In particular, the refined signals are trained with a random forest method and are categorized into safe and crack signals that are based on actual field experiments. Here, safe signals are those corresponding to non-damaged wires, whereas crack signals correspond to damaged wires. Two other well-known machine-learning algorithms are also applied on the data: Support Vector Machine (SVM) [29] and decision trees [30]. All of the algorithms used in this study were written and executed with the Python language
  • 24. 9 machine learning scikit-learn library [31]. 1.5. Thesis Organization This thesis is organized as follows: In Chapter 1, reinforced concrete poles are briefly introduced and compared with timber and steel poles. Different types of inspection methods on concrete materials along with the benefits and the need of NDT methods are concisely explained. Further, different types of NDT methods and their limitations are described in this chapter. At last, the literature review and the proposed system are presented. In Chapter 2, the complete experimental setup for data gathering is described. The proposed system resulting from this research work is elaborated in Chapter 3. The experimental results are presented and discussed in Chapter 4. Finally, the conclusions and the directions towards future work are given in Chapter 5.
  • 25. 10 Chapter 2 Experimental Setup In this chapter, the applied data gathering techniques, the devices which are used for data gathering, and their complete specifications are elaborated. 2.1 Experimental Setup The whole setup for data gathering is as follows: a magnetic sensing device, a Data Acquisition (DAQ) device, a cable, and a portable computer. The magnetic sensing device (Hall Effect Sensor) is used for detecting the magnetic field, DAQ device is used for converting those magnetic field signals into digital values, which can later be manipulated by a computer and a cable is used for connecting the magnetic sensing device and the DAQ device. The whole arrangement of these devices is shown in Figure 1. To inspect a utility pole, the magnetic sensing device can be inserted into the pole through holes, and the signals can be gathered with the help of the magnetic sensing and DAQ device. Later, this device can be attached to a portable computer to check the signals with the patterns of this application (the proposed system). Regarding the training of the algorithm, the signals were gathered with the same device and the dataset was manually configured by the field engineers of SMART C&S. They collected signals from 30 reinforced concrete utility poles containing signals from damaged and non-damaged wires. They labeled each signal manually on the basis of the condition of each wire. For collecting signals from those 30 reinforced concrete utility poles, they uninstalled those utility poles and broke down the concrete cover over steel wires to pull the wires out the concrete. In particular, the field engineers pulled all of the wires out those 30 poles and gathered the signals from each wire. They inserted the sensors into the poles through the bolt hole for maintenance and gathered the signals from the ground portion up to the bolt hole of the pole, because the tendency of defects in wires inside poles is mostly in the ground portion. Further, uninstallation of the pole is not necessary during the inspection. The signals gathered from those 30 utility
  • 26. 11 poles can be used for the whole lifetime of the proposed system for the inspection. Furthermore, the measuring section for signal gathering inside the poles was set to a maximum height of 4 m, and the measurement rate was fixed to 0.3 m/s. The diameter of steel wires in all of the poles was from 9 to 12 mm, and there were 16 total steel (eight tension and eight reinforcing) wires in each utility pole. The thickness of the concrete cover over the steel wires was from 12 to 24 mm in each pole. They tried to maintain the constant speed in all of the experiments and repeated every experiment on each wire from 3 to 4 times. The field engineers gathered 101 Hall Effect values for every signal from all of the wires. These Hall Effect values are then referred to as “features” in the dataset. The dataset could contain many ambiguous, repeated, unnecessary and insignificant features. Therefore, the dataset was filtered into meaningful features for the purpose of using it with a classification algorithm, which is further explained in Chapter 3. The eight-channel magnetic sensing device along with all the parts is depicted in Figure 1. The specifications of all the parts of this device are displayed in Table 1. Figure 1: Magnetic sensing device with eight channels. Table 1: Specifications of all the parts of the magnetic sensing device.
  • 27. 12 Sensor and DAQ ADC Resolution 16 bit ADC Input Channel 8 Differential Input Channels ADC Sampling rate 50 S/s Main Cable Length 6 m Diameter 22 mm Figure 2(a) illustrates the process of removing a maintenance bolt and securing the bolt hole to insert the magnetic sensing device into a working utility pole. Figure 2(b) shows an example of laboratory verification experiments of the data-gathering device. Figure 2(c) represents an example of a broken steel wire and Figure 2(d) shows multiple steel wires inside the poles. Figure 2. (a) Field testing; (b) laboratory testing; (c) a broken steel wire; (d) multiple steel wires.
  • 28. 13 Chapter 3 The Proposed System In this chapter, the data set which is used for training of the algorithm is explained, the signals from the data set are presented in graphical form, and the need of dimensionality reduction techniques are explained. Further, the proposed system is elaborated in a complete step-wise manner. 3.1. The Dataset The dataset for this research project is composed of n samples with m features, R = {(xi u , yi), i = 1, 2, …..n}. Here, n = 240, m = 101 and u = 1, 2, ……m, xi is the input data while every row in xi has a label yi ∈ {0, 1}. Note that 0 denotes safe signals, while 1 indicates crack signals. Each row of xi represents a signal, while u indicates the features of every signal, which are the Hall Effect values for each signal. Figure 3 depicts plots of safe and crack signals from the dataset in two-dimensional graphs. Figure 3(a) represents a single sample of a safe signal from the dataset. Figure 3(b) shows a single sample of a crack signal in the dataset. Figure 3(c) shows all of the safe signals in the dataset. Figure 3(d) depicts all the crack signals from the dataset.
  • 29. 14 Figure 3. (a) A safe signal; (b) a crack signal; (c) safe signals; and, (d) crack signals. 3.2. Dimensionality Reduction and its Importance For data gathering, magnetic sensors were used and the dataset also contains many replicated features. To reduce these replicated features, it is essential to apply a dimensionality reduction technique. Dimensionality reduction is also important for finding meaningful low- dimensional hidden structures in high dimensional data and to improve the performance of classification algorithms. Dimensionality reduction techniques project higher dimensional data to lower dimensional while retaining most of the relevant information. It helps in data compression, and also reduces storage space. It makes much easier to analyze data for machine learning algorithms. Dimensionality reduction is also helpful for visualization purpose. Dimensionality reduction techniques are very useful in various domains, for example, document categorization, protein disorder prediction, life sciences, computer vision based tasks and machine learning models.
  • 30. 15 Dimensionality reduction can be expressed mathematically in the context of a given dataset. Consider a dataset represented as a matrix X, where X is of size r x s, here r represents the number of rows while s shows the number of columns of matrix X. Normally, the rows of a matrix represent data points while the columns are considered as features. Dimensionality reduction will reduce the number of features of each data point. It will turn the matrix X into a new matrix X' , the size of r x t, where t < s. For visualization purpose t is usually set to 1, 2 or 3. Many algorithms have been developed that compute the dimensionality reduction of a dataset [32]. Simpler algorithms such as Principal Component Analysis (PCA) produce the best possible embedding by maximizing the variance in the data. State-of-the-art algorithms can find an optimal combination of features so that the distances of a high dimensional space are well-maintained in the new embedding. Dimensionality reduction is an active area of research where researchers are developing new techniques in order to produce better embeddings.
  • 31. 16 3.3. The Flowchart of the Proposed System The overall flow of the proposed system is depicted in Figure 4. Figure 4: Flowchart of the proposed system. The flowchart in Figure 4 illustrates that for the dimensionality reduction of the data, ISOMAP is applied, which is a manifold based global geometric framework mainly used for non-linear dimensionality reduction. In recent years, numerous non-linear dimensionality reduction algorithms have been developed for the purpose of addressing the limitations of the traditional linear techniques [33, 34]. Many linear techniques for instance, PCA, classical scaling, and factor analysis often fail to properly handle non-linear data [32, 34]. In contrast, non-linear methods can perform well on higher dimensional non-linear data. In the last decade, non-linear based dimensionality reduction techniques became very popular due to their superior performance on high dimensional data when compared to linear techniques [34]. Maaten et al.
  • 32. 17 [34] applied twelve non-linear dimensionality reduction algorithms and PCA on ten datasets (05 artificial and 05 natural) and proved that most of the non-linear techniques performed better than the linear one on high dimensional complex datasets. Jeong [35] showed that ISOMAP outperforms PCA (a linear dimensionality reduction technique) on higher dimensional data. At the end, random forest classification algorithm is applied on ISOMAP based reduced data and training with random forest algorithm have been stopped once the out-of-bag (OOB) error has reached less than 0.05 (5%). All of these steps and algorithms are explained with complete details in the next sections. 3.4. ISOMAP ISOMAP is a famous manifold based dimensionality reduction technique. A manifold is any topological space, which is locally Euclidean. Typically, any item which is approximately “flat” on trivial scales is a manifold. Manifold learning is a tactic to non-linear dimensionality reduction. Manifold learning is referred to as uncovering the manifold structure in a data set, where the data set usually lies along a low-dimensional manifold which is embedded in a high- dimensional space, the lower dimensional space reveals the primary parameters while the higher dimensional space represents the feature space [36]. Manifold learning is an important problem in various data processing domains for instance, pattern recognition, machine learning, data compression, and database navigation [33]. Manifold based algorithms works on the notion that the dimensionality of a data set is merely artificially high [36]. ISOMAP is one of the first algorithms which is introduced specifically for manifold based learning [36]. The manifold structure can be recovered very well with ISOMAP. ISOMAP is an extension of Multidimensional Scaling (MDS), a classical method for embedding dissimilatory information into a Euclidean space. In ISOMAP, global geometrical features of a data set are preserved in the embedding space. The main concept of ISOMAP is to replace Euclidean distances with an approximation of the geodesic distances on the manifold. Whereas,
  • 33. 18 geodesic distance can be described as the distance between any two points which is measured on the manifold. For every point in ISOMAP, geodesic distances are estimated with the use of shortest-path distances and then by using classical MDS, an embedding of these distances are estimated in a Euclidean space. In low-dimensional space, ISOMAP attempts to map neighboring points to neighboring points, and on the manifold it attempts to map faraway points to faraway points. ISOMAP is a three-step process, which are shown in figure 5. Figure 5: Flowchart of the ISOMAP algorithm. The three steps from the flowchart in figure 5 are further explained as: 1) Construct neighborhood graph; k-nearest neighbors of every data point are defined and represented by a graph G; in G, every point is connected to its nearest neighbors by edges. 2) Compute the shortest paths; the geodesic distances between all the pairs of points are estimated using the Dijkstra algorithm [37]; the squares of these distances are stored in a matrix of graph distances D(G). 3) Construct d-dimensional embedding; the classical MDS algorithm is applied on D(G) in order to find a new low-dimensional embedding of the data in a d- dimensional Euclidean space Y. Recently, ISOMAP has been extended for handling conformal mappings and very large data sets [36, 38].
  • 34. 19 3.4.1. Deciding the Number of Dimensions for the ISOMAPAlgorithm Finding intrinsic dimensionality of a dimensionality reduction algorithm is very important. Setting a lower dimensionality than the intrinsic dimensionality of a dimensionality reduction algorithm can enhance the possibility of losing much important information. Whereas, on the other hand by setting the dimensions higher than the intrinsic dimensionality can make the dimensionality reduction algorithm very slow and dimensionality reduction algorithm can retain many redundant and repeated features. Maaten et al. [34] mentioned that the classification performance of non-linear dimensionality reduction techniques on many natural datasets was not improved without utilizing the intrinsic dimensionality estimator properly. Therefore, it is essential to find the intrinsic dimensions before using a dimensionality reduction technique. In this case, the dataset is composed of 240 data items with 101 features. Firstly, we have to determine the intrinsic number of dimensions of the ISOMAP space. For this purpose, the residual variance [26, 39] has been calculated, which is typically used to evaluate the error of dimensionality reduction. Residual variance is defined in Equation (1), as follows: Rd = 1 – r2 (G, Dd) (1) Where Rd is the residual variance, G shows the geodesic distance matrix, Dd represents the Euclidean distance matrix in the d-dimensional space, and r(G, Dd) denotes the correlation coefficient of G and Dd. The value of d is determined by a trial-and-error approach for reducing the residual variance.
  • 35. 20 Figure 6: The residual variance of ISOMAP in the dataset. Figure 6 clearly shows that in all cases the residual variance decreases as the number of dimensions d is increased. It is recommended in [26, 39] to select the number of dimensions at which this curve stops to decrease significantly with added dimensions. To evaluate the intrinsic dimensionality of our data, we searched for the “elbow”, at which this curve is not decreasing significantly with increasing dimensions. The arrow mark highlights the approximate dimensions in the data set. If the dimensions are increased and it exhibit some residual variance of the increased dimensions, then the data can be better explained by ISOMAP and the classification algorithm will perform better based on the residual variance. For example, in this dataset for a number of dimensions equal to 50 having a residual variance equal to 0.03, the performance is slightly better than for a number of dimensions equal to 8. We set eight dimensions for the classification algorithm because of the computational cost of both ISOMAP and the random forest. We had already achieved better performance with eight dimensions. However, the performance is not getting improved with increasing dimensions once the residual variance becomes zero.
  • 36. 21 3.5. Random Forest ISOMAP reduces the dataset into eight dimensions on which a random forest classification algorithm is applied for the purpose of training features in the data. The random forest is an ensemble method, and generally, ensemble methods perform better than the single classifiers. Ensemble methods are built from a set of classifiers and then use the weightage of their predictions to render the final output [40]. These methods employ more than one classification technique and then combine their results. They are notably less prone to overfitting. Recently, various ensemble methods have been proposed, in which boosting [41] and bagging [42] of classification trees are most famous methods. In boosting based techniques, consecutive trees assign extra weightage to points inaccurately predicted by previous predictors. At last, a weighted vote is taken for the prediction purpose. In boosting, the performance of a tree is dependent on the performance of the preceding trees. In bagging based techniques, each tree is grown independently with the use of a bootstrap sample from the dataset and consecutive trees do not relay on previous trees. In bagging techniques, a majority vote is also taken at the end for prediction. In regular trees, splitting of each node is performed among all variables with the use of the best split method. In the random forest method, every node is split amongst the subset of predictors which are randomly picked at that node with the use of the best split method [28]. Random forest is considered as an enhanced version of bagging. Random forest is a widely used technique in various recent research projects and real-world applications [43]. Random forest has successfully been applied in land cover classification [44], Bioinformatics [45], Pattern Recognition [46], Ecology [47], Medicine [48], Astronomy [49] and much more. To demonstrate random forest algorithm and its voting mechanism, a simple dataset is taken, which is shown in Table 2. Table 2 contains eight samples with four features. A random forest is formed for the purpose of prediction of the value of the Play feature, it will predict that whether a child is able to play or not, other features for these samples are, Outlook, HWDone (Homework Done), and Weekend. From the dataset in Table 2, it is clear that if the child had completed his
  • 37. 22 homework on a sunny day then the child can play whether it is a weekday or weekend. Further, on a rainy weekday, the child cannot play even if the child had finished the homework. Table 2: Sample dataset for illustration of random forest algorithm. Outlook HW Done Weekend Play Sunny True True Yes Sunny True False Yes Sunny False True Yes Sunny False False No Rainy True True Yes Rainy True False No Rainy False True Yes Rainy False False No In order to classify new samples, a random forest of three decision trees is formed which is illustrated in Figure 7. Table 3 displays the outcome of vote casting for classifying the samples. From Table 3, it is clear that the votes of decision tree A and tree C are for “Yes”, while the vote of decision tree B is for “No”. Therefore, from the majority voting mechanism, the winning vote is “Yes” so that in this case the child can play.
  • 38. 23 Figure 7: Random forest of three decision trees. Table 3: Illustration of vote casting mechanism in random forest. Decision Tree Vote A Yes B No C Yes This demonstration of random forest algorithm along with the dataset and Figure 7 were adopted from [50]. Furthermore, the random forest algorithm consists of three steps, which are: 1) Construct ntrees bootstrap samples from the input data while using the Classification And Regression Trees (CART) [51] methodology.
  • 39. 24 2) Grow an unpruned tree for each of the ntrees, randomly sample mtry of the predictors, and choose the best split among those variables. 3) Aggregate the predictions of the ntrees trees for the prediction of new data. In this study, scikit-learn [31] based method is applied which combines classifiers by averaging their probabilistic prediction. In random forest technique, the generalization error in a forest is reliant on the strength of the distinct trees and the correlation between all of the trees in a forest [27]. Breiman [27] applied random forest and Adaboost [52] (a tree based ensemble method which works on boosting principles), on 13 different datasets and stated that the random forest did not overfit during training, while in Adaboost the problem of overfitting occurred in many cases. Breiman [27] further claimed that the random forest worked well with categorized variables and also performed well on data with weak variables. The accuracy of the random forest model was notably good as that of the Adaboost and occasionally superior. Random forest worked better on noisy data as compared to Adaboost [27]. Random forest can also perform better with a very huge amount of input variables. Random forest is very simple and can be very easily parallelized so that several random forests can be executed on different machines, and then the votes can be combined for the purpose of getting the final outcome [28]. Choosing the number of trees in a random forest is an open question. Breiman [27] mentioned that, the greater the number of trees, the better the performance of the random forest. However, finding the optimal number of trees for the algorithm is very difficult. Oshiro et al. [43] applied random forest algorithm on 29 different types of datasets with varying number of trees L in exponential rates using base two, which is L = 2j , j = 1, 2, . . . , 12. They concluded that sometimes a larger number of trees in the forest only increases its computational cost, having no significant impact on its performance. In this study, the same method as that of Oshiro et al. [43] is applied for deciding the number of trees in the random forest classifier. The number of trees and their performance on the dataset is reported in Table 4.
  • 40. 25 Table 4: Performance evaluation with different number of trees. No of Trees 2 4 8 16 32 64 128 256 512 Accuracy 0.93 0.94 0.97 0.97 0.97 0.97 0.97 0.98 0.98 Table 4 clearly depicts that using 8, 16, 32, 64, or 128 as the number of trees in a random forest on the dataset resulted in the same performance. Therefore, the number of trees have been set to eight. We did not set a large number of trees, as this would increase the computational cost without any significance. For deciding the number of random samples in random forest many methods have been proposed. However, the common term applied is √ 𝑝, where p is the number of predictors. Here, the dataset for random forest algorithm has eight features, therefore, we choose mtry = 3. For training a machine-learning algorithm, it is essential to know when to stop the training of an algorithm. It is very important to know that whether the algorithm has learnt enough and whether it is in the position to put and test on unseen data or still need some more training. In random forest for such purpose the OOB (out-of-bag) error estimate is used. The OOB error check the performance of a random forest model. This error is calculated by averaging only those trees corresponding to bootstrap samples that render wrong predictions [27, 53]. In this study, the training of the model has been stopped when the OOB error reduced below 5%, which is a quite good fit and shows that more than 95% of the decision trees are rendering correct predications. As random training is applied to a random forest, therefore, the model is executed 100 times and then the mean OOB error has been calculated from all the OOB error values.
  • 41. 26 Chapter 4 Results and Discussion In this chapter, the results originated from the proposed system on the available data are presented. The results are elaborated with different performance metrics. Further, the random forest algorithm is compared with decision trees and SVM on our data. 4.1. Performance Evaluation of Our Classifier For training of the classification algorithm, 210 data items were manually chosen while 30 data items were also manually selected for testing purpose. The structural safety labels were detached and were not exposed to the algorithm for evaluation purposes. The training set was composed of 184 safe signals and 26 cracked signals while the test set contained 27 safe signals and 3 cracked signals. The distribution of the data items were performed in such way for the purpose of getting the algorithm enough trained with the cracked signals. The predicted results from the algorithm were then compared with the labels. The accuracy was calculated as the ratio between accurate predicted instances and the total number of examined instances, as defined in Equation (2). As the random forest algorithm renders different results on different executions, therefore, it was executed 100 times. The accuracy, defined as the mean value of all the 100 calculated accuracies, in this case the accuracy was 97%. Further, the dataset consisted of 211 safe signals and 29 crack signals. Therefore, it is an imbalanced dataset. It is likely that the algorithm will only predict safe signals and calculate better accuracy on the basis of the result. In such cases, accuracy alone is not enough for determining the performance of an algorithm. The accuracy mostly biases to the majority class data, and presents several other weaknesses, such as less distinctiveness, discriminability, and informativeness. Further, it is possible for a classifier to perform well on one metric while being suboptimal on other metrics. Moreover, different performance metrics measure different tradeoffs in the predictions made by a classifier.
  • 42. 27 Therefore, it is essential to assess algorithms on a set of performance metrics [54]. In this case, it is not good to relay only on accuracy. In order to discriminate accurately and to select an optimal solution for the model, the confusion matrix, precision, recall [55], and F-measure [55, 56] have also been calculated. The confusion matrix is widely used for describing the performance of a classifier, because it shows the ways in which a classification model is confused during predication. The confusion matrix is composed of rows and columns corresponding to the classification labels. The rows of the table represent the actual class, while the columns represent the predicted class. The main diagonal elements in the confusion matrix are TP and TN, which denote correctly classified instances, while other elements (FP and FN) denote incorrectly classified instances. Here, TP means true positive, which is when the algorithm correctly predicts the positive class, while TN shows true negative, which is when the algorithm correctly predicts the negative class. Further, FP indicates false positive, which is when the algorithm fails to predict the negative class correctly. Finally, FN represents false negative, which is when the algorithm fails to predict the positive class correctly. In this data, the safe signals are denoted as a positive class, while the crack signals are represented as a negative class. The confusion matrix of random forest algorithm on the test data is presented in Table 5. Table 5: Confusion matrix of the proposed system. Predicated Safe Signals Predicated Crack Signals Actual Safe Signals TP = 26 FN = 1 Actual Crack Signals FP = 0 TN = 3 Note that here total test points are equal to 30, and the total number of correctly classified instances is calculated as 26 + 3 = 29, while the total number of incorrectly classified instances is given by 0 + 1 = 1.
  • 43. 28 The confusion matrix clearly shows that there is only one mistake made by the algorithm, namely a safe signal is predicted as crack signal. On the basis of the confusion matrix, the result is quite good. However, an analysis exclusively based on the confusion matrix is not sufficient when evaluating the performance of an algorithm. In the case of imbalanced classes, precision, and recall constitute a useful measure of success of prediction. Precision and recall are commonly used in information retrieval for the evaluation of retrieval performance [57]. Precision is applied to measure the correctly predicted positive patterns from the total predicted patterns in a positive class, whereas recall is used to find the effectiveness of a classifier in identifying positive patterns. For full evaluation of the effectiveness of a model, examining both precision and recall is necessary. High precision represents a low false positive rate, while high recall represents a low false negative rate. Accuracy A, precision P, and recall R are defined in Equation (2). A = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁 , P = 𝑇𝑃 𝑇𝑃+𝐹𝑃 and R = 𝑇𝑃 𝑇𝑃+𝐹𝑁 (2) The precision of the classifier on the test data is 97%, while the recall is also 97%. These both are very high values for the test data. However, precision and recall are still not sufficient to select an optimal solution or algorithm. To find a balance between both precision and recall, F-measure is used. For computing score, the F-measure considers both precision and recall. Moreover, F-measure indicates how many instances the classifier classifies correctly without missing a significant number of instances. The greater the F-measure, the better the performance of the model. F-measure is defined as the harmonic mean of precision and recall, as expressed in Equation (3). F-measure = 2 1/𝑃+1/𝑅 (3)
  • 44. 29 The F-measure of the algorithm is 97%. Sometimes, precision, recall and F-measure also fails to find an optimal classifier. Because while calculating the score, all of these performance measures do not balance positive and negative classes appropriately. In such situations, Receiver Operating Characteristics (ROC) analysis are preferred which compare True Positive Rate (TPR) with False Positive Rate (FPR) for a given classification algorithm. ROC graphs are very helpful in order to organize a classifier better and to visualize its performance [55]. ROC graphs are widely being applied for medical decision making problems and now it is also increasingly being used in data mining and machine learning [55]. ROC graphs are represented as two-dimensional graphs in which FPR is plotted on the X axis while TPR is plotted on the Y axis. TPR can also be referred to as hit rate or recall, while FPR is known as false alarm rate of a classifier. TPR measures the fraction of correctly labeled positive instances while FPR measure the fraction of negative instances which are incorrectly predicted as positives. TPR and FPR can be defined as: TPR = 𝑇𝑃 𝑇𝑃+𝐹𝑁 and FPR = 𝐹𝑃 𝐹𝑃+𝑇𝑁 (4) From equation (4), it is clear that the TPR is the same as that of recall, ROC graphs shows better balance between TP and FP. The lower left point (0,0) in the bottom left hand corner shows that such classifier do not commit any FP errors and also gains no TP. While (1,1) in the top right hand corner shows that the classifier has all the TP and all FP errors. The point (0,1) in top left hand corner denotes a perfect classifier because in such case the FPR is 0 and TPR are 100%. Whereas, the point (1,0) in the bottom right hand corner represent a worst classifier in which the FPR is 100% while TPR is 0. In ROC graphs, AUC (area under the ROC curve) is also used which measures the entire two-dimensional area underneath the whole ROC curve. AUC can be expressed as the probability that a randomly picked positive instance is ranked more highly than a randomly selected negative instance. AUC measures how well predications are ranked and it measure the quality of the model’s prediction. AUC summarizes the
  • 45. 30 performance of a classifier in a single number. The greater the value of the AUC, the better will be the performance of the classifier. The blue block in Figure 8 depicts the ROC curve of our random forest model while the whole area underneath the blue block shows the AUC of the random forest model. Figure 8: ROC curve of random forest algorithm. In ROC curve, the main goal is to be in the upper left hand corner. When one looks at the ROC curve in Figure 8, it can easily be seen that it is properly close to the optimal. The AUC score is 0.99, which demonstrates quite improved result. The performance of the algorithm can now be measured in terms of classification accuracy, confusion matrix, precision, recall, F-measure and ROC curves. In this study, all of them are calculated for the evaluation of the classifier, and the results demonstrate that the classifier is very reliable.
  • 46. 31 4.2. Comparison of Random Forest with SVM and Decision Trees on Our Data It is a common practice in machine learning to apply different machine-learning models on a dataset and to check the performance of those models on test data. As a result, choose that model which performs better on test data. Therefore, to evaluate and compare the results of the random forest algorithm with other machine learning algorithms on the ISOMAP-based reduced data, SVM and decision trees are also applied. SVM attempts to find the optimal separating hyperplane between objects of different classes. Further, SVM is the most suitable for binary classification but can also be configured for multi-classification tasks. Decision trees are applied for both classification and regression tasks. In decision trees, each node represents a feature, each link represents a decision, and each leaf represents an outcome. The root node is defined as the attribute that best classifies the training data. This process is repeated for each branch of the tree. In this study, SVM classifier with different types of kernels including linear, polynomial, sigmoid and Radial Basis Function (RBF) were applied. The accuracy of SVM with linear kernel was 86%, while with polynomial the accuracy was 93%, with sigmoid it was 90% and the accuracy with RBF kernel was also 90%. Therefore, the one with the greatest accuracy which is polynomial here were taken to compare with the random forest on different performance evaluators. For decision trees, unpruned decision trees with entropy index-based and with gini index-based with the best split method were implemented. The accuracy of entropy based decision tree was 93% while the accuracy of the one with the gini index was 96%. Here, also the gini index-based tree were chosen to compare with random forest model. Using SVM and decision trees, the greatest achieved accuracies were 93% and 96% respectively, the other performance measures of SVM and decision trees are shown in Figure 11. The ROC curves of SVM and decision trees on test data are also drawn which are displayed in Figure 9 and Figure 10 respectively. The ROC’s clearly shows that SVM is able to balance
  • 47. 32 negative and positive classes better as compare to decision tree but when both ROC’s are compared with ROC of the random forest model then the random forest is the clear winner. Figure 9: ROC curve of SVM. Figure 10: ROC curve of decision tree.
  • 48. 33 Further, a complete comparison of all these algorithms on the test data is depicted in Figure 11. Figure 11: Performance measure graph of different algorithms. Figure 11 clearly shows that the random forest outperform SVM and decision trees. The precision and recall of random forest and decision trees are almost the same, but the accuracy, F-measure and AUC score of random forest are superior amongst all. Furthermore, random forest is composed of many decision trees. In a random forest, several decision trees are combined and every decision tree votes for their respective class. Therefore, it is obvious that the random forest will outperform a decision tree because in decision trees the result is taken only on the basis of a single tree. However, the comparison between random forest and SVM is very difficult and deciding that one algorithm outperforms another one is a hectic task. Liaw et al. [28] stated that random forest outperforms SVM, a decision tree and many other machine learning algorithms. Random forest is an ensemble method and Dietterich et al. [40] mentioned that ensemble based methods works well when compared with a single classifier. Further, Gislason et al., Qi et al., and Jia et al. [44, 45, 58] applied random forest on different
  • 49. 34 kinds of datasets and compared it with SVM and concluded that on some datasets random forest works better while in some cases, the performance of SVM was better than the random forest.
  • 50. 35 Chapter 5 Conclusions In this study, a structural safety assessment method for reinforced concrete utility poles with the use of ISOMAP and random forest is proposed. The proposed system is able to identify the condition of wires inside reinforced concrete poles. It is also an easy and inexpensive system to adopt, because only a few devices are needed for carrying out all the steps of the proposed system. ISOMAP is used for data reduction and then a random forest classifier is applied for classification purposes. The random forest algorithm outperformed other machine learning algorithms (such as SVM and decision trees) on the data set. The performance of the system was evaluated with different performance measures including accuracy, precision, recall, F-measure and ROC curves. In the very first attempt, the achieved performance was quite better. At the moment, there was a limited number of trained data items from field experiments. Therefore, machine learning techniques were opted rather than deep learning methods. In future, more experimental data can be received from the field engineers and then deep learning methods can be applied for accomplishing better performance with higher reliability. Another possible work will be to transform the existing time domain signals into frequency domain through some transformation and filtering methods. Various methods such as Fourier transform or time-dependent convolutional methods including low-pass and high-pass filters will be applied. At the end, dimensionality reduction and machine learning algorithms will be applied on the transformed data. Further, in this study the random forest algorithm is only compared with two other machine learning algorithms. In future, our random forest model will be compared with more up-to-the-minute ensemble based methods on the same dataset and the performance of the classifier can also be evaluated with more performance measures.
  • 51. 36 References 1. Kliukas, R., Daniunas, A., Gribniak, V., Lukoseviciene, O., Vanagas, E., & Patapavicius, A. (2018), ‘Half a century of reinforced concrete electric poles maintenance: Inspection, field- testing, and performance assessment’, Structure and Infrastructure Engineering, 14(9), 1221-1232. 2. Baraneedaran, S., Gad, E. F., Flatley, I., Kamiran, A., & Wilson, J. L. (2009), ‘Review of in- service assessment of timber poles’, Proceedings of the Australian Earthquake Engineering Society, Newcastle, Australia. 3. Miszczyk, A., Szocinski, M., & Darowicki, K. (2016), ‘Restoration and preservation of the reinforced concrete poles of fence at the former Auschwitz concentration and extermination camp’, Case Studies in Construction Materials, 4, 42-48. 4. Cairns, J., Plizzari, G. A., Du, Y., Law, D. W., & Franzoni, C. (2005), ‘Mechanical properties of corrosion-damaged reinforcement’, ACI Materials Journal, 102(4), 256. 5. Stewart, M. G. (2012), ‘Spatial and time-dependent reliability modelling of corrosion damage, safety and maintenance for reinforced concrete structures’, Structure and Infrastructure Engineering, 8(6), 607-619. 6. Ying, L., & Vrouwenvelder, A. C. W. M. (2007), ‘Service life prediction and repair of concrete structures with spatial variability’, Heron, 52 (4). 7. Doukas, H., Karakosta, C., Flamos, A., & Psarras, J. (2011), ‘Electric power transmission: An overview of associated burdens’, International journal of energy research, 35(11), 979- 988. 8. Val, D. V., & Stewart, M. G. (2009), ‘Reliability assessment of ageing reinforced concrete structures—current situation and future challenges’, Structural Engineering International, 19(2), 211-219. 9. No, T. C. S. (2002), ‘Guidebook on non-destructive testing of concrete structures’, Vienna: International Atomic Energy Agency (IAEA).
  • 52. 37 10. Breysse D. (2012), ‘Non destructive assessment of concrete structures: Reliability and Limits of Single and Combined Techniques’, Dordrecht: RILEM State of the Art Reports, vol 1. Springer. 11. Helal, J., Sofi, M., & Mendis, P. (2015), ‘Non-destructive testing of concrete: A review of methods’, Electronic Journal of Structural Engineering, 14(1), 97-105. 12. Davis, A. G., Ansari, F., Gaynor, R. D., Lozen, K. M., Rowe, T. J., Caratin, H., & Hertlein, B. H. (1998), ‘Nondestructive test methods for evaluation of concrete in structures’, American Concrete Institute, ACI, 228. 13. Hermanek, P., & Carmignato, S. (2016), ‘Reference object for evaluating the accuracy of porosity measurements by X-ray computed tomography’, Case studies in nondestructive testing and evaluation, 6, 122-127. 14. Makar, J., & Desnoyers, R. (2001), ‘Magnetic field techniques for the inspection of steel under concrete cover’, NDT & e International, 34(7), 445-456. 15. Milovanović, B., & Banjad Pečur, I. (2013), ‘Detecting defects in reinforced concrete using the method of infrared thermography’, HDKBR INFO Magazin, 3(3), 3-13. 16. Akhtar, S. (2013), ‘Review of nondestructive testing methods for condition monitoring of concrete structures’, Journal of construction engineering. 17. Sannikov, D. V., Kolevatov, A. S., Vavilov, V. P., & Kuimova, M. V. (2018), ‘Evaluating the Quality of Reinforced Concrete Electric Railway Poles by Thermal Nondestructive Testing’, Applied Sciences, 8(2), 222. 18. Milovanović, B., & Banjad Pečur, I. (2016), ‘Review of active IR thermography for detection and characterization of defects in reinforced concrete’, Journal of Imaging, 2(2), 11. 19. Szymanik, B., Frankowski, P. K., Chady, T., & John Chelliah, C. R. A. (2016), ‘Detection and inspection of steel bars in reinforced concrete structures using active infrared thermography with microwave excitation and eddy current sensors’, Sensors, 16(2), 234. 20. Zhang, J. K., Yan, W., & Cui, D. M. (2016), ‘Concrete condition assessment using impact- echo method and extreme learning machines’, Sensors, 16(4), 447.
  • 53. 38 21. Cui, D. M., Yan, W., Wang, X. Q., & Lu, L. M. (2017), ‘Towards intelligent interpretation of low strain pile integrity testing results using machine learning techniques’, Sensors, 17(11), 2443. 22. Dackermann, U., Skinner, B., & Li, J. (2014), ‘Guided wave–based condition assessment of in situ timber utility poles using machine learning algorithms’, Structural Health Monitoring, 13(4), 374-388. 23. Dackermann, U., Yu, Y., Niederleithinger, E., Li, J., & Wiggenhauser, H. (2017), ‘Condition Assessment of Foundation Piles and Utility Poles Based on Guided Wave Propagation Using a Network of Tactile Transducers and Support Vector Machines’, Sensors, 17(12), 2938. 24. Dackermann, U., Yu, Y., Li, J., Niederleithinger, E., & Wiggenhauser, H. (2015, September), ‘A new non-destructive testing system based on narrow-band frequency excitation for the condition assessment of pole structures using frequency response functions and principle component analysis’, In Proceedings of the International Symposium Non-Destructive Testing in Civil Engineering, Berlin, Germany, 15-17. 25. SMART C&S. Available online: http://smartcs.co.kr/ 26. Tenenbaum, J. B., De Silva, V., & Langford, J. C. (2000), ‘A global geometric framework for nonlinear dimensionality reduction’, science, 290(5500), 2319-2323. 27. Breiman, L. (2001), ‘Random forests’, Machine learning, 45(1), 5-32. 28. Liaw, A., & Wiener, M. (2002), ‘Classification and regression by randomForest’, R news, 2(3), 18-22. 29. Osuna, E., Freund, R., & Girosi, F. (1997), ‘Support vector machines: Training and applications’, Cambridge, MA, USA: Massachusetts Institute of Technology (MIT). 30. Quinlan, J. R. (1986), ‘Induction of decision trees’, Machine learning, 1(1), 81-106. 31. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011), ‘Scikit-learn: Machine learning in Python’, Journal of machine learning research, 12(Oct), 2825-2830. 32. Shi, H., Yin, B., Kang, Y., Shao, C., & Gui, J. (2017), ‘Robust L-Isomap with a Novel
  • 54. 39 Landmark Selection Method’, Mathematical Problems in Engineering, 2017. 33. Ghodsi, A. (2006), ‘Dimensionality reduction a short tutorial’, Department of Statistics and Actuarial Science, Univ. of Waterloo, Ontario, Canada, 37, 38. 34. Van Der Maaten, L., Postma, E., & Van den Herik, J. (2009), ‘Dimensionality reduction: a comparative review’, J Mach Learn Res, 10, 66-71. 35. Jeong, M., Choi, J. H., & Koh, B. H. (2014), ‘Isomap‐based damage classification of cantilevered beam using modal frequency changes’, Structural Control and Health Monitoring, 21(4), 590-602. 36. Cayton, L. (2005), ‘Algorithms for manifold learning’, Univ. of California at San Diego Tech. Rep, 12(1-17), 1. 37. Cormen, T. H., Leiserson, C. E., & Rivest, R. L., Clifford stein (2001), ‘Introduction to algorithms’, Cambridge, MA, USA: MIT Press. 38. Silva, V. D., & Tenenbaum, J. B. (2003), ‘Global versus local methods in nonlinear dimensionality reduction’, In Advances in neural information processing systems, 721-728. 39. Liang, Y. M., Shih, S. W., Shih, A. C. C., Liao, H. Y. M., & Lin, C. C. (2009, May), ‘Unsupervised analysis of human behavior based on manifold learning’, In International Symposium on Circuits and Systems (ISCAS), Taipei, Taiwan, 2605-2608. 40. Dietterich, T. G. (2000, June), ‘Ensemble methods in machine learning’, In International workshop on multiple classifier systems, Springer, Berlin, Heidelberg, 1-15. 41. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998), ‘Boosting the margin: A new explanation for the effectiveness of voting methods’, The annals of statistics, 26(5), 1651-1686. 42. Breiman, L. (1996), ‘Bagging predictors’, Machine learning, 24(2), 123-140. 43. Oshiro, T. M., Perez, P. S., & Baranauskas, J. A. (2012, July), ‘How many trees in a random forest?’, In International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, Berlin, Heidelberg, 154-168. 44. Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006), ‘Random forests for land cover classification’, Pattern Recognition Letters, 27(4), 294-300.
  • 55. 40 45. Boulesteix, A. L., Janitza, S., Kruppa, J., & König, I. R. (2012), ‘Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics’, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 493-507. 46. Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., & Torr, P. H. (2008, June), ‘Randomized trees for human pose detection’, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-8. 47. Cutler, D. R., Edwards Jr, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007), ‘Random forests for classification in ecology’, Ecology, 88(11), 2783-2792. 48. Klassen, M., Cummings, M., & Saldana, G. (2008), ‘Investigation of Random Forest Performance with Cancer Microarray Data’, In proceedings of the ISCA 23rd International Conference on Computers and Their Applications (CATA), Cancun, Mexico, 64-69. 49. Gao, D., Zhang, Y. X., & Zhao, Y. H. (2009), ‘Random forest algorithm for classification of multiwavelength data’, Research in Astronomy and Astrophysics, 9(2), 220. 50. Fawagreh, K., Gaber, M. M., & Elyan, E. (2014), ‘Random forests: from early developments to recent advancements’, Systems Science & Control Engineering: An Open Access Journal, 2(1), 602-609. 51. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984), ‘Classification and Regression Trees’, Belmont, California, USA: Wadsworth Int. Group. 52. Freund, Y., & Schapire, R. E. (1996, July), ‘Experiments with a new boosting algorithm’, Machine Learning: Proceedings of the Thirteenth International Conference, 148–156. 53. Hastie, T., Tibshirani, R., & Friedman, J. (2001), ‘The elements of statistical learning’, New York, NY, USA, Springer. 54. Caruana, R., & Niculescu-Mizil, A. (2006, June), ‘An empirical comparison of supervised learning algorithms’, In Proceedings of the 23rd International Conference on Machine learning, Pittsburgh, PA, USA, 161-168. 55. Fawcett, T. (2006), ‘An introduction to ROC analysis’, Pattern recognition letters, 27(8), 861-
  • 56. 41 874. 56. Sasaki, Y. (2007), ‘The truth of the F-measure’, Teach Tutor mater, 1(5), 1-5. 57. Lewis, D. D. (1991), ‘Evaluating text categorization’, In Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California. 58. Jia, S., Hu, X., & Sun, L. (2013), ‘The comparison between random forest and support vector machine algorithm for predicting β-hairpin motifs in proteins’, Engineering, 5(10), 391.