reference paper.pdf

Non - Dominated Sorting Genetic Algorithm - III based Meta Decision Tree
Gursimran Kaur
Computer Science and Engineering Department
Thapar Institute of Engineering and Technology
(Deemed to be University)
Patiala, India
gkaur.me16@thapar.edu
Harkiran Kaur
Computer Science and Engineering Department
Thapar Institute of Engineering and Technology
(Deemed to be University)
Patiala, India
harkiran.kaur@thapar.edu
Abstract—Buying or selling a property is a financial as well as
an emotional undertaking. In this advanced era, these
processes can be addressed differently than earlier, with more
accuracy and optimization into them. Although, existing
models perform significantly to classify real estate purchasing,
but, suffer from the parameter tuning issue. This issue has
been solved by considering the well-known meta-heuristic
optimization technique i.e., NSGA-III. It iteratively optimizes
the meta-J48 model to improve the classification rate by
considering mutation and crossover operator. The obtained
solutions are non-dominated in nature, therefore, proposed
model can provide better accuracy as well as other parameters
concurrently. Extensive experiments have been performed. It
has been found that the proposed technique outperforms in
terms of Accuracy, TP-Rate, TN-Rate, Precision and
F_Measure. Therefore, the proposed technique is applicable
for real-time real-estate users.
Keywords- Data Mining; Real Estate Database; NSGA-III;
Confusion Matri
I. INTRODUCTION
Data Mining is an analytical process which is used to
extract hidden patterns from available data sources. On the
contrary, big data problem can be better addressed by an
enhanced data analysis process. Data mining is not an
individual process but it is an essential part of Knowledge
Discovery from Data (KDD) process. Data mining comprises
of set of specific methods and techniques aimed specifically
at extracting patterns from raw data.
In the data world, data mining [1] is a promising and well
known field, attracting a great deal of attention due to the
wide availability of data in diverse forms. Moreover, there is
an urgent need of turning this data into meaningful and
useful information to finally gain knowledge that too at a fast
pace. Though, data mining is usually used as a synonym for
Knowledge Discovery from Data, but the fact is that it is
only an essential part of this radiant KDD process. In simple
words, Data mining is a process where intelligent techniques
are applied on preprocessed data (clean, complete,
transformed and reduced) for extracting the desired data
patterns. It always results in extracting the hidden patterns
for decision making. Data mining takes input from the data
warehouse or from on-line analytical processing (OLAP)
servers. Data mining comprises of many intelligent
techniques to analyze the given data. Data mining job is not
restricted to summarize the data but extract hidden useful
patterns for decision making. Based upon the services
provided by data systems, the data mining systems can be
categorized as:
• Machine Learning System: These systems are not
capable of handling bulky data. These systems are
based on statistical data analysis that follows the
rules and procedure of experimental systems.
• Database System: These systems (information
retrieval) are responsible for retrieving data or
information. According to the requirements, the data
can be aggregated or generalized. These systems are
also capable of answering the query for large
databases. “Data mining incorporates various
technologies from multiple disciplines like file
system, database, data warehouse, machine learning,
pattern recognition, fuzzy based approach, neural
networks, image processing, data visualization,
statistics, information retrieval, spatial and temporal
data analysis, signal processing, seismic data”.
NSGA-III-tree-based based machine learning technique
is used to classify real estate data. NSGA-III is a well-known
metaheuristic technique which can find optimal solutions.
Initially, a collection of reference points is created. For an
objective problem, is a dimensional vector represented by,
where and next, the original population using N users is
definitely arbitrarily produced. Intended for an ideal point,
therefore it is really estimated with the bare minimum cost
discovered to date to get intent objective and is kept up to
date in the search [2]. Along with such as, it is integrated
with e present population as well as form a different
population. Thereafter, is normalized applying an ideal point.
Following normalization, the particular clustering user is
used to split the particular users inside into a set of clusters
where the cluster is definitely depicted by the reference
point. Then, a non-dominated organizing dependent on
importance (not Pareto-dominance) works to help classify in
unique non-domination levels Dominance, which is a key
principle in NSGA-III, will be presented later. The moment
non-dominated-sorting have been accomplished, right now
their maiming steps complete the population slots in making
use of 1 level at one time, commencing from. Compared
with both equally NSGA-II and NSGA-III, we just at
random select answers in the last recognized level in NSGA-
III, due to the fact dominance offers stressed out both equally
unity plus diversity [3]. Genetic Algorithm (GA) is a meta-
heuristic that impersonates the procedure of normal
162
2018 2nd International Conference on Micro-Electronics and Telecommunication Engineering
978-1-5386-6918-1/18/$31.00 ©2018 IEEE
DOI 10.1109/ICMETE.2018.00045

assessment. Genetic operators are of three types as explained
below:
• Selection: Selection manages the probabilistic
survival of the fittest, in that, more fit chromosomes
are survived. Where wellness is a practically
identical proportion of how well a chromosome
takes care of the current issue. There are distinctive
methods to execute choice in hereditary calculations.
They are competition determination, roulette wheel
choice, proportionate choice, rank choice, and
enduring state choice and so forth.
• Crossover: This task is performed by choosing an
arbitrary quality along the length of the
chromosomes and swapping every one of the
qualities after that point. The most famous hybrid
chooses any two arrangements strings haphazardly
from the mating pool and some part of the strings is
traded between the strings. The determination point
chosen haphazardly. A likelihood of hybrid is
likewise acquainted all together with offer flexibility
to an individual arrangement string to decide if the
arrangement would go for hybrid or not [4].
• Mutation: Mutation is the periodic acquaintance of
new highlights in with the arrangement string of the
populace pool to keep up decent variety in the
populace. Despite the fact that hybrid has the
fundamental obligation to look for the ideal
arrangement, change is additionally utilized for this
reason. Transformation administrator changes a 1 to
0 or the other way around, with a transformation
likelihood of. The change likelihood is for the most
part kept low for relentless intermingling.
II. RELATED WORK
Wherever Times is specified, Times Roman or Times
New Roman may be used. If neither is available on your
word processor, please use the font closest in appearance to
Times. Avoid using bit-mapped fonts if possible. True-Type
1 or Open Type fonts are preferred. Please embed symbol
fonts, as well, for math, etc. In this area, the papers identified
with prescriptive analysis on different domains are studied.
From these literatures, variety of models were found for
performing optimizations. As per guidance from papers
Genetic Algorithm (GA) and there variants NSGA-I, NSGA-
II and NSGA-III were chosen to carry forward for research.
Following papers cover the related part.
Li K. et al. (2018) in [5] informs that flip chip innovation
has been generally utilized as a part of IC bundling, and the
blend of flip chip innovation and weld joint interconnection
innovation has been used in the assembling of electronic
gadgets all around. As the advancement of flip chip towards
high thickness and ultra-fine pitch, the examination of flip
chips is stood up to with incredible difficulties. This paper
built up an insightful framework utilized for the discovery of
flip chips in light of vibration. Thirty-four highlights
including 18 time space highlights and 16 recurrence area
highlights were extricated from the crude vibration
information. The help vector machine was utilized to execute
the acknowledgment and arrangement of flip chips. With a
specific end goal to enhance the grouping exactness of SVM,
cross approval (CV) and hereditary calculation (GA) were
used to enhance the parameters of SVM individually. SVM,
CV-SVM and GA-SVM were connected to order
independently and the outcomes were acquired. By
correlation, GA-SVM can perceive and group the flip chips
quickly with high exactness. In this way, GA-SVM is
powerful for the imperfection investigation of flip chips.
Hong J. et al. (2018) in [6] explains one of the genuine
troubles of dealing with Big Data improvement issues by
methods for customary multi-objective transformative
estimations (MOEAs) is their high computational costs. This
issue has been capably taken care of by non-ruled
masterminding innate figuring, the third shape, (NSGA-III).
On the other hand, a stress over the NSGA-III estimation is
that it uses a settled rate for change head. To adjust to this
issue, this examination familiarizes a flexible change
manager to enhance the execution of the standard NSGA-III
figuring. The proposed flexible change director framework is
evaluated using three cross breed overseers of NSGA-III
including reenacted matched half and half (SBX), uniform
mixture (UC) and single point crossover (SI). In this way,
three improved NSGA-III estimations (NSGA-III SBXAM,
NSGA-III SIAM, and NSGA-III UCAM) are made. These
updated estimations are then completed to settle different Big
Data streamlining issues. Preliminary comes to fruition
demonstrate that NSGA-III with UC and adaptable change
chairman outmaneuvers the other NSGA-III counts.
Hu C. et al. (2018) in [7] says that Contaminant
occasions in drinkable Water Dissemination Systems
(WDSs) have happened as often as possible lately, causing
serious harms, financial misfortune, and dependable societal
effect. A basic and viable technique to screen WDS
continuously is conveying a water quality sensor. Be that as
it may, the position of such sensors in a water appropriation
organize (WDN) has turned into a principal worry the world
over. In this paper, initially examine sensor position
numerically and demonstrate that it is NP-hard.
Consequently, we recognize single-and multi-target
advancement, and endeavor, out of the blue, to propose a
changed NSGA-III to illuminate many-target streamlining
for the sensor situation issue. WDNs of two sizes are utilized
and recreation comes about show the legitimacy and
adequacy of the proposed model and system. The future
research works are likewise recognized and examined.
Elarbi M. et al. (2018) in [8] examines that as of late
decay has picked up a wide enthusiasm for taking care of
multi-target improvement issues including in excess of three
goals otherwise called Many-target Optimization Problems
(MaOPs). Over the most recent couple of years, there have
been numerous recommendations to utilize disintegration to
handle unconstrained issues. Regardless, less is the
proportion of works that has been given to propose new
deterioration based counts to deal with obliged many-target
issues. This paper proposed the ISC-Pareto quality (Isolated
Solution-based Constrained Pareto prevalence) association
that can: (1) handle obliged many-target issues depicted by
163

different sorts of inconveniences and (2) bolster the
assurance of not simply infeasible plans identified with
separated sub-locale yet likewise infeasible courses of action
with tinier CV (Constraint Violation) values. The basic
dealing with framework has been joined into the structure of
the Constrained NSGA-III to convey another computation
called Isolated Solution-based Constrained NSGA-III. The
observational results have shown that our necessity dealing
with procedure can give better and forceful results when
considered against three starting late proposed constrained
disintegration based many-objective transformative figurings
despite a discipline build adjustment of NSGA-III
concerning the CDTLZ benchmark issues including up to
fifteen objectives. Additionally, the ampleness of ISC-
NSGA-III on a genuine water organization issue is shown.
Chahardoli S. et al. (2018) in [9] examines, a punctured
topped end funnel shaped steel safeguard was explored to
upgrade its divider thickness and opening stature. The
openings were worked out on the edge of the safeguard to
bring down pinnacle compel at fall. For this reason, once
completed with reenacting the safeguard using LS-Dyna
programming. Further, checking the reproduced display
utilizing test information, opening stature and divider
thickness of the safeguard were upgraded to accomplish
most extreme vitality assimilation alongside least pinnacle
compel. A whole of 96 novel cases were impersonated, of
which 7 cases were subjected to preliminary tests. The
streamlining was performed using NSGA-III and MOEA/D
counts executed in MATLAB programming. Response
surface approach was used to choose input capacities with
regards to these counts. Finally, perfect position for the holes
in cone like protections was seen to be the nearest point to
the upper base of the truncated cone. A for the most part
average affirmations was seen between the delayed
consequences of NSGA-III and MOEA/D computations, and
the counts could predict perfect divider thickness and
opening position at a commendable precision now and again.
Zhu Y. et al. (2017) in [10] highlights that determination
can enhance order precision and diminishing the
computational intricacy of arrangement. Information
includes in interruption location frameworks (IDS)
constantly introduce the issue of imbalanced order in which a
few characterizations just have a couple of cases while others
have numerous occurrences. This irregularity can clearly
restrain order effectiveness, yet couple of endeavors had
been made to address it. In this paper, a plan for the many-
target issue was proposed for highlighting choice in IDS,
which utilizes two systems, in particular, an exceptional
mastery technique and a predefined technique focused on
seek, for populace development. It can separate movement
amongst ordinary and strange as well as by variation from
the norm write. In view of plan, NSGA-III is utilized to
acquire a sufficient element subset with great execution. An
enhanced many-target streamlining calculation (I-NSGA-III)
is additionally proposed utilizing a novel specialty protection
technique. It comprises of an inclination choice process that
chooses the person with the least chose highlights and a fit-
choice process that chooses the person with the greatest
entirety weight of its targets. Trial comes about demonstrate
that I-NSGA-III can ease the unevenness issue with higher
grouping precision for classes having less occasions.
Additionally, it can accomplish both higher characterization
precision and lower computational unpredictability.
Tavana M. et al. (2016) in [11] proposed that X-bar
control diagrams are broadly used to screen business and
assembling bisiness. This investigation considers a X-bar
control graph outline issue with different and regularly
clashing destinations, including the normal time the
procedure stays in measurable control status, the sort I
blunder, and the discovery control. A coordinated multi-
target calculation is proposed for improving prudent control
outline plan. Further connected multi-target enhancement
strategies established on the reference-focuses based NSGA
III and a multi-target molecule swarm streamlining
(MOPSO) calculation to productively tackle the
improvement issue. At that point, two diverse different
criteria basic leadership (MCDM) strategies, including Data
Envelopment Analysis (DEA) and the strategy for request of
inclination by comparability to perfect arrangement
(TOPSIS), are utilized to decrease the quantity of Pareto
ideal answers for a reasonable size. Four DEA techniques
analyze the ideal arrangements in light of relative
proficiency, and afterward the TOPSIS strategy positions the
productive ideal arrangements. A few measurements are
utilized to think about the execution of the NSGA-III and
MOPSO calculations. Moreover, the DEA and TOPSIS
techniques are utilized to think about the execution of
NSGA-III and MOPSO. A notable contextual investigation
is figured and settled to show the pertinence and display the
viability of the proposed enhancement calculation
III. PROPOSED WORK
First, confirm that you have the correct template for your
paper size. This template has been tailored for output on the
US-letter paper size. If you are using A4-sized paper, please
close this template and download the file for A4 paper
format called “CPS_A4_format”. NSGA-III is a well-known
metaheuristic technique which can find optimal path between
given set of nodes with sink as destination. So, NSGA-III
based tree beggar is proposed. Tree beggar here is creation of
multiple trees same as random forest. Also training the
model needs all the input parameters that are taken as input.
But now tuning would not be done manually, rather is done
using optimization techniques. Genetic Algorithm (GA) was
found during research because it’s easy to implement. But
Genetic Algorithm (GA) has issues:
• The Poor convergence speed
• May stuck in local optima
• May suffer from premature convergence
To get rid from these issues, researchers of GA found
variants of GA that is NSGA-I, NSGA-II and NSGA-III.
Each variant had one or another feature but we carried
NSGA-III because it overcomes all the issues of GA as
shown in figure. Also, it can optimize up to thirteen
parameters at once likewise accuracy, true positive rate, true
negative rate, F-measure and precision are just five
parameters. Although, optimizing thirteen parameters at once
164

may reduce a little speed. NSGA-III creates random
solutions and on the basis of random solution checks the tree
beggar. Here predicted results are again checked. The best
known accuracy that came so far is saved. Then again re-
combinations are done and child is created that is there
accuracy is checked. If child has better results than parent,
then the model upgrades the parent’s accuracy.
IV. RESULT AND DISCUSIUON
This section covers the cross authentication between
existing and proposed techniques. Some familiar algorithms
parameters have been chosen to show that the performance
of the proposed algorithm is superior to the existing
techniques. For experimentation and implementation, the
proposed technique is evaluated using MATLAB tool
u2013a and statistics & machine learning toolbox. Here we
will compare the parameters of existing with proposed
algorithm i.e. merging of J48 decision tree, SVM and
NSGA-III. The tabular and graphical comparison has been
done between existing and proposed methodology on the
basis of parameters like TP-Rate, TN-Rate, Accuracy, F
measure and Precision..
A. Accuracy
Accuracy is also called as rand accuracy or rand index.
Accuracy is measured with respect to reality. Accuracy is
calculated by following equation. In which true positive
(TP), true negative (TN) and false positive (FP), false
negative (FN) is considered for the calculation.
TABLE I. ACCURACY EVALUATION
Percentage
of
Training
Data
J48 SVM NSGA-III
10 0.9525 0.9850 0.9918
20 0.9563 0.9850 0.9912
30 0.9500 0.9842 0.9925
40 0.9444 0.9850 0.9922
50 0.9505 0.9885 0.9917
60 0.9429 0.9871 0.9912
70 0.9486 0.9850 0.9924
80 0.9437 0.9841 0.9928
90 0.9383 0.9753 0.9920
Table I is indicated about quantized research into the
Accuracy. As Accuracy ought to be higher which implies
proposed algorithm is indicating the superior results when
compared to access methods as the Accuracy is higher in
each case.
Figure. 1: indicates about comparison of J48, SVM and
NSGA-III method wherever x-axis indicates size of training
data as well as y- axis indicates Accuracy. Here, Blue line
indicates the j48 technique, green line indicate the SVM and
green line indicate the NSGA-III. In our case the proposed
Accuracy are comparatively higher than existing one.
Figure 1. Represent the Accuracy.
B. True Positive Rate
True positive rate are needed functions of classifier. It
defines as how many correct positive occur among all
positive results. It is known as the hypothesis of the correct
results that has been configured during the system working.
True positive rate is calculated as:
TABLE II. TRUE POSITIVE RATE EVALUATION
Percentage
of
Training
Data
J48 SVM NSGA-III
10 0.9254 0.9231 0.9605
20 0.9296 0.9542 0.9579
30 0.9342 0.9611 0.9587
40 0.9272 0.9593 0.9605
50 0.9235 0.9724 0.9607
60 0.9033 0.9788 0.9579
70 0.9254 0.9832 0.9592
80 0.9202 0.9692 0.9592
90 0.9227 0.9754 0.9592
Table II is indicated about quantized research into the
True positive rate. As True positive rate ought to be higher
which implies proposed algorithm is indicating the superior
results when compared to access methods as the True
positive rate is higher in each case.
Figure.2: indicates about comparison of True positive
rate between existing and the proposed method wherever x-
axis indicates size of training data as well as y- axis indicates
True positive rate. Here, red line indicates the proposed
technique and blue line indicate the previous one. In our case
the proposed True positive rate are comparatively higher
than existing one
165

Figure 2. Represent the True positive rate.
C. True Negative Rate
True negative rate is amount of negatives that are
correctly identified as negatives. It defines as how many
negative occurs among the all results. These are the amount
of outcomes that are predicted negative but actually negative.
The formula for calculating true negative rate is:
TABLE III. TRUE NEGATIVE RATE EVALUATION
Percentage
of
Training
Data
J48 SVM NSGA-III
10 0.9688 0.9899 0.9986
20 0.9626 0.9999 0.9984
30 0.9820 0.9542 0.9986
40 0.9601 0.9974 0.9986
50 0.9660 0.9979 0.9985
60 0.9754 0.9965 0.9984
70 0.9583 0.9954 0.9986
80 0.9618 0.9946 0.9986
90 0.9532 0.9720 0.9986
Table III is indicated about quantized research into the
True negative rate. As True negative rate ought to be higher
which implies proposed algorithm is indicating the superior
results when compared to access methods as the True
negative rate is higher in each case.
Figure.3: indicates about comparison of True negative
rate between existing and the proposed method wherever x-
axis indicates size of training data as well as y- axis indicates
True negative rate. Here, red line indicates the proposed
technique and blue line indicate the previous one. In our case
the proposed True negative rate are comparatively higher
than existing one.
Figure 3. Represent the True negative rate
D. F-Measure
F-Measure is also called F1 score. It contains both
precision and recall. It is generally use to check the accuracy
and reliability. It computes the mean of precision and recall.
Basically, it uses as best and 0 as worst when both precision
and recall are used. F-measure can be calculated with using
the formula given as:
TABLE IV. F_MEASURE EVALUATION
Percentage
of
Training
Data
J48 SVM NSGA-III
10 0.9394 0.9600 0.9790
20 0.9395 0.9466 0.9775
30 0.9552 0.9802 0.9780
40 0.9391 0.9775 0.9790
50 0.9409 0.9846 0.9791
60 0.9330 0.9869 0.9775
70 0.9353 0.9888 0.9782
80 0.9345 0.9810 0.9782
90 0.9310 0.9694 0.9782
Table IV is indicated about quantized research into the F-
Measure. As F-Measure ought to be higher which implies
compared to access methods as the F-Measure is higher in
each case.
Figure.4: indicates about comparison of F-Measure
between existing and the proposed method wherever x-axis
indicates size of training data as well as y- axis indicates F-
Measure. Here, red line indicates the proposed technique and
blue line indicate the previous one. In our case the proposed
F-Measure are comparatively higher than existing one.
166

Figure 4. Represent the F-Measure
E. Precision
Precision is defined as measurement of all positive cases
that are identified when making calculations. Precision is
also known as positive predictive value. Higher Precision
signifies that an algorithm significantly returned more
relevant results when compared to irrelevant. Precision can
be calculated by using the formula:
TABLE V. PRECISION EVALUATION
Percentage
of
Training
Data
J48 SVM NSGA-III
10 0.9538 0.9999 0.9981
20 0.9496 0.9899 0.9980
30 0.9771 0.9842 0.9980
40 0.9513 0.9965 0.9981
50 0.9590 0.9972 0.9981
60 0.9647 0.9952 0.9980
70 0.9453 0.9944 0.9981
80 0.9492 0.9930 0.9981
90 0.9395 0.9653 0.9981
Table V is indicated about quantized research into the
Precision. As Precision ought to be higher which implies
compared to access methods as the Precision is higher in
each case.
Figure.5: indicates about comparison of Precision
between existing and the proposed method wherever x-axis
indicates size of training data as well as y- axis indicates
Precision. Here, red line indicates the proposed technique
and blue line indicate the previous one. In our case the
proposed Precision are comparatively higher than existing
one.
Figure 5. Represent the Precision
ACKNOWLEDGMENT
It gives me immense pleasure in expressing thanks and
profound gratitude to Harkiran Kaur, Lecturer, Computer
Science & Engineering Department, Thapar Institute of
Engineering and Technology, Patiala for her valuable
guidance and continual encouragement throughout this
research work.
REFERENCES
[1] Larose, & Daniel T., “Data mining methods & models”, John Wiley
& Sons, 2006.
[2] Bhargava &Neeraj, "Decision tree analysis on j48 algorithm for data
mining", Proceedings of International Journal of Advanced Research
in Computer Science and Software Engineering, vol. 3 no .6, 2013
[3] Mkaouer, Wiem, et al. "Many-objective software remodularization
using NSGA-III." ACM Transactions on Software Engineering and
Methodology (TOSEM), vol. 24 no. 3, 2017.
[4] Mühlenbein, Heinz, M. Schomisch & Joachim Born, "The parallel
genetic algorithm as function optimizer", Parallel computing, vol. 17
no. 6-7, pp. 619-632, 2012.
[5] Li K, Wang L, Wu J, Zhang Q, Liao G, Su L., “Using GA-SVM for
defect inspection of flip chips based on vibration signals”,
Microelectronics Reliability, vol. 81, pp. 159-66, 2018.
[6] Yi JH, Deb S, Dong J, Alavi AH, Wang GG, “An improved NSGA-
III algorithm with adaptive mutation operator for Big Data
optimization problems”, Future Generation Computer Systems, 2018.
[7] Hu C, Dai L, Yan X, Gong W, Liu X, Wang L., “Modified NSGA-III
for sensor placement in water distribution system”, Information
Sciences, 2018.
[8] Elarbi M, Bechikh S, Said LB., “On the Importance of Isolated
Infeasible Solutions in the Many-objective Constrained NSGA-III”,
Knowledge-Based Systems, 2018.
[9] Chahardoli S, Hadian H, Vahedi R., “Optimization of hole height and
wall thickness in perforated capped-end conical absorbers under axial
quasi-static loading (using NSGA-III and MOEA/D algorithms)”,
Thin-Walled Structures, vol. 127, pp. 540-55, 2018.
[10] Tavana M, Li Z, Mobin M, Komaki M, Teymourian E., “Multi-
objective control chart design optimization using NSGA-III and
MOPSO enhanced with DEA and TOPSIS Expert Systems with
Applications”, vol. 50, pp. 17-39, 2016.
[11] Liu, Huan, and Hiroshi Motoda, “Feature selection for knowledge
discovery and data mining”, Springer Science & Business Media, vol.
454, 2012.
167

reference paper.pdf

Recommended

Recommended

More Related Content

Similar to reference paper.pdf

Similar to reference paper.pdf (20)

Recently uploaded

Recently uploaded (20)

reference paper.pdf