Parallel Rule Generation For Efficient Classification System,
genetic algorithms,
divide and conquer approach to classification , Distributed computing to solve classification problem , heterogeneous approach to classification
Parallel Rule Generation For Efficient Classification System
1. Parallel Rule Generation for an Efficient
Classification System
Talha Ghaffar
MS(CS)9/23/2015 MS Thesis Defense
2. Scope of Presentation
• Introduction
• Background Study & Literature Review
• Proposed Technique
• Applications And Research Contribution
• Implementation
• Experimental Results
• Future Work
• Conclusion
9/23/2015 2MS Thesis Defense
3. Introduction
• Nowadays, many organizations utilize large databases for
analytical purposes
• With growing size of training data, researchers are converging
their research towards development or improvement of the data
mining techniques to match up the growth
• Major challenges while handling complex and large data:
– Sifting through the data efficiently
– Extracting relevant and useful information accurately
– Analyzing the extracted information and guiding
organizations decisions and actions reliably
9/23/2015 3MS Thesis Defense
4. Introduction ctd.
• Limited Computational Resources
• It seems inefficient applying sequential data mining techniques
with inherent drawbacks of taking long response time.
• Research suggests when data mining techniques are
implemented on parallel machines, improved processing and
response time is achieved
• Classification: Core tasks in data mining, the field that is
concerned with the extraction of knowledge or patterns from
databases through the building of predictive or
descriptive models (Learning Models)
9/23/2015 4MS Thesis Defense
5. 9/23/2015 5MS Thesis Defense
Att1 Att3 Att2 Class
No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply Model
Att1 Att3 Att2 Class
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
Classifier
e.g. IF then
Action
Training Set
Test Set
6. 9/23/2015 6MS Thesis Defense
Att1 Att3 Att2 Class
No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply
Model
Att1 Att3 Att2 Class
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
e.g. IF then
Action
Training Set
Test Set
7. Background Study
• Machine Learning: need to incorporate two important elements
that are computer based knowledge acquisition process and has
to state where skills or knowledge can be obtained.
– Mitchell says the concept of machine learning as a study of computer
algorithms that improve through experience automatically.
– Alpaydin defines machine learning as “the capability of the computer
program to acquire or develop new knowledge or skills from existing or
non existing examples for the sake of optimizing performance criterion”.
• Contrary to the Mitchell’s definition which lacks knowledge
acquisition process, this definition is of more preference to the
research domain
9/23/2015 7MS Thesis Defense
8. Background Study ctd.
• Building individual classifiers on subsets of the data sets using
appropriate learning models will result in accurate set of rules
on individual machines.
• Possible drop-off due to the parallelism can be reduced to the
nearly possible.
• Redundancy of records is possible which will need to be
removed while applying the subset approach.
• In the context of my work, the use of subsets for training is the
better and efficient approach.
• Supervised Learning
9/23/2015 8MS Thesis Defense
9. Background Study ctd.
• Several methods for classification have been introduced over the
years - e.g. decision trees, artificial neural networks, nearest
neighbor classifiers, support vector machines and so on
• Decision trees have decent accuracy and moreover are easier to
interpret, which is a crucial advantage when it comes to data
mining
• I also suspect that once an algorithm gains acceptance, it takes
time before scalable and parallelized versions of that algorithm
appear. For these reasons, decision trees are preferred
9/23/2015 9MS Thesis Defense
10. Background Study ctd.
• Common techniques that are used to overcome
the problem of large datasets and memory
limitations are as following.
1. Data sampling.
2. Feature selection.
3. Data pre-processing.
4. Parallel processing.
9/23/2015 10MS Thesis Defense
11. Background Study ctd.
• Parallel Approaches:
• Independent Partitioning
– Each processor is provided the complete data set
– All processors processes the same data set as input, builds and generate
rules according the set and then combined afterwards on combining
techniques
• Parallel Sequential Partitioning
– Every processor is allowed to generate particular subset of concepts
• Replicated Sequential Partitioning
– Each processor processes one particular partition of the data set
horizontally, and executes which is more or less the sequential algorithm,
as each processor can view only partial information
– Local set of concepts which is after coordinated to add up in the global
set of concepts.
9/23/2015 11MS Thesis Defense
12. Background Study ctd.
9/23/2015 12MS Thesis Defense
Combining Rules Description
Maximum Rule It seems reasonable i-e select the classifier with the
maximum confidence values. This rule can generate adverse
results if the particular classifier with maximum confidence
values is over-trained.
Sum Rule This rule is effective if every individual classifier is
independent of each other
When large set of similar classifiers is generated, it is helpful
to reduce the noise in large sets of so-called weak classifiers
Minimum Rule This rule selects the outcome of the classifier that has the
least objection against a certain class.
Product rule This rule is effective if every individual classifier is
independent of each other
Median Rule This rule is similar to the sum rule but may yield more robust
results.
13. Proposed Technique
• A three step approach which divides the very large dataset into
data chunks initially, processes it on defined N processors on
different machines, generates the final merged decision rule file
and resolves the conflicts that may arise later on
– Data Pre-Processing
– Parallel Rule Generation
– Rule Merging and Conflict Resolution
• Data Pre-Processing
– In this step, we divide the large dataset into N (N=user specified number)
smaller datasets.
– Round robin approach is being used, which gives random symmetric
distribution of data
9/23/2015 13MS Thesis Defense
14. 9/23/2015 14MS Thesis Defense
Training Set
Small Data
Chunks
Round
Robin
Data Pre-Processing Parallel Rule Generation
Small Data
Chunks
P
1
P
2
P
3
P
N
Learning
Algo.
IF
then
Action
IF
then
Action
IF
then
Action
IF
then
Action
Rule
Merging
&
Conflict
Res
IF {
} then {
Action; }
Rule Merging and Conflict
Resolution
15. Proposed Technique
• Parallel Rule Generation:
– In this step, each of the smaller dataset from previous step is given to
different processors so that the process of classification can be
performed in parallel on each processor.
– We can use any classification algorithm for generating rules or we can use
multiple classifiers on different processors for rule generation.
– These rules are in the form of if-then-else. The rules that each processor
will generate will only be valid for the data that is provided to that
particular processor only.
– It is a possibility that the rules generated on one processor may conflict
with rules generated at another processor and it is also possible that
more than one processor generate the same rules
9/23/2015 15MS Thesis Defense
16. Proposed Technique
• Parallel Rule Generation:
– Two additional steps need to be performed at this stage that are
1. Calculation of support for each individual rule and store it with each
rule.
2. Calculation of confidence for each individual rule and store it with each
rule.
9/23/2015 16MS Thesis Defense
17. Proposed Technique
• Rule Merging and Conflict Resolution:
• In this step, the rules generated by all the processors are
combined to get the final and complete rule set. While merging
the rules we encounter these problems:
– Redundancy of rules i-e same rules occurring more than one time.
– Conflicting rules i-e different decisions with same rules
9/23/2015 17MS Thesis Defense
18. Proposed Technique
• Use sufficiently large data sets on each processor that will
reduce the probability of conflicting rules and increase the
probability of similar rules
• Make the data distribution on each processor random so that
the distribution is un-biased and every processor will get almost
similar type of data and will produce almost similar rule set
• Take union of all rule sets that will remove the occurring of
single rule more than one times and it will also include all
possible unique rules
• If conflict appears, select the rule with more coverage
confidence
9/23/2015 18MS Thesis Defense
19. Proposed Technique
• If conflict appears, select the rule with more coverage
confidence
• This rule will be selected with more coverage and confidence.
9/23/2015 19MS Thesis Defense
20. Proposed Technique
• Conflicting rules with same support and different confidence
• This rule will be selected with more confidence.
9/23/2015 20MS Thesis Defense
21. Proposed Technique
• Conflicting rules with same confidence and different support
• This rule will be selected with more Support .
9/23/2015 21MS Thesis Defense
22. Proposed Technique
• Conflicting rules with different confidence and different support.
• In that case we can use the formula.
X = α(confidence)+β(support)
• Where α and β are two variables whose values can be between
0 and 1 and such that sum of their values is always 1.
9/23/2015 22MS Thesis Defense
23. Proposed Technique
• Once the conflict resolution is over the next step is optimization
of the results.
• For that purpose GA is used.
9/23/2015 23MS Thesis Defense
24. Rule Set Optimization Through GA
• Genetic algorithms are from family of computational models
which are based on biological evolution.
• One complete solution is represented in the form a simple
vector called chromosome.
• Set of chromosomes is called generation.
• Solution is evolved from one generation to another on the basis
of a fitness function, selection criteria and reproduction
operators.
• Final rule set that is obtained after conflict resolution and
combining individual rule sets is further optimized with the help
of GA.
• After applying GA not only no of rules in final classifier reduce
but also accuracy is increased.
9/23/2015 24MS Thesis Defense
25. Rule Set Optimization Through GA
• In our case representation of problem is as follow.
• In the proposed solution, rule set is encoded as the
chromosome using the string of bits where each bit representing
one rule
• 1 represents presence while 0 represents the absence of a rule
in chromosome.
9/23/2015 25MS Thesis Defense
26. Rule Set Optimization Through GA
• Algorithm is initialized with random generation .
• Fitness function is used to calculate the fitness of each classifier
so that new generation can be selected.
• In our case fitness function is simply the sum of confidence of
rules present in that cromosome.
• Cromosome with more fitness value is considered as candidate
for next generation.
• Next generation is produced using two genetic operators that is
crossover and mutation.
• One point crossover is chosen in our case.
9/23/2015 26MS Thesis Defense
27. Rule Set Optimization Through GA
• Algorithm will stop on two stopping conditions.
1. No of maximum iteration is exhausted.
2. Algorithm is converged to optimal point and no further
convergence is possible.
9/23/2015 27MS Thesis Defense
28. Rule Set Optimization Through GA
• The parameters used for the GA are as following.
9/23/2015 28MS Thesis Defense
Parameters for GA Values
Cross-Over Rate 95%
Mutation Rate 5%
Population Size 5000
Number of Generations 3000
29. Rule Set Optimization Through GA
• After applying GA on the rule set, the following reduction of
rules in rule set is seen:
9/23/2015 29MS Thesis Defense
Datasets # of Rules Optimized Set
of Rules
TAE 90 72
Zoo 25 20
Balance Scale 280 190
Tic-Tac-Toe 131 122
Car Evaluation 221 209
Breast-Cancer 50 41
Mushroom 20 17
Nursery 62 48
30. Application Areas And Significance
• Improved efficiency due to parallelism .
• Overcoming memory limitations.
• Computation reusability.
• Continuous learning system.
• Scalability
• One generic classifier.
• Heterogeneous and flexible classifier.
• Improved Accuracy.
9/23/2015 30MS Thesis Defense
31. Results And Findings
• The methodology that I adopted for the compilation of results is
as following.
• First and the most critical part was the selection of data sets .
• 8 different well known state of the art data sets were used in the
experimentation process each of them is of different size so that
we can test our technique against all size of datasets.
9/23/2015 31MS Thesis Defense
32. Results And Findings
• First of all data is divided into three sets that are.
1. Training set(66%)
2. Validation set(17%)
3. Test set(17%)
• Training set was further divided into n small partitions.
• Each partition was used to build a separate classifier.
• Then all these classifiers were combined to make one final
classifier.
9/23/2015 32MS Thesis Defense
33. Results And Findings
• Final classifier is then optimized with GA.
• At this stage validation set is used for tuning and optimization
purpose.
• Finally the optimized classifier is used for the evaluation
purpose here the test set is used and results are computed over
it.
9/23/2015 33MS Thesis Defense
34. Results And Findings
• Following is the percentage split details of training, validation
and test data sets:
9/23/2015 34MS Thesis Defense
Data Sets Details
Zoo
Discrete, 66% training 17% validation,17% test set
Balance Scale
Discrete, 66% training 17% validation,17% test set
Tic-Tac-Toe
Discrete, 66% training 17% validation,17% test set
Car Evaluation
Discrete, 66% training 17% validation,17% test set
Mushroom
Discrete, 66% training 17% validation,17% test set
35. Results And Findings
• Results before optimization are as following.
9/23/2015 35MS Thesis Defense
Data Sets Accuracy
Zoo 88%
Balance Scale 35%
Tic-Tac-Toe 88%
Car Evaluation 86%
Mushroom 100%
36. Results And Findings
• Results After optimization are as following.
9/23/2015 36MS Thesis Defense
Data Sets Accuracy
Zoo 95%
Balance Scale 73%
Tic-Tac-Toe 89%
Car Evaluation 98.6%
Mushroom 100%
37. Future Work
• In every proposed technique, there is always room for
improvement and enhancements. Proposed technique can be
extended further in the following directions:
• Particle swarm optimization algorithms can be chosen to
optimize the rule set.
• This technique divide dataset horizontally we can consider
dividing dataset vertically that may result in improved accuracy.
• Further work on parameter optimization can be done.
9/23/2015 37MS Thesis Defense
Mitchell’s definition does not reflect anything related to knowledge acquisition process for the stated computer programs, therefore it is considered insufficient in our domain of research.
Supervised Learning: covers learning algorithms that reason from externally provided instances as input to produce general hypothesis and make predictions about future unknown instances.
Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.