SlideShare a Scribd company logo
1 of 38
Parallel Rule Generation for an Efficient
Classification System
Talha Ghaffar
MS(CS)9/23/2015 MS Thesis Defense
Scope of Presentation
• Introduction
• Background Study & Literature Review
• Proposed Technique
• Applications And Research Contribution
• Implementation
• Experimental Results
• Future Work
• Conclusion
9/23/2015 2MS Thesis Defense
Introduction
• Nowadays, many organizations utilize large databases for
analytical purposes
• With growing size of training data, researchers are converging
their research towards development or improvement of the data
mining techniques to match up the growth
• Major challenges while handling complex and large data:
– Sifting through the data efficiently
– Extracting relevant and useful information accurately
– Analyzing the extracted information and guiding
organizations decisions and actions reliably
9/23/2015 3MS Thesis Defense
Introduction ctd.
• Limited Computational Resources
• It seems inefficient applying sequential data mining techniques
with inherent drawbacks of taking long response time.
• Research suggests when data mining techniques are
implemented on parallel machines, improved processing and
response time is achieved
• Classification: Core tasks in data mining, the field that is
concerned with the extraction of knowledge or patterns from
databases through the building of predictive or
descriptive models (Learning Models)
9/23/2015 4MS Thesis Defense
9/23/2015 5MS Thesis Defense
Att1 Att3 Att2 Class
No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply Model
Att1 Att3 Att2 Class
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
Classifier
e.g. IF then
Action
Training Set
Test Set
9/23/2015 6MS Thesis Defense
Att1 Att3 Att2 Class
No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply
Model
Att1 Att3 Att2 Class
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
e.g. IF then
Action
Training Set
Test Set
Background Study
• Machine Learning: need to incorporate two important elements
that are computer based knowledge acquisition process and has
to state where skills or knowledge can be obtained.
– Mitchell says the concept of machine learning as a study of computer
algorithms that improve through experience automatically.
– Alpaydin defines machine learning as “the capability of the computer
program to acquire or develop new knowledge or skills from existing or
non existing examples for the sake of optimizing performance criterion”.
• Contrary to the Mitchell’s definition which lacks knowledge
acquisition process, this definition is of more preference to the
research domain
9/23/2015 7MS Thesis Defense
Background Study ctd.
• Building individual classifiers on subsets of the data sets using
appropriate learning models will result in accurate set of rules
on individual machines.
• Possible drop-off due to the parallelism can be reduced to the
nearly possible.
• Redundancy of records is possible which will need to be
removed while applying the subset approach.
• In the context of my work, the use of subsets for training is the
better and efficient approach.
• Supervised Learning
9/23/2015 8MS Thesis Defense
Background Study ctd.
• Several methods for classification have been introduced over the
years - e.g. decision trees, artificial neural networks, nearest
neighbor classifiers, support vector machines and so on
• Decision trees have decent accuracy and moreover are easier to
interpret, which is a crucial advantage when it comes to data
mining
• I also suspect that once an algorithm gains acceptance, it takes
time before scalable and parallelized versions of that algorithm
appear. For these reasons, decision trees are preferred
9/23/2015 9MS Thesis Defense
Background Study ctd.
• Common techniques that are used to overcome
the problem of large datasets and memory
limitations are as following.
1. Data sampling.
2. Feature selection.
3. Data pre-processing.
4. Parallel processing.
9/23/2015 10MS Thesis Defense
Background Study ctd.
• Parallel Approaches:
• Independent Partitioning
– Each processor is provided the complete data set
– All processors processes the same data set as input, builds and generate
rules according the set and then combined afterwards on combining
techniques
• Parallel Sequential Partitioning
– Every processor is allowed to generate particular subset of concepts
• Replicated Sequential Partitioning
– Each processor processes one particular partition of the data set
horizontally, and executes which is more or less the sequential algorithm,
as each processor can view only partial information
– Local set of concepts which is after coordinated to add up in the global
set of concepts.
9/23/2015 11MS Thesis Defense
Background Study ctd.
9/23/2015 12MS Thesis Defense
Combining Rules Description
Maximum Rule It seems reasonable i-e select the classifier with the
maximum confidence values. This rule can generate adverse
results if the particular classifier with maximum confidence
values is over-trained.
Sum Rule This rule is effective if every individual classifier is
independent of each other
When large set of similar classifiers is generated, it is helpful
to reduce the noise in large sets of so-called weak classifiers
Minimum Rule This rule selects the outcome of the classifier that has the
least objection against a certain class.
Product rule This rule is effective if every individual classifier is
independent of each other
Median Rule This rule is similar to the sum rule but may yield more robust
results.
Proposed Technique
• A three step approach which divides the very large dataset into
data chunks initially, processes it on defined N processors on
different machines, generates the final merged decision rule file
and resolves the conflicts that may arise later on
– Data Pre-Processing
– Parallel Rule Generation
– Rule Merging and Conflict Resolution
• Data Pre-Processing
– In this step, we divide the large dataset into N (N=user specified number)
smaller datasets.
– Round robin approach is being used, which gives random symmetric
distribution of data
9/23/2015 13MS Thesis Defense
9/23/2015 14MS Thesis Defense
Training Set
Small Data
Chunks
Round
Robin
Data Pre-Processing Parallel Rule Generation
Small Data
Chunks
P
1
P
2
P
3
P
N
Learning
Algo.
IF
then
Action
IF
then
Action
IF
then
Action
IF
then
Action
Rule
Merging
&
Conflict
Res
IF {
} then {
Action; }
Rule Merging and Conflict
Resolution
Proposed Technique
• Parallel Rule Generation:
– In this step, each of the smaller dataset from previous step is given to
different processors so that the process of classification can be
performed in parallel on each processor.
– We can use any classification algorithm for generating rules or we can use
multiple classifiers on different processors for rule generation.
– These rules are in the form of if-then-else. The rules that each processor
will generate will only be valid for the data that is provided to that
particular processor only.
– It is a possibility that the rules generated on one processor may conflict
with rules generated at another processor and it is also possible that
more than one processor generate the same rules
9/23/2015 15MS Thesis Defense
Proposed Technique
• Parallel Rule Generation:
– Two additional steps need to be performed at this stage that are
1. Calculation of support for each individual rule and store it with each
rule.
2. Calculation of confidence for each individual rule and store it with each
rule.
9/23/2015 16MS Thesis Defense
Proposed Technique
• Rule Merging and Conflict Resolution:
• In this step, the rules generated by all the processors are
combined to get the final and complete rule set. While merging
the rules we encounter these problems:
– Redundancy of rules i-e same rules occurring more than one time.
– Conflicting rules i-e different decisions with same rules
9/23/2015 17MS Thesis Defense
Proposed Technique
• Use sufficiently large data sets on each processor that will
reduce the probability of conflicting rules and increase the
probability of similar rules
• Make the data distribution on each processor random so that
the distribution is un-biased and every processor will get almost
similar type of data and will produce almost similar rule set
• Take union of all rule sets that will remove the occurring of
single rule more than one times and it will also include all
possible unique rules
• If conflict appears, select the rule with more coverage
confidence
9/23/2015 18MS Thesis Defense
Proposed Technique
• If conflict appears, select the rule with more coverage
confidence
• This rule will be selected with more coverage and confidence.
9/23/2015 19MS Thesis Defense
Proposed Technique
• Conflicting rules with same support and different confidence
• This rule will be selected with more confidence.
9/23/2015 20MS Thesis Defense
Proposed Technique
• Conflicting rules with same confidence and different support
• This rule will be selected with more Support .
9/23/2015 21MS Thesis Defense
Proposed Technique
• Conflicting rules with different confidence and different support.
• In that case we can use the formula.
X = α(confidence)+β(support)
• Where α and β are two variables whose values can be between
0 and 1 and such that sum of their values is always 1.
9/23/2015 22MS Thesis Defense
Proposed Technique
• Once the conflict resolution is over the next step is optimization
of the results.
• For that purpose GA is used.
9/23/2015 23MS Thesis Defense
Rule Set Optimization Through GA
• Genetic algorithms are from family of computational models
which are based on biological evolution.
• One complete solution is represented in the form a simple
vector called chromosome.
• Set of chromosomes is called generation.
• Solution is evolved from one generation to another on the basis
of a fitness function, selection criteria and reproduction
operators.
• Final rule set that is obtained after conflict resolution and
combining individual rule sets is further optimized with the help
of GA.
• After applying GA not only no of rules in final classifier reduce
but also accuracy is increased.
9/23/2015 24MS Thesis Defense
Rule Set Optimization Through GA
• In our case representation of problem is as follow.
• In the proposed solution, rule set is encoded as the
chromosome using the string of bits where each bit representing
one rule
• 1 represents presence while 0 represents the absence of a rule
in chromosome.
9/23/2015 25MS Thesis Defense
Rule Set Optimization Through GA
• Algorithm is initialized with random generation .
• Fitness function is used to calculate the fitness of each classifier
so that new generation can be selected.
• In our case fitness function is simply the sum of confidence of
rules present in that cromosome.
• Cromosome with more fitness value is considered as candidate
for next generation.
• Next generation is produced using two genetic operators that is
crossover and mutation.
• One point crossover is chosen in our case.
9/23/2015 26MS Thesis Defense
Rule Set Optimization Through GA
• Algorithm will stop on two stopping conditions.
1. No of maximum iteration is exhausted.
2. Algorithm is converged to optimal point and no further
convergence is possible.
9/23/2015 27MS Thesis Defense
Rule Set Optimization Through GA
• The parameters used for the GA are as following.
9/23/2015 28MS Thesis Defense
Parameters for GA Values
Cross-Over Rate 95%
Mutation Rate 5%
Population Size 5000
Number of Generations 3000
Rule Set Optimization Through GA
• After applying GA on the rule set, the following reduction of
rules in rule set is seen:
9/23/2015 29MS Thesis Defense
Datasets # of Rules Optimized Set
of Rules
TAE 90 72
Zoo 25 20
Balance Scale 280 190
Tic-Tac-Toe 131 122
Car Evaluation 221 209
Breast-Cancer 50 41
Mushroom 20 17
Nursery 62 48
Application Areas And Significance
• Improved efficiency due to parallelism .
• Overcoming memory limitations.
• Computation reusability.
• Continuous learning system.
• Scalability
• One generic classifier.
• Heterogeneous and flexible classifier.
• Improved Accuracy.
9/23/2015 30MS Thesis Defense
Results And Findings
• The methodology that I adopted for the compilation of results is
as following.
• First and the most critical part was the selection of data sets .
• 8 different well known state of the art data sets were used in the
experimentation process each of them is of different size so that
we can test our technique against all size of datasets.
9/23/2015 31MS Thesis Defense
Results And Findings
• First of all data is divided into three sets that are.
1. Training set(66%)
2. Validation set(17%)
3. Test set(17%)
• Training set was further divided into n small partitions.
• Each partition was used to build a separate classifier.
• Then all these classifiers were combined to make one final
classifier.
9/23/2015 32MS Thesis Defense
Results And Findings
• Final classifier is then optimized with GA.
• At this stage validation set is used for tuning and optimization
purpose.
• Finally the optimized classifier is used for the evaluation
purpose here the test set is used and results are computed over
it.
9/23/2015 33MS Thesis Defense
Results And Findings
• Following is the percentage split details of training, validation
and test data sets:
9/23/2015 34MS Thesis Defense
Data Sets Details
Zoo
Discrete, 66% training 17% validation,17% test set
Balance Scale
Discrete, 66% training 17% validation,17% test set
Tic-Tac-Toe
Discrete, 66% training 17% validation,17% test set
Car Evaluation
Discrete, 66% training 17% validation,17% test set
Mushroom
Discrete, 66% training 17% validation,17% test set
Results And Findings
• Results before optimization are as following.
9/23/2015 35MS Thesis Defense
Data Sets Accuracy
Zoo 88%
Balance Scale 35%
Tic-Tac-Toe 88%
Car Evaluation 86%
Mushroom 100%
Results And Findings
• Results After optimization are as following.
9/23/2015 36MS Thesis Defense
Data Sets Accuracy
Zoo 95%
Balance Scale 73%
Tic-Tac-Toe 89%
Car Evaluation 98.6%
Mushroom 100%
Future Work
• In every proposed technique, there is always room for
improvement and enhancements. Proposed technique can be
extended further in the following directions:
• Particle swarm optimization algorithms can be chosen to
optimize the rule set.
• This technique divide dataset horizontally we can consider
dividing dataset vertically that may result in improved accuracy.
• Further work on parameter optimization can be done.
9/23/2015 37MS Thesis Defense
Questions ??
9/23/2015 38MS Thesis Defense

More Related Content

Viewers also liked

Mathayas-Dissertation Defense
Mathayas-Dissertation DefenseMathayas-Dissertation Defense
Mathayas-Dissertation DefenseSabeena Mathayas
 
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).Courtney Miller
 
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in NigeriaPart 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in NigeriaDr. Chantell Beaty, BS, MBA,DBA
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenheinestien
 
CSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - SlidesCSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - SlidesBrittany Jakubiec
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyChen Liang
 
[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defense[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defenseNhat le Thien
 
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...Gary Broils, DBA, PMP
 
Unsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defenseUnsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defenseElias Ponvert
 
Participatory drumming and oral language articulation
Participatory drumming and oral language articulationParticipatory drumming and oral language articulation
Participatory drumming and oral language articulationmlespier0859
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeCorey Caugherty
 
Ed s turner iii oral defense presentation - 18 feb 10
Ed s  turner iii   oral defense presentation - 18 feb 10Ed s  turner iii   oral defense presentation - 18 feb 10
Ed s turner iii oral defense presentation - 18 feb 10Dr. Ed S. Turner III
 
Dissertation Oral Defense
Dissertation Oral DefenseDissertation Oral Defense
Dissertation Oral DefenseStacey Gonzales
 
Example Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point SlideExample Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point SlideDr. Vince Bridges
 
DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908Happiness for HumanKind
 
Introduction To Anthropology, Online Version
Introduction To Anthropology, Online VersionIntroduction To Anthropology, Online Version
Introduction To Anthropology, Online VersionPaulVMcDowell
 

Viewers also liked (20)

DiDomenica slides (4-12-15)
DiDomenica slides (4-12-15)DiDomenica slides (4-12-15)
DiDomenica slides (4-12-15)
 
Mathayas-Dissertation Defense
Mathayas-Dissertation DefenseMathayas-Dissertation Defense
Mathayas-Dissertation Defense
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
 
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in NigeriaPart 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeen
 
CSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - SlidesCSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - Slides
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
 
[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defense[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defense
 
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
 
Unsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defenseUnsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defense
 
Participatory drumming and oral language articulation
Participatory drumming and oral language articulationParticipatory drumming and oral language articulation
Participatory drumming and oral language articulation
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitative
 
Ed s turner iii oral defense presentation - 18 feb 10
Ed s  turner iii   oral defense presentation - 18 feb 10Ed s  turner iii   oral defense presentation - 18 feb 10
Ed s turner iii oral defense presentation - 18 feb 10
 
Dissertation Oral Defense
Dissertation Oral DefenseDissertation Oral Defense
Dissertation Oral Defense
 
defense_2
defense_2defense_2
defense_2
 
Example Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point SlideExample Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point Slide
 
DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908
 
Ch06
Ch06Ch06
Ch06
 
Introduction To Anthropology, Online Version
Introduction To Anthropology, Online VersionIntroduction To Anthropology, Online Version
Introduction To Anthropology, Online Version
 

Similar to Parallel Rule Generation For Efficient Classification System

Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientistMatthew Evans
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...mlaij
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...mlaij
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overviewdublinx
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 Georgina Tilby
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3NIMMYRAJU
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptxDOUGLASBILLY
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxNingthoujamMahesh1
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detectionjagan477830
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...PingCAP
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence ApproachesJincy Nelson
 

Similar to Parallel Rule Generation For Efficient Classification System (20)

Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptx
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 

Recently uploaded

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 

Recently uploaded (20)

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 

Parallel Rule Generation For Efficient Classification System

  • 1. Parallel Rule Generation for an Efficient Classification System Talha Ghaffar MS(CS)9/23/2015 MS Thesis Defense
  • 2. Scope of Presentation • Introduction • Background Study & Literature Review • Proposed Technique • Applications And Research Contribution • Implementation • Experimental Results • Future Work • Conclusion 9/23/2015 2MS Thesis Defense
  • 3. Introduction • Nowadays, many organizations utilize large databases for analytical purposes • With growing size of training data, researchers are converging their research towards development or improvement of the data mining techniques to match up the growth • Major challenges while handling complex and large data: – Sifting through the data efficiently – Extracting relevant and useful information accurately – Analyzing the extracted information and guiding organizations decisions and actions reliably 9/23/2015 3MS Thesis Defense
  • 4. Introduction ctd. • Limited Computational Resources • It seems inefficient applying sequential data mining techniques with inherent drawbacks of taking long response time. • Research suggests when data mining techniques are implemented on parallel machines, improved processing and response time is achieved • Classification: Core tasks in data mining, the field that is concerned with the extraction of knowledge or patterns from databases through the building of predictive or descriptive models (Learning Models) 9/23/2015 4MS Thesis Defense
  • 5. 9/23/2015 5MS Thesis Defense Att1 Att3 Att2 Class No Small 40 K No No Medium 20K No Yes Large 120k Yes Yes Small 70 K Yes No Medium 45 K Yes Learn Model Apply Model Att1 Att3 Att2 Class No Small 25 K ? Yes Medium 20 K ? Yes Large 100 K ? No Small 30 K ? Yes Small 55 K ? Classifier e.g. IF then Action Training Set Test Set
  • 6. 9/23/2015 6MS Thesis Defense Att1 Att3 Att2 Class No Small 40 K No No Medium 20K No Yes Large 120k Yes Yes Small 70 K Yes No Medium 45 K Yes Learn Model Apply Model Att1 Att3 Att2 Class No Small 25 K ? Yes Medium 20 K ? Yes Large 100 K ? No Small 30 K ? Yes Small 55 K ? e.g. IF then Action Training Set Test Set
  • 7. Background Study • Machine Learning: need to incorporate two important elements that are computer based knowledge acquisition process and has to state where skills or knowledge can be obtained. – Mitchell says the concept of machine learning as a study of computer algorithms that improve through experience automatically. – Alpaydin defines machine learning as “the capability of the computer program to acquire or develop new knowledge or skills from existing or non existing examples for the sake of optimizing performance criterion”. • Contrary to the Mitchell’s definition which lacks knowledge acquisition process, this definition is of more preference to the research domain 9/23/2015 7MS Thesis Defense
  • 8. Background Study ctd. • Building individual classifiers on subsets of the data sets using appropriate learning models will result in accurate set of rules on individual machines. • Possible drop-off due to the parallelism can be reduced to the nearly possible. • Redundancy of records is possible which will need to be removed while applying the subset approach. • In the context of my work, the use of subsets for training is the better and efficient approach. • Supervised Learning 9/23/2015 8MS Thesis Defense
  • 9. Background Study ctd. • Several methods for classification have been introduced over the years - e.g. decision trees, artificial neural networks, nearest neighbor classifiers, support vector machines and so on • Decision trees have decent accuracy and moreover are easier to interpret, which is a crucial advantage when it comes to data mining • I also suspect that once an algorithm gains acceptance, it takes time before scalable and parallelized versions of that algorithm appear. For these reasons, decision trees are preferred 9/23/2015 9MS Thesis Defense
  • 10. Background Study ctd. • Common techniques that are used to overcome the problem of large datasets and memory limitations are as following. 1. Data sampling. 2. Feature selection. 3. Data pre-processing. 4. Parallel processing. 9/23/2015 10MS Thesis Defense
  • 11. Background Study ctd. • Parallel Approaches: • Independent Partitioning – Each processor is provided the complete data set – All processors processes the same data set as input, builds and generate rules according the set and then combined afterwards on combining techniques • Parallel Sequential Partitioning – Every processor is allowed to generate particular subset of concepts • Replicated Sequential Partitioning – Each processor processes one particular partition of the data set horizontally, and executes which is more or less the sequential algorithm, as each processor can view only partial information – Local set of concepts which is after coordinated to add up in the global set of concepts. 9/23/2015 11MS Thesis Defense
  • 12. Background Study ctd. 9/23/2015 12MS Thesis Defense Combining Rules Description Maximum Rule It seems reasonable i-e select the classifier with the maximum confidence values. This rule can generate adverse results if the particular classifier with maximum confidence values is over-trained. Sum Rule This rule is effective if every individual classifier is independent of each other When large set of similar classifiers is generated, it is helpful to reduce the noise in large sets of so-called weak classifiers Minimum Rule This rule selects the outcome of the classifier that has the least objection against a certain class. Product rule This rule is effective if every individual classifier is independent of each other Median Rule This rule is similar to the sum rule but may yield more robust results.
  • 13. Proposed Technique • A three step approach which divides the very large dataset into data chunks initially, processes it on defined N processors on different machines, generates the final merged decision rule file and resolves the conflicts that may arise later on – Data Pre-Processing – Parallel Rule Generation – Rule Merging and Conflict Resolution • Data Pre-Processing – In this step, we divide the large dataset into N (N=user specified number) smaller datasets. – Round robin approach is being used, which gives random symmetric distribution of data 9/23/2015 13MS Thesis Defense
  • 14. 9/23/2015 14MS Thesis Defense Training Set Small Data Chunks Round Robin Data Pre-Processing Parallel Rule Generation Small Data Chunks P 1 P 2 P 3 P N Learning Algo. IF then Action IF then Action IF then Action IF then Action Rule Merging & Conflict Res IF { } then { Action; } Rule Merging and Conflict Resolution
  • 15. Proposed Technique • Parallel Rule Generation: – In this step, each of the smaller dataset from previous step is given to different processors so that the process of classification can be performed in parallel on each processor. – We can use any classification algorithm for generating rules or we can use multiple classifiers on different processors for rule generation. – These rules are in the form of if-then-else. The rules that each processor will generate will only be valid for the data that is provided to that particular processor only. – It is a possibility that the rules generated on one processor may conflict with rules generated at another processor and it is also possible that more than one processor generate the same rules 9/23/2015 15MS Thesis Defense
  • 16. Proposed Technique • Parallel Rule Generation: – Two additional steps need to be performed at this stage that are 1. Calculation of support for each individual rule and store it with each rule. 2. Calculation of confidence for each individual rule and store it with each rule. 9/23/2015 16MS Thesis Defense
  • 17. Proposed Technique • Rule Merging and Conflict Resolution: • In this step, the rules generated by all the processors are combined to get the final and complete rule set. While merging the rules we encounter these problems: – Redundancy of rules i-e same rules occurring more than one time. – Conflicting rules i-e different decisions with same rules 9/23/2015 17MS Thesis Defense
  • 18. Proposed Technique • Use sufficiently large data sets on each processor that will reduce the probability of conflicting rules and increase the probability of similar rules • Make the data distribution on each processor random so that the distribution is un-biased and every processor will get almost similar type of data and will produce almost similar rule set • Take union of all rule sets that will remove the occurring of single rule more than one times and it will also include all possible unique rules • If conflict appears, select the rule with more coverage confidence 9/23/2015 18MS Thesis Defense
  • 19. Proposed Technique • If conflict appears, select the rule with more coverage confidence • This rule will be selected with more coverage and confidence. 9/23/2015 19MS Thesis Defense
  • 20. Proposed Technique • Conflicting rules with same support and different confidence • This rule will be selected with more confidence. 9/23/2015 20MS Thesis Defense
  • 21. Proposed Technique • Conflicting rules with same confidence and different support • This rule will be selected with more Support . 9/23/2015 21MS Thesis Defense
  • 22. Proposed Technique • Conflicting rules with different confidence and different support. • In that case we can use the formula. X = α(confidence)+β(support) • Where α and β are two variables whose values can be between 0 and 1 and such that sum of their values is always 1. 9/23/2015 22MS Thesis Defense
  • 23. Proposed Technique • Once the conflict resolution is over the next step is optimization of the results. • For that purpose GA is used. 9/23/2015 23MS Thesis Defense
  • 24. Rule Set Optimization Through GA • Genetic algorithms are from family of computational models which are based on biological evolution. • One complete solution is represented in the form a simple vector called chromosome. • Set of chromosomes is called generation. • Solution is evolved from one generation to another on the basis of a fitness function, selection criteria and reproduction operators. • Final rule set that is obtained after conflict resolution and combining individual rule sets is further optimized with the help of GA. • After applying GA not only no of rules in final classifier reduce but also accuracy is increased. 9/23/2015 24MS Thesis Defense
  • 25. Rule Set Optimization Through GA • In our case representation of problem is as follow. • In the proposed solution, rule set is encoded as the chromosome using the string of bits where each bit representing one rule • 1 represents presence while 0 represents the absence of a rule in chromosome. 9/23/2015 25MS Thesis Defense
  • 26. Rule Set Optimization Through GA • Algorithm is initialized with random generation . • Fitness function is used to calculate the fitness of each classifier so that new generation can be selected. • In our case fitness function is simply the sum of confidence of rules present in that cromosome. • Cromosome with more fitness value is considered as candidate for next generation. • Next generation is produced using two genetic operators that is crossover and mutation. • One point crossover is chosen in our case. 9/23/2015 26MS Thesis Defense
  • 27. Rule Set Optimization Through GA • Algorithm will stop on two stopping conditions. 1. No of maximum iteration is exhausted. 2. Algorithm is converged to optimal point and no further convergence is possible. 9/23/2015 27MS Thesis Defense
  • 28. Rule Set Optimization Through GA • The parameters used for the GA are as following. 9/23/2015 28MS Thesis Defense Parameters for GA Values Cross-Over Rate 95% Mutation Rate 5% Population Size 5000 Number of Generations 3000
  • 29. Rule Set Optimization Through GA • After applying GA on the rule set, the following reduction of rules in rule set is seen: 9/23/2015 29MS Thesis Defense Datasets # of Rules Optimized Set of Rules TAE 90 72 Zoo 25 20 Balance Scale 280 190 Tic-Tac-Toe 131 122 Car Evaluation 221 209 Breast-Cancer 50 41 Mushroom 20 17 Nursery 62 48
  • 30. Application Areas And Significance • Improved efficiency due to parallelism . • Overcoming memory limitations. • Computation reusability. • Continuous learning system. • Scalability • One generic classifier. • Heterogeneous and flexible classifier. • Improved Accuracy. 9/23/2015 30MS Thesis Defense
  • 31. Results And Findings • The methodology that I adopted for the compilation of results is as following. • First and the most critical part was the selection of data sets . • 8 different well known state of the art data sets were used in the experimentation process each of them is of different size so that we can test our technique against all size of datasets. 9/23/2015 31MS Thesis Defense
  • 32. Results And Findings • First of all data is divided into three sets that are. 1. Training set(66%) 2. Validation set(17%) 3. Test set(17%) • Training set was further divided into n small partitions. • Each partition was used to build a separate classifier. • Then all these classifiers were combined to make one final classifier. 9/23/2015 32MS Thesis Defense
  • 33. Results And Findings • Final classifier is then optimized with GA. • At this stage validation set is used for tuning and optimization purpose. • Finally the optimized classifier is used for the evaluation purpose here the test set is used and results are computed over it. 9/23/2015 33MS Thesis Defense
  • 34. Results And Findings • Following is the percentage split details of training, validation and test data sets: 9/23/2015 34MS Thesis Defense Data Sets Details Zoo Discrete, 66% training 17% validation,17% test set Balance Scale Discrete, 66% training 17% validation,17% test set Tic-Tac-Toe Discrete, 66% training 17% validation,17% test set Car Evaluation Discrete, 66% training 17% validation,17% test set Mushroom Discrete, 66% training 17% validation,17% test set
  • 35. Results And Findings • Results before optimization are as following. 9/23/2015 35MS Thesis Defense Data Sets Accuracy Zoo 88% Balance Scale 35% Tic-Tac-Toe 88% Car Evaluation 86% Mushroom 100%
  • 36. Results And Findings • Results After optimization are as following. 9/23/2015 36MS Thesis Defense Data Sets Accuracy Zoo 95% Balance Scale 73% Tic-Tac-Toe 89% Car Evaluation 98.6% Mushroom 100%
  • 37. Future Work • In every proposed technique, there is always room for improvement and enhancements. Proposed technique can be extended further in the following directions: • Particle swarm optimization algorithms can be chosen to optimize the rule set. • This technique divide dataset horizontally we can consider dividing dataset vertically that may result in improved accuracy. • Further work on parameter optimization can be done. 9/23/2015 37MS Thesis Defense
  • 38. Questions ?? 9/23/2015 38MS Thesis Defense

Editor's Notes

  1. Mitchell’s definition does not reflect anything related to knowledge acquisition process for the stated computer programs, therefore it is considered insufficient in our domain of research.
  2. Supervised Learning: covers learning algorithms that reason from externally provided instances as input to produce general hypothesis and make predictions about future unknown instances.
  3. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  4. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  5. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  6. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  7. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  8. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  9. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  10. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  11. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  12. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  13. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  14. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  15. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  16. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  17. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  18. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.