SlideShare a Scribd company logo
Parallel Rule Generation for an Efficient
Classification System
Talha Ghaffar
MS(CS)9/23/2015 MS Thesis Defense
Scope of Presentation
• Introduction
• Background Study & Literature Review
• Proposed Technique
• Applications And Research Contribution
• Implementation
• Experimental Results
• Future Work
• Conclusion
9/23/2015 2MS Thesis Defense
Introduction
• Nowadays, many organizations utilize large databases for
analytical purposes
• With growing size of training data, researchers are converging
their research towards development or improvement of the data
mining techniques to match up the growth
• Major challenges while handling complex and large data:
– Sifting through the data efficiently
– Extracting relevant and useful information accurately
– Analyzing the extracted information and guiding
organizations decisions and actions reliably
9/23/2015 3MS Thesis Defense
Introduction ctd.
• Limited Computational Resources
• It seems inefficient applying sequential data mining techniques
with inherent drawbacks of taking long response time.
• Research suggests when data mining techniques are
implemented on parallel machines, improved processing and
response time is achieved
• Classification: Core tasks in data mining, the field that is
concerned with the extraction of knowledge or patterns from
databases through the building of predictive or
descriptive models (Learning Models)
9/23/2015 4MS Thesis Defense
9/23/2015 5MS Thesis Defense
Att1 Att3 Att2 Class
No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply Model
Att1 Att3 Att2 Class
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
Classifier
e.g. IF then
Action
Training Set
Test Set
9/23/2015 6MS Thesis Defense
Att1 Att3 Att2 Class
No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply
Model
Att1 Att3 Att2 Class
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
e.g. IF then
Action
Training Set
Test Set
Background Study
• Machine Learning: need to incorporate two important elements
that are computer based knowledge acquisition process and has
to state where skills or knowledge can be obtained.
– Mitchell says the concept of machine learning as a study of computer
algorithms that improve through experience automatically.
– Alpaydin defines machine learning as “the capability of the computer
program to acquire or develop new knowledge or skills from existing or
non existing examples for the sake of optimizing performance criterion”.
• Contrary to the Mitchell’s definition which lacks knowledge
acquisition process, this definition is of more preference to the
research domain
9/23/2015 7MS Thesis Defense
Background Study ctd.
• Building individual classifiers on subsets of the data sets using
appropriate learning models will result in accurate set of rules
on individual machines.
• Possible drop-off due to the parallelism can be reduced to the
nearly possible.
• Redundancy of records is possible which will need to be
removed while applying the subset approach.
• In the context of my work, the use of subsets for training is the
better and efficient approach.
• Supervised Learning
9/23/2015 8MS Thesis Defense
Background Study ctd.
• Several methods for classification have been introduced over the
years - e.g. decision trees, artificial neural networks, nearest
neighbor classifiers, support vector machines and so on
• Decision trees have decent accuracy and moreover are easier to
interpret, which is a crucial advantage when it comes to data
mining
• I also suspect that once an algorithm gains acceptance, it takes
time before scalable and parallelized versions of that algorithm
appear. For these reasons, decision trees are preferred
9/23/2015 9MS Thesis Defense
Background Study ctd.
• Common techniques that are used to overcome
the problem of large datasets and memory
limitations are as following.
1. Data sampling.
2. Feature selection.
3. Data pre-processing.
4. Parallel processing.
9/23/2015 10MS Thesis Defense
Background Study ctd.
• Parallel Approaches:
• Independent Partitioning
– Each processor is provided the complete data set
– All processors processes the same data set as input, builds and generate
rules according the set and then combined afterwards on combining
techniques
• Parallel Sequential Partitioning
– Every processor is allowed to generate particular subset of concepts
• Replicated Sequential Partitioning
– Each processor processes one particular partition of the data set
horizontally, and executes which is more or less the sequential algorithm,
as each processor can view only partial information
– Local set of concepts which is after coordinated to add up in the global
set of concepts.
9/23/2015 11MS Thesis Defense
Background Study ctd.
9/23/2015 12MS Thesis Defense
Combining Rules Description
Maximum Rule It seems reasonable i-e select the classifier with the
maximum confidence values. This rule can generate adverse
results if the particular classifier with maximum confidence
values is over-trained.
Sum Rule This rule is effective if every individual classifier is
independent of each other
When large set of similar classifiers is generated, it is helpful
to reduce the noise in large sets of so-called weak classifiers
Minimum Rule This rule selects the outcome of the classifier that has the
least objection against a certain class.
Product rule This rule is effective if every individual classifier is
independent of each other
Median Rule This rule is similar to the sum rule but may yield more robust
results.
Proposed Technique
• A three step approach which divides the very large dataset into
data chunks initially, processes it on defined N processors on
different machines, generates the final merged decision rule file
and resolves the conflicts that may arise later on
– Data Pre-Processing
– Parallel Rule Generation
– Rule Merging and Conflict Resolution
• Data Pre-Processing
– In this step, we divide the large dataset into N (N=user specified number)
smaller datasets.
– Round robin approach is being used, which gives random symmetric
distribution of data
9/23/2015 13MS Thesis Defense
9/23/2015 14MS Thesis Defense
Training Set
Small Data
Chunks
Round
Robin
Data Pre-Processing Parallel Rule Generation
Small Data
Chunks
P
1
P
2
P
3
P
N
Learning
Algo.
IF
then
Action
IF
then
Action
IF
then
Action
IF
then
Action
Rule
Merging
&
Conflict
Res
IF {
} then {
Action; }
Rule Merging and Conflict
Resolution
Proposed Technique
• Parallel Rule Generation:
– In this step, each of the smaller dataset from previous step is given to
different processors so that the process of classification can be
performed in parallel on each processor.
– We can use any classification algorithm for generating rules or we can use
multiple classifiers on different processors for rule generation.
– These rules are in the form of if-then-else. The rules that each processor
will generate will only be valid for the data that is provided to that
particular processor only.
– It is a possibility that the rules generated on one processor may conflict
with rules generated at another processor and it is also possible that
more than one processor generate the same rules
9/23/2015 15MS Thesis Defense
Proposed Technique
• Parallel Rule Generation:
– Two additional steps need to be performed at this stage that are
1. Calculation of support for each individual rule and store it with each
rule.
2. Calculation of confidence for each individual rule and store it with each
rule.
9/23/2015 16MS Thesis Defense
Proposed Technique
• Rule Merging and Conflict Resolution:
• In this step, the rules generated by all the processors are
combined to get the final and complete rule set. While merging
the rules we encounter these problems:
– Redundancy of rules i-e same rules occurring more than one time.
– Conflicting rules i-e different decisions with same rules
9/23/2015 17MS Thesis Defense
Proposed Technique
• Use sufficiently large data sets on each processor that will
reduce the probability of conflicting rules and increase the
probability of similar rules
• Make the data distribution on each processor random so that
the distribution is un-biased and every processor will get almost
similar type of data and will produce almost similar rule set
• Take union of all rule sets that will remove the occurring of
single rule more than one times and it will also include all
possible unique rules
• If conflict appears, select the rule with more coverage
confidence
9/23/2015 18MS Thesis Defense
Proposed Technique
• If conflict appears, select the rule with more coverage
confidence
• This rule will be selected with more coverage and confidence.
9/23/2015 19MS Thesis Defense
Proposed Technique
• Conflicting rules with same support and different confidence
• This rule will be selected with more confidence.
9/23/2015 20MS Thesis Defense
Proposed Technique
• Conflicting rules with same confidence and different support
• This rule will be selected with more Support .
9/23/2015 21MS Thesis Defense
Proposed Technique
• Conflicting rules with different confidence and different support.
• In that case we can use the formula.
X = α(confidence)+β(support)
• Where α and β are two variables whose values can be between
0 and 1 and such that sum of their values is always 1.
9/23/2015 22MS Thesis Defense
Proposed Technique
• Once the conflict resolution is over the next step is optimization
of the results.
• For that purpose GA is used.
9/23/2015 23MS Thesis Defense
Rule Set Optimization Through GA
• Genetic algorithms are from family of computational models
which are based on biological evolution.
• One complete solution is represented in the form a simple
vector called chromosome.
• Set of chromosomes is called generation.
• Solution is evolved from one generation to another on the basis
of a fitness function, selection criteria and reproduction
operators.
• Final rule set that is obtained after conflict resolution and
combining individual rule sets is further optimized with the help
of GA.
• After applying GA not only no of rules in final classifier reduce
but also accuracy is increased.
9/23/2015 24MS Thesis Defense
Rule Set Optimization Through GA
• In our case representation of problem is as follow.
• In the proposed solution, rule set is encoded as the
chromosome using the string of bits where each bit representing
one rule
• 1 represents presence while 0 represents the absence of a rule
in chromosome.
9/23/2015 25MS Thesis Defense
Rule Set Optimization Through GA
• Algorithm is initialized with random generation .
• Fitness function is used to calculate the fitness of each classifier
so that new generation can be selected.
• In our case fitness function is simply the sum of confidence of
rules present in that cromosome.
• Cromosome with more fitness value is considered as candidate
for next generation.
• Next generation is produced using two genetic operators that is
crossover and mutation.
• One point crossover is chosen in our case.
9/23/2015 26MS Thesis Defense
Rule Set Optimization Through GA
• Algorithm will stop on two stopping conditions.
1. No of maximum iteration is exhausted.
2. Algorithm is converged to optimal point and no further
convergence is possible.
9/23/2015 27MS Thesis Defense
Rule Set Optimization Through GA
• The parameters used for the GA are as following.
9/23/2015 28MS Thesis Defense
Parameters for GA Values
Cross-Over Rate 95%
Mutation Rate 5%
Population Size 5000
Number of Generations 3000
Rule Set Optimization Through GA
• After applying GA on the rule set, the following reduction of
rules in rule set is seen:
9/23/2015 29MS Thesis Defense
Datasets # of Rules Optimized Set
of Rules
TAE 90 72
Zoo 25 20
Balance Scale 280 190
Tic-Tac-Toe 131 122
Car Evaluation 221 209
Breast-Cancer 50 41
Mushroom 20 17
Nursery 62 48
Application Areas And Significance
• Improved efficiency due to parallelism .
• Overcoming memory limitations.
• Computation reusability.
• Continuous learning system.
• Scalability
• One generic classifier.
• Heterogeneous and flexible classifier.
• Improved Accuracy.
9/23/2015 30MS Thesis Defense
Results And Findings
• The methodology that I adopted for the compilation of results is
as following.
• First and the most critical part was the selection of data sets .
• 8 different well known state of the art data sets were used in the
experimentation process each of them is of different size so that
we can test our technique against all size of datasets.
9/23/2015 31MS Thesis Defense
Results And Findings
• First of all data is divided into three sets that are.
1. Training set(66%)
2. Validation set(17%)
3. Test set(17%)
• Training set was further divided into n small partitions.
• Each partition was used to build a separate classifier.
• Then all these classifiers were combined to make one final
classifier.
9/23/2015 32MS Thesis Defense
Results And Findings
• Final classifier is then optimized with GA.
• At this stage validation set is used for tuning and optimization
purpose.
• Finally the optimized classifier is used for the evaluation
purpose here the test set is used and results are computed over
it.
9/23/2015 33MS Thesis Defense
Results And Findings
• Following is the percentage split details of training, validation
and test data sets:
9/23/2015 34MS Thesis Defense
Data Sets Details
Zoo
Discrete, 66% training 17% validation,17% test set
Balance Scale
Discrete, 66% training 17% validation,17% test set
Tic-Tac-Toe
Discrete, 66% training 17% validation,17% test set
Car Evaluation
Discrete, 66% training 17% validation,17% test set
Mushroom
Discrete, 66% training 17% validation,17% test set
Results And Findings
• Results before optimization are as following.
9/23/2015 35MS Thesis Defense
Data Sets Accuracy
Zoo 88%
Balance Scale 35%
Tic-Tac-Toe 88%
Car Evaluation 86%
Mushroom 100%
Results And Findings
• Results After optimization are as following.
9/23/2015 36MS Thesis Defense
Data Sets Accuracy
Zoo 95%
Balance Scale 73%
Tic-Tac-Toe 89%
Car Evaluation 98.6%
Mushroom 100%
Future Work
• In every proposed technique, there is always room for
improvement and enhancements. Proposed technique can be
extended further in the following directions:
• Particle swarm optimization algorithms can be chosen to
optimize the rule set.
• This technique divide dataset horizontally we can consider
dividing dataset vertically that may result in improved accuracy.
• Further work on parameter optimization can be done.
9/23/2015 37MS Thesis Defense
Questions ??
9/23/2015 38MS Thesis Defense

More Related Content

Viewers also liked

DiDomenica slides (4-12-15)
DiDomenica slides (4-12-15)DiDomenica slides (4-12-15)
DiDomenica slides (4-12-15)
Bessie DiDomenica, PhD, MBA
 
Mathayas-Dissertation Defense
Mathayas-Dissertation DefenseMathayas-Dissertation Defense
Mathayas-Dissertation Defense
Sabeena Mathayas
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
julien.ponge
 
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Courtney Miller
 
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in NigeriaPart 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Dr. Chantell Beaty, BS, MBA,DBA
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeen
heinestien
 
CSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - SlidesCSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - Slides
Brittany Jakubiec
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
Chen Liang
 
[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defense[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defense
Nhat le Thien
 
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, DBA, PMP
 
Unsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defenseUnsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defense
Elias Ponvert
 
Participatory drumming and oral language articulation
Participatory drumming and oral language articulationParticipatory drumming and oral language articulation
Participatory drumming and oral language articulation
mlespier0859
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitative
Corey Caugherty
 
Ed s turner iii oral defense presentation - 18 feb 10
Ed s  turner iii   oral defense presentation - 18 feb 10Ed s  turner iii   oral defense presentation - 18 feb 10
Ed s turner iii oral defense presentation - 18 feb 10
Dr. Ed S. Turner III
 
Dissertation Oral Defense
Dissertation Oral DefenseDissertation Oral Defense
Dissertation Oral Defense
Stacey Gonzales
 
defense_2
defense_2defense_2
Example Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point SlideExample Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point Slide
Dr. Vince Bridges
 
DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908
Happiness for HumanKind
 
Ch06
Ch06Ch06
Introduction To Anthropology, Online Version
Introduction To Anthropology, Online VersionIntroduction To Anthropology, Online Version
Introduction To Anthropology, Online Version
PaulVMcDowell
 

Viewers also liked (20)

DiDomenica slides (4-12-15)
DiDomenica slides (4-12-15)DiDomenica slides (4-12-15)
DiDomenica slides (4-12-15)
 
Mathayas-Dissertation Defense
Mathayas-Dissertation DefenseMathayas-Dissertation Defense
Mathayas-Dissertation Defense
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
Oral graduate thesis defense (September 14, 2011, Guelph, Ontario).
 
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in NigeriaPart 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
Part 2: Business Leaders Marketing to Bottom of the Pyramid Consumers in Nigeria
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeen
 
CSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - SlidesCSSE 2016 University of Calgary - Slides
CSSE 2016 University of Calgary - Slides
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
 
[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defense[Bảo vệ khóa luận] Dissertation defense
[Bảo vệ khóa luận] Dissertation defense
 
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
Gary Broils, D.B.A. - Dissertation Defense: Virtual Teaming and Collaboration...
 
Unsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defenseUnsupervised Partial Parsing: Thesis defense
Unsupervised Partial Parsing: Thesis defense
 
Participatory drumming and oral language articulation
Participatory drumming and oral language articulationParticipatory drumming and oral language articulation
Participatory drumming and oral language articulation
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitative
 
Ed s turner iii oral defense presentation - 18 feb 10
Ed s  turner iii   oral defense presentation - 18 feb 10Ed s  turner iii   oral defense presentation - 18 feb 10
Ed s turner iii oral defense presentation - 18 feb 10
 
Dissertation Oral Defense
Dissertation Oral DefenseDissertation Oral Defense
Dissertation Oral Defense
 
defense_2
defense_2defense_2
defense_2
 
Example Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point SlideExample Dissertation Proposal Defense Power Point Slide
Example Dissertation Proposal Defense Power Point Slide
 
DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908DOCTORAL RESEARCH Dissertation Defense 120908
DOCTORAL RESEARCH Dissertation Defense 120908
 
Ch06
Ch06Ch06
Ch06
 
Introduction To Anthropology, Online Version
Introduction To Anthropology, Online VersionIntroduction To Anthropology, Online Version
Introduction To Anthropology, Online Version
 

Similar to Parallel Rule Generation For Efficient Classification System

Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
Matthew Evans
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
mlaij
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
mlaij
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
Salford Systems
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
Maarten Smeets
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Salford Systems
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
dublinx
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
Amir Masoud Sefidian
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
QuantUniversity
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
Vikram Sankhala IIT, IIM, Ex IRS, FRM, Fin.Engr
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
Georgina Tilby
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
NIMMYRAJU
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptx
DOUGLASBILLY
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
NingthoujamMahesh1
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
jagan477830
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
PingCAP
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
MujtabaAliKhan12
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
JhimarPeredoJurado
 

Similar to Parallel Rule Generation For Efficient Classification System (20)

Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptx
 
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptxApriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
Apriori-Eclat-Upper-Confidence-Bound-in-Machine-Learning.pptx
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
 

Recently uploaded

Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Leena Ghag-Sakpal
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 

Recently uploaded (20)

Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 

Parallel Rule Generation For Efficient Classification System

  • 1. Parallel Rule Generation for an Efficient Classification System Talha Ghaffar MS(CS)9/23/2015 MS Thesis Defense
  • 2. Scope of Presentation • Introduction • Background Study & Literature Review • Proposed Technique • Applications And Research Contribution • Implementation • Experimental Results • Future Work • Conclusion 9/23/2015 2MS Thesis Defense
  • 3. Introduction • Nowadays, many organizations utilize large databases for analytical purposes • With growing size of training data, researchers are converging their research towards development or improvement of the data mining techniques to match up the growth • Major challenges while handling complex and large data: – Sifting through the data efficiently – Extracting relevant and useful information accurately – Analyzing the extracted information and guiding organizations decisions and actions reliably 9/23/2015 3MS Thesis Defense
  • 4. Introduction ctd. • Limited Computational Resources • It seems inefficient applying sequential data mining techniques with inherent drawbacks of taking long response time. • Research suggests when data mining techniques are implemented on parallel machines, improved processing and response time is achieved • Classification: Core tasks in data mining, the field that is concerned with the extraction of knowledge or patterns from databases through the building of predictive or descriptive models (Learning Models) 9/23/2015 4MS Thesis Defense
  • 5. 9/23/2015 5MS Thesis Defense Att1 Att3 Att2 Class No Small 40 K No No Medium 20K No Yes Large 120k Yes Yes Small 70 K Yes No Medium 45 K Yes Learn Model Apply Model Att1 Att3 Att2 Class No Small 25 K ? Yes Medium 20 K ? Yes Large 100 K ? No Small 30 K ? Yes Small 55 K ? Classifier e.g. IF then Action Training Set Test Set
  • 6. 9/23/2015 6MS Thesis Defense Att1 Att3 Att2 Class No Small 40 K No No Medium 20K No Yes Large 120k Yes Yes Small 70 K Yes No Medium 45 K Yes Learn Model Apply Model Att1 Att3 Att2 Class No Small 25 K ? Yes Medium 20 K ? Yes Large 100 K ? No Small 30 K ? Yes Small 55 K ? e.g. IF then Action Training Set Test Set
  • 7. Background Study • Machine Learning: need to incorporate two important elements that are computer based knowledge acquisition process and has to state where skills or knowledge can be obtained. – Mitchell says the concept of machine learning as a study of computer algorithms that improve through experience automatically. – Alpaydin defines machine learning as “the capability of the computer program to acquire or develop new knowledge or skills from existing or non existing examples for the sake of optimizing performance criterion”. • Contrary to the Mitchell’s definition which lacks knowledge acquisition process, this definition is of more preference to the research domain 9/23/2015 7MS Thesis Defense
  • 8. Background Study ctd. • Building individual classifiers on subsets of the data sets using appropriate learning models will result in accurate set of rules on individual machines. • Possible drop-off due to the parallelism can be reduced to the nearly possible. • Redundancy of records is possible which will need to be removed while applying the subset approach. • In the context of my work, the use of subsets for training is the better and efficient approach. • Supervised Learning 9/23/2015 8MS Thesis Defense
  • 9. Background Study ctd. • Several methods for classification have been introduced over the years - e.g. decision trees, artificial neural networks, nearest neighbor classifiers, support vector machines and so on • Decision trees have decent accuracy and moreover are easier to interpret, which is a crucial advantage when it comes to data mining • I also suspect that once an algorithm gains acceptance, it takes time before scalable and parallelized versions of that algorithm appear. For these reasons, decision trees are preferred 9/23/2015 9MS Thesis Defense
  • 10. Background Study ctd. • Common techniques that are used to overcome the problem of large datasets and memory limitations are as following. 1. Data sampling. 2. Feature selection. 3. Data pre-processing. 4. Parallel processing. 9/23/2015 10MS Thesis Defense
  • 11. Background Study ctd. • Parallel Approaches: • Independent Partitioning – Each processor is provided the complete data set – All processors processes the same data set as input, builds and generate rules according the set and then combined afterwards on combining techniques • Parallel Sequential Partitioning – Every processor is allowed to generate particular subset of concepts • Replicated Sequential Partitioning – Each processor processes one particular partition of the data set horizontally, and executes which is more or less the sequential algorithm, as each processor can view only partial information – Local set of concepts which is after coordinated to add up in the global set of concepts. 9/23/2015 11MS Thesis Defense
  • 12. Background Study ctd. 9/23/2015 12MS Thesis Defense Combining Rules Description Maximum Rule It seems reasonable i-e select the classifier with the maximum confidence values. This rule can generate adverse results if the particular classifier with maximum confidence values is over-trained. Sum Rule This rule is effective if every individual classifier is independent of each other When large set of similar classifiers is generated, it is helpful to reduce the noise in large sets of so-called weak classifiers Minimum Rule This rule selects the outcome of the classifier that has the least objection against a certain class. Product rule This rule is effective if every individual classifier is independent of each other Median Rule This rule is similar to the sum rule but may yield more robust results.
  • 13. Proposed Technique • A three step approach which divides the very large dataset into data chunks initially, processes it on defined N processors on different machines, generates the final merged decision rule file and resolves the conflicts that may arise later on – Data Pre-Processing – Parallel Rule Generation – Rule Merging and Conflict Resolution • Data Pre-Processing – In this step, we divide the large dataset into N (N=user specified number) smaller datasets. – Round robin approach is being used, which gives random symmetric distribution of data 9/23/2015 13MS Thesis Defense
  • 14. 9/23/2015 14MS Thesis Defense Training Set Small Data Chunks Round Robin Data Pre-Processing Parallel Rule Generation Small Data Chunks P 1 P 2 P 3 P N Learning Algo. IF then Action IF then Action IF then Action IF then Action Rule Merging & Conflict Res IF { } then { Action; } Rule Merging and Conflict Resolution
  • 15. Proposed Technique • Parallel Rule Generation: – In this step, each of the smaller dataset from previous step is given to different processors so that the process of classification can be performed in parallel on each processor. – We can use any classification algorithm for generating rules or we can use multiple classifiers on different processors for rule generation. – These rules are in the form of if-then-else. The rules that each processor will generate will only be valid for the data that is provided to that particular processor only. – It is a possibility that the rules generated on one processor may conflict with rules generated at another processor and it is also possible that more than one processor generate the same rules 9/23/2015 15MS Thesis Defense
  • 16. Proposed Technique • Parallel Rule Generation: – Two additional steps need to be performed at this stage that are 1. Calculation of support for each individual rule and store it with each rule. 2. Calculation of confidence for each individual rule and store it with each rule. 9/23/2015 16MS Thesis Defense
  • 17. Proposed Technique • Rule Merging and Conflict Resolution: • In this step, the rules generated by all the processors are combined to get the final and complete rule set. While merging the rules we encounter these problems: – Redundancy of rules i-e same rules occurring more than one time. – Conflicting rules i-e different decisions with same rules 9/23/2015 17MS Thesis Defense
  • 18. Proposed Technique • Use sufficiently large data sets on each processor that will reduce the probability of conflicting rules and increase the probability of similar rules • Make the data distribution on each processor random so that the distribution is un-biased and every processor will get almost similar type of data and will produce almost similar rule set • Take union of all rule sets that will remove the occurring of single rule more than one times and it will also include all possible unique rules • If conflict appears, select the rule with more coverage confidence 9/23/2015 18MS Thesis Defense
  • 19. Proposed Technique • If conflict appears, select the rule with more coverage confidence • This rule will be selected with more coverage and confidence. 9/23/2015 19MS Thesis Defense
  • 20. Proposed Technique • Conflicting rules with same support and different confidence • This rule will be selected with more confidence. 9/23/2015 20MS Thesis Defense
  • 21. Proposed Technique • Conflicting rules with same confidence and different support • This rule will be selected with more Support . 9/23/2015 21MS Thesis Defense
  • 22. Proposed Technique • Conflicting rules with different confidence and different support. • In that case we can use the formula. X = α(confidence)+β(support) • Where α and β are two variables whose values can be between 0 and 1 and such that sum of their values is always 1. 9/23/2015 22MS Thesis Defense
  • 23. Proposed Technique • Once the conflict resolution is over the next step is optimization of the results. • For that purpose GA is used. 9/23/2015 23MS Thesis Defense
  • 24. Rule Set Optimization Through GA • Genetic algorithms are from family of computational models which are based on biological evolution. • One complete solution is represented in the form a simple vector called chromosome. • Set of chromosomes is called generation. • Solution is evolved from one generation to another on the basis of a fitness function, selection criteria and reproduction operators. • Final rule set that is obtained after conflict resolution and combining individual rule sets is further optimized with the help of GA. • After applying GA not only no of rules in final classifier reduce but also accuracy is increased. 9/23/2015 24MS Thesis Defense
  • 25. Rule Set Optimization Through GA • In our case representation of problem is as follow. • In the proposed solution, rule set is encoded as the chromosome using the string of bits where each bit representing one rule • 1 represents presence while 0 represents the absence of a rule in chromosome. 9/23/2015 25MS Thesis Defense
  • 26. Rule Set Optimization Through GA • Algorithm is initialized with random generation . • Fitness function is used to calculate the fitness of each classifier so that new generation can be selected. • In our case fitness function is simply the sum of confidence of rules present in that cromosome. • Cromosome with more fitness value is considered as candidate for next generation. • Next generation is produced using two genetic operators that is crossover and mutation. • One point crossover is chosen in our case. 9/23/2015 26MS Thesis Defense
  • 27. Rule Set Optimization Through GA • Algorithm will stop on two stopping conditions. 1. No of maximum iteration is exhausted. 2. Algorithm is converged to optimal point and no further convergence is possible. 9/23/2015 27MS Thesis Defense
  • 28. Rule Set Optimization Through GA • The parameters used for the GA are as following. 9/23/2015 28MS Thesis Defense Parameters for GA Values Cross-Over Rate 95% Mutation Rate 5% Population Size 5000 Number of Generations 3000
  • 29. Rule Set Optimization Through GA • After applying GA on the rule set, the following reduction of rules in rule set is seen: 9/23/2015 29MS Thesis Defense Datasets # of Rules Optimized Set of Rules TAE 90 72 Zoo 25 20 Balance Scale 280 190 Tic-Tac-Toe 131 122 Car Evaluation 221 209 Breast-Cancer 50 41 Mushroom 20 17 Nursery 62 48
  • 30. Application Areas And Significance • Improved efficiency due to parallelism . • Overcoming memory limitations. • Computation reusability. • Continuous learning system. • Scalability • One generic classifier. • Heterogeneous and flexible classifier. • Improved Accuracy. 9/23/2015 30MS Thesis Defense
  • 31. Results And Findings • The methodology that I adopted for the compilation of results is as following. • First and the most critical part was the selection of data sets . • 8 different well known state of the art data sets were used in the experimentation process each of them is of different size so that we can test our technique against all size of datasets. 9/23/2015 31MS Thesis Defense
  • 32. Results And Findings • First of all data is divided into three sets that are. 1. Training set(66%) 2. Validation set(17%) 3. Test set(17%) • Training set was further divided into n small partitions. • Each partition was used to build a separate classifier. • Then all these classifiers were combined to make one final classifier. 9/23/2015 32MS Thesis Defense
  • 33. Results And Findings • Final classifier is then optimized with GA. • At this stage validation set is used for tuning and optimization purpose. • Finally the optimized classifier is used for the evaluation purpose here the test set is used and results are computed over it. 9/23/2015 33MS Thesis Defense
  • 34. Results And Findings • Following is the percentage split details of training, validation and test data sets: 9/23/2015 34MS Thesis Defense Data Sets Details Zoo Discrete, 66% training 17% validation,17% test set Balance Scale Discrete, 66% training 17% validation,17% test set Tic-Tac-Toe Discrete, 66% training 17% validation,17% test set Car Evaluation Discrete, 66% training 17% validation,17% test set Mushroom Discrete, 66% training 17% validation,17% test set
  • 35. Results And Findings • Results before optimization are as following. 9/23/2015 35MS Thesis Defense Data Sets Accuracy Zoo 88% Balance Scale 35% Tic-Tac-Toe 88% Car Evaluation 86% Mushroom 100%
  • 36. Results And Findings • Results After optimization are as following. 9/23/2015 36MS Thesis Defense Data Sets Accuracy Zoo 95% Balance Scale 73% Tic-Tac-Toe 89% Car Evaluation 98.6% Mushroom 100%
  • 37. Future Work • In every proposed technique, there is always room for improvement and enhancements. Proposed technique can be extended further in the following directions: • Particle swarm optimization algorithms can be chosen to optimize the rule set. • This technique divide dataset horizontally we can consider dividing dataset vertically that may result in improved accuracy. • Further work on parameter optimization can be done. 9/23/2015 37MS Thesis Defense
  • 38. Questions ?? 9/23/2015 38MS Thesis Defense

Editor's Notes

  1. Mitchell’s definition does not reflect anything related to knowledge acquisition process for the stated computer programs, therefore it is considered insufficient in our domain of research.
  2. Supervised Learning: covers learning algorithms that reason from externally provided instances as input to produce general hypothesis and make predictions about future unknown instances.
  3. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  4. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  5. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  6. Neural nets and support vector machines are now-a-days considered to be the state-of-the-art
  7. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  8. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  9. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  10. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  11. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  12. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  13. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  14. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  15. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  16. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  17. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.
  18. each partition will get equal number of instances and same type of data, that will eventually result in similar kind of decision rules.