Parallel Rule Generation For Efficient Classification System

Parallel Rule Generation for an Efficient
Classification System
Talha Ghaffar
MS(CS)9/23/2015 MS Thesis Defense

Scope of Presentation
• Introduction
• Background Study & Literature Review
• Proposed Technique
• Applications And Research Contribution
• Implementation
• Experimental Results
• Future Work
• Conclusion
9/23/2015 2MS Thesis Defense

Introduction
• Nowadays, many organizations utilize large databases for
analytical purposes
• With growing size of training data, researchers are converging
their research towards development or improvement of the data
mining techniques to match up the growth
• Major challenges while handling complex and large data:
– Sifting through the data efficiently
– Extracting relevant and useful information accurately
– Analyzing the extracted information and guiding
organizations decisions and actions reliably

Introduction ctd.
• Limited Computational Resources
• It seems inefficient applying sequential data mining techniques
with inherent drawbacks of taking long response time.
• Research suggests when data mining techniques are
implemented on parallel machines, improved processing and
response time is achieved
• Classification: Core tasks in data mining, the field that is
concerned with the extraction of knowledge or patterns from
databases through the building of predictive or
descriptive models (Learning Models)

Att1 Att3 Att2 Class
No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply Model
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
Classifier
e.g. IF then
Action
Training Set
Test Set

No Small 40 K No
No Medium 20K No
Yes Large 120k Yes
Yes Small 70 K Yes
No Medium 45 K Yes
Learn Model
Apply
Model
No Small 25 K ?
Yes Medium 20 K ?
Yes Large 100 K ?
No Small 30 K ?
Yes Small 55 K ?
e.g. IF then
Action
Training Set
Test Set

Background Study
• Machine Learning: need to incorporate two important elements
that are computer based knowledge acquisition process and has
to state where skills or knowledge can be obtained.
– Mitchell says the concept of machine learning as a study of computer
algorithms that improve through experience automatically.
– Alpaydin defines machine learning as “the capability of the computer
program to acquire or develop new knowledge or skills from existing or
non existing examples for the sake of optimizing performance criterion”.
• Contrary to the Mitchell’s definition which lacks knowledge
acquisition process, this definition is of more preference to the
research domain

Background Study ctd.
• Building individual classifiers on subsets of the data sets using
appropriate learning models will result in accurate set of rules
on individual machines.
• Possible drop-off due to the parallelism can be reduced to the
nearly possible.
• Redundancy of records is possible which will need to be
removed while applying the subset approach.
• In the context of my work, the use of subsets for training is the
better and efficient approach.
• Supervised Learning

• Several methods for classification have been introduced over the
years - e.g. decision trees, artificial neural networks, nearest
neighbor classifiers, support vector machines and so on
• Decision trees have decent accuracy and moreover are easier to
interpret, which is a crucial advantage when it comes to data
mining
• I also suspect that once an algorithm gains acceptance, it takes
time before scalable and parallelized versions of that algorithm
appear. For these reasons, decision trees are preferred

• Common techniques that are used to overcome
the problem of large datasets and memory
limitations are as following.
1. Data sampling.
2. Feature selection.
3. Data pre-processing.
4. Parallel processing.

• Parallel Approaches:
• Independent Partitioning
– Each processor is provided the complete data set
– All processors processes the same data set as input, builds and generate
rules according the set and then combined afterwards on combining
techniques
• Parallel Sequential Partitioning
– Every processor is allowed to generate particular subset of concepts
• Replicated Sequential Partitioning
– Each processor processes one particular partition of the data set
horizontally, and executes which is more or less the sequential algorithm,
as each processor can view only partial information
– Local set of concepts which is after coordinated to add up in the global
set of concepts.

Combining Rules Description
Maximum Rule It seems reasonable i-e select the classifier with the
maximum confidence values. This rule can generate adverse
results if the particular classifier with maximum confidence
values is over-trained.
Sum Rule This rule is effective if every individual classifier is
independent of each other
When large set of similar classifiers is generated, it is helpful
to reduce the noise in large sets of so-called weak classifiers
Minimum Rule This rule selects the outcome of the classifier that has the
least objection against a certain class.
Product rule This rule is effective if every individual classifier is
independent of each other
Median Rule This rule is similar to the sum rule but may yield more robust
results.

Proposed Technique
• A three step approach which divides the very large dataset into
data chunks initially, processes it on defined N processors on
different machines, generates the final merged decision rule file
and resolves the conflicts that may arise later on
– Data Pre-Processing
– Parallel Rule Generation
– Rule Merging and Conflict Resolution
• Data Pre-Processing
– In this step, we divide the large dataset into N (N=user specified number)
smaller datasets.
– Round robin approach is being used, which gives random symmetric
distribution of data

Training Set
Small Data
Chunks
Round
Robin
Data Pre-Processing Parallel Rule Generation
Small Data
Chunks
P
1
P
2
P
3
P
N
Learning
Algo.
IF
then
Action
IF
then
Action
IF
then
Action
IF
then
Action
Rule
Merging
&
Conflict
Res
IF {
} then {
Action; }
Rule Merging and Conflict
Resolution

Proposed Technique
• Parallel Rule Generation:
– In this step, each of the smaller dataset from previous step is given to
different processors so that the process of classification can be
performed in parallel on each processor.
– We can use any classification algorithm for generating rules or we can use
multiple classifiers on different processors for rule generation.
– These rules are in the form of if-then-else. The rules that each processor
will generate will only be valid for the data that is provided to that
particular processor only.
– It is a possibility that the rules generated on one processor may conflict
with rules generated at another processor and it is also possible that
more than one processor generate the same rules

Proposed Technique
• Parallel Rule Generation:
– Two additional steps need to be performed at this stage that are
1. Calculation of support for each individual rule and store it with each
rule.
2. Calculation of confidence for each individual rule and store it with each
rule.

Proposed Technique
• Rule Merging and Conflict Resolution:
• In this step, the rules generated by all the processors are
combined to get the final and complete rule set. While merging
the rules we encounter these problems:
– Redundancy of rules i-e same rules occurring more than one time.
– Conflicting rules i-e different decisions with same rules

Proposed Technique
• Use sufficiently large data sets on each processor that will
reduce the probability of conflicting rules and increase the
probability of similar rules
• Make the data distribution on each processor random so that
the distribution is un-biased and every processor will get almost
similar type of data and will produce almost similar rule set
• Take union of all rule sets that will remove the occurring of
single rule more than one times and it will also include all
possible unique rules
• If conflict appears, select the rule with more coverage
confidence

Proposed Technique
• If conflict appears, select the rule with more coverage
confidence
• This rule will be selected with more coverage and confidence.

Proposed Technique
• Conflicting rules with same support and different confidence
• This rule will be selected with more confidence.

Proposed Technique
• Conflicting rules with same confidence and different support
• This rule will be selected with more Support .

Proposed Technique
• Conflicting rules with different confidence and different support.
• In that case we can use the formula.
X = α(confidence)+β(support)
• Where α and β are two variables whose values can be between
0 and 1 and such that sum of their values is always 1.

Proposed Technique
• Once the conflict resolution is over the next step is optimization
of the results.
• For that purpose GA is used.

Rule Set Optimization Through GA
• Genetic algorithms are from family of computational models
which are based on biological evolution.
• One complete solution is represented in the form a simple
vector called chromosome.
• Set of chromosomes is called generation.
• Solution is evolved from one generation to another on the basis
of a fitness function, selection criteria and reproduction
operators.
• Final rule set that is obtained after conflict resolution and
combining individual rule sets is further optimized with the help
of GA.
• After applying GA not only no of rules in final classifier reduce
but also accuracy is increased.

• In our case representation of problem is as follow.
• In the proposed solution, rule set is encoded as the
chromosome using the string of bits where each bit representing
one rule
• 1 represents presence while 0 represents the absence of a rule
in chromosome.

• Algorithm is initialized with random generation .
• Fitness function is used to calculate the fitness of each classifier
so that new generation can be selected.
• In our case fitness function is simply the sum of confidence of
rules present in that cromosome.
• Cromosome with more fitness value is considered as candidate
for next generation.
• Next generation is produced using two genetic operators that is
crossover and mutation.
• One point crossover is chosen in our case.

• Algorithm will stop on two stopping conditions.
1. No of maximum iteration is exhausted.
2. Algorithm is converged to optimal point and no further
convergence is possible.

• The parameters used for the GA are as following.
Parameters for GA Values
Cross-Over Rate 95%
Mutation Rate 5%
Population Size 5000
Number of Generations 3000

• After applying GA on the rule set, the following reduction of
rules in rule set is seen:
Datasets # of Rules Optimized Set
of Rules
TAE 90 72
Zoo 25 20
Balance Scale 280 190
Tic-Tac-Toe 131 122
Car Evaluation 221 209
Breast-Cancer 50 41
Mushroom 20 17
Nursery 62 48

Application Areas And Significance
• Improved efficiency due to parallelism .
• Overcoming memory limitations.
• Computation reusability.
• Continuous learning system.
• Scalability
• One generic classifier.
• Heterogeneous and flexible classifier.
• Improved Accuracy.

Results And Findings
• The methodology that I adopted for the compilation of results is
as following.
• First and the most critical part was the selection of data sets .
• 8 different well known state of the art data sets were used in the
experimentation process each of them is of different size so that
we can test our technique against all size of datasets.

• First of all data is divided into three sets that are.
1. Training set(66%)
2. Validation set(17%)
3. Test set(17%)
• Training set was further divided into n small partitions.
• Each partition was used to build a separate classifier.
• Then all these classifiers were combined to make one final
classifier.

• Final classifier is then optimized with GA.
• At this stage validation set is used for tuning and optimization
purpose.
• Finally the optimized classifier is used for the evaluation
purpose here the test set is used and results are computed over
it.

• Following is the percentage split details of training, validation
and test data sets:
Data Sets Details
Zoo
Discrete, 66% training 17% validation,17% test set
Balance Scale
Tic-Tac-Toe
Car Evaluation
Mushroom

• Results before optimization are as following.
Data Sets Accuracy
Zoo 88%
Balance Scale 35%
Tic-Tac-Toe 88%
Car Evaluation 86%
Mushroom 100%

• Results After optimization are as following.
Data Sets Accuracy
Zoo 95%
Balance Scale 73%
Tic-Tac-Toe 89%
Car Evaluation 98.6%
Mushroom 100%

Future Work
• In every proposed technique, there is always room for
improvement and enhancements. Proposed technique can be
extended further in the following directions:
• Particle swarm optimization algorithms can be chosen to
optimize the rule set.
• This technique divide dataset horizontally we can consider
dividing dataset vertically that may result in improved accuracy.
• Further work on parameter optimization can be done.

Questions ??

Parallel Rule Generation For Efficient Classification System

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Parallel Rule Generation For Efficient Classification System

Similar to Parallel Rule Generation For Efficient Classification System (20)

Recently uploaded

Recently uploaded (20)

Parallel Rule Generation For Efficient Classification System

Editor's Notes