Visualizing Probabilistic Classification
Data in WEKA
Supervised by
Dr. Bilal Alsallakh
Designed, Implemented and Tested by
Mohammed Saed Haj Ali
Kinda Altarbouch
F.I.T.E of Damascus, Syria – AI Department 2015
Probabilistic Multi-class Classification
Sample
Probabilistic
Classifier
Classes
Probability
Probabilistic
Classifier
Probabilistic Multi-class Classification
Classes
Probability
Probabilistic Classification Data
[UCI Machine Learning Repository]
Samples
Input Output
Neural Nets
HMMs
Naive Bayesian
Data Mining and Machine Learning Platforms
Data Mining and Machine Learning Platforms
Analyzing Probabilistic
Classification Data
Interactive Exploration Environment
Classification Results
Confusion Matrix
Classification Results
Probabilistic
Classifier
Analyze Probability Distributions
100%
0
100%
0
100%
0
• High vs. Low
discrimination (e.g. “4”
vs. “8”)
• high probability vs.
correct classification
(e.g. inconsistent for “5”)
ProbabilityinProbabilityinProbabilityin
Analyze Probability Distributions
100%
0
100%
0
100%
0
• High vs. Low
discrimination (e.g. “4”
vs. “8”)
• high probability vs.
correct classification
(e.g. inconsistent for “5”)
ProbabilityinProbabilityinProbabilityin
Interactive Selection
Can a data feature separate correctly and incorrectly classified
samples?
Sort features by
separability:
• Median distance
Group Separability
Improvement > 8%
Usage Scenarios
Visual Inspection
Correct Actual Class from C5 to C0
Analyzing Classifier Behavior
Defining Post-Classification Rules
Rectify certain false negatives
Rectify certain false positives
Conclusion
Classification probabilities
rich of information
explains classifier behavior
Interactive exploration
Reveals several insights
Guides new improvements
More work is needed!
Improvement analysis, comparison, ...
Future Works
Separability by 2-features
X8 and Y3
Defining Post-Classification Rules
Rectify certain false negatives
Rectify certain false positives
Integration with Further Data Mining
Frameworks
The implementation is modular and well-separated from WEKA
Integration with further Java-based Frameworks (KNIME, RapidMiner) is
straightforward
Conclusion
Classification probabilities
rich of information
explains classifier behavior
Interactive exploration
Reveals several insights
Guides new improvements
More work is needed!
Improvement analysis, comparison, ...
Conclusion
A Visual Inspection Tool for Classification Data
▪ Integrated with WEKA
▪ Works with any classifier
Helps in
▪ Undertanding classifier behavior
▪ Finding bugs in data and algorithms
▪ Improving classification performance
Summary
Conclusion
Classification probabilities
rich of information
explains classifier behavior
Interactive exploration
Reveals several insights
Guides new improvements
More work is needed!
Improvement analysis, comparison, ...
Backup Slides
Task C: Group Separability
Can a data feature separate correctly and incorrectly classified samples?
Sort features by
separability:
• p-Value
• F-measure
Defining Post-Classification Rules
Use a different classifier for selected samples
Neural Nets
Majority vote
Naïve Bayisian
Samples Filter
Probability Sub-Histograms
Task A: Analyze Probability Distributions
Separability by 2-features
• E.g. X8 and Y3
• Open question
Multi-class Classification
Sample Classes
Classifier
72%
0 1 2 3 4 5 6 7 8 9
Probabilistic Classifiers
Neural Nets Hidden Markov Models Naive Bayesian
Feature Analysis
Defining post-classification rules

Visualizing probabilistic classification data in weka