SlideShare a Scribd company logo
1 of 22
Practical Machine Learning Tools and Techniques
Decision Trees Dealing with numeric attributes Standard method: binary splits  Steps to decide where to split:  Evaluate info gain for every possible split point of attribute Choose “best” split point But this is computationally intensive
Decision Trees Example Split on temperature attribute:              64  65  68  69  70   71  72  72  75  75  80   81  83  85            Yes  No  Yes  YesYes  No  No  Yes  YesYes  No  Yes  Yes  No temperature < 71.5: yes/4, no/2 temperature > 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3])  = 0.939 bits
Decision Trees Dealing with missing values: Split instances with missing values into pieces A piece going down a branch receives a weight  proportional to the popularity of the branch weights sum to 1
Decision Trees Pruning  Making the decision tree less complex by removing cases of over fitting  We have two types of pruning: Prepruning: Trying to decide during tree building Postpruning: Doing pruning after the tree has been constructed The two types of postpruning thatare generally used are: Subtree replacement  Subtree raising  To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
Decision Trees Subtree raising:
Decision Trees Subtree replacement
Classification rules Criteria for choosing  tests: p/t ratio Maximizes the ratio of positive instances with stress on accuracy p[log(p/t) – log(p/t)] Maximizes the number of positive instances with lesser accuracy
Classification rules Generating good rules: We can remove over fitting by either pruning of trees during construction or after they have been fully constructed To prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
Classification rules Obtaining rules from partial decision trees: Algorithm
Classification rules
Classification rules As the node 4 was not replaced, we stop at this stage Now each leaf node gives us a possible rule Choose the leaf which covers the greatest number of instances
Extending linear models Support vector machines: Support vector machines are algorithms for learning linear classifier They use maximum marginal hyper plane: removes over fitting The instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
Extending linear models
Extending linear models Support vector machines: The hyper plane can be written as: Support vector: All instances for which alpha(i) > 0 b and alpha are determined using software packages The hyper plane can also be written using kernel as:
Extending linear models Multilayer perceptron: We can create a network of perceptron to approximate arbitrary target concepts  Multilayer perceptron is an example of an artificial neural network Consists of: input layer, hidden layer(s), and output layer   Structure of MLP is usually found by experimentation Parameters can be found using backpropagation
Extending linear models Examples:
Extending linear models Back propagation: f(x) = 1/(1+exp(-x)) Error = ½(y-f(x))^2 So we try to minimize the error and get: Now just calculate the above expression for all training instances and do:       w(i) = w(i) – L(dE/dw) We assume values of w in the starting
Clustering Incremental clustering: Steps Tree consists of empty root node Add instances one by one Update tree at appropriately at each stage  To update, find the right leaf for an instance  May involve restructuring the tree Restructuring: Merging and Replacement Decisions are made using category utility
Clustering Example of incremental clustering:
EM Algorithm EM = Expectation­Maximization  Generalize k­means to probabilistic setting Iterative procedure: E “expectation” step:      Calculate cluster probability for each instance  M “maximization” step:      Estimate distribution parameters from cluster       probabilities Store cluster probabilities as instance weights Stop when improvement is negligible
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

More Related Content

What's hot

SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
butest
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
PyData
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
Aashay Harlalka
 

What's hot (18)

Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Id3 algorithm
Id3 algorithmId3 algorithm
Id3 algorithm
 
SVM
SVMSVM
SVM
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
support vector machine and associative classification
support vector machine and associative classificationsupport vector machine and associative classification
support vector machine and associative classification
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
 
Data types vbnet
Data types vbnetData types vbnet
Data types vbnet
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 

Viewers also liked

HistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN IiHistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN Ii
lara
 
Direct-services portfolio
Direct-services portfolioDirect-services portfolio
Direct-services portfolio
vlastakolaja
 

Viewers also liked (20)

Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)
 
Portavocía en redes sociales
Portavocía en redes socialesPortavocía en redes sociales
Portavocía en redes sociales
 
Huidige status van de testtaal TTCN-3
Huidige status van de testtaal TTCN-3Huidige status van de testtaal TTCN-3
Huidige status van de testtaal TTCN-3
 
R Environment
R EnvironmentR Environment
R Environment
 
BI: Open Source
BI: Open SourceBI: Open Source
BI: Open Source
 
Jive Clearspace Best#2598 C8
Jive  Clearspace  Best#2598 C8Jive  Clearspace  Best#2598 C8
Jive Clearspace Best#2598 C8
 
Oracle: DML
Oracle: DMLOracle: DML
Oracle: DML
 
Anime
AnimeAnime
Anime
 
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
 
SQL Server: BI
SQL Server: BISQL Server: BI
SQL Server: BI
 
WEKA: Output Knowledge Representation
WEKA: Output Knowledge RepresentationWEKA: Output Knowledge Representation
WEKA: Output Knowledge Representation
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
Kidical Mass Presentation
Kidical Mass PresentationKidical Mass Presentation
Kidical Mass Presentation
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
 
HistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN IiHistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN Ii
 
Ccc
CccCcc
Ccc
 
Direct-services portfolio
Direct-services portfolioDirect-services portfolio
Direct-services portfolio
 
System Init
System InitSystem Init
System Init
 
Anime
AnimeAnime
Anime
 

Similar to WEKA: Practical Machine Learning Tools And Techniques

Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
ESCOM
 

Similar to WEKA: Practical Machine Learning Tools And Techniques (20)

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
eam2
eam2eam2
eam2
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Chapter 17
Chapter 17Chapter 17
Chapter 17
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
 

More from DataminingTools Inc

More from DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

WEKA: Practical Machine Learning Tools And Techniques

  • 2. Decision Trees Dealing with numeric attributes Standard method: binary splits Steps to decide where to split:  Evaluate info gain for every possible split point of attribute Choose “best” split point But this is computationally intensive
  • 3. Decision Trees Example Split on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes YesYes No No Yes YesYes No Yes Yes No temperature < 71.5: yes/4, no/2 temperature > 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3])  = 0.939 bits
  • 4. Decision Trees Dealing with missing values: Split instances with missing values into pieces A piece going down a branch receives a weight  proportional to the popularity of the branch weights sum to 1
  • 5. Decision Trees Pruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning: Prepruning: Trying to decide during tree building Postpruning: Doing pruning after the tree has been constructed The two types of postpruning thatare generally used are: Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
  • 8. Classification rules Criteria for choosing tests: p/t ratio Maximizes the ratio of positive instances with stress on accuracy p[log(p/t) – log(p/t)] Maximizes the number of positive instances with lesser accuracy
  • 9. Classification rules Generating good rules: We can remove over fitting by either pruning of trees during construction or after they have been fully constructed To prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
  • 10. Classification rules Obtaining rules from partial decision trees: Algorithm
  • 12. Classification rules As the node 4 was not replaced, we stop at this stage Now each leaf node gives us a possible rule Choose the leaf which covers the greatest number of instances
  • 13. Extending linear models Support vector machines: Support vector machines are algorithms for learning linear classifier They use maximum marginal hyper plane: removes over fitting The instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
  • 15. Extending linear models Support vector machines: The hyper plane can be written as: Support vector: All instances for which alpha(i) > 0 b and alpha are determined using software packages The hyper plane can also be written using kernel as:
  • 16. Extending linear models Multilayer perceptron: We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural network Consists of: input layer, hidden layer(s), and output layer  Structure of MLP is usually found by experimentation Parameters can be found using backpropagation
  • 18. Extending linear models Back propagation: f(x) = 1/(1+exp(-x)) Error = ½(y-f(x))^2 So we try to minimize the error and get: Now just calculate the above expression for all training instances and do: w(i) = w(i) – L(dE/dw) We assume values of w in the starting
  • 19. Clustering Incremental clustering: Steps Tree consists of empty root node Add instances one by one Update tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the tree Restructuring: Merging and Replacement Decisions are made using category utility
  • 20. Clustering Example of incremental clustering:
  • 21. EM Algorithm EM = Expectation­Maximization  Generalize k­means to probabilistic setting Iterative procedure: E “expectation” step: Calculate cluster probability for each instance  M “maximization” step: Estimate distribution parameters from cluster  probabilities Store cluster probabilities as instance weights Stop when improvement is negligible
  • 22. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net