SlideShare a Scribd company logo
Practical Machine Learning Tools and Techniques
Decision Trees Dealing with numeric attributes Standard method: binary splits  Steps to decide where to split:  Evaluate info gain for every possible split point of attribute Choose “best” split point But this is computationally intensive
Decision Trees Example Split on temperature attribute:              64  65  68  69  70   71  72  72  75  75  80   81  83  85            Yes  No  Yes  YesYes  No  No  Yes  YesYes  No  Yes  Yes  No temperature < 71.5: yes/4, no/2 temperature > 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3])  = 0.939 bits
Decision Trees Dealing with missing values: Split instances with missing values into pieces A piece going down a branch receives a weight  proportional to the popularity of the branch weights sum to 1
Decision Trees Pruning  Making the decision tree less complex by removing cases of over fitting  We have two types of pruning: Prepruning: Trying to decide during tree building Postpruning: Doing pruning after the tree has been constructed The two types of postpruning thatare generally used are: Subtree replacement  Subtree raising  To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
Decision Trees Subtree raising:
Decision Trees Subtree replacement
Classification rules Criteria for choosing  tests: p/t ratio Maximizes the ratio of positive instances with stress on accuracy p[log(p/t) – log(p/t)] Maximizes the number of positive instances with lesser accuracy
Classification rules Generating good rules: We can remove over fitting by either pruning of trees during construction or after they have been fully constructed To prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
Classification rules Obtaining rules from partial decision trees: Algorithm
Classification rules
Classification rules As the node 4 was not replaced, we stop at this stage Now each leaf node gives us a possible rule Choose the leaf which covers the greatest number of instances
Extending linear models Support vector machines: Support vector machines are algorithms for learning linear classifier They use maximum marginal hyper plane: removes over fitting The instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
Extending linear models
Extending linear models Support vector machines: The hyper plane can be written as: Support vector: All instances for which alpha(i) > 0 b and alpha are determined using software packages The hyper plane can also be written using kernel as:
Extending linear models Multilayer perceptron: We can create a network of perceptron to approximate arbitrary target concepts  Multilayer perceptron is an example of an artificial neural network Consists of: input layer, hidden layer(s), and output layer   Structure of MLP is usually found by experimentation Parameters can be found using backpropagation
Extending linear models Examples:
Extending linear models Back propagation: f(x) = 1/(1+exp(-x)) Error = ½(y-f(x))^2 So we try to minimize the error and get: Now just calculate the above expression for all training instances and do:       w(i) = w(i) – L(dE/dw) We assume values of w in the starting
Clustering Incremental clustering: Steps Tree consists of empty root node Add instances one by one Update tree at appropriately at each stage  To update, find the right leaf for an instance  May involve restructuring the tree Restructuring: Merging and Replacement Decisions are made using category utility
Clustering Example of incremental clustering:
EM Algorithm EM = Expectation­Maximization  Generalize k­means to probabilistic setting Iterative procedure: E “expectation” step:      Calculate cluster probability for each instance  M “maximization” step:      Estimate distribution parameters from cluster       probabilities Store cluster probabilities as instance weights Stop when improvement is negligible
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

More Related Content

What's hot

Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
Carlo Carandang
 
Id3 algorithm
Id3 algorithmId3 algorithm
Id3 algorithm
SreekuttanJayakumar
 
SVM
SVMSVM
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
Learnbay Datascience
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
butest
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
Anandha L Ranganathan
 
support vector machine and associative classification
support vector machine and associative classificationsupport vector machine and associative classification
support vector machine and associative classification
rajshreemuthiah
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
Ruta Kambli
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
PyData
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
Aashay Harlalka
 
Data types vbnet
Data types vbnetData types vbnet
Data types vbnet
nicky_walters
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
Josh Patterson
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 

What's hot (18)

Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Id3 algorithm
Id3 algorithmId3 algorithm
Id3 algorithm
 
SVM
SVMSVM
SVM
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
support vector machine and associative classification
support vector machine and associative classificationsupport vector machine and associative classification
support vector machine and associative classification
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanAccelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
 
Data types vbnet
Data types vbnetData types vbnet
Data types vbnet
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 

Viewers also liked

Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
Albert Orriols-Puig
 
Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)
Bernardo Najlis
 
Portavocía en redes sociales
Portavocía en redes socialesPortavocía en redes sociales
Portavocía en redes sociales
Muévete en bici por Madrid
 
Huidige status van de testtaal TTCN-3
Huidige status van de testtaal TTCN-3Huidige status van de testtaal TTCN-3
Huidige status van de testtaal TTCN-3
Erik Altena
 
R Environment
R EnvironmentR Environment
R Environment
DataminingTools Inc
 
BI: Open Source
BI: Open SourceBI: Open Source
BI: Open Source
DataminingTools Inc
 
Jive Clearspace Best#2598 C8
Jive  Clearspace  Best#2598 C8Jive  Clearspace  Best#2598 C8
Jive Clearspace Best#2598 C8
mrshamilton1b
 
Oracle: DML
Oracle: DMLOracle: DML
Oracle: DML
DataminingTools Inc
 
Anime
AnimeAnime
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
DataminingTools Inc
 
SQL Server: BI
SQL Server: BISQL Server: BI
SQL Server: BI
DataminingTools Inc
 
WEKA: Output Knowledge Representation
WEKA: Output Knowledge RepresentationWEKA: Output Knowledge Representation
WEKA: Output Knowledge Representation
DataminingTools Inc
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
DataminingTools Inc
 
Kidical Mass Presentation
Kidical Mass PresentationKidical Mass Presentation
Kidical Mass Presentation
Eugene SRTS
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
DataminingTools Inc
 
HistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN IiHistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN Iilara
 
Ccc
CccCcc
Direct-services portfolio
Direct-services portfolioDirect-services portfolio
Direct-services portfolio
vlastakolaja
 
System Init
System InitSystem Init
System Init
cntlinux
 
Anime
AnimeAnime

Viewers also liked (20)

Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)
 
Portavocía en redes sociales
Portavocía en redes socialesPortavocía en redes sociales
Portavocía en redes sociales
 
Huidige status van de testtaal TTCN-3
Huidige status van de testtaal TTCN-3Huidige status van de testtaal TTCN-3
Huidige status van de testtaal TTCN-3
 
R Environment
R EnvironmentR Environment
R Environment
 
BI: Open Source
BI: Open SourceBI: Open Source
BI: Open Source
 
Jive Clearspace Best#2598 C8
Jive  Clearspace  Best#2598 C8Jive  Clearspace  Best#2598 C8
Jive Clearspace Best#2598 C8
 
Oracle: DML
Oracle: DMLOracle: DML
Oracle: DML
 
Anime
AnimeAnime
Anime
 
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
 
SQL Server: BI
SQL Server: BISQL Server: BI
SQL Server: BI
 
WEKA: Output Knowledge Representation
WEKA: Output Knowledge RepresentationWEKA: Output Knowledge Representation
WEKA: Output Knowledge Representation
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
Kidical Mass Presentation
Kidical Mass PresentationKidical Mass Presentation
Kidical Mass Presentation
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
 
HistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN IiHistoriografíA Latina LatíN Ii
HistoriografíA Latina LatíN Ii
 
Ccc
CccCcc
Ccc
 
Direct-services portfolio
Direct-services portfolioDirect-services portfolio
Direct-services portfolio
 
System Init
System InitSystem Init
System Init
 
Anime
AnimeAnime
Anime
 

Similar to WEKA: Practical Machine Learning Tools And Techniques

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
Chitrachitrap
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
JayabharathiMuraliku
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
Dr. Radhey Shyam
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
ssuser2023c6
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
eam2
eam2eam2
eam2
butest
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
IJECEIAES
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
Chapter 17
Chapter 17Chapter 17
Chapter 17
ashish bansal
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
Padma Metta
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
byteLAKE
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
ssuserf07225
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
ESCOM
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
weka Content
 

Similar to WEKA: Practical Machine Learning Tools And Techniques (20)

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
eam2
eam2eam2
eam2
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Chapter 17
Chapter 17Chapter 17
Chapter 17
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
 

More from DataminingTools Inc

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
DataminingTools Inc
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
DataminingTools Inc
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
DataminingTools Inc
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
DataminingTools Inc
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
DataminingTools Inc
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
DataminingTools Inc
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
DataminingTools Inc
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
DataminingTools Inc
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
DataminingTools Inc
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
DataminingTools Inc
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
DataminingTools Inc
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
DataminingTools Inc
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
DataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
DataminingTools Inc
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
DataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
DataminingTools Inc
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
DataminingTools Inc
 

More from DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Recently uploaded

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 

Recently uploaded (20)

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 

WEKA: Practical Machine Learning Tools And Techniques

  • 2. Decision Trees Dealing with numeric attributes Standard method: binary splits Steps to decide where to split:  Evaluate info gain for every possible split point of attribute Choose “best” split point But this is computationally intensive
  • 3. Decision Trees Example Split on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes YesYes No No Yes YesYes No Yes Yes No temperature < 71.5: yes/4, no/2 temperature > 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3])  = 0.939 bits
  • 4. Decision Trees Dealing with missing values: Split instances with missing values into pieces A piece going down a branch receives a weight  proportional to the popularity of the branch weights sum to 1
  • 5. Decision Trees Pruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning: Prepruning: Trying to decide during tree building Postpruning: Doing pruning after the tree has been constructed The two types of postpruning thatare generally used are: Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
  • 8. Classification rules Criteria for choosing tests: p/t ratio Maximizes the ratio of positive instances with stress on accuracy p[log(p/t) – log(p/t)] Maximizes the number of positive instances with lesser accuracy
  • 9. Classification rules Generating good rules: We can remove over fitting by either pruning of trees during construction or after they have been fully constructed To prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
  • 10. Classification rules Obtaining rules from partial decision trees: Algorithm
  • 12. Classification rules As the node 4 was not replaced, we stop at this stage Now each leaf node gives us a possible rule Choose the leaf which covers the greatest number of instances
  • 13. Extending linear models Support vector machines: Support vector machines are algorithms for learning linear classifier They use maximum marginal hyper plane: removes over fitting The instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
  • 15. Extending linear models Support vector machines: The hyper plane can be written as: Support vector: All instances for which alpha(i) > 0 b and alpha are determined using software packages The hyper plane can also be written using kernel as:
  • 16. Extending linear models Multilayer perceptron: We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural network Consists of: input layer, hidden layer(s), and output layer  Structure of MLP is usually found by experimentation Parameters can be found using backpropagation
  • 18. Extending linear models Back propagation: f(x) = 1/(1+exp(-x)) Error = ½(y-f(x))^2 So we try to minimize the error and get: Now just calculate the above expression for all training instances and do: w(i) = w(i) – L(dE/dw) We assume values of w in the starting
  • 19. Clustering Incremental clustering: Steps Tree consists of empty root node Add instances one by one Update tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the tree Restructuring: Merging and Replacement Decisions are made using category utility
  • 20. Clustering Example of incremental clustering:
  • 21. EM Algorithm EM = Expectation­Maximization  Generalize k­means to probabilistic setting Iterative procedure: E “expectation” step: Calculate cluster probability for each instance  M “maximization” step: Estimate distribution parameters from cluster  probabilities Store cluster probabilities as instance weights Stop when improvement is negligible
  • 22. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net