Classification techniques (i.e., choice tree strategies) are suggested when the information mining task contains characterizations or expectations of results, and the objective is to produce decides that can be handily clarified and converted into SQL or a characteristic question language. Classification tree marks, records, and allots factors to discrete classes. A Classification tree can likewise give a proportion of certainty that the order is right.
Clustering is the process for dividing/separating the population or data points into several groups & each group has similar data points. Clustering is an unsupervised approach.
The document provides an overview of decision tree learning algorithms:
- Decision trees are a supervised learning method that can represent discrete functions and efficiently process large datasets.
- Basic algorithms like ID3 use a top-down greedy search to build decision trees by selecting attributes that best split the training data at each node.
- The quality of a split is typically measured by metrics like information gain, with the goal of creating pure, homogeneous child nodes.
- Fully grown trees may overfit, so algorithms incorporate a bias toward smaller, simpler trees with informative splits near the root.
Basic of Decision Tree Learning. This slide includes definition of decision tree, basic example, basic construction of a decision tree, mathlab example
This document provides an introduction to machine learning with Apache Mahout. It defines machine learning as a branch of artificial intelligence that uses statistics and large datasets to make smart decisions. Common applications include spam filtering, credit card fraud detection, medical diagnostics, and search engines. Apache Mahout is a platform for machine learning algorithms that allows users to build their own algorithms or use existing functionality like recommender engines, classification, and clustering.
A lot of people talk about Data Mining, Machine Learning and Big Data. It clearly must be important, right?
A lot of people are also trying to sell you snake oil - sometimes half-arsed and overpriced products or solutions promising a world of insight into your customers or users if you handover your data to them. Instead, trying to understanding your own data and what you could do with it, should be the first thing you’d be looking at.
In this talk, we’ll introduce some basic terminology about Data and Text Mining as well as Machine Learning and will have a look at what you can on your own to understand more about your data and discover patterns in your data.
This document discusses decision trees and entropy. It begins by providing examples of binary and numeric decision trees used for classification. It then describes characteristics of decision trees such as nodes, edges, and paths. Decision trees are used for classification by organizing attributes, values, and outcomes. The document explains how to build decision trees using a top-down approach and discusses splitting nodes based on attribute type. It introduces the concept of entropy from information theory and how it can measure the uncertainty in data for classification. Entropy is the minimum number of questions needed to identify an unknown value.
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDustiBuckner14
This document discusses machine learning classification methods. It provides an example of using a decision tree to classify tax refund requests based on attributes like marital status, income, and refund amount. It explains how decision trees are constructed using a training set of records to learn a model, which can then be applied to a test set to classify new records. The document outlines the process of splitting nodes based on attribute tests to minimize impurity measures like Gini index or classification error.
This document discusses decision tree induction and attribute selection measures. It describes common measures like information gain, gain ratio, and Gini index that are used to select the best splitting attribute at each node in decision tree construction. It provides examples to illustrate information gain calculation for both discrete and continuous attributes. The document also discusses techniques for handling large datasets like SLIQ and SPRINT that build decision trees in a scalable manner by maintaining attribute value lists.
Clustering is the process for dividing/separating the population or data points into several groups & each group has similar data points. Clustering is an unsupervised approach.
The document provides an overview of decision tree learning algorithms:
- Decision trees are a supervised learning method that can represent discrete functions and efficiently process large datasets.
- Basic algorithms like ID3 use a top-down greedy search to build decision trees by selecting attributes that best split the training data at each node.
- The quality of a split is typically measured by metrics like information gain, with the goal of creating pure, homogeneous child nodes.
- Fully grown trees may overfit, so algorithms incorporate a bias toward smaller, simpler trees with informative splits near the root.
Basic of Decision Tree Learning. This slide includes definition of decision tree, basic example, basic construction of a decision tree, mathlab example
This document provides an introduction to machine learning with Apache Mahout. It defines machine learning as a branch of artificial intelligence that uses statistics and large datasets to make smart decisions. Common applications include spam filtering, credit card fraud detection, medical diagnostics, and search engines. Apache Mahout is a platform for machine learning algorithms that allows users to build their own algorithms or use existing functionality like recommender engines, classification, and clustering.
A lot of people talk about Data Mining, Machine Learning and Big Data. It clearly must be important, right?
A lot of people are also trying to sell you snake oil - sometimes half-arsed and overpriced products or solutions promising a world of insight into your customers or users if you handover your data to them. Instead, trying to understanding your own data and what you could do with it, should be the first thing you’d be looking at.
In this talk, we’ll introduce some basic terminology about Data and Text Mining as well as Machine Learning and will have a look at what you can on your own to understand more about your data and discover patterns in your data.
This document discusses decision trees and entropy. It begins by providing examples of binary and numeric decision trees used for classification. It then describes characteristics of decision trees such as nodes, edges, and paths. Decision trees are used for classification by organizing attributes, values, and outcomes. The document explains how to build decision trees using a top-down approach and discusses splitting nodes based on attribute type. It introduces the concept of entropy from information theory and how it can measure the uncertainty in data for classification. Entropy is the minimum number of questions needed to identify an unknown value.
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDustiBuckner14
This document discusses machine learning classification methods. It provides an example of using a decision tree to classify tax refund requests based on attributes like marital status, income, and refund amount. It explains how decision trees are constructed using a training set of records to learn a model, which can then be applied to a test set to classify new records. The document outlines the process of splitting nodes based on attribute tests to minimize impurity measures like Gini index or classification error.
This document discusses decision tree induction and attribute selection measures. It describes common measures like information gain, gain ratio, and Gini index that are used to select the best splitting attribute at each node in decision tree construction. It provides examples to illustrate information gain calculation for both discrete and continuous attributes. The document also discusses techniques for handling large datasets like SLIQ and SPRINT that build decision trees in a scalable manner by maintaining attribute value lists.
Machine learning algorithms can learn through supervised, unsupervised, or reinforcement learning. Supervised learning involves providing labeled examples to learn a function that maps inputs to outputs. Unsupervised learning identifies hidden patterns in unlabeled data. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment. Machine learning has applications in areas like computer vision, natural language processing, medical diagnosis, and more.
The document discusses decision trees and decision tree learning algorithms. It defines decision trees as tree-structured models that represent a series of decisions that lead to an outcome. Each node in the tree represents a test on an attribute, and branches represent outcomes of the test. It describes how decision tree learning algorithms work by recursively splitting the data into purer subsets based on attribute values, until a leaf node is reached that predicts the label. The document discusses information gain and Gini impurity as metrics for selecting the best attribute to split on at each node to gain the most information about the label.
This document outlines an agenda for a data science boot camp covering various machine learning topics over several hours. The agenda includes discussions of decision trees, ensembles, random forests, data modelling, and clustering. It also provides examples of data leakage problems and discusses the importance of evaluating model performance. Homework assignments involve building models with Weka and identifying the minimum attributes needed to distinguish between red and white wines.
The document discusses clean code principles such as writing code for readability by other programmers, using meaningful names, following the DRY principle of not repeating yourself, and focusing on writing code that is maintainable and changeable. It provides examples of clean code versus less clean code and emphasizes that code is written primarily for human consumption by other programmers, not for computers. The document also discusses principles like the Single Responsibility Principle and the Boy Scout Rule of leaving the code cleaner than how you found it. It questions how to measure clean code and emphasizes the importance of writing tests for code and refactoring legacy code without tests.
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
This technical session provides a hands-on introduction to TensorFlow using Keras in the Python programming language. TensorFlow is Google’s scalable, distributed, GPU-powered compute graph engine that machine learning practitioners used for deep learning. Keras provides a Python-based API that makes it easy to create well-known types of neural networks in TensorFlow. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to train neural networks of much greater complexity. Deep learning allows a model to learn hierarchies of information in a way that is similar to the function of the human brain.
Classification: Basic Concepts and Decision Treessathish sak
Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the values of other attributes.
Goal: previously unseen records should be assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
The document provides a summary of various machine learning algorithms and their key features:
- K-nearest neighbors is interpretable, handles small data well but not noise, with no automatic feature learning. Prediction and training are fast.
- Linear regression is interpretable, handles small data and irrelevant features well, with fast prediction and training but requires feature scaling.
- Decision trees are somewhat interpretable with average accuracy, handling small data and irrelevant features depending on algorithm. Prediction and training speed varies by algorithm.
- Random forests have less interpretability than decision trees but higher accuracy, handling small data and noise better depending on settings. Prediction and training speed varies.
- Neural networks generally have the lowest interpretability but can automatically
The document provides an overview of data mining concepts including association rules, classification, clustering, and applications of data mining. It discusses association rule mining algorithms like Apriori and FP-growth, decision tree algorithms for classification, and k-means clustering. It also lists some commercial data mining tools and potential applications in various domains like marketing, risk analysis, manufacturing, and bioinformatics.
The document provides an overview of data mining concepts including association rules, classification, clustering, and applications of data mining. It discusses association rule mining algorithms like Apriori and FP-growth, decision tree algorithms for classification, and K-means clustering. Commercial data mining tools from companies like Oracle, SAS, and IBM are also mentioned. The document concludes that data mining can be used to discover patterns in many types of data and the results may include association rules, sequential patterns, and classification trees.
The document provides an overview of data mining concepts including association rules, classification, and clustering algorithms. It introduces data mining and knowledge discovery processes. Association rule mining aims to find relationships between variables in large datasets using the Apriori and FP-growth algorithms. Classification algorithms build a model to predict class membership for new records based on a decision tree. Clustering algorithms group similar records together without predefined classes.
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
University Course Timetabling by using Multi Objective Genetic AlgortihmsHalil Kaşkavalcı
University course timetabling is a research area in combinatorial optimization. Since the problem is an NP-Hard problem, exhaustive search is not feasible. Thus, smarter methods need to be applied. University timetables should be feasible and decent. Constraints introduced by faculty and department can be categorized as hard and soft objectives. Hard objectives should be satisfied strictly, whereas soft objectives should be fulfilled as much as possible. In this work, Yeditepe University Computer and Engineering department’s course timetabling problem is solved by using multi objective genetic algorithms. Yeditepe University’s timetabling problem introduces constraints which are not covered in literature. YU constraints are handled in this work and graphical user interface is implemented for a user friendly experience for the program.
The document discusses decision tree models in machine learning. It begins by defining key terms like decision nodes, branches, and leaf nodes. It then explains how decision trees are built in a top-down manner by recursively splitting the training data based on selected attributes. The document also covers different algorithms for building decision trees like ID3, C4.5, and CART. It discusses measures used for attribute selection like information gain, gain ratio, and Gini index. Finally, it provides an example of how to build a decision tree to classify whether to play tennis based on weather attributes.
The document discusses clustering customers based on their purchasing behavior using the k-means clustering algorithm. It begins with defining clustering and k-means clustering. It then outlines the steps of the k-means algorithm and provides an example. The document describes the dataset used, which contains customer purchase and demographic information. It then shows the Python code to implement k-means clustering on the dataset to group customers into clusters based on annual income and spending score.
- The document discusses decision trees, a type of supervised machine learning model that can be used for classification or regression problems.
- Decision trees represent the data in a tree structure, with internal nodes representing attributes and leaf nodes representing class labels.
- An example decision tree is presented to classify whether a company will be profitable or not based on attributes like age, competition type, and product type. The document then walks through calculating information gain and entropy to determine the optimal attributes to use at each node in the decision tree.
Data mining and machine learning techniques like classification and clustering are increasingly being used to extract useful information from large datasets. Data mining helps provide better customer service and aids scientists in hypothesis formation by analyzing patterns in data from various sources like business transactions, sensor networks, and scientific experiments. Classification algorithms such as decision trees can be applied to datasets containing attributes for individuals and a target variable to predict, like credit worthiness, to build a predictive model. Clustering algorithms like K-means group unlabeled data into clusters without a predefined target variable to discover hidden patterns in the data.
This presentation educates you about top data science project ideas for Beginner, Intermediate and Advanced. the ideas such as Fake News Detection Using Python, Data Science Project on, Detecting Forest Fire, Detection of Road Lane Lines, Project on Sentimental Analysis, Speech Recognition, Developing Chatbots, Detection of Credit Card Fraud and Customer Segmentations etc:
For more topics stay tuned with Learnbay.
This presentation educate you about how to create table using Python MySQL with example syntax and Creating a table in MySQL using python.
For more topics stay tuned with Learnbay.
Machine learning algorithms can learn through supervised, unsupervised, or reinforcement learning. Supervised learning involves providing labeled examples to learn a function that maps inputs to outputs. Unsupervised learning identifies hidden patterns in unlabeled data. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment. Machine learning has applications in areas like computer vision, natural language processing, medical diagnosis, and more.
The document discusses decision trees and decision tree learning algorithms. It defines decision trees as tree-structured models that represent a series of decisions that lead to an outcome. Each node in the tree represents a test on an attribute, and branches represent outcomes of the test. It describes how decision tree learning algorithms work by recursively splitting the data into purer subsets based on attribute values, until a leaf node is reached that predicts the label. The document discusses information gain and Gini impurity as metrics for selecting the best attribute to split on at each node to gain the most information about the label.
This document outlines an agenda for a data science boot camp covering various machine learning topics over several hours. The agenda includes discussions of decision trees, ensembles, random forests, data modelling, and clustering. It also provides examples of data leakage problems and discusses the importance of evaluating model performance. Homework assignments involve building models with Weka and identifying the minimum attributes needed to distinguish between red and white wines.
The document discusses clean code principles such as writing code for readability by other programmers, using meaningful names, following the DRY principle of not repeating yourself, and focusing on writing code that is maintainable and changeable. It provides examples of clean code versus less clean code and emphasizes that code is written primarily for human consumption by other programmers, not for computers. The document also discusses principles like the Single Responsibility Principle and the Boy Scout Rule of leaving the code cleaner than how you found it. It questions how to measure clean code and emphasizes the importance of writing tests for code and refactoring legacy code without tests.
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
This technical session provides a hands-on introduction to TensorFlow using Keras in the Python programming language. TensorFlow is Google’s scalable, distributed, GPU-powered compute graph engine that machine learning practitioners used for deep learning. Keras provides a Python-based API that makes it easy to create well-known types of neural networks in TensorFlow. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to train neural networks of much greater complexity. Deep learning allows a model to learn hierarchies of information in a way that is similar to the function of the human brain.
Classification: Basic Concepts and Decision Treessathish sak
Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the values of other attributes.
Goal: previously unseen records should be assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
The document provides a summary of various machine learning algorithms and their key features:
- K-nearest neighbors is interpretable, handles small data well but not noise, with no automatic feature learning. Prediction and training are fast.
- Linear regression is interpretable, handles small data and irrelevant features well, with fast prediction and training but requires feature scaling.
- Decision trees are somewhat interpretable with average accuracy, handling small data and irrelevant features depending on algorithm. Prediction and training speed varies by algorithm.
- Random forests have less interpretability than decision trees but higher accuracy, handling small data and noise better depending on settings. Prediction and training speed varies.
- Neural networks generally have the lowest interpretability but can automatically
The document provides an overview of data mining concepts including association rules, classification, clustering, and applications of data mining. It discusses association rule mining algorithms like Apriori and FP-growth, decision tree algorithms for classification, and k-means clustering. It also lists some commercial data mining tools and potential applications in various domains like marketing, risk analysis, manufacturing, and bioinformatics.
The document provides an overview of data mining concepts including association rules, classification, clustering, and applications of data mining. It discusses association rule mining algorithms like Apriori and FP-growth, decision tree algorithms for classification, and K-means clustering. Commercial data mining tools from companies like Oracle, SAS, and IBM are also mentioned. The document concludes that data mining can be used to discover patterns in many types of data and the results may include association rules, sequential patterns, and classification trees.
The document provides an overview of data mining concepts including association rules, classification, and clustering algorithms. It introduces data mining and knowledge discovery processes. Association rule mining aims to find relationships between variables in large datasets using the Apriori and FP-growth algorithms. Classification algorithms build a model to predict class membership for new records based on a decision tree. Clustering algorithms group similar records together without predefined classes.
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
University Course Timetabling by using Multi Objective Genetic AlgortihmsHalil Kaşkavalcı
University course timetabling is a research area in combinatorial optimization. Since the problem is an NP-Hard problem, exhaustive search is not feasible. Thus, smarter methods need to be applied. University timetables should be feasible and decent. Constraints introduced by faculty and department can be categorized as hard and soft objectives. Hard objectives should be satisfied strictly, whereas soft objectives should be fulfilled as much as possible. In this work, Yeditepe University Computer and Engineering department’s course timetabling problem is solved by using multi objective genetic algorithms. Yeditepe University’s timetabling problem introduces constraints which are not covered in literature. YU constraints are handled in this work and graphical user interface is implemented for a user friendly experience for the program.
The document discusses decision tree models in machine learning. It begins by defining key terms like decision nodes, branches, and leaf nodes. It then explains how decision trees are built in a top-down manner by recursively splitting the training data based on selected attributes. The document also covers different algorithms for building decision trees like ID3, C4.5, and CART. It discusses measures used for attribute selection like information gain, gain ratio, and Gini index. Finally, it provides an example of how to build a decision tree to classify whether to play tennis based on weather attributes.
The document discusses clustering customers based on their purchasing behavior using the k-means clustering algorithm. It begins with defining clustering and k-means clustering. It then outlines the steps of the k-means algorithm and provides an example. The document describes the dataset used, which contains customer purchase and demographic information. It then shows the Python code to implement k-means clustering on the dataset to group customers into clusters based on annual income and spending score.
- The document discusses decision trees, a type of supervised machine learning model that can be used for classification or regression problems.
- Decision trees represent the data in a tree structure, with internal nodes representing attributes and leaf nodes representing class labels.
- An example decision tree is presented to classify whether a company will be profitable or not based on attributes like age, competition type, and product type. The document then walks through calculating information gain and entropy to determine the optimal attributes to use at each node in the decision tree.
Data mining and machine learning techniques like classification and clustering are increasingly being used to extract useful information from large datasets. Data mining helps provide better customer service and aids scientists in hypothesis formation by analyzing patterns in data from various sources like business transactions, sensor networks, and scientific experiments. Classification algorithms such as decision trees can be applied to datasets containing attributes for individuals and a target variable to predict, like credit worthiness, to build a predictive model. Clustering algorithms like K-means group unlabeled data into clusters without a predefined target variable to discover hidden patterns in the data.
This presentation educates you about top data science project ideas for Beginner, Intermediate and Advanced. the ideas such as Fake News Detection Using Python, Data Science Project on, Detecting Forest Fire, Detection of Road Lane Lines, Project on Sentimental Analysis, Speech Recognition, Developing Chatbots, Detection of Credit Card Fraud and Customer Segmentations etc:
For more topics stay tuned with Learnbay.
This presentation educate you about how to create table using Python MySQL with example syntax and Creating a table in MySQL using python.
For more topics stay tuned with Learnbay.
This presentation educates you about Python MySQL - Create Database and Creating a database in MySQL using python with sample program.
For more topics stay tuned with Learnbay.
This presentation educates you about Python MySQL - Database Connection, Python MySQL - Database Connection, Establishing connection with MySQL using python with sample program.
For more topics stay tuned with Learnbay.
This document discusses how to install and use the mysql-connector-python package to connect to a MySQL database from Python. It provides instructions on installing Python and PIP if needed, then using PIP to install the mysql-connector-python package. It also describes verifying the installation by importing the mysql.connector module in a Python script without errors.
This presentation educates you about AI - Issues and the types of issue, AI - Terminology with its list of frequently used terms in the domain of AI.
For more topics stay tuned with Learnbay.
This presentation educates you about AI - Fuzzy Logic Systems and its Implementation, Why Fuzzy Logic?, Why Fuzzy Logic?, Membership Function, Example of a Fuzzy Logic System and its Algorithm.
For more topics stay tuned with Learnbay.
This presentation educates you about AI - Working of ANNs, Machine Learning in ANNs, Back Propagation Algorithm, Bayesian Networks (BN), Building a Bayesian Network and Gather Relevant Information of Problem.
For more topics stay tuned with Learnbay.
This presentation educates you about AI- Neural Networks, Basic Structure of ANNs with a sample of ANN and Types of Artificial Neural Networks are Feedforward and Feedback.
For more topics stay tuned with Learnbay.
This presentation educates you about Artificial Intelligence - Robotics, What is Robotics?, Difference in Robot System and Other AI Program, Robot Locomotion, Components of a Robot and Applications of Robotics.
For more topics stay tuned with Learnbay.
This presentation educates you about Applications of Expert System, Expert System Technology, Development of Expert Systems: General Steps and Benefits of Expert Systems.
For more topics stay tuned with Learnbay.
This presentation educates you about AI - Components and Acquisition of Expert Systems and those are Knowledge Base, Knowledge Base and User Interface, AI - Expert Systems Limitation.
For more topics stay tuned with Learnbay.
This presentation educates you about AI - Expert Systems, Characteristics of Expert Systems, Capabilities of Expert Systems and Components of Expert Systems.
For more topics stay tuned with Learnbay.
This presentation educates you about AI - Natural Language Processing, Components of NLP (NLU and NLG), Difficulties in NLU and NLP Terminology and steps of NLP.
For more topics stay tuned with Learnbay.
This presentation educates you about AI - Popular Search Algorithms, Single Agent Pathfinding Problems, Search Terminology, Brute-Force Search Strategies, Breadth-First Search and Depth-First Search with example chart.
For more topics stay tuned with Learnbay.
This presentation educates you about AI - Agents & Environments, Agent Terminology, Rationality, What is Ideal Rational Agent?, The Structure of Intelligent Agents and Properties of Environment.
For more topics stay tuned with Learnbay.
This presentation educates you about Artificial Intelligence - Research Areas, Speech and Voice Recognition., Working of Speech and Voice Recognition Systems and Real Life Applications of Research Areas.
For more topics stay tuned with Learnbay.
This presentation educates you about Artificial intelligence composed and those are Reasoning, Learning, Problem Solving, Perception and Linguistic Intelligence.
For more topics stay tuned with Learnbay.
This presentation educates you about Artificial Intelligence - Intelligent Systems, Types of Intelligence, Linguistic intelligence, Musical intelligence, Logical-mathematical intelligence, Spatial intelligence, Bodily-Kinesthetic intelligence, Intra-personal intelligence and Interpersonal intelligence.
For more topics stay tuned with Learnbay.
This presentation educates you about Applications of Artificial Intelligence such as Intelligent Robots, Handwriting Recognition, Speech Recognition, Vision Systems and so more.
For more topics stay tuned with Learnbay.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
Creative Restart 2024: Mike Martin - Finding a way around “no”Taste
Ideas that are good for business and good for the world that we live in, are what I’m passionate about.
Some ideas take a year to make, some take 8 years. I want to share two projects that best illustrate this and why it is never good to stop at “no”.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
How to Setup Default Value for a Field in Odoo 17Celine George
In Odoo, we can set a default value for a field during the creation of a record for a model. We have many methods in odoo for setting a default value to the field.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxOH TEIK BIN
(A Free eBook comprising 3 Sets of Presentation of a selection of Puzzles, Brain Teasers and Thinking Problems to exercise both the mind and the Right and Left Brain. To help keep the mind and brain fit and healthy. Good for both the young and old alike.
Answers are given for all the puzzles and problems.)
With Metta,
Bro. Oh Teik Bin 🙏🤓🤔🥰
Brand Guideline of Bashundhara A4 Paper - 2024khabri85
It outlines the basic identity elements such as symbol, logotype, colors, and typefaces. It provides examples of applying the identity to materials like letterhead, business cards, reports, folders, and websites.
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
2. Enriching training and learningsession…
§ Training Checklist
– Sitting arrangementF2F
– Quality over Quantity
– Everyone to have their own machinesfor
hands-on practice
– Illuminated and happy glowingtraining
room (no candle light dinnerambience)
– Anyone wanting to step-out, feel free
– Feel free to ask for breaks
– Feel free to ask same question againtill
you understand
– Let me know if you want me toskip
Practice Exercises in between the
session
– Brief side-talks areokay
– I don’t speak to walls, respect each
other
Involvement
Content Duration
Enriching
Training
Visit: Learnbay.co
4. Learning Objectives
§ What is ClassificationTechnique?
§ CHAID, CART, C4.5 Intro
§ Gini Gain Computation
§ Why are Classification Tree algorithmsRecursive?
§ What is pre-pruning and post-pruning in ClassificationTree?
§ What is Loss?
§ What is Validation? What is Cross-Validation?
§ Why you should avoid over-fitting?
§ Performance Measure
Visit: Learnbay.co
6. What is Classification?
The action or process of classifying something
according to shared qualities or characteristics.
Visit: Learnbay.co
7. Defining Characteristics of each animalclassification
§ Mammals – Mammals are vertebrates (backboned animals). Mammals are
warm-blooded and have hair. Mammals are able to move around using
limbs
§ Birds – Birds are warm-blooded vertebrates, having a body covered with
feathers, forelimbs modified into wings, scaly legs, a beak, and no teeth, and
bearing young ones in a hard-shelledegg
§ Insects – any of small invertebrate animals which typically have a well
defined head, thorax, and abdomen, only three pairs of legs, and typically
one or two pair of wings
§ Amphibian - any cold-blooded vertebrate that live on land but breed in water
§ Reptiles - class of cold-blooded air-breathing vertebrates withcompletely
ossified skeleton and a body usually covered with scales or horny plates
§ Fish - Alimbless cold-blooded vertebrate animal with gills and fins and living
wholly in water
Visit: Learnbay.co
8. Why Classify?
To Explain (Profile)
Explaining in the classification world is called Profiling
or
ToPredict (Classify)
Predicting the class of new records is called Classifying
Visit: Learnbay.co
9. Win Back Campaign Classification Analysis
RootNode
Leaf Node
Leaf/Node
TerminalNode
InRteorontaNlNodoede
LienChrg>5K LienChrg1Kto 5K LienChrg<1K AccBalance<1000 AccBalance>=1000
Dud 1,550 16% Dud 1,250 13% Dud 1,200 12% Dud 1,234 12% Dud 1,340 13%
W.B. 421 12% W.B. 601 17% W.B. 1,078 31% W.B. 152 4% W.B. 769 22%
W.B.% 27.2% W.B.% 48.1% W.B.% 89.8% W.B.% 12.3% W.B.% 57.4%
AccTypeSAL=TRUE AccTypeSAL=FALSE Gender =Female Gender =Male CntTxnsLastActive
Mth <10
CntTxnsLastActive
Mth >=10
Dud 275 3% Dud 1,275 13% Dud 450 5% Dud 800 8% Dud 311 3% Dud 1,029 10%
W.B. 70 2% W.B. 351 10% W.B. 129 4% W.B. 472 13% W.B. 85 2% W.B. 684 20%
W.B.% 25.5% W.B.% 27.5% W.B.% 28.7% W.B.% 59.0% W.B.% 27.3% W.B.% 66.5%
Gender =Male
Gender =Female
t TxnsLastActiveMth<
Dud 540 5% Dud 735 7% Dud 250 3%
W.B. 300 9% W.B. 51 1% W.B. 35 1%
W.B.% 55.6% W.B.% 6.9% W.B.% 14.0%
Total
Dud 10,000 100%
W.B. 3,500 100%
W.B.% 35.0%
Ina
ct
ive<6 Mths Inactive 6- 12Mths Inactive>12Mths
Dud 3,426 34%
Dud 4,000 40% Dud 2574 26% W.B. 479 14%
W.B. 2,100 60% W.B. 921 26% W.B.% 14.0%
W.B.% 52.5% W.B.% 35.8%
CntTxnsLastActive
Mth >=10
Dud 550 6%
W.B. 437 12%
W.B.% 79.5%
Dud Dud Accounts(Inactivefor
longperiod)
W.B. WinBack
Visit: Learnbay.co
10. Main issues of classification tree learning
§ Choosing the splitting criterion
– Impurity based criteria
– Information gain
– Statistical measures ofassociation
§ Binary or multiway splits
– Multiway split
– Binary split
§ Finding the right sized tree
– Pre-pruning
– Post-pruning
Visit: Learnbay.co
11. Popular Classification Techniques
§ CHAID - CHi-squared Automatic Interaction Detector. The “Chi-
squared” part of the name arises because the technique essentially
involves automatically constructing many cross-tabs, and working
out statistical significance of the proportions. The most significant
relationships are used to control the structure of a treediagram
– CHAID is a non-binary decision tree; Recursive PartitioningAlgorithm
– Continuous variables must be grouped into a finite number of bins to
create categories.
§ CLASSIFICATION AND REGRESSION TREES (CART) are binary
decision trees, which split a single variable at each node.
– The CART algorithm recursively goes though an exhaustive search ofall
variables and split values to find the optimal splitting rule for each node.
§ C4.5 builds decision trees from a set of training data using the
concept of information entropy
Visit: Learnbay.co
14. K2Analytics.co.in
CART | Splitting Criteria
§ CART uses the Gini Index as measure of impurity
§ Gini of a Node
(NOTE: p( j | t) is the relative frequencyof
class j at node t).
§ Gini of Split Node is computed as Weighted Avg Gini of each Node
at Split Node level
ni = number of records at childi,
n = Totalnumber of records in parent node
§ Gini Gain = Gini(t) – Gini(split)
www.cs.kent.edu/~jin/DM07/ClassificationDecisionTree.ppt
Visit: Learnbay.co
15. Gini calculations
Root Node
N:10; T:4
M
N: 6; T:3
F
N: 4; T:1
Gender
Cust_ID Gender Occupation Age Target
1 M Sal 22 1
2 M Sal 22 0
3 M Self-Emp 23 1
4 M Self-Emp 23 0
5 M Self-Emp 24 1
6 M Self-Emp 24 0
7 F Sal 25 1
8 F Sal 25 0
9 F Sal 26 0
10 F Self-Emp 26 0
Node Gini Computation Formula Gini Index
Overall = 1 - ( (4/10)^2 + (6/10)^2 ) 0.48
Gender = M = 1 - ( (3/6)^2 + (3/6)^2) 0.50
Gender = F = 1 - ( (1/4)^2 + (3/4)^2) 0.375
Gender = (6/10) * 0.5 + (4/10) *0.375 0.45
Gini Gain = Gini (Overall) – Gini (Gender) 0.03
Visit: Learnbay.co
17. Exercise… Compute Gini Gain
Root Node
N:100; T:40
M
N: 25; T:10
F
N: 75; T:30
Visits > 3
Y N
Visit: Learnbay.co
18. Sampling…
## Creating Development and ValidationSample
##dummy_df = pd.read_csv("/home/utkarsh/Desktop/bank.csv", na_values =['NA'])
##x_train, x_test, y_train, y_test = train_test_split(x,y,test_size =0.5)
CTDF.dev <- pd.read_csv("datafile/DEV_SAMPLE.csv", sep = ",", header = T)
CTDF.holdout <- pd.read_csv ("datafile/HOLDOUT_SAMPLE.csv", sep = ",", header = T)
SamplingCode
Separate Dev & Val
samples areprovidedas
such we will directly
import them rather than
use samplingcode
Visit: Learnbay.co
19. Decision Tree code to build CART Tree
## installing rpart package forCART
# from sklearn.model_selection importtrain_test_split
# from sklearn.tree import DecisionTreeClassifier
# import matplotlib.pyplot as plt from sklearn.externals.six #
# import StringIO from IPython.display import Image
# from sklearn.tree import export_graphviz
# import pydotplus
## calling the Decision Tree functionto buildthe tree
model_dt = DecisionTreeClassifier(max_depth = 8, criterion =“gini“,
min_samples_split = 100, min_sample_leaf = 10 )
Visit: Learnbay.co
20. Decision Tree control arguments
§ Min_samples_split: the minimum number of observations that must existin
a nodein order for a split to beattempted.
§ Min_samples_leaf: the minimum number of observations in any terminal
leaf node. If only one of min_samples_leaf or min_samples_split is specified,
the code either sets min_samples_split to min_samples_leaf*3 or
min_samples_leaf to min_samples_split/3,as appropriate.
§ max_depth: The maximum depth of the tree.if NONE then nodes are
expanded until all leaves are pure or until all leaves contains less than
min_samples_split samples.
§ Criterion: The function to measure the quality of the split. It can be “gini” for
the gini impurity and “entropy” for the information gain.
Visit: Learnbay.co
21. Loss, Mis-Classification Error and Response Rate
§ Loss is the number of cases mis-
classified in a given node
§ Mis-Classification Error is the
ratio of total number of cases mis-
classified to total number ofcases
– We are interested in mis-
classification error for the fulltree
§ Response Rate is the ratio of
number of responders (Target=
1) to the total number ofcases
– We are interested in findingnodes
where the response rate is very
high
# Obs : 9,182
# Target =1 :443
# Target= 0 : 8,739
# Obs : 4,818
# Target =1 :792
# Target= 0 : 4,026
# Obs : 600
# Target =1 :400
# Target= 0 : 200
# Obs : 4,218
# Target =1 :392
# Target= 0 : 3,826
Root Node
# Obs : 14,000
# Target =1 :1,235
# Target = 0 :12,765
N
Holding Period >=10
Y
ABC >X
What is the mis-classification error for the abovetree?
Visit: Learnbay.co
22. Plotting the Classification Tree
l
)
Let us exportthe
output to PDF
format to havea
clear view ofthe
tree
Visit: Learnbay.co
23. Concepts | Greedy Algorithm
Make 31 Paise using any combination of above coins
Optimal solution with few coins : 25 + 5 + 1
What if the 5 paise coin is not there?
Optimal solution with few coins : 10 * 3 + 1
Greedy Algorithm solution: 25 + 1 * 6 Visit: Learnbay.co
24. Concepts | Cross Validation
§ Cross Validation is
part of the CART
algorithm
§ Method to see how
well the model
performs tounseen
data
§ Typically xval
parameter for cross-
validation is set to10
KFoldCV P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Fold1 Train Train Train Train Train Train Train Train Train Test
Fold2 Train Train Train Train Train Train Train Train Test Train
Fold3 Train Train Train Train Train Train Train Test Train Train
Fold4 Train Train Train Train Train Train Test Train Train Train
Fold5 Train Train Train Train Train Test Train Train Train Train
Fold6 Train Train Train Train Test Train Train Train Train Train
Fold7 Train Train Train Test Train Train Train Train Train Train
Fold8 Train Train Test Train Train Train Train Train Train Train
Fold9 Train Test Train Train Train Train Train Train Train Train
Fold10 Test Train Train Train Train Train Train Train Train Train
Visit: Learnbay.co
25. Concepts | Over-fitting
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Training
Data
Test
Data
0 10 20 30 40 50 60 70 80 90 100
Tree Size(No. of Nodes)
Accuracy
§ If you grow the tree
too long you will run
the risk of over-fitting
§ Classification model
may not work well on
unseen data
How do we avoid Over-fitting?
Stopping Rule: don’t expand a node if the impurity reductionof the best
split is below somethreshold
Pruning: grow a very large tree and merge backnodes
Visit: Learnbay.co
26. Concepts | Parsimony Principle & Re-substitution Error
§ Parsimony principle is basic to all
science and tells us to choose the
simplest scientific explanation that
fits the evidence.
§ Resubstitution Error: It measures
what fraction of the cases in a node
is classified incorrectly if we assign
every case to the majority class in
that node; It always favours large
tree
§ Tocounter balance the
resubstitution error we need a
penalty component that favours
smaller tree
Sub-tree Node
530 ; 113;0
Node 14
122; 10;0
Node 15
408; 103;0
Node 30
388; 90;0
Node 31
20; 7; 1
SCR <334
Y N
Gender:M,O
Re (prunded) = 113 /530
Re (leaves) = 107 /530
Visit: Learnbay.co
27. Cost Component Pruning
§ “cost-complexity” – a measure of avg. error reduced per leaf
§ Calculate number of errors for each node if collapsed toleaf
§ Compare to errors in leaves, taking into account more nodes used
Sub-tree Node
530 ; 113;0
Node 14
122; 10;0
Node 15
408; 103;0
Node 30
388; 90;0
Node 31
20; 7; 1
SCR <334
Y N
Gender:M,O
Re (prunded) + 1 a
= Re (leaves) +3
a
113 / 530 + 1 a =107/ 530+3
a
a =
0.0056
Visit: Learnbay.co
28. Pruning
§ Pruning is Basically the average cost complexity reduced perleaf
in a Decision Tree.
§ Generally It’s a hit & try method to get the accuracy improved over
the depth of tree getting reduced or average number of nodes
reduced without over fitting.
§ Practically, We creates a Tree structure which is getting refined on
certain pre-assumptions for improving the performance and
accuracy of a Decision Treeclassifier
http://stats.stackexchange.com/questions/92547/r-rpart-cross-validation-and-1-se-rule-why-is-the-column-in-cptable-called-xst
https://stats.stackexchange.com/questions/13471/how-to-choose-the-number-of-splits-in-rpart
Visit: Learnbay.co
30. Model Evaluation
Various measures to see the model performance
§ Error Matrix
§ Gini Coefficient
§ AUC
§ KS
§ Lift Chart
https://www.youtube.com/watch?v=OAl6eAyP-yo
Demo of Rattle interfaceto
build model and generate
various model evaluation
measures
Visit: Learnbay.co
32. Area Under Curve
Sensitivity = True PositiveRate
= True Positive / TotalPositive
= a / (a + b)
Specificity = True Negative / TotalNegative
= d / (c + d)
False Positive Rate = 1 -Specificity
Classification Matrix Predicted
Y N
Actual Y a b
N c d
Visit: Learnbay.co