This document provides an overview of classification and decision tree induction. It discusses basic concepts of classification and prediction. Classification involves analyzing labeled datasets to build a model, while prediction involves forecasting future trends. Decision tree induction is then explained as a common classification technique. It involves learning classification rules from training data and using test data to evaluate the rules. The document outlines the decision tree induction process and algorithms. It also discusses attribute selection measures, pruning techniques, and compares decision trees to naive Bayesian classification.
DCOM (Distributed Component Object Model) and CORBA (Common Object Request Broker Architecture) are two popular distributed object models. In this paper, we make architectural comparison of DCOM and CORBA at three different layers: basic programming architecture, remoting architecture, and the wire protocol architecture.
DCOM (Distributed Component Object Model) and CORBA (Common Object Request Broker Architecture) are two popular distributed object models. In this paper, we make architectural comparison of DCOM and CORBA at three different layers: basic programming architecture, remoting architecture, and the wire protocol architecture.
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
IBM has a rich history with machine learning. One of its own, Arthur Samuel, is credited for coining the term, “machine learning” with his research (link resides outside ibm.com) around the game of checkers. Robert Nealey, the self-proclaimed checkers master, played the game on an IBM 7094 computer in 1962, and he lost to the computer. Compared to what can be done today, this feat seems trivial, but it’s considered a major milestone in the field of artificial intelligence.
Over the last couple of decades, the technological advances in storage and processing power have enabled some innovative products based on machine learning, such as Netflix’s recommendation engine and self-driving cars.
Machine learning is an important component of the growing field of data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, and to uncover key insights in data mining projects. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. As big data continues to expand and grow, the market demand for data scientists will increase. They will be required to help identify the most relevant business questions and the data to answer them.
Machine learning algorithms are typically created using frameworks that accelerate solution development, such as TensorFlow and PyTorch.
Related content
Subscribe to IBM newsletters
Begin your journey to AI
Learn how to scale AI
Explore the AI Academy
Machine Learning vs. Deep Learning vs. Neural Networks
Since deep learning and machine learning tend to be used interchangeably, it’s worth noting the nuances between the two. Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence. However, neural networks is actually a sub-field of machine learning, and deep learning is a sub-field of neural networks.
The way in which deep learning and machine learning differ is in how each algorithm learns. "Deep" machine learning can use labeled datasets, also known as supervised learning, to inform its algorithm, but it doesn’t necessarily require a labeled dataset. Deep learning can ingest unstructured data in its raw form (e.g., text or images), and it can automatically determine the set of features which distinguish different categories of data from one another. This eliminates some of the human intervention required and enables the use of larger data sets. You can think of deep learning as "scalable machine learning" as Lex Fridman notes in this MIT lecture (link resides outside ibm.com).
Classical, or "non-deep", machine learning is more dependent on human intervention to learn. Human experts determine the set of features to understand the differences between data inputs,
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
Abstract
Decision Tree learning algorithms have been able to capture knowledge successfully. Decision Trees are best considered when
instances are described by attribute-value pairs and when the target function has a discrete value. The main task of these
decision trees is to use inductive methods to the given values of attributes of an unknown object and determine an
appropriate classification by applying decision tree rules. Decision Trees are very effective forms to evaluate the performance
and represent the algorithms because of their robustness, simplicity, capability of handling numerical and categorical data,
ability to work with large datasets and comprehensibility to a name a few. There are various decision tree algorithms available
like ID3, CART, C4.5, VFDT, QUEST, CTREE, GUIDE, CHAID, CRUISE, etc. In this paper a comparative study on three of
these popular decision tree algorithms - (Iterative Dichotomizer 3), C4.5 which is an evolution of ID3 and VFDT (Very
Fast Decision Tree has been made. An empirical study has been conducted to compare C4.5 and VFDT in terms of accuracy
and execution time and various conclusions have been drawn.
Key Words: Decision tree, ID3, C4.5, VFDT, Information Gain, Gain Ratio, Gini Index, Over−fitting.
classification algorithms, decision tree, naive bayes, back propagation, KNN,TU, BIM 8th semester Data mining and data warehousing Slide by Tekendra Nath Yogi
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
Its all about Machine learning .Machine learning is a field of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming instructions. Instead, these algorithms learn from data, identifying patterns, and making decisions or predictions based on that data.
There are several types of machine learning approaches, including:
Supervised Learning: In this approach, the algorithm learns from labeled data, where each example is paired with a label or outcome. The algorithm aims to learn a mapping from inputs to outputs, such as classifying emails as spam or not spam.
Unsupervised Learning: Here, the algorithm learns from unlabeled data, seeking to find hidden patterns or structures within the data. Clustering algorithms, for instance, group similar data points together without any predefined labels.
Semi-Supervised Learning: This approach combines elements of supervised and unsupervised learning, typically by using a small amount of labeled data along with a large amount of unlabeled data to improve learning accuracy.
Reinforcement Learning: This paradigm involves an agent learning to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties, enabling it to learn the optimal behavior to maximize cumulative rewards over time.Machine learning algorithms can be applied to a wide range of tasks, including:
Classification: Assigning inputs to one of several categories. For example, classifying whether an email is spam or not.
Regression: Predicting a continuous value based on input features. For instance, predicting house prices based on features like square footage and location.
Clustering: Grouping similar data points together based on their characteristics.
Dimensionality Reduction: Reducing the number of input variables to simplify analysis and improve computational efficiency.
Recommendation Systems: Predicting user preferences and suggesting items or actions accordingly.
Natural Language Processing (NLP): Analyzing and generating human language text, enabling tasks like sentiment analysis, machine translation, and text summarization.
Machine learning has numerous applications across various domains, including healthcare, finance, marketing, cybersecurity, and more. It continues to be an area of active research and
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...ijcnes
Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining can be applied on the educational data for predicting the students behavior. This paper focuses on finding thesuitablealgorithm which yields the best result to find out the reason behind students absenteeism in an academic year. The first step in this processis to gather students data by using questionnaire.The datais collected from 123 under graduate students from a private college which is situated in a semirural area. The second step is to clean the data which is appropriate for mining purpose and choose the relevant attributes. In the final step, three different Decision tree induction algorithms namely, ID3(Iterative Dichotomiser), C4.5 and CART(Classification and Regression Tree)were applied for comparison of results for the same data sample collected using questionnaire. The results were compared to find the algorithm which yields the best result in predicting the reason for student s absenteeism.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
IBM has a rich history with machine learning. One of its own, Arthur Samuel, is credited for coining the term, “machine learning” with his research (link resides outside ibm.com) around the game of checkers. Robert Nealey, the self-proclaimed checkers master, played the game on an IBM 7094 computer in 1962, and he lost to the computer. Compared to what can be done today, this feat seems trivial, but it’s considered a major milestone in the field of artificial intelligence.
Over the last couple of decades, the technological advances in storage and processing power have enabled some innovative products based on machine learning, such as Netflix’s recommendation engine and self-driving cars.
Machine learning is an important component of the growing field of data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, and to uncover key insights in data mining projects. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. As big data continues to expand and grow, the market demand for data scientists will increase. They will be required to help identify the most relevant business questions and the data to answer them.
Machine learning algorithms are typically created using frameworks that accelerate solution development, such as TensorFlow and PyTorch.
Related content
Subscribe to IBM newsletters
Begin your journey to AI
Learn how to scale AI
Explore the AI Academy
Machine Learning vs. Deep Learning vs. Neural Networks
Since deep learning and machine learning tend to be used interchangeably, it’s worth noting the nuances between the two. Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence. However, neural networks is actually a sub-field of machine learning, and deep learning is a sub-field of neural networks.
The way in which deep learning and machine learning differ is in how each algorithm learns. "Deep" machine learning can use labeled datasets, also known as supervised learning, to inform its algorithm, but it doesn’t necessarily require a labeled dataset. Deep learning can ingest unstructured data in its raw form (e.g., text or images), and it can automatically determine the set of features which distinguish different categories of data from one another. This eliminates some of the human intervention required and enables the use of larger data sets. You can think of deep learning as "scalable machine learning" as Lex Fridman notes in this MIT lecture (link resides outside ibm.com).
Classical, or "non-deep", machine learning is more dependent on human intervention to learn. Human experts determine the set of features to understand the differences between data inputs,
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
Abstract
Decision Tree learning algorithms have been able to capture knowledge successfully. Decision Trees are best considered when
instances are described by attribute-value pairs and when the target function has a discrete value. The main task of these
decision trees is to use inductive methods to the given values of attributes of an unknown object and determine an
appropriate classification by applying decision tree rules. Decision Trees are very effective forms to evaluate the performance
and represent the algorithms because of their robustness, simplicity, capability of handling numerical and categorical data,
ability to work with large datasets and comprehensibility to a name a few. There are various decision tree algorithms available
like ID3, CART, C4.5, VFDT, QUEST, CTREE, GUIDE, CHAID, CRUISE, etc. In this paper a comparative study on three of
these popular decision tree algorithms - (Iterative Dichotomizer 3), C4.5 which is an evolution of ID3 and VFDT (Very
Fast Decision Tree has been made. An empirical study has been conducted to compare C4.5 and VFDT in terms of accuracy
and execution time and various conclusions have been drawn.
Key Words: Decision tree, ID3, C4.5, VFDT, Information Gain, Gain Ratio, Gini Index, Over−fitting.
classification algorithms, decision tree, naive bayes, back propagation, KNN,TU, BIM 8th semester Data mining and data warehousing Slide by Tekendra Nath Yogi
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
Its all about Machine learning .Machine learning is a field of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming instructions. Instead, these algorithms learn from data, identifying patterns, and making decisions or predictions based on that data.
There are several types of machine learning approaches, including:
Supervised Learning: In this approach, the algorithm learns from labeled data, where each example is paired with a label or outcome. The algorithm aims to learn a mapping from inputs to outputs, such as classifying emails as spam or not spam.
Unsupervised Learning: Here, the algorithm learns from unlabeled data, seeking to find hidden patterns or structures within the data. Clustering algorithms, for instance, group similar data points together without any predefined labels.
Semi-Supervised Learning: This approach combines elements of supervised and unsupervised learning, typically by using a small amount of labeled data along with a large amount of unlabeled data to improve learning accuracy.
Reinforcement Learning: This paradigm involves an agent learning to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties, enabling it to learn the optimal behavior to maximize cumulative rewards over time.Machine learning algorithms can be applied to a wide range of tasks, including:
Classification: Assigning inputs to one of several categories. For example, classifying whether an email is spam or not.
Regression: Predicting a continuous value based on input features. For instance, predicting house prices based on features like square footage and location.
Clustering: Grouping similar data points together based on their characteristics.
Dimensionality Reduction: Reducing the number of input variables to simplify analysis and improve computational efficiency.
Recommendation Systems: Predicting user preferences and suggesting items or actions accordingly.
Natural Language Processing (NLP): Analyzing and generating human language text, enabling tasks like sentiment analysis, machine translation, and text summarization.
Machine learning has numerous applications across various domains, including healthcare, finance, marketing, cybersecurity, and more. It continues to be an area of active research and
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...ijcnes
Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining can be applied on the educational data for predicting the students behavior. This paper focuses on finding thesuitablealgorithm which yields the best result to find out the reason behind students absenteeism in an academic year. The first step in this processis to gather students data by using questionnaire.The datais collected from 123 under graduate students from a private college which is situated in a semirural area. The second step is to clean the data which is appropriate for mining purpose and choose the relevant attributes. In the final step, three different Decision tree induction algorithms namely, ID3(Iterative Dichotomiser), C4.5 and CART(Classification and Regression Tree)were applied for comparison of results for the same data sample collected using questionnaire. The results were compared to find the algorithm which yields the best result in predicting the reason for student s absenteeism.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
2. Basic Concepts
• There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends.
» Classification
» Prediction
• Analysis the class datasets.
• Classification :- Example; Bank loan decision, Marketing, Medical datasets.
• Prediction :- How much a given customer will spend during a sale at his company.
3. How does classification work?
• Data classification is a two-step process,
» Learning step
» Classification step
• Learning steps :- It training the datasets are analyzed by a classification algorithm and
classifier is represented in the form of classification rules.
• Classification step :- Test data are used to estimate the accuracy of the classification rules.
If the accuracy is considered acceptable, the rules can be applied to the classification of
new data tuples.
8. Decision Tree Induction
• It includes a root node, branches, and leaf nodes.
• Each internal node denotes a test on an attribute,
• each branch denotes the outcome of a test,
• each leaf node holds a class label.
• The topmost node in the tree is the root node.
10. • Decision trees can easily be converted to classification rules.
• Decision trees can handle multidimensional data.
• The learning and classification steps of decision tree induction are
simple and fast.
• It have good accuracy.
• Application areas such as medicine, manufacturing and production,
financial analysis, astronomy, and molecular biology.
11. Decision Tree Induction Algorithm :-
Three parameters in DTI
• Input
• Method
• Output
• Input :-
– Data Partition , D (Training tuples and class label)
– Attribute_List
– Attribute_Selection_Method
• Methods :-
1. Create a node N;
2. if tuples in D are all of the same class, C, then
3. return N as a leaf node labeled with the class C;
12. 4. if attribute_list is empty then
5. return N as a leaf node labeled with the majority class in D;
6. apply Attribute_selection_method (D, attribute_list) to find best
splitting_attribute
7. label node N with splitting_criterion;
8. if splitting_attribute is discrete-valued and multiway splits allowed then
9. for each outcome j of splitting criterion
10. let Dj be the set of data tuples in D satisfying outcome j;
11. if Dj is empty then
12. attach a leaf labeled with the majority class in D to node N;
13. else attach the node returned by Generate decision tree(Dj, attribute list)
to node N;
endfor
14. return N;
• Output :-
A Decision Tree
13. There are three possible scenarios,
1. Discrete-valued
2. Continuous-valued
3. Discrete-valued and a binary tree
1. Discrete-valued :- If A is discrete-valued, then one branch is grown for each
known value of A.
14. 2. Continuous-valued :- If A is continuous-valued, then two branches are
grown, corresponding to A <= split point and A > split point.
where split point is the split-point returned by Attribute selection
method as part of the splitting criterion.
15. 3. Discrete-valued and a binary tree :- If A is discrete-valued and a binary tree
must be produced, then the test is of the form A ∈ SA, where SA is the
splitting subset for A.
16. Attribute Selection Measures
• It provides a ranking for each attribute.
• The attribute having the best score for the measure the splitting attribute for
the given tuples.
• Three popular attribute selection measures
i. Information gain,
ii.Gain ratio
iii.Gini index.
17. Information Gain
• The highest information gain is chosen as the splitting attribute for node N.
• This attribute minimizes the information needed to classify the tuples in the
resulting partitions and reflects the least randomness or “impurity” in these
partitions.
• Information gain is defined as the difference between the original information
requirement (i.e., based on just the proportion of classes) and the new
requirement (i.e., obtained after partitioning on A). That is,
• The expected information needed to classify a tuple in D is given by
• where pi is the probability that an arbitrary tuple in D belongs to class.
18. • For each indivudal attribute
• InfoA(D) is the expected information required to classify a tuple from D
based on the partitioning by A.
• A means Attribute.
19. Example Database
(How to select an attribute from given database)
Note : Highest value of the attribute is root node
20. • From given database,
• Two categories class(Buy's_Computer) {yes, no} yes --> 9 times no--> 5 times
total class --> 14.
= 0.940 bits
= 0.694 bits
21. Similarly, we can compute,
Gain(income) = 0.029 bits,
Gain(student) = 0.151 bits,
Gain(credit rating) = 0.048 bits.
Age has the highest information gain among the attributes, it is selected as
the splitting attribute. Node N is labeled with age, and branches are grown
for each of the attribute’s values.
25. • Similarly, we can compute,
Gain(student) = 0.971
Gain(credit_rating) = 0.021
Student has the highest information gain among the attributes, it is
selected as the splitting attribute.
28. Gain Ratio
• The information gain measure is biased toward tests with many outcomes.
• It prefers to select attributes having a large number of values.
• For example, consider an attribute that acts as a unique identifier such as
product ID. A split on product ID would result in a large number of
partitions (as many as there are values), each one containing just one tuple.
• Because each partition is pure, the information required to classify data set
D based on this partitioning would be Infoproduct ID (D) = 0.
• Therefore, the information gained by partitioning on this attribute is
maximal.
• such a partitioning is useless for classification.
29. • It applies a kind of normalization to informationgain using a
“splitinformation” value defined analogously with Info(D) as
• A successor of ID3, uses an extension to information gain known
as gain ratio,
30. Gini Index
• Gini index measures the impurity of D, a data partition or set of training
tuples, as
• Pi --> Probability of tuple in D belongs to class Ci and estimated by
| Ci , D |/|D|
--> If a binary split on A partitions D into D1 and D2, the Gini Index of D
is given that partitioning is
31. • The reduction in impurity that would be incurred by a binary split on a
discrete- or continuous-valued attribute A is
• The attribute that maximizes the reduction in impurity (or, equivalently, has
the minimum Gini index) is selected as the splitting attribute. This attribute
and either its splitting subset (for a discrete-valued splitting attribute) or
split-point (for a continuous-valued splitting attribute) together form the
splitting criterion.
• Example :- Induction of a decision tree using the Gini index. Let D be the
training data shown earlier in customers table where there are 9 tuples
belonging to the class buys computer D = yes and the remaining 5 tuples
belong to the class buys computer D = no. A (root) node N is created for
the tuples in D.We first use Eq. (8.7) for the Gini index to compute the
impurity of D:
32. • Solution :-
• To find the splitting criterion for the tuples in D, we need to compute the
Gini index for each attribute.
• Let’s start with the attribute income and consider each of the possible
splitting subsets.
• Consider the subset {low, medium}. This would result in 10 tuples in
partition D1 satisfying the condition “income Є {low, medium}.” The
remaining 4 tuples of D would be assigned to partition D2. The Gini index
value computed based on
34. • Similarly, for all attributes such as age, student, and credit-rating.
• Therefore Given minimum Gini Index overall, with a reduction in
impurity of
= 0.459 - 0.357
= 0.102 (overall attribute)
[ The binary split "age Є {Youth, senior}
The result in the maximum reduction in impurity of tuples in D and returned
as thesplitting criterion, Node N is labeled with the criterion, two branches are
grown from it, and the tuples are partitioned accordingly.
35. Tree Pruning
• In some datasets (ie, in large datasets) decision tree builts the tree which
have many branches, due to noise and outliers.
• Tree pruning methods address this problem of overfitting the data.
• So in such case typically use statistical measures to remove the least reliable
branches.
• “How does tree pruning work?” There are two common approaches to tree
pruning:
1.Prepruning
2.Postpruning
1. Prepruning :-
Prepruning approach, a tree is pruned by halting its construction early.
Example : by deciding not to further split or partition the subset of
training tuples at a given node.
36. The leaf may hold the most frequent class among the subset tuples or
the probability distribution of those tuples.
2. Postpruning :-
which removes subtrees from a “fully grown” tree.
A subtree at a given node is pruned by removing its branches and
replacing it with a leaf.
The leaf is labeled with the most frequent class among the subtree
being replaced.
Example :- An unpruned decision tree and a pruned version of it.
38. • Decision trees can suffer from
– repetition
– replication
• Repetition :-
Repetition occurs when an attribute is repeatedly tested along a given
branch of the tree (e.g., “age < 60?,” followed by “age < 45?,” and so on).
• Replication :-
In replication, duplicate subtrees exist within the tree. These situations
can impede the accuracy and comprehensibility of a decision tree. (e.g., the
subtree headed by the node “credit rating?”).
Fig : (a) Repetition and (b) Replication
39.
40. Bayes Classification Methods
• Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities such as the probability that
a given tuple belongs to a particular class.
• Bayesian classification is based on Bayes’ theorem.
• Algorithms have found a simple Bayesian classifier known as the naive
bayesian classifier.
• Bayesian classifiers have also exhibited high accuracy and speed when
applied to large databases.
41. Bayes’ Theorem
• Let X be a data tuple.
• In Bayesian terms, X is considered “evidence.”
• Let H be some hypothesis such as that the data tuple X belongs to a
specified class C.
• For classification problems, we want to determine P(H/X), from P(H),
P(X/H) and P(X).
• P(H/X) --> P(H/X) is the posterior probability, or a posteriori probability, of
H conditioned on X.
42. Example :- Suppose data tuples is confined to customers described by the
attributes age and income, respectively, and that X is a 35-year-old customer
with an income of $40,000. Suppose that H is the hypothesis that our customer
will buy a computer. Then P(H/X) reflects the probability that customer X will
buy a computer given that we know the customer’s age and income.
P(H) --> prior probability, or a priori probability, of H.
Example :- The probability that any given customer will buy a computer,
regardless of age, income, or any other information.
P(X) --> prior probability, or a priori probability of X.
Example :- The probability that a person from our set of customers is 35 years
old and earns $40,000.
43. Naive Bayesian Classification
• The naive Bayesian classifier, or simple Bayesian classifier, works as follows:
Step 1 :- Let D be a training set of tuples and their associated class labels. As
usual, each tuple is represented by an n-dimensional attribute vector, X = (x1,
x2, : : : , xn), depicting n measurements made on the tuple from n attributes,
respectively, A1, A2, : : : , An.
Step 2 :- Suppose that there are m classes, C1, C2, : : : , Cm. Given a tuple, X,
the classifier will predict that X belongs to the class having the highest
posterior probability, conditioned on X. That is, the naive Bayesian classifier
predicts that tuple X belongs to the class Ci if and only if
44. • Thus, we maximize P(Ci/X). The class Ci for which P(Ci/X) is maximized is
called the maximum posteriori hypothesis. By Bayes’ theorem.
• Step 3 :- As P(X) is constant for all classes, only P(X/Ci) P(Ci) needs to be
maximized.
If the class prior probabilities are not known, then it is commonly
assumed that the classes are equally likely, that is, P(C1) = P(C2) = ........ =
P(Cm), and we would therefore maximize P(X/Ci). Otherwise, we maximize
P(X/Ci) P(Ci). Note that the class prior probabilities may be estimated by
P(Ci) = |Ci, D|/|D|, where |Ci,D| is the number of training tuples of class Ci
in D.
45. • Step 4 :- Given data sets with many attributes, it would be extremely
computationally expensive to compute P(X/Ci). To reduce computation in
evaluating P(X/Ci), the naive assumption of class-conditional independence
is made. This presumes that the attributes ’ values are conditionally
independent of one another, given the class label of the tuple (i.e., that there
are no dependence relationships among the attributes). Thus,
here xk refers to the value of attribute Ak for tuple X.
• To computeP(X/Ci) consider the following
a) If Ak is categorical, then P(xk/Ci) is the number of tuples of class Ci in
D having the value xk for Ak, divided by |Ci,D|, the number of tuples of
class Ci in D.
46. b) If Ak is continuous-valued, then we need to do a bit more work, but the
calculation is pretty straightforward. A continuous-valued attribute is
typically assumed to have a Gaussian distribution with a mean µ and
standard deviation σ, defined by
µCi -- mean (average) and σCi -- standard deviation
• Step 5 :- To predict the class label of X, P(X/Ci) P(Ci) is evaluated for each
class Ci . The classifier predicts that the class label of tuple X is the class Ci
if and only if
• The predicted class label is the class Ci for which P.XjCi/P.Ci/ is the
maximum.
47. Example :-
Predicting a class label using naive Bayesian classification. We wish to
predict the class label of a tuple using naive Bayesian classification, given the
same training data as in customer dataset for decision tree induction. The
training data were shown earlier in Table customer dataset. The data tuples
are described by the attributes age, income, student, and credit rating. The
class label attribute, buys computer, has two distinct values (namely, {yes,
no}). Let C1 correspond to the class buys computer = yes and C2 correspond
to buys computer = no. The tuple we wish to classify is
X = (age = youth, income = medium, student = yes, credit-rating = fair)
We need to maximize P(X/Ci) P(Ci), for i = 1, 2. P(Ci), the prior probability
of each class, can be computed based on the training tuples:
50. Regression
• Regression is a data mining function that predicts a number. Age, weight,
distance, temperature, income, or sales could all be predicted using
regression techniques. For example, a regression model could be used to
predict children's height, given their age, weight, and other factors.
• A regression task begins with a data set in which the target values are
known.
• For example, a regression model that predicts children's height could be
developed based on observed data for many children over a period of time.
The data might track age, height, weight, developmental milestones, family
history, and so on. Height would be the target, the other attributes would be
the predictors, and the data for each child would constitute a case.
• Regression models are tested by computing various statistics that measure
the difference between the predicted values and the expected values.
51. • Regression is for predicting numeric attribute. Regression analysis can be
used to model the relationship between one or more independent.
• Two types of Regression
1. Linear Regression
2. Multiple Regression
1. Linear Regression :-
Simple linear regression is a method that enables you to determine the
relationship between a continuous process output (Y) and one factor (X).
The relationship is typically expressed in terms of a mathematical equation
such as Y = b + mX
Here,
Y -- response, b and m -- constant, x -- predictor variable.
55. Fig for Linear Regression :-
2. Multiple Regression :-
Y= b 0 + b 1 X 1 + b 2 X 2 + .... + b k X k
where Y is the dependent variable (response) and X 1 , X 2 ,.. .,X k are the
independent variables (predictors) and e is random error. b0 , b1 , b2 , .... b k
are known as the regression coefficients, which have to be estimated from the
data.
The multiple linear regression algorithm in XLMiner chooses regression
coefficients so as to minimize the difference between predicted values and
actual values.
57. Model Evaluation and Selection
• In this section you may have built a classification model, here we confuse
that which classifier is the best and accuracy.
• For example, suppose you used data from previous sales to build a classifier
to predict customer purchasing behavior. You would like an estimate of how
accurately the classifier can predict the purchasing behavior of future
customers, that is, future customer data on which the classifier has not been
trained.
• But what is accuracy? How can we estimate it? Are some measures of a
classifier’s accuracy more appropriate than others?How can we obtain a
reliable accuracy estimate?
58. • Describes various evaluation metrics for the predictive accuracy of a classifier.
– Holdout and random subsampling
– cross-validation and bootstrap methods
• These are common techniques for assessing accuracy, based on randomly
sampled partitions of the given data.
• What if we have more than one classifier and want to choose the “best” one?
This is referred to as model selection.
• The last two sections address this issue, and discusses how to use tests of
statistical significance to assess whether the difference in accuracy between two
classifiers is due to chance.
• Techniques to Improve Classification Accuracy presents how to compare
classifiers based on cost–benefit and receiver operating characteristic (ROC)
curves.
59. Metrics for Evaluating Classifier Performance
• This section presents measures for assessing how good or how “accurate”
your classifier is at predicting the class label of tuples.
• We will consider the case of where the class tuples are more or less evenly
distributed, as well as the case where classes are unbalanced.
• They include accuracy (also known as recognition rate), sensitivity (or
recall), specificity, precision, F1, and Fβ.
• Note that although accuracy is a specific measure, the word “accuracy” is
also used as a general term to refer to a classifier’s predictive abilities.
• Before we discuss the various measures, we need to become comfortable
with some terminology. Recall that we can talk in terms of positive tuples
(tuples of the main class of interest) and negative tuples (all other tuples).
• Given two classes, for example, the positive tuples may be buys computer =
yes while the negative tuples are buys computer = no.
60. • Suppose we use our classifier on a test set of labeled tuples. P is the number of positive
tuples and N is the number of negative tuples. For each tuple, we compare the
classifier’s class label prediction with the tuple’s known class label.
• There are four additional terms we need to know that are the “building blocks” used in
computing many evaluation measures. Understanding themwill make it easy to grasp
the meaning of the various measures.
– True positives (TP): These refer to the positive tuples that were correctly labeled by
the classifier. Let TP be the number of true positives.
– True negatives (TN): These are the negative tuples that were correctly labeled by
the classifier. Let TN be the number of true negatives.
– False positives .FP): These are the negative tuples that were incorrectly labeled as
positive (e.g., tuples of class buys computer = no for which the classifier predicted
buys computer = yes). Let FP be the number of false positives.
– False negatives .FN/: These are the positive tuples that were mislabeled as negative
(e.g., tuples of class buys computer = yes for which the classifier predicted buys
computer = no). Let FN be the number of false negatives.
61. confusion matrix
• These terms are summarized in the confusion matrix,
• The confusion matrix is a useful tool for analyzing how well your classifier
can recognize tuples of different classes. TP and TN tell us when the
classifier is getting things right, while FP and FN tell us when the classifier
is getting things wrong
62.
63. • In addition to accuracy-based measures, classifiers can also be compared
with respect to the following additional aspects:
– Speed: This refers to the computational costs involved in generating and
using the given classifier.
– Robustness: This is the ability of the classifier to make correct
predictions given noisy data or data with missing values. Robustness is
typically assessed with a series of synthetic data sets representing
increasing degrees of noise and missing values.
– Scalability: This refers to the ability to construct the classifier efficiently
given large amounts of data. Scalability is typically assessed with a series
of data sets of increasing size.
– Interpretability: Interpretability is subjective and therefore more difficult
to assess. Decision trees and classification rules can be easy to interpret,
yet their interpretability may diminish the more they become complex.
64. Holdout Method and Random Subsampling
Holdout Method
• The holdout method is what we have alluded to so far in our discussions
about accuracy.
• In this method, the given data are randomly partitioned into two
independent sets, a training set and a test set.
• Typically, two-thirds of the data are allocated to the training set, and the
remaining one-third is allocated to the test set.
Random subsampling
• Random subsampling is a variation of the holdout method in which the
holdout method is repeated k times.
• The overall accuracy estimate is taken as the average of the accuracies
obtained from each iteration.
65. Cross-Validation
• In k-fold cross-validation, the initial data are randomly partitioned into k
mutually exclusive subsets or “folds,” D1, D2, : : : , Dk, each of
approximately equal size.
• Training and testing is performed k times. In iteration i, partition Di is
reserved as the test set, and the remaining partitions are collectively used to
train the model.
• In the first iteration, subsets D2, : : : , Dk collectively serve as the training set
to obtain a first model, which is tested on D1;
• The second iteration is trained on subsets D1, D3, : : : , Dk and tested on D2;
and so on.
• For classification, the accuracy estimate is the overall number of correct
classifications from the k iterations, divided by the total number of tuples in
the initial data.
66. Bootstrap
• The bootstrap method samples the given training tuples uniformly with
replacement.
• Each time a tuple is selected, it is equally likely to be selected again and re-added
to the training set.
• For instance, imagine a machine that randomly selects tuples for our training set.
In sampling with replacement, the machine is allowed to select the same tuple
more than once.
• There are several bootstrap methods. A commonly used one is the .632 bootstrap,
which works as follows.
• Suppose we are given a data set of d tuples. The data set is sampled d times, with
replacement, resulting in a bootstrap sample or training set of d samples.
• It is very likely that some of the original data tuples will occur more than once
• in this sample.
67. • ROC Curve Refer text book page no 375
Bagging
• How bagging works as a method of increasing accuracy.
• Suppose that you are a patient and would like to have a diagnosis made
based on your symptoms. Instead of asking one doctor, you may choose to
ask several. If a certain diagnosis occurs more than any other, you may
choose this as the final or best diagnosis.
• That is, the final diagnosis is made based on a majority vote, where each
doctor gets an equal vote. Now replace each doctor by a classifier, and you
have the basic idea behind bagging. Intuitively, a majority vote made by a
large group of doctors may be more reliable than a majority vote made by a
small group.
• Given a set, D, of d tuples, bagging works as follows.
68. Bagging Works
• For iteration i (i =1, 2, : : : , k), a training set, Di , of d tuples is sampled
with replacement from the original set of tuples, D. Note that the term
bagging stands for bootstrap aggregation. Each training set is a bootstrap
sample, as described in boostrap topic.
• Because sampling with replacement is used, some of the original tuples of D
may not be included in Di , whereas others may occur more than once. A
classifier model, Mi , is learned for each training set, Di .
• To classify an unknown tuple, X, each classifier, Mi , returns its class
prediction, which counts as one vote. The bagged classifier, M*, counts the
votes and assigns the class with the most votes to X. Bagging can be applied
to the prediction of continuous values by taking the average value of each
prediction for a given test tuple.
69. Bagging Algorithm
• Algorithm: Bagging. The bagging algorithm—create an ensemble of classification
models for a learning scheme where each model gives an equally weighted
prediction.
• Input:
– D, a set of d training tuples;
– k, the number of models in the ensemble;
– a classification learning scheme (decision tree algorithm, na¨ıve Bayesian,
etc.).
• Output: The ensemble—a composite model, M .
• Method:
– (1) for i D 1 to k do // create k models:
– (2) create bootstrap sample, Di , by sampling D with replacement;
– (3) use Di and the learning scheme to derive a model, Mi ;
– (4) endfor
• To use the ensemble to classify a tuple, X:
• let each of the k models classify X and return the majority vote;
70. Random Forests
• The another ensemble method called randomforests. Imagine that each of the
classifiers in the ensemble is a decision tree classifier so that the collection of
classifiers is a “forest.”
• The individual decision trees are generated using a random selection of attributes at
each node to determine the split.
• Each tree depends on the values of a random vector sampled independently and with
the same distribution for all trees in the forest. During classification, each tree votes
and the most popular class is returned.
• Random forests can be built using bagging in tandem with random attribute selection
• The trees are grown to maximum size and are not pruned. Random forests formed
this way, with random input selection, are called Forest-RI.
• Another form of random forest, called Forest-RC, uses random linear combinations
of the input attributes. Instead of randomly selecting a subset of the attributes, it
creates new attributes (or features) that are a linear combination of the existing
attributes.
71. • Random forests are comparable in accuracy to AdaBoost, yet are more
robust to errors and outliers.
• The generalization error for a forest converges as long as the number of
trees in the forest is large. Thus, overfitting is not a problem.
• The accuracy of a random forest depends on the strength of the individual
classifiers and a measure of the dependence between them.
• Random forests are insensitive to the number of attributes selected for
consideration at each split.
• Random forests give internal estimates of variable importance.