This document discusses classification, which is a type of supervised machine learning where algorithms are used to predict categorical class labels. There is a two-step process: 1) model construction using a training dataset to develop rules or formulas for classification, and 2) model usage to classify new data. Common applications include credit approval, target marketing, medical diagnosis, and treatment effectiveness analysis. The document also covers Bayesian classification, which uses probability distributions over class labels to classify new data instances based on attribute values and their probabilities.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
Random forests are an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode of the classes of the individual trees. It improves upon decision trees by reducing variance. The algorithm works by:
1) Randomly sampling cases and variables to grow each tree.
2) Splitting nodes using the gini index or information gain on the randomly selected variables.
3) Growing each tree fully without pruning.
4) Aggregating the predictions of all trees using a majority vote. This reduces variance compared to a single decision tree.
The document discusses K-means clustering, an unsupervised machine learning algorithm that partitions observations into k clusters defined by centroids. It compares clustering to classification, noting clustering does not use training data and maps observations into natural groupings. The K-means algorithm is then explained, with the steps of initializing centroids, assigning observations to the closest centroid, revising centroids as cluster means, and repeating until convergence. Applications of clustering in business contexts like banking, retail, and insurance are also briefly mentioned.
This document discusses classification and prediction. Classification predicts categorical class labels by classifying data based on a training set and class labels. Prediction models continuous values and predicts unknown values. Some applications are credit approval, marketing, medical diagnosis, and treatment analysis. Classification involves a learning step to describe classes and a classification step to classify new data. Prediction involves estimating accuracy by comparing test results to known labels. Issues with classification and prediction include data preparation, comparing methods, and decision tree induction algorithms.
Meta-learning, also known as learning to learn, is a subset of machine learning that aims to improve the performance of learning algorithms. It does this by using the outputs and metadata from machine learning algorithms as input to optimize aspects of the learning process. This allows meta-learning algorithms to learn which machine learning algorithms work best for certain datasets and prediction tasks. They can then help reduce the number of experiments needed to find high performing models and build models that generalize well from only a few examples.
New! Neo4j AuraDS: The Fastest Way to Get Started with Data Science in the CloudNeo4j
The document discusses Neo4j's new Graph Data Science as a Service (GDSaaS) product called AuraDS. AuraDS provides full access to Neo4j's Graph Data Science platform and algorithms in a fully managed cloud service, allowing users to focus on analytics instead of database administration. It introduces the key capabilities and integration options available through AuraDS.
Introduction to Machine Learning : Machine Learning (ML) is a type of Intelligence (AI) that allows Software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine Learning Algorithms use historical data as input to predict new output values.
This document discusses data preprocessing techniques for data mining. It explains that real-world data is often dirty, containing issues like missing values, noise, and inconsistencies. Major tasks in data preprocessing include data cleaning, integration, transformation, reduction, and discretization. Data cleaning techniques are especially important and involve filling in missing values, identifying and handling outliers, resolving inconsistencies, and reducing redundancy from data integration. Other techniques discussed include binning data for smoothing noisy values and handling missing data through various imputation methods.
This is a presentation about Gradient Boosted Trees which starts from the basics of Data Mining, building up towards Ensemble Methods like Bagging,Boosting etc. and then building towards Gradient Boosted Trees.
Random forests are an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode of the classes of the individual trees. It improves upon decision trees by reducing variance. The algorithm works by:
1) Randomly sampling cases and variables to grow each tree.
2) Splitting nodes using the gini index or information gain on the randomly selected variables.
3) Growing each tree fully without pruning.
4) Aggregating the predictions of all trees using a majority vote. This reduces variance compared to a single decision tree.
The document discusses K-means clustering, an unsupervised machine learning algorithm that partitions observations into k clusters defined by centroids. It compares clustering to classification, noting clustering does not use training data and maps observations into natural groupings. The K-means algorithm is then explained, with the steps of initializing centroids, assigning observations to the closest centroid, revising centroids as cluster means, and repeating until convergence. Applications of clustering in business contexts like banking, retail, and insurance are also briefly mentioned.
This document discusses classification and prediction. Classification predicts categorical class labels by classifying data based on a training set and class labels. Prediction models continuous values and predicts unknown values. Some applications are credit approval, marketing, medical diagnosis, and treatment analysis. Classification involves a learning step to describe classes and a classification step to classify new data. Prediction involves estimating accuracy by comparing test results to known labels. Issues with classification and prediction include data preparation, comparing methods, and decision tree induction algorithms.
Meta-learning, also known as learning to learn, is a subset of machine learning that aims to improve the performance of learning algorithms. It does this by using the outputs and metadata from machine learning algorithms as input to optimize aspects of the learning process. This allows meta-learning algorithms to learn which machine learning algorithms work best for certain datasets and prediction tasks. They can then help reduce the number of experiments needed to find high performing models and build models that generalize well from only a few examples.
New! Neo4j AuraDS: The Fastest Way to Get Started with Data Science in the CloudNeo4j
The document discusses Neo4j's new Graph Data Science as a Service (GDSaaS) product called AuraDS. AuraDS provides full access to Neo4j's Graph Data Science platform and algorithms in a fully managed cloud service, allowing users to focus on analytics instead of database administration. It introduces the key capabilities and integration options available through AuraDS.
Introduction to Machine Learning : Machine Learning (ML) is a type of Intelligence (AI) that allows Software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine Learning Algorithms use historical data as input to predict new output values.
This document discusses data preprocessing techniques for data mining. It explains that real-world data is often dirty, containing issues like missing values, noise, and inconsistencies. Major tasks in data preprocessing include data cleaning, integration, transformation, reduction, and discretization. Data cleaning techniques are especially important and involve filling in missing values, identifying and handling outliers, resolving inconsistencies, and reducing redundancy from data integration. Other techniques discussed include binning data for smoothing noisy values and handling missing data through various imputation methods.
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...Lucidworks
This document discusses using clickstream data to improve search ranking and facet optimization in SOLR. It describes compiling relevance feedback from click data, indexing click signals to SOLR to boost document ranking, and reordering facets based on engagement data learned from clickstream. The approach led to improvements in key metrics like CTR, conversion rates, and sales. Ongoing work includes learning to rank models and discovering relevant facets for queries to better guide users' product discovery.
The document describes the C4.5 algorithm for building decision trees. It begins with an overview of decision trees and the goals of minimizing tree levels and nodes. It then outlines the steps of the C4.5 algorithm: 1) Choose the attribute that best differentiates training instances, 2) Create a tree node for that attribute and child nodes for each value, 3) Recursively create subordinate nodes until reaching criteria or no remaining attributes. An example applies these steps to build a decision tree to predict customers' responses to a life insurance promotion using attributes like age, income and insurance status.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Machine learning involves developing systems that can learn from data and experience. The document discusses several machine learning techniques including decision tree learning, rule induction, case-based reasoning, supervised and unsupervised learning. It also covers representations, learners, critics and applications of machine learning such as improving search engines and developing intelligent tutoring systems.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
This document summarizes research on using data mining techniques to predict heart disease. It discusses previous work using classification, clustering, association rule mining and other techniques on several heart disease datasets. Classification algorithms like naive bayes, decision trees and neural networks have been widely used with naive bayes found to often provide the best performance. Feature selection and attribute reduction are also examined. The document provides an overview of the key steps and techniques in medical data mining and predictive analysis for heart disease.
Handbook of Research on AI and Machine Learning Applications in Customer Supp...IGI Global
In the modern data-driven era, artificial intelligence (AI) and machine learning (ML) technologies that allow a computer to mimic intelligent human behavior are essential for organizations to achieve business excellence and assist organizations in extracting useful information from raw data. AI and ML have existed for decades, but in the age of big data, this sort of analysis is in higher demand than ever, especially for customer support and analytics.
The Handbook of Research on AI and Machine Learning Applications in Customer Support and Analytics investigates the applications of AI and ML and how they can be implemented to enhance customer support and analytics at various levels of organizations. This book is ideal for marketing professionals, managers, business owners, researchers, practitioners, academicians, instructors, university libraries, and students, and covers topics such as artificial intelligence, machine learning, supervised learning, deep learning, customer sentiment analysis, data mining, neural networks, and business analytics.
The many academic areas covered in this publication include, but are not limited to:
Artificial Intelligence
Business Analytics
Business Intelligence
Customer Engagement
Customer Sentiment Analysis
Data Mining
Deep Learning
Machine Learning
Neural Networks
Supervised Learning
Artificial Intelligence (AI) and Machine Learning (ML) have rapidly advanced and revolutionized numer-
ous industries, including customer support and analytics in recent years. These technologies have gained
popularity due to their ability to process vast amounts of data and provide insights that enhance customer
experiences and optimize business operations. Customer-facing businesses have particularly experienced
significant impacts from AI and ML, transforming customer support, analytics, and experience.
This book, titled Handbook of Research on AI and Machine Learning Applications in Customer
Support and Analytics, explores the diverse applications of AI and ML in these domains. In the modern
data-driven era, AI and ML technologies that allow computers to mimic intelligent human behavior are
essential for organizations to achieve business excellence. The ability of AI and ML to extract useful
information from raw data is in high demand than ever, especially for customer support and analytics.
The book investigates the applications of AI and ML and how they can be implemented to enhance
customer support and analytics at various levels of organizations. It covers topics such as artificial in-
telligence, machine learning, supervised learning, customer sentiment analysis, data mining, customer
analytics, optimization strategies, predictive analytics, AI-based product suggestion, query auto-sugges-
tion, and business analytics. This book is ideal for marketing professionals, managers, business owners,
researchers, practitioners, academicians, instructors, university libraries, and students.
This document discusses rule-based classification. It describes how rule-based classification models use if-then rules to classify data. It covers extracting rules from decision trees and directly from training data. Key points include using sequential covering algorithms to iteratively learn rules that each cover positive examples of a class, and measuring rule quality based on both coverage and accuracy to determine the best rules.
Detecting fraud with Python and machine learningwgyn
- Machine learning models are used to detect fraud by estimating the probability of fraud given transaction features.
- Building and updating fraud detection models involves significant work in feature engineering, model training, evaluation, and monitoring in production.
- Debugging a model that was performing poorly revealed an important predictive feature - whether a customer's email address was provided - that improved the model once incorporated.
The document discusses credit card fraud detection. It defines credit card fraud as unauthorized purchases made using someone's credit card or account. Credit card fraud detection models past credit card transactions to identify fraudulent versus legitimate transactions. The model's performance is evaluated based on metrics like true positives, false positives, accuracy, sensitivity, specificity, and precision. The dataset used contains over 284,000 credit card transactions, with variables like amount and time, and a class variable indicating legitimate or fraudulent transactions. An XGBoost model is used for fraud prediction in the user interface. XGBoost is an optimized gradient boosting algorithm that converts weak learners into strong learners through sequential iterations to improve predictions.
This document summarizes a student project that aimed to predict students' final grades based on various demographic and social factors. The students analyzed a dataset of 396 student observations with 33 attributes and used classification algorithms like logistic regression, naive bayes, and k-nearest neighbors. Their key findings were that variables like alcohol consumption did not significantly impact grades, while factors like failures, sex, age, extracurricular activities were statistically significant. The various models tested achieved similar accuracy between 65-70% in predicting student grades.
A start guide to the concepts and algorithms in machine learning, including regression frameworks, ensemble methods, clustering, optimization, and more. Mathematical knowledge is not assumed, and pictures/analogies demonstrate the key concepts behind popular and cutting-edge methods in data analysis.
Updated to include newer algorithms, such as XGBoost, and more geometrically/topologically-based algorithms. Also includes a short overview of time series analysis
The document discusses using correlations to build predictive models and compares the success rates of two algorithms, Algorithm A and Algorithm B, on different user groups. It also notes that while average comment length decreases over time across all users, length increases over time for each yearly cohort. References are provided on causation and spurious correlations.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...Neo4j
Ricardo Antonio Batista, CIO,
Qualicorp Administradora De Beneficios
Atila Ferreira de Resende, IT Manager, Qualicorp
André Luiz Pereira, Neo4j Project Lead, Qualicorp
Eurico Carlos Catule, IT Manager, Qualicorp
Andre Serpa, Vice President, Latin America, Neo4j
The document provides an overview of decision tree learning algorithms:
- Decision trees are a supervised learning method that can represent discrete functions and efficiently process large datasets.
- Basic algorithms like ID3 use a top-down greedy search to build decision trees by selecting attributes that best split the training data at each node.
- The quality of a split is typically measured by metrics like information gain, with the goal of creating pure, homogeneous child nodes.
- Fully grown trees may overfit, so algorithms incorporate a bias toward smaller, simpler trees with informative splits near the root.
Human Activity Recognition (HAR) systems aim to recognize human activities through sensors in order to provide assistance. The key steps in designing a HAR system are:
1) Acquiring sensor data and preprocessing it by removing noise.
2) Segmenting the preprocessed data into windows that may contain activities.
3) Extracting features from each window to reduce the data into discriminative features.
4) Training a classification model on the extracted features to predict activity labels, and evaluating the model's performance using methods like a confusion matrix.
Supercharging your Data with Azure AI Search and Azure OpenAIPeter Gallagher
The slides from my talk - "Supercharging your Data with Azure AI Search and Azure OpenAI" first given at .NET Notts on November 27th.
In this session we will take a look at how we can combine the power of Azure AI Search and OpenAI to allow us to gain insights over our own data.
Using a .NET 8 Blazor app along with SignalR and C#, we'll begin by taking a walk through the Azure OpenAI Service looking at the basics of GenAI, the OpenAI Playground and the .NET SDK.
We'll then take a look at Azure AI Search including; Chunking, Indexes, Vectorisation, Facets, Search and more.
Finally, we'll move on to looking at how we can combine AI Search and OpenAI to supercharge our own data.
This session will appeal to both beginners to Azure OpenAI and AI Search as well as learners wishing to expand their knowledge of these services to further their skillset.
This document provides an introduction and overview of text analytics for SMS spam filtering classification. It discusses the classification of spam and ham SMS, describes the company Sky Bits Technology which focuses on analytics solutions, and performs Porter's Five Forces and SWOT analyses of the analytics industry and company. It also covers basic concepts in text mining such as preprocessing, transformation, feature selection, and classification methods. The objective is to develop a text classification model using R Studio to automatically categorize SMS as spam or ham.
The document discusses machine learning techniques for finding patterns in data. It covers classification algorithms like decision trees and neural networks that can predict outcomes for new data based on patterns learned from training examples. The document also discusses concepts like bias, which refers to the assumptions built into machine learning algorithms that guide their search for patterns and prevent overfitting to noise in the training data. Examples are provided to illustrate classification problems and solutions like rules learned to predict gameplay based on weather conditions.
Integrating Clickstream Data into Solr for Ranking and Dynamic Facet Optimiza...Lucidworks
This document discusses using clickstream data to improve search ranking and facet optimization in SOLR. It describes compiling relevance feedback from click data, indexing click signals to SOLR to boost document ranking, and reordering facets based on engagement data learned from clickstream. The approach led to improvements in key metrics like CTR, conversion rates, and sales. Ongoing work includes learning to rank models and discovering relevant facets for queries to better guide users' product discovery.
The document describes the C4.5 algorithm for building decision trees. It begins with an overview of decision trees and the goals of minimizing tree levels and nodes. It then outlines the steps of the C4.5 algorithm: 1) Choose the attribute that best differentiates training instances, 2) Create a tree node for that attribute and child nodes for each value, 3) Recursively create subordinate nodes until reaching criteria or no remaining attributes. An example applies these steps to build a decision tree to predict customers' responses to a life insurance promotion using attributes like age, income and insurance status.
Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning.
Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.
Machine learning involves developing systems that can learn from data and experience. The document discusses several machine learning techniques including decision tree learning, rule induction, case-based reasoning, supervised and unsupervised learning. It also covers representations, learners, critics and applications of machine learning such as improving search engines and developing intelligent tutoring systems.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
This document summarizes research on using data mining techniques to predict heart disease. It discusses previous work using classification, clustering, association rule mining and other techniques on several heart disease datasets. Classification algorithms like naive bayes, decision trees and neural networks have been widely used with naive bayes found to often provide the best performance. Feature selection and attribute reduction are also examined. The document provides an overview of the key steps and techniques in medical data mining and predictive analysis for heart disease.
Handbook of Research on AI and Machine Learning Applications in Customer Supp...IGI Global
In the modern data-driven era, artificial intelligence (AI) and machine learning (ML) technologies that allow a computer to mimic intelligent human behavior are essential for organizations to achieve business excellence and assist organizations in extracting useful information from raw data. AI and ML have existed for decades, but in the age of big data, this sort of analysis is in higher demand than ever, especially for customer support and analytics.
The Handbook of Research on AI and Machine Learning Applications in Customer Support and Analytics investigates the applications of AI and ML and how they can be implemented to enhance customer support and analytics at various levels of organizations. This book is ideal for marketing professionals, managers, business owners, researchers, practitioners, academicians, instructors, university libraries, and students, and covers topics such as artificial intelligence, machine learning, supervised learning, deep learning, customer sentiment analysis, data mining, neural networks, and business analytics.
The many academic areas covered in this publication include, but are not limited to:
Artificial Intelligence
Business Analytics
Business Intelligence
Customer Engagement
Customer Sentiment Analysis
Data Mining
Deep Learning
Machine Learning
Neural Networks
Supervised Learning
Artificial Intelligence (AI) and Machine Learning (ML) have rapidly advanced and revolutionized numer-
ous industries, including customer support and analytics in recent years. These technologies have gained
popularity due to their ability to process vast amounts of data and provide insights that enhance customer
experiences and optimize business operations. Customer-facing businesses have particularly experienced
significant impacts from AI and ML, transforming customer support, analytics, and experience.
This book, titled Handbook of Research on AI and Machine Learning Applications in Customer
Support and Analytics, explores the diverse applications of AI and ML in these domains. In the modern
data-driven era, AI and ML technologies that allow computers to mimic intelligent human behavior are
essential for organizations to achieve business excellence. The ability of AI and ML to extract useful
information from raw data is in high demand than ever, especially for customer support and analytics.
The book investigates the applications of AI and ML and how they can be implemented to enhance
customer support and analytics at various levels of organizations. It covers topics such as artificial in-
telligence, machine learning, supervised learning, customer sentiment analysis, data mining, customer
analytics, optimization strategies, predictive analytics, AI-based product suggestion, query auto-sugges-
tion, and business analytics. This book is ideal for marketing professionals, managers, business owners,
researchers, practitioners, academicians, instructors, university libraries, and students.
This document discusses rule-based classification. It describes how rule-based classification models use if-then rules to classify data. It covers extracting rules from decision trees and directly from training data. Key points include using sequential covering algorithms to iteratively learn rules that each cover positive examples of a class, and measuring rule quality based on both coverage and accuracy to determine the best rules.
Detecting fraud with Python and machine learningwgyn
- Machine learning models are used to detect fraud by estimating the probability of fraud given transaction features.
- Building and updating fraud detection models involves significant work in feature engineering, model training, evaluation, and monitoring in production.
- Debugging a model that was performing poorly revealed an important predictive feature - whether a customer's email address was provided - that improved the model once incorporated.
The document discusses credit card fraud detection. It defines credit card fraud as unauthorized purchases made using someone's credit card or account. Credit card fraud detection models past credit card transactions to identify fraudulent versus legitimate transactions. The model's performance is evaluated based on metrics like true positives, false positives, accuracy, sensitivity, specificity, and precision. The dataset used contains over 284,000 credit card transactions, with variables like amount and time, and a class variable indicating legitimate or fraudulent transactions. An XGBoost model is used for fraud prediction in the user interface. XGBoost is an optimized gradient boosting algorithm that converts weak learners into strong learners through sequential iterations to improve predictions.
This document summarizes a student project that aimed to predict students' final grades based on various demographic and social factors. The students analyzed a dataset of 396 student observations with 33 attributes and used classification algorithms like logistic regression, naive bayes, and k-nearest neighbors. Their key findings were that variables like alcohol consumption did not significantly impact grades, while factors like failures, sex, age, extracurricular activities were statistically significant. The various models tested achieved similar accuracy between 65-70% in predicting student grades.
A start guide to the concepts and algorithms in machine learning, including regression frameworks, ensemble methods, clustering, optimization, and more. Mathematical knowledge is not assumed, and pictures/analogies demonstrate the key concepts behind popular and cutting-edge methods in data analysis.
Updated to include newer algorithms, such as XGBoost, and more geometrically/topologically-based algorithms. Also includes a short overview of time series analysis
The document discusses using correlations to build predictive models and compares the success rates of two algorithms, Algorithm A and Algorithm B, on different user groups. It also notes that while average comment length decreases over time across all users, length increases over time for each yearly cohort. References are provided on causation and spurious correlations.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
Qualicorp Scales to Millions of Customers and Data Relationships to Provide W...Neo4j
Ricardo Antonio Batista, CIO,
Qualicorp Administradora De Beneficios
Atila Ferreira de Resende, IT Manager, Qualicorp
André Luiz Pereira, Neo4j Project Lead, Qualicorp
Eurico Carlos Catule, IT Manager, Qualicorp
Andre Serpa, Vice President, Latin America, Neo4j
The document provides an overview of decision tree learning algorithms:
- Decision trees are a supervised learning method that can represent discrete functions and efficiently process large datasets.
- Basic algorithms like ID3 use a top-down greedy search to build decision trees by selecting attributes that best split the training data at each node.
- The quality of a split is typically measured by metrics like information gain, with the goal of creating pure, homogeneous child nodes.
- Fully grown trees may overfit, so algorithms incorporate a bias toward smaller, simpler trees with informative splits near the root.
Human Activity Recognition (HAR) systems aim to recognize human activities through sensors in order to provide assistance. The key steps in designing a HAR system are:
1) Acquiring sensor data and preprocessing it by removing noise.
2) Segmenting the preprocessed data into windows that may contain activities.
3) Extracting features from each window to reduce the data into discriminative features.
4) Training a classification model on the extracted features to predict activity labels, and evaluating the model's performance using methods like a confusion matrix.
Supercharging your Data with Azure AI Search and Azure OpenAIPeter Gallagher
The slides from my talk - "Supercharging your Data with Azure AI Search and Azure OpenAI" first given at .NET Notts on November 27th.
In this session we will take a look at how we can combine the power of Azure AI Search and OpenAI to allow us to gain insights over our own data.
Using a .NET 8 Blazor app along with SignalR and C#, we'll begin by taking a walk through the Azure OpenAI Service looking at the basics of GenAI, the OpenAI Playground and the .NET SDK.
We'll then take a look at Azure AI Search including; Chunking, Indexes, Vectorisation, Facets, Search and more.
Finally, we'll move on to looking at how we can combine AI Search and OpenAI to supercharge our own data.
This session will appeal to both beginners to Azure OpenAI and AI Search as well as learners wishing to expand their knowledge of these services to further their skillset.
This document provides an introduction and overview of text analytics for SMS spam filtering classification. It discusses the classification of spam and ham SMS, describes the company Sky Bits Technology which focuses on analytics solutions, and performs Porter's Five Forces and SWOT analyses of the analytics industry and company. It also covers basic concepts in text mining such as preprocessing, transformation, feature selection, and classification methods. The objective is to develop a text classification model using R Studio to automatically categorize SMS as spam or ham.
The document discusses machine learning techniques for finding patterns in data. It covers classification algorithms like decision trees and neural networks that can predict outcomes for new data based on patterns learned from training examples. The document also discusses concepts like bias, which refers to the assumptions built into machine learning algorithms that guide their search for patterns and prevent overfitting to noise in the training data. Examples are provided to illustrate classification problems and solutions like rules learned to predict gameplay based on weather conditions.
The document provides an overview of concepts related to computing for bioinformatics including machine learning, data mining, knowledge discovery, statistics, databases, and data visualization. It discusses techniques like classification, clustering, association rule mining, and anomaly detection. It also presents examples of applying these techniques to problems like weather prediction, contact lens recommendation, and soybean disease diagnosis.
The document discusses the classification and regression tree (CART) algorithm. It provides details on how CART builds decision trees using a greedy algorithm that recursively splits nodes based on thresholds of predictor variables. CART uses the Gini index criterion to find the optimal splits that result in homogenous subsets. An example is provided to demonstrate how CART constructs a decision tree to classify examples based on various predictor variables like outlook, temperature, humidity, and wind.
The document discusses machine learning concepts related to classification, including linear regression, decision trees, and neural networks. It provides an example of using weather data to classify whether a game will be played or not based on attributes like temperature and humidity. Rules are generated to make the classification based on patterns in the data.
The document discusses machine learning techniques for finding patterns in data and using those patterns to make predictions. It covers topics like classification algorithms, decision trees, neural networks, learning as a search process, and how machine learning systems use bias to avoid overfitting training data. Examples are provided on classifying weather data to determine if a baseball game should be played, classifying iris flowers, predicting CPU performance, and diagnosing soybean diseases.
Decision tree in artificial intelligenceMdAlAmin187
The document presents an overview of decision trees, including what they are, common algorithms like ID3 and C4.5, types of decision trees, and how to construct a decision tree using the ID3 algorithm. It provides an example applying ID3 to a sample dataset about determining whether to go out based on weather conditions. Key advantages of decision trees are that they are simple to understand, can handle both numerical and categorical data, and closely mirror human decision making. Limitations include potential for overfitting and lower accuracy compared to other models.
What is the Covering (Rule-based) algorithm?
Classification Rules- Straightforward
1. If-Then rule
2. Generating rules from Decision Tree
Rule-based Algorithm
1. The 1R Algorithm / Learn One Rule
2. The PRISM Algorithm
3. Other Algorithm
Application of Covering algorithm
Discussion on e/m-learning application
Data mining involves using algorithms to find patterns in large datasets. It is commonly used in market research to perform tasks like classification, prediction, and association rule mining. The document discusses several common data mining techniques like decision trees, naive Bayes classification, and regression trees. It also covers related topics like cross-validation, bagging, and boosting methods used for improving model performance.
Data mining involves using algorithms to find patterns in large datasets. It is commonly used in market research to perform tasks like classification, prediction, and association rule mining. The document discusses several common data mining techniques like decision trees, naive Bayes classification, and regression trees. It also covers related topics like cross-validation, bagging, and boosting methods used for improving model performance.
Data mining involves using algorithms to find patterns in large datasets. It is commonly used in market research to perform tasks like classification, prediction, and association rule mining. The document discusses several common data mining techniques like decision trees, naive Bayes classification, and regression trees. It also covers related topics like cross-validation, bagging, and boosting methods used for improving model performance.
The document describes the process of constructing decision trees. It begins with an example weather dataset and shows how to build a decision tree to predict whether to play or not based on attributes like outlook, temperature, etc. It then discusses the key steps in constructing decision trees which include selecting the best attribute to split on at each node based on information gain. It also discusses overfitting and the need for tree pruning. The document provides formulas to calculate information gain and discusses strategies like using a chi-squared test to select statistically robust splits during tree construction.
This document discusses classification, which involves using a training dataset to build a model that can predict the class of new data. It provides an example classification dataset on weather conditions and whether an outdoor activity was held. The document explains that classification involves a two-step process of model construction using a training set, and then model usage to classify future test data and estimate the accuracy of the predictions. An example classification process is described where attributes of employees are used to build a model to predict whether someone is tenured based on their rank and years of experience.
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
Concepts include decision tree with its examples. Measures used for splitting in decision tree like gini index, entropy, information gain, pros and cons, validation. Basics of random forests with its example and uses.
Naive Bayes is a simple supervised machine learning algorithm used for classification. It is based on Bayes' theorem and works by calculating the probability of a data point belonging to a particular class. The algorithm was applied to a weather prediction problem using humidity, temperature, and wind speed data to predict if the weather will be sunny or rainy. A frequency table and likelihood table were created from the sample data and Bayes' theorem was used to calculate the conditional probabilities and predict the weather. The key advantages of Naive Bayes are that it requires little training data, is fast to predict outcomes, and handles both continuous and discrete data. It can be used for problems like text classification, spam filtering, and recommending products online.
Machine learning algorithms can learn through supervised, unsupervised, or reinforcement learning. Supervised learning involves providing labeled examples to learn a function that maps inputs to outputs. Unsupervised learning identifies hidden patterns in unlabeled data. Reinforcement learning involves an agent learning through trial-and-error interactions with a dynamic environment. Machine learning has applications in areas like computer vision, natural language processing, medical diagnosis, and more.
1) The 1R algorithm generates a one-level decision tree by considering each attribute individually and creating branches for each attribute value. It assigns the majority class to each branch and chooses the attribute with the minimum error.
2) Naive Bayes classification assumes attributes are independent and calculates the probability of each class using Bayes' theorem. It handles missing and numeric attributes.
3) Decision tree algorithms like ID3 use a divide-and-conquer approach, selecting the attribute that maximizes information gain at each node to create branches. Gain ratio addresses issues with highly branched attributes.
1) The 1R algorithm generates a one-level decision tree by considering each attribute individually and assigning the majority class to each branch. It chooses the attribute with the minimum classification error.
2) Naive Bayes classification assumes attributes are independent and calculates the probability of each class using Bayes' rule. It handles missing and numeric attributes.
3) Decision tree algorithms like ID3 use a divide-and-conquer approach, recursively splitting the data on attributes that maximize information gain or gain ratio at each node.
4) Rule-based algorithms like PRISM generate rules to cover instances of each class sequentially, maximizing the ratio of correctly covered to total covered instances at each step.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. 2
Classification:
predicts categorical class labels
classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
Typical Applications
credit approval
target marketing
medical diagnosis
treatment effectiveness analysis
Classification
3. 3
Classification—A Two-Step Process
Model construction: describing a set of predetermined classes
Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
The set of tuples used for model construction: training set
The model is represented as classification rules, decision trees,
or mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
The known label of test sample is compared with the
classified result from the model
Accuracy rate is the percentage of test set samples that are
correctly classified by the model
Test set is independent of training set, otherwise over-fitting
will occur
4. 4
Classification Process (1): Model
Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
5. 5
Classification Process (2): Use the
Model in Prediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
6. 6
Supervised vs. Unsupervised
Learning
Supervised learning (classification)
Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
7. 7
Issues regarding classification and
prediction (1): Data Preparation
Data cleaning
Preprocess data in order to reduce noise and handle
missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation
Generalize and/or normalize data
8. 8
Issues regarding classification and prediction
(2): Evaluating Classification Methods
Predictive accuracy
Speed and scalability
time to construct the model
time to use the model
Robustness
handling noise and missing values
Scalability
efficiency in disk-resident databases
Interpretability:
understanding and insight provded by the model
Goodness of rules
decision tree size
compactness of classification rules
9. 9
Simplicity first: 1R
Simple algorithms often work very well!
There are many kinds of simple structure, eg:
One attribute does all the work
All attributes contribute equally & independently
A weighted linear combination might do
Instance-based: use a few prototypes
Use simple logical rules
Success of method depends on the domain
10. 10
Inferring rudimentary rules
1R: learns a 1-level decision tree
I.e., rules that all test one particular attribute
Basic version
One branch for each value
Each branch assigns most frequent class
Error rate: proportion of instances that don’t belong to the
majority class of their corresponding branch
Choose attribute with lowest error rate
(assumes nominal attributes)
11. 11
Pseudo-code for 1R
For each attribute,
For each value of the attribute, make a rule as follows:
count how often each class appears
find the most frequent class
make the rule assign that class to this attribute-value
Calculate the error rate of the rules
Choose the rules with the smallest error rate
Note: “missing” is treated as a separate attribute value
12. 12
Evaluating the weather attributes
Attribute Rules Errors Total
errors
Outlook Sunny No 2/5 4/14
Overcast Yes 0/4
Rainy Yes 2/5
Temp Hot No* 2/4 5/14
Mild Yes 2/6
Cool Yes 1/4
Humidity High No 3/7 4/14
Normal Yes 1/7
Windy False Yes 2/8 5/14
True No* 3/6
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
* indicates a tie
15. 15
Dealing with
numeric attributes
Discretize numeric attributes
Divide each attribute’s range into intervals
Sort instances according to attribute’s values
Place breakpoints where the class changes
(the majority class)
This minimizes the total error
Example: temperature from weather data
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes | No | Yes Yes Yes | No No No| Yes Yes | No | Yes Yes | No
Outlook Temperat
ure
Humidity Windy Play
Sunny 85 85 False No
Sunny 80 90 True No
Overcast 83 86 False Yes
Rainy 75 80 False Yes
… … … … …
16. 16
The problem of overfitting
This procedure is very sensitive to noise
One instance with an incorrect class label will probably
produce a separate interval
Simple solution:
enforce minimum number of instances in majority
class per interval
17. 17
Discretization example
Example (with min = 3):
Final result for temperature attribute
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No
18. 18
With overfitting avoidance
Resulting rule set:
Attribute Rules Errors Total errors
Outlook Sunny No 2/5 4/14
Overcast Yes 0/4
Rainy Yes 2/5
Temperature 77.5 Yes 3/10 5/14
> 77.5 No* 2/4
Humidity 82.5 Yes 1/7 3/14
> 82.5 and 95.5 No 2/6
> 95.5 Yes 0/1
Windy False Yes 2/8 5/14
True No* 3/6
19. 20
Bayesian (Statistical) modeling
“Opposite” of 1R: use all the attributes
Two assumptions: Attributes are
equally important
statistically independent (given the class value)
I.e., knowing the value of one attribute says nothing about
the value of another
(if the class is known)
Independence assumption is almost never correct!
But … this scheme works well in practice
20. 21
Probabilities for weather data
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5 Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
21. 22
Probabilities for weather data
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
A new day:
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5
22. 23
Bayes’s rule
Probability of event H given evidence E :
A priori probability of H :
Probability of event before evidence is seen
A posteriori probability of H :
Probability of event after evidence is seen
]
Pr[
]
Pr[
]
|
Pr[
]
|
Pr[
E
H
H
E
E
H
]
|
Pr[ E
H
]
Pr[H
Thomas Bayes
Born: 1702 in London, England
Died: 1761 in Tunbridge Wells, Kent, England
from Bayes “Essay towards solving a problem in the
doctrine of chances” (1763)
23. 24
Naïve Bayes for classification
Classification learning: what’s the probability of the
class given an instance?
Evidence E = instance
Event H = class value for instance
Naïve assumption: evidence splits into parts (i.e.
attributes) that are independent
]
Pr[
]
Pr[
]
|
Pr[
]
|
Pr[
]
|
Pr[
]
|
Pr[ 2
1
E
H
H
E
H
E
H
E
E
H n
24. 25
Weather data example
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
Evidence E
Probability of
class “yes”
]
|
Pr[
]
|
Pr[ yes
Sunny
Outlook
E
yes
]
|
Pr[ yes
Cool
e
Temperatur
]
|
Pr[ yes
High
Humidity
]
|
Pr[ yes
True
Windy
]
Pr[
]
Pr[
E
yes
]
Pr[
14
9
9
3
9
3
9
3
9
2
E
25. 26
Probabilities for weather data
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
A new day: Likelihood of the two classes
For “yes” = 2/9 3/9 3/9 3/9 9/14 = 0.0053
For “no” = 3/5 1/5 4/5 3/5 5/14 = 0.0206
Conversion into a probability by normalization:
P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205
P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5
26. 27
The “zero-frequency problem”
What if an attribute value doesn’t occur with every class
value?
(e.g. “Humidity = high” for class “yes”)
Probability will be zero!
A posteriori probability will also be zero!
(No matter how likely the other values are!)
Remedy: add 1 to the count for every attribute value-class
combination (Laplace estimator)
Result: probabilities will never be zero!
(also: stabilizes probability estimates)
0
]
|
Pr[
E
yes
0
]
|
Pr[
yes
High
Humidity
27. 28
*Modified probability estimates
In some cases adding a constant different from 1
might be more appropriate
Example: attribute outlook for class yes
Weights don’t need to be equal
(but they must sum to 1)
9
3
/
2
9
3
/
4
9
3
/
3
Sunny Overcast Rainy
9
2 1
p
9
4 2
p
9
3 3
p
28. 29
Missing values
Training: instance is not included in
frequency count for attribute value-class
combination
Classification: attribute will be omitted from
calculation
Example: Outlook Temp. Humidity Windy Play
? Cool High True ?
Likelihood of “yes” = 3/9 3/9 3/9 9/14 = 0.0238
Likelihood of “no” = 1/5 4/5 3/5 5/14 = 0.0343
P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41%
P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%
29. 30
Numeric attributes
Usual assumption: attributes have a normal or
Gaussian probability distribution (given the class)
The probability density function for the normal
distribution is defined by two parameters:
Sample mean
Standard deviation
Then the density function f(x) is
n
i
i
x
n 1
1
n
i
i
x
n 1
2
)
(
1
1
2
2
2
)
(
2
1
)
(
x
e
x
f Karl Gauss, 1777-1855
great German
mathematician
30. 31
Statistics for
weather data
Example density value:
0340
.
0
2
.
6
2
1
)
|
66
(
2
2
2
.
6
2
)
73
66
(
e
yes
e
temperatur
f
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 64, 68, 65, 71, 65, 70, 70, 85, False 6 2 9 5
Overcast 4 0 69, 70, 72, 80, 70, 75, 90, 91, True 3 3
Rainy 3 2 72, … 85, … 80, … 95, …
Sunny 2/9 3/5 =73 =75 =79 =86 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 =6.2 =7.9 =10.2 =9.7 True 3/9 3/5
Rainy 3/9 2/5
31. 32
Classifying a new day
A new day:
Missing values during training are not included in
calculation of mean and standard deviation
Outlook Temp. Humidity Windy Play
Sunny 66 90 true ?
Likelihood of “yes” = 2/9 0.0340 0.0221 3/9 9/14 = 0.000036
Likelihood of “no” = 3/5 0.0291 0.0380 3/5 5/14 = 0.000136
P(“yes”) = 0.000036 / (0.000036 + 0. 000136) = 20.9%
P(“no”) = 0.000136 / (0.000036 + 0. 000136) = 79.1%
32. 33
Naïve Bayes: discussion
Naïve Bayes works surprisingly well (even if
independence assumption is clearly violated)
Why? Because classification doesn’t require
accurate probability estimates as long as
maximum probability is assigned to correct
class
However: adding too many redundant
attributes will cause problems (e.g. identical
attributes)
Note also: many numeric attributes are not
normally distributed .
33. Naïve Bayes Extensions
Improvements:
select best attributes (e.g. with greedy
search)
often works as well or better with just a
fraction of all attributes
Bayesian Networks
34. 35
Summary
OneR – uses rules based on just one attribute
Naïve Bayes – use all attributes and Bayes rules
to estimate probability of the class given an
instance.
Simple methods frequently work well, but …
Complex methods can be better (as we will
see)
36. 37
Decision Tree for PlayTennis
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
Yes
No
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
37. 38
Decision Tree for PlayTennis
Outlook
Sunny Overcast Rain
Humidity
High Normal
No Yes
Each internal node tests an attribute
Each branch corresponds to an
attribute value node
Each leaf node assigns a classification
38. 39
No
Decision Tree for PlayTennis
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
Yes
No
Outlook Temperature Humidity Wind PlayTennis
Sunny Hot High Weak ?
39. 40
Decision Tree for Conjunction
Outlook
Sunny Overcast Rain
Wind
Strong Weak
No Yes
No
Outlook=Sunny Wind=Weak
No
40. 41
Decision Tree for Disjunction
Outlook
Sunny Overcast Rain
Yes
Outlook=Sunny Wind=Weak
Wind
Strong Weak
No Yes
Wind
Strong Weak
No Yes
41. 42
Decision Tree
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
Yes
No
• decision trees represent disjunctions of conjunctions
(Outlook=Sunny Humidity=Normal)
(Outlook=Overcast)
(Outlook=Rain Wind=Weak)
42. 43
When to consider Decision Trees
Instances describable by attribute-value pairs
Target function is discrete valued
Disjunctive hypothesis may be required
Possibly noisy training data
Missing attribute values
Examples:
Medical diagnosis
Credit risk analysis
Object classification for robot manipulator
43. 44
Motivation # 1: Analysis Tool
•Suppose that a company have a data base of sales
data, lots of sales data
•How can that company’s CEO use this data to figure
out an effective sales strategy
•Safeway, Giant, etc cards: what is that for?
44. 45
Motivation # 1: Analysis Tool
(cont’d)
Ex’ple Bar Fri Hun Pat Type Res wait
x1 no no yes some french yes yes
x4 no yes yes full thai no yes
x5 no yes no full french yes no
x6
x7
x8
x9
x10
x11
Sales data
“if buyer is male & and age between 24-35 & married
then he buys sport magazines”
induction
Decision Tree
45. 46
Motivation # 1: Analysis Tool
(cont’d)
•Decision trees has been frequently used in IDSS
•Some companies:
•SGI: provides tools for decision tree visualization
•Acknosoft (France), Tech:Inno (Germany):
combine decision trees with CBR technology
•Several applications
•Decision trees are used for Data Mining
46. 47
Parenthesis: Expert Systems
•Have been used in :
medicine
oil and mineral exploration
weather forecasting
stock market predictions
financial credit, fault analysis
some complex control systems
•Two components:
Knowledge Base
Inference Engine
47. 48
The Knowledge Base in Expert Systems
A knowledge base consists of a collection of IF-THEN
rules:
if buyer is male & age between 24-50 & married
then he buys sport magazines
if buyer is male & age between 18-30
then he buys PC games magazines
Knowledge bases of fielded expert systems contain
hundreds and sometimes even thousands such rules.
Frequently rules are contradictory and/or overlap
48. 49
The Inference Engine in Expert Systems
The inference engine reasons on the rules in the
knowledge base and the facts of the current problem
Typically the inference engine will contain policies to
deal with conflicts, such as “select the most specific
rule in case of conflict”
Some expert systems incorporate probabilistic
reasoning, particularly those doing predictions
49. 50
Expert Systems: Some Examples
MYCIN. It encodes expert knowledge to identify
kinds of bacterial infections. Contains 500 rules and
use some form of uncertain reasoning
DENDRAL. Identifies interpret mass spectra on
organic chemical compounds
MOLGEN. Plans gene-cloning experiments in
laboratories.
XCON. Used by DEC to configure, or set up, VAX
computers. Contained 2500 rules and could handle
computer system setups involving 100-200 modules.
50. 51
Main Drawback of Expert Systems: The
Knowledge Acquisition Bottle-Neck
The main problem of expert systems is acquiring
knowledge from human specialist is a difficult,
cumbersome and long activity.
Name KB #Rules Const. time
(man/years)
Maint. time
(man/months)
MYCIN KA 500 10 N/A
XCON KA 2500 18 3
KB = Knowledge Base
KA = Knowledge Acquisition
51. 52
Motivation # 2: Avoid Knowledge
Acquisition Bottle-Neck
•GASOIL is an expert system for designing gas/oil separation
systems stationed of-shore
•The design depends on multiple factors including:
proportions of gas, oil and water, flow rate, pressure, density, viscosity,
temperature and others
•To build that system by hand would had taken 10 person years
•It took only 3 person-months by using inductive learning!
•GASOIL saved BP millions of dollars
52. 53
Motivation # 2 : Avoid Knowledge
Acquisition Bottle-Neck
Name KB #Rules Const. time
(man/years)
Maint. time
(man/months)
MYCIN KA 500 10 N/A
XCON KA 2500 18 3
GASOIL IDT 2800 1 0.1
BMT KA
(IDT)
30000+ 9 (0.3) 2 (0.1)
KB = Knowledge Base
KA = Knowledge Acquisition
IDT = Induced Decision Trees
53. 54
Training Examples
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
54. 55
Temp
Cool Mild Hot
Wind
Strong Weak
No
Outlook
Outlook
Sunny
Overcast
Yes
Yes
Wind
Strong Weak
Yes
No
Yes
No
Humidity
High
Normal
Outlook
Sunny Overcast
Yes
Tree 1
Yes
No
Humidity
High
Normal
Outlook
Sunny Overcast
Yes
Sunny Rain
Rain
56. 57
Top-Down Induction of Decision
Trees ID3
1. A the “best” decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A create new descendant
4. Sort training examples to leaf node according to
the attribute value of the branch
5. If all training examples are perfectly classified
(same value of target attribute) stop, else
iterate over new leaf nodes.
58. 59
Entropy
S is a sample of training examples
p+ is the proportion of positive examples
p- is the proportion of negative examples
Entropy measures the impurity of S
Entropy(S) = -p+ log2 p+ - p- log2 p-
59. 60
Information Gain (ID3/C4.5)
Select the attribute with the highest information gain
Assume there are two classes, P and N
Let the set of examples S contain p elements of class P
and n elements of class N
The amount of information, needed to decide if an
arbitrary example in S belongs to P or N is defined as
n
p
n
n
p
n
n
p
p
n
p
p
n
p
I
2
2 log
log
)
,
(
60. 61
Information Gain in Decision
Tree Induction
Assume that using attribute A a set S will be partitioned
into sets {S1, S2 , …, Sv}
If Si contains pi examples of P and ni examples of N,
the entropy, or the expected information needed to
classify objects in all subtrees Si is
The encoding information that would be gained by
branching on A
1
)
,
(
)
(
i
i
i
i
i
n
p
I
n
p
n
p
A
E
)
(
)
,
(
)
( A
E
n
p
I
A
Gain
61. 62
Information Gain
Gain(S,A): expected reduction in entropy due to sorting S on
attribute A
A1=?
True False
[21+, 5-] [8+, 30-]
[29+,35-] A2=?
True False
[18+, 33-] [11+, 2-]
[29+,35-]
Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)
Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64
= 0.99
63. 64
Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
64. 65
Attribute Selection by Information
Gain Computation
Class P: buys_computer =
“yes”
Class N: buys_computer =
“no”
I(p, n) = I(9, 5) =0.940
Compute the entropy for age:
Hence
Similarly
age pi ni I(pi, ni)
<=30 2 3 0.971
30…40 4 0 0
>40 3 2 0.971
69
.
0
)
2
,
3
(
14
5
)
0
,
4
(
14
4
)
3
,
2
(
14
5
)
(
I
I
I
age
E
048
.
0
)
_
(
151
.
0
)
(
029
.
0
)
(
rating
credit
Gain
student
Gain
income
Gain
)
(
)
,
(
)
( age
E
n
p
I
age
Gain
65. 66
Output: A Decision Tree for “buys_computer”
age?
overcast
student? credit rating?
no yes fair
excellent
<=30 >40
no no
yes yes
yes
30..40
66. 67
Training Examples
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
67. 68
Selecting the Next Attribute
Humidity
High Normal
[3+, 4-] [6+, 1-]
S=[9+,5-]
E=0.940
Gain(S,Humidity)
=0.940-(7/14)*0.985
– (7/14)*0.592
=0.151
E=0.985 E=0.592
Wind
Weak Strong
[6+, 2-] [3+, 3-]
S=[9+,5-]
E=0.940
E=0.811 E=1.0
Gain(S,Wind)
=0.940-(8/14)*0.811
– (6/14)*1.0
=0.048
68. 69
Selecting the Next Attribute
Outlook
Sunny Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]
E=0.940
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
E=0.971 E=0.971
Over
cast
[4+, 0]
E=0.0
Temp ?
70. 71
ID3 Algorithm
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
Yes
No
[D3,D7,D12,D13]
[D8,D9,D11] [D6,D14]
[D1,D2] [D4,D5,D10]
71. 72
Converting a Tree to Rules
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
Yes
No
R1: If (Outlook=Sunny) (Humidity=High) Then PlayTennis=No
R2: If (Outlook=Sunny) (Humidity=Normal) Then PlayTennis=Yes
R3: If (Outlook=Overcast) Then PlayTennis=Yes
R4: If (Outlook=Rain) (Wind=Strong) Then PlayTennis=No
R5: If (Outlook=Rain) (Wind=Weak) Then PlayTennis=Yes
72. 73
Continuous Valued Attributes
Create a discrete attribute to test continuous
Temperature = 24.50C
(Temperature > 20.00C) = {true, false}
Where to set the threshold?
Temperature 150C 180C 190C 220C 240C 270C
PlayTennis No No Yes Yes Yes No
73. 74
Attributes with many Values
Problem: if an attribute has many values, maximizing InformationGain
will select it.
E.g.: Imagine using Date=12.7.1996 as attribute
perfectly splits the data into subsets of size 1
Use GainRatio instead of information gain as criteria:
GainRatio(S,A) = Gain(S,A) / SplitInformation(S,A)
SplitInformation(S,A) = -i=1..c |Si|/|S| log2 |Si|/|S|
Where Si is the subset for which attribute A has the value vi
74. 75
Attributes with Cost
Consider:
Medical diagnosis : blood test costs 1000 SEK
Robotics: width_from_one_feet has cost 23 secs.
How to learn a consistent tree with low expected
cost?
Replace Gain by :
Gain2(S,A)/Cost(A) [Tan, Schimmer 1990]
2Gain(S,A)-1/(Cost(A)+1)w w [0,1] [Nunez 1988]
75. 76
Unknown Attribute Values
What if examples are missing values of A?
Use training example anyway sort through tree
If node n tests A, assign most common value of A among other
examples sorted to node n.
Assign most common value of A among other examples with same
target value
Assign probability pi to each possible value vi of A
Assign fraction pi of example to each
descendant in tree
Classify new examples in the same fashion
79. 80
Neural Networks
Advantages
prediction accuracy is generally high
robust, works when training examples contain errors
output may be discrete, real-valued, or a vector of
several discrete or real-valued attributes
fast evaluation of the learned target function
Criticism
long training time
difficult to understand the learned function (weights)
not easy to incorporate domain knowledge
80. 81
A Neuron
The n-dimensional input vector x is mapped into
variable y by means of the scalar product and a
nonlinear function mapping
k
-
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w
w0
w1
wn
x0
x1
xn
81. Network Training
The ultimate objective of training
obtain a set of weights that makes almost all the
tuples in the training data classified correctly
Steps
Initialize weights with random values
Feed the input tuples into the network one by one
For each unit
Compute the net input to the unit as a linear combination
of all the inputs to the unit
Compute the output value using the activation function
Compute the error
Update the weights and the bias
82
82. Multi-Layer Perceptron
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector: xi
wij
i
j
i
ij
j O
w
I
j
I
j
e
O
1
1
)
)(
1
( j
j
j
j
j O
T
O
O
Err
jk
k
k
j
j
j w
Err
O
O
Err
)
1
(
i
j
ij
ij O
Err
l
w
w )
(
j
j
j Err
l)
(
83
83. 85
Other Classification Methods
k-nearest neighbor classifier
case-based reasoning
Genetic algorithm
Rough set approach
Fuzzy set approaches
84. 86
Instance-Based Methods
Instance-based learning:
Store training examples and delay the processing
(“lazy evaluation”) until a new instance must be
classified
Typical approaches
k-nearest neighbor approach
Instances represented as points in a Euclidean
space.
Locally weighted regression
Constructs local approximation
Case-based reasoning
Uses symbolic representations and knowledge-
based inference
85. 87
The k-Nearest Neighbor Algorithm
All instances correspond to points in the n-D space.
The nearest neighbor are defined in terms of
Euclidean distance.
The target function could be discrete- or real- valued.
For discrete-valued, the k-NN returns the most
common value among the k training examples nearest
to xq.
Vonoroi diagram: the decision surface induced by 1-
NN for a typical set of training examples.
.
_
+
_ xq
+
_ _
+
_
_
+
.
.
.
. .
86. 88
Discussion on the k-NN Algorithm
The k-NN algorithm for continuous-valued target functions
Calculate the mean values of the k nearest neighbors
Distance-weighted nearest neighbor algorithm
Weight the contribution of each of the k neighbors
according to their distance to the query point xq
giving greater weight to closer neighbors
Similarly, for real-valued target functions
Robust to noisy data by averaging k-nearest neighbors
Curse of dimensionality: distance between neighbors could
be dominated by irrelevant attributes.
To overcome it, axes stretch or elimination of the least
relevant attributes.
w
d xq xi
1
2
( , )
87. 89
Case-Based Reasoning
Also uses: lazy evaluation + analyze similar instances
Difference: Instances are not “points in a Euclidean space”
Example: Water faucet problem in CADET (Sycara et al’92)
Methodology
Instances represented by rich symbolic descriptions
(e.g., function graphs)
Multiple retrieved cases may be combined
Tight coupling between case retrieval, knowledge-based
reasoning, and problem solving
Research issues
Indexing based on syntactic similarity measure, and
when failure, backtracking, and adapting to additional
cases
88. 90
Remarks on Lazy vs. Eager Learning
Instance-based learning: lazy evaluation
Decision-tree and Bayesian classification: eager evaluation
Key differences
Lazy method may consider query instance xq when deciding how to
generalize beyond the training data D
Eager method cannot since they have already chosen global
approximation when seeing the query
Efficiency: Lazy - less time training but more time predicting
Accuracy
Lazy method effectively uses a richer hypothesis space since it uses
many local linear functions to form its implicit global approximation
to the target function
Eager: must commit to a single hypothesis that covers the entire
instance space
89. 91
Genetic Algorithms
GA: based on an analogy to biological evolution
Each rule is represented by a string of bits
An initial population is created consisting of randomly
generated rules
e.g., IF A1 and Not A2 then C2 can be encoded as 100
Based on the notion of survival of the fittest, a new
population is formed to consists of the fittest rules and
their offsprings
The fitness of a rule is represented by its classification
accuracy on a set of training examples
Offsprings are generated by crossover and mutation
90. 92
Rough Set Approach
Rough sets are used to approximately or “roughly”
define equivalent classes
A rough set for a given class C is approximated by two
sets: a lower approximation (certain to be in C) and an
upper approximation (cannot be described as not
belonging to C)
Finding the minimal subsets (reducts) of attributes (for
feature reduction) is NP-hard but a discernibility matrix
is used to reduce the computation intensity
91. 93
Fuzzy Set
Approaches
Fuzzy logic uses truth values between 0.0 and 1.0 to
represent the degree of membership (such as using
fuzzy membership graph)
Attribute values are converted to fuzzy values
e.g., income is mapped into the discrete categories
{low, medium, high} with fuzzy values calculated
For a given new sample, more than one fuzzy value may
apply
Each applicable rule contributes a vote for membership
in the categories
Typically, the truth values for each predicted category
are summed