From the 2017 HPCC Systems Community Day:
Roger Dev gives an update on the new algorithms and other innovative features in our machine learning library.
Roger Dev
Senior Architect, LexisNexis Risk Solutions
Roger is a Senior Architect working with John Holt on the Machine Learning Team. He recently joined HPCC Systems from CA Technologies. Roger has been involved in the implementation and utilization of machine learning and AI techniques for many years, and has over 20 patents in diverse areas of software technology.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
Presentation for my tutorial at Big Data Tech Con http://goo.gl/ZRoFHi
This is the R version of my pycon tutorial + a few updates
It is work in progress. I will update with daily snapshot until done.
Immersive Recommendation incorporates cross-platform and diverse personal digital traces into recommendations. Our context-aware topic modeling algorithm systematically profiles users' interests based on their traces from different contexts, and our hybrid recommendation algorithm makes high-quality recommendations by fusing users' personal profiles, item profiles, and existing ratings. The proposed model showed significant improvement over the state-of-the-art algorithms, suggesting the value of using this new user-centric recommendation model to improve recommendation quality, including in cold-start situations.
From the 2017 HPCC Systems Community Day:
Roger Dev gives an update on the new algorithms and other innovative features in our machine learning library.
Roger Dev
Senior Architect, LexisNexis Risk Solutions
Roger is a Senior Architect working with John Holt on the Machine Learning Team. He recently joined HPCC Systems from CA Technologies. Roger has been involved in the implementation and utilization of machine learning and AI techniques for many years, and has over 20 patents in diverse areas of software technology.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
Presentation for my tutorial at Big Data Tech Con http://goo.gl/ZRoFHi
This is the R version of my pycon tutorial + a few updates
It is work in progress. I will update with daily snapshot until done.
Immersive Recommendation incorporates cross-platform and diverse personal digital traces into recommendations. Our context-aware topic modeling algorithm systematically profiles users' interests based on their traces from different contexts, and our hybrid recommendation algorithm makes high-quality recommendations by fusing users' personal profiles, item profiles, and existing ratings. The proposed model showed significant improvement over the state-of-the-art algorithms, suggesting the value of using this new user-centric recommendation model to improve recommendation quality, including in cold-start situations.
Future of AI-powered automation in businessLouis Dorard
Starting from examples of current use cases of AI in business and in everyday life, we'll see what the future holds and we'll mention questions to address when giving autonomy to intelligent machines. We'll also aim at demystifying how AI works, in particular how machines can use data to automatically learn business rules and actions to perform in different contexts.
Writing Machine Learning code is now possible with .NET native library ML.NET that has recently reached 1.0 milestole. Let's look what we can do with this lib, which scenarios can be handled.
Big data classification refers to the process of categorizing or classifying data based on predefined categories or classes. It involves using machine learning algorithms and statistical techniques to analyze large volumes of data and assign them to specific classes or categories.
Here are some key aspects of big data classification:
Training Data: Classification algorithms require a labeled dataset for training. This dataset consists of examples where the class or category of each data point is known. The training data is used to build a classification model that can generalize patterns and relationships between the input features and the corresponding classes.
Feature Selection: In classification, the features or attributes of the data play a crucial role in determining the class labels. Feature selection involves identifying the most relevant and informative features that contribute to accurate classification. This helps reduce dimensionality and improve the performance of the classification model.
Classification Algorithms: Various machine learning algorithms can be used for big data classification, such as decision trees, random forests, support vector machines (SVM), logistic regression, and deep learning techniques like neural networks. Each algorithm has its strengths, limitations, and suitability for different types of data and classification tasks.
Model Training and Evaluation: The classification model is trained using the labeled training data. The model learns the patterns and relationships between the features and the corresponding class labels. After training, the model is evaluated using evaluation metrics such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic (ROC) curve to assess its performance and generalization ability.
Predictive Classification: Once the classification model is trained and evaluated, it can be used to predict the classes of new, unseen data. The model takes the input features of the new data and applies the learned patterns to assign the appropriate class label.
Handling Big Data Challenges: Big data classification comes with specific challenges, including the volume, velocity, variety, and veracity of data. Processing large-scale datasets requires distributed computing frameworks like Apache Hadoop or Apache Spark. Additionally, data preprocessing, feature engineering, and model optimization techniques are used to handle the complexity and scalability of big data.
Big data classification finds applications in various domains, including customer segmentation, fraud detection, image recognition, sentiment analysis, and medical diagnosis, among others. It enables organizations to extract valuable insights from massive datasets and automate the process of categorizing and organizing data.
To successfully perform big data classification, it is important to have a good understanding of the data, the domain, and the appropriate choice of algorithms and techniqu
We provide real time big data training in Chennai by industrial experts with real time scenarios.
Our Advanced topics will enhance the students expectations into high level knowledge in Big Data Technology.
For More Info.Reach our Big Data Technical Team@ +91 96677211551/56
The Experience of Big data Training Experts Team.
www.thecreatingexperts.com
SAP BEST INSTITUTES IN CHENNAI
http://www.youtube.com/watch?v=UpWthI0P-7g
The Magical Art of Extracting Meaning From Datalmrei
Basics of recommender systems, machine learning, supervised classification an unsupervised classification via clustering. Code in python, including simple examples. k-Nearest Neighbors, Naive Bayes, Non Negative Matrix Factorization
Codebits 2010 presentation.
The Deep Continual Learning community should move beyond studying forgetting in Class-Incremental Learning Scenarios! In this tutorial we gave at
#CoLLAs2023, me and Antonio Carta try to explain why and how! 👇
Do you agree?
DataEngConf: Building the Next New York Times Recommendation EngineHakka Labs
By Alex Spangher (Data Engineer, New York Times Digital)
Machine Learning is a discipline characterized by systematic approaches and common threads to seemingly diverse problems. In this talk I'll talk about several approaches taken during our work on the next New York Times Recommendation Engine, specifically focusing on spatial reasoning, dimensionality reduction, and testing strategies. Topics covered will include implicit regression, Bayesian modeling and neural networks. The talk will focus on the commonalities between different approaches taken.
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Dr. Aparna Varde
This is the 3rd part of the tutorial on commonsense knowledge (CSK) at ACM WSDM 2021 by Simon Razniewski, Niket Tandon and Aparna Varde. It focuses on evaluation of the acquired knowledge, both intrinsic & extrinsic, as well as highlights, outlook with a brief perspective on COVID and open issues for further research.
Abstract: Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt’s birth date, or Angelina Jolie’s movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds.
In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?.
This work was presented at The Web Conference 2018, MSM workshop.
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
This presentation will cover all aspects of modeling, from preparing data, training and evaluating the results. There will be descriptions of the mainline ML methods including, neural nets, SVM, boosting, bagging, trees, forests, and deep learning. common problems of overfitting and dimensionality will be covered with discussion of modeling best practices. Other topics will include field standardization, encoding categorical variables, feature creation and selection. It will be a soup-to-nuts overview of all the necessary procedures for building state-of-the art predictive models.
Future of AI-powered automation in businessLouis Dorard
Starting from examples of current use cases of AI in business and in everyday life, we'll see what the future holds and we'll mention questions to address when giving autonomy to intelligent machines. We'll also aim at demystifying how AI works, in particular how machines can use data to automatically learn business rules and actions to perform in different contexts.
Writing Machine Learning code is now possible with .NET native library ML.NET that has recently reached 1.0 milestole. Let's look what we can do with this lib, which scenarios can be handled.
Big data classification refers to the process of categorizing or classifying data based on predefined categories or classes. It involves using machine learning algorithms and statistical techniques to analyze large volumes of data and assign them to specific classes or categories.
Here are some key aspects of big data classification:
Training Data: Classification algorithms require a labeled dataset for training. This dataset consists of examples where the class or category of each data point is known. The training data is used to build a classification model that can generalize patterns and relationships between the input features and the corresponding classes.
Feature Selection: In classification, the features or attributes of the data play a crucial role in determining the class labels. Feature selection involves identifying the most relevant and informative features that contribute to accurate classification. This helps reduce dimensionality and improve the performance of the classification model.
Classification Algorithms: Various machine learning algorithms can be used for big data classification, such as decision trees, random forests, support vector machines (SVM), logistic regression, and deep learning techniques like neural networks. Each algorithm has its strengths, limitations, and suitability for different types of data and classification tasks.
Model Training and Evaluation: The classification model is trained using the labeled training data. The model learns the patterns and relationships between the features and the corresponding class labels. After training, the model is evaluated using evaluation metrics such as accuracy, precision, recall, F1 score, and area under the receiver operating characteristic (ROC) curve to assess its performance and generalization ability.
Predictive Classification: Once the classification model is trained and evaluated, it can be used to predict the classes of new, unseen data. The model takes the input features of the new data and applies the learned patterns to assign the appropriate class label.
Handling Big Data Challenges: Big data classification comes with specific challenges, including the volume, velocity, variety, and veracity of data. Processing large-scale datasets requires distributed computing frameworks like Apache Hadoop or Apache Spark. Additionally, data preprocessing, feature engineering, and model optimization techniques are used to handle the complexity and scalability of big data.
Big data classification finds applications in various domains, including customer segmentation, fraud detection, image recognition, sentiment analysis, and medical diagnosis, among others. It enables organizations to extract valuable insights from massive datasets and automate the process of categorizing and organizing data.
To successfully perform big data classification, it is important to have a good understanding of the data, the domain, and the appropriate choice of algorithms and techniqu
We provide real time big data training in Chennai by industrial experts with real time scenarios.
Our Advanced topics will enhance the students expectations into high level knowledge in Big Data Technology.
For More Info.Reach our Big Data Technical Team@ +91 96677211551/56
The Experience of Big data Training Experts Team.
www.thecreatingexperts.com
SAP BEST INSTITUTES IN CHENNAI
http://www.youtube.com/watch?v=UpWthI0P-7g
The Magical Art of Extracting Meaning From Datalmrei
Basics of recommender systems, machine learning, supervised classification an unsupervised classification via clustering. Code in python, including simple examples. k-Nearest Neighbors, Naive Bayes, Non Negative Matrix Factorization
Codebits 2010 presentation.
The Deep Continual Learning community should move beyond studying forgetting in Class-Incremental Learning Scenarios! In this tutorial we gave at
#CoLLAs2023, me and Antonio Carta try to explain why and how! 👇
Do you agree?
DataEngConf: Building the Next New York Times Recommendation EngineHakka Labs
By Alex Spangher (Data Engineer, New York Times Digital)
Machine Learning is a discipline characterized by systematic approaches and common threads to seemingly diverse problems. In this talk I'll talk about several approaches taken during our work on the next New York Times Recommendation Engine, specifically focusing on spatial reasoning, dimensionality reduction, and testing strategies. Topics covered will include implicit regression, Bayesian modeling and neural networks. The talk will focus on the commonalities between different approaches taken.
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Dr. Aparna Varde
This is the 3rd part of the tutorial on commonsense knowledge (CSK) at ACM WSDM 2021 by Simon Razniewski, Niket Tandon and Aparna Varde. It focuses on evaluation of the acquired knowledge, both intrinsic & extrinsic, as well as highlights, outlook with a brief perspective on COVID and open issues for further research.
Abstract: Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt’s birth date, or Angelina Jolie’s movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds.
In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?.
This work was presented at The Web Conference 2018, MSM workshop.
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
This presentation will cover all aspects of modeling, from preparing data, training and evaluating the results. There will be descriptions of the mainline ML methods including, neural nets, SVM, boosting, bagging, trees, forests, and deep learning. common problems of overfitting and dimensionality will be covered with discussion of modeling best practices. Other topics will include field standardization, encoding categorical variables, feature creation and selection. It will be a soup-to-nuts overview of all the necessary procedures for building state-of-the art predictive models.
Similar to Nearest Neighbor And Decision Tree - NN DT (20)
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
КАТЕРИНА АБЗЯТОВА «Ефективне планування тестування ключові аспекти та практ...QADay
Lviv Direction QADay 2024 (Professional Development)
КАТЕРИНА АБЗЯТОВА
«Ефективне планування тестування ключові аспекти та практичні поради»
https://linktr.ee/qadayua
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
1. Nearest Neighbor, Decision Tree
Danna Gurari
University of Texas at Austin
Spring 2021
https://www.ischool.utexas.edu/~dannag/Courses/IntroToMachineLearning/CourseContent.html
2. Review
• Last week:
• Binary classification applications
• Evaluating classification models
• Biological neurons: inspiration
• Artificial neurons: Perceptron & Adaline
• Gradient descent
• Assignments (Canvas)
• Lab assignment 1 due tonight
• Problem set 3 due next week
• Questions?
3. Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
4. Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
10. Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
Input:
Label: Cat Dog Person
11. 1. Split data into a “training set” and “ ”
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
Training Data Test Data
Input:
Label: Cat Dog Person
12. Training Data Test Data
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
Input:
Label: Cat Dog
2. Train model on “training set” to try to minimize prediction error on it
Person
13. Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
? ? ?
Input:
Label:
14. Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat ? ?
Input:
Label:
15. Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat Giraffe ?
Input:
Label:
16. Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat Giraffe Person
Input:
Label:
17. Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat Giraffe Person
Input:
Label:
Cat Pumpkin
Human Annotated Label: Person
19. Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
20. Recall: Historical Context of ML Models
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
21. Recall: Vision for Perceptron
Frank Rosenblatt
(Psychologist)
“[The perceptron is] the embryo of an
electronic computer that [the Navy] expects
will be able to walk, talk, see, write,
reproduce itself and be conscious of its
existence…. [It] is expected to be finished in
about a year at a cost of $100,000.”
1958 New York Times article: https://www.nytimes.com/1958/07/08/archives/new-
navy-device-learns-by-doing-psychologist-shows-embryo-of.html
https://en.wikipedia.org/wiki/Frank_Rosenblatt
22. “Perceptrons” Book: Instigator for “AI Winter”
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
1969
Minsky & Papert publish a book called
“Perceptrons” to discuss its limitations
23. Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
?
?
?
?
24. Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
?
?
?
?
25. Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
?
?
?
26. Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
?
?
27. Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
1
?
28. Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
1
0
29. Perceptron Limitation: XOR Problem
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
1
0
How to separate 1s from 0s with a perceptron (linear function)?
30. Perceptron Limitation: XOR Problem
Frank Rosenblatt
(Psychologist)
“[The perceptron is] the embryo of an
electronic computer that [the Navy] expects
will be able to walk, talk, see, write,
reproduce itself and be conscious of its
existence…. [It] is expected to be finished in
about a year at a cost of $100,000.”
1958 New York Times article: https://www.nytimes.com/1958/07/08/archives/new-
navy-device-learns-by-doing-psychologist-shows-embryo-of.html
How can a machine be “conscious”
when it can’t solve the XOR problem?
31. How to Overcome Limitation?
Non-linear models: e.g.,
• Linear regression: perform non-linear transformation of input features
• K-nearest neighbors
• Decision trees
• And many more to follow…
32. Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric model
33. Historical Context of ML Models
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
1951
K-Nearest Neighbors
E. Fix and J.L. Hodges. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Technical Report 4, USAF School of Aviation Medicine, Randolph Field, 1951
34. Recall: Instance-Based Learning
Memorizes examples and uses a similarity
measure to those examples to make predictions
Figure Source: Hands-on Machine Learning with Scikit-Learn & TensorFlow, Aurelien Geron
35. e.g., Predict What Scene The Image Shows
Input Output
Kitchen
Database Search
37. e.g., Predict What Scene The Image Shows
2. Organize Database so Visually Similar Examples Neighbor Each Other
Adapted from slides by Antonio Torralba
38. 3. Predict Class Using Label of Most Similar Example(s) in the Database
Input:
Label: ?
Adapted from slides by Antonio Torralba
e.g., Predict What Scene The Image Shows
39. 3. Predict Class Using Label of Most Similar Example(s) in the Database
e.g., Predict What Scene The Image Shows
Input:
Label: Cathedral
Adapted from slides by Antonio Torralba
40. 3. Predict Class Using Label of Most Similar Example(s) in the Database
e.g., Predict What Scene The Image Shows
Input:
Label: ?
Adapted from slides by Antonio Torralba
41. 3. Predict Class Using Label of Most Similar Example(s) in the Database
e.g., Predict What Scene The Image Shows
Input:
Label: Highway
Adapted from slides by Antonio Torralba
43. K-Nearest Neighbor Classification
Training Data:
• Given:
• x = {10.5, 3}
• Predict:
• When k = 1:
• Class 1
• When k = 6:
• How to avoid ties?
• Set “k” to odd value
for binary problems
• Prefer “closer”
neighbors
Novel Examples:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
44. K-Nearest Neighbors: Measuring Distance
How to measure distance between a novel example and test example?
• Commonly use, Minkowski distance:
• When p = 2, Euclidean distance:
• When p = 1, Manhattan distance:
https://en.wikipedia.org/wiki/Taxicab_geometry
45. K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Euclidean Distance
Training Data:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
46. K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Euclidean Distance
Training Data:
Note: May want to
scale the data to the
same range first
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
47. K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Manhattan Distance
Training Data:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
48. K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Manhattan Distance
Training Data:
Note: May want to
scale the data to the
same range first
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
49. K-Nearest Neighbors: Measuring Distance
How to measure distance for novel categorical test data?
• e.g., Train = blue
• e.g., Test = blue; identical values so assign distance 0
• e.g., Test = white; different values so assign distance 1
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
50. K-Nearest Neighbor Classification: What “K”?
When K=1: When K=3:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
51. K-Nearest Neighbor Classification: What “K”?
When K=1: When K=3:
Given that predictions can change with different
values of “k”, what value of “k” should one use?
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
52. K-Nearest Neighbor Classification: What “K”?
What happens to the decision boundary as “k” grows?
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
53. K-Nearest Neighbor Classification: What “K”?
What happens when “k” equals the training data size?
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
54. K-Nearest Neighbor Classification: What “K”?
(Higher Model Complexity) (Lower Model Complexity)
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
55. K-Nearest Neighbor Classification: What “K”?
At what value
for “k” is model
overfitting the
most?
k = 1
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
56. K-Nearest Neighbor Classification: What “K”?
What is the best
value for “k”?
k = 6
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
57. K-Nearest Neighbor: How to Use to Predict
More than Two Classes?
• Tally number of examples belonging to each class and again choose
the majority vote winners
58. What are Strengths of KNN?
• Adapts as new data is added
• Training is relatively fast
• Easy to understand
59. What are Weaknesses of KNN?
• For large datasets, requires large storage space
• For large datasets, this approach can be very slow or infeasible
• Note: can improve speed with efficient data structures such as KD-trees
• Vulnerable to noisy/irrelevant examples
• Sensitive to imbalanced datasets where more frequent class will
dominate majority voting
60. Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
61. Historical Context of ML Models
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
Hunt, E.B. (1962). Concept learning: An information processing problem. New York: Wiley.
Quinlan, J.R. (1979). Discovering rules by induction from large collections of examples. In D. Michie (Ed.), Expert systems in the
micro electronic age. Edinburgh University Press.
K-nearest
neighbors
1962
Concept Learning
1979
ID3
66. Decision Tree: Generic Structure
• Goal: predict class label (aka: class, ground truth)
• Representation: Tree
• Internal (non-leaf) nodes = tests an attribute
• Branches = attribute value
• Leaf = classification label
67. Decision Tree: Generic Structure
• Goal: predict class label (aka: class, ground truth)
• Representation: Tree
• Internal (non-leaf) nodes = tests an attribute
• Branches = attribute value
• Leaf = classification label
68. Decision Tree: Generic Structure
• Goal: predict class label
• Representation: Tree
• Internal (non-leaf) nodes = tests an attribute
• Branches = attribute value
• Leaf = classification label
How can a machine learn a decision tree?
72. Next “Best” Attribute: Use Entropy
Number of classes
Fraction of examples belonging to class i
Encodes in bits
In a binary setting,
• Entropy is 0 when fraction of examples belonging to a class is 0 or 1
• Entropy is 1 when fraction of examples belonging to each class is 0.5
https://www.logcalculator.net/log-2
73. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Current entropy?
74. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Current entropy?
75. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Current entropy?
76. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = ?
77. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = ?
78. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = 0.92
• Right tree: “Drama” = ?
79. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = 0.92
• Right tree: “Drama” = ?
80. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = 0.92
• Right tree: “Drama” = 0.97
• Information gain by split on “Type”?
81. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = ?
82. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = 0.92
• Middle tree: “Medium” = ?
83. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = 0.92
• Middle tree: “Medium” = 0.82
• Right tree: “Long” = ?
84. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = 0.92
• Middle tree: “Medium” = 0.82
• Right tree: “Long” = 0
• Information gain by split on “Length”?
85. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
86. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
87. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
88. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
89. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
90. Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
91. Decision Tree: What is Our First Split?
• Greedy approach (NP complete problem)
IG = 0 IG = 0.19 IG = 0.95
92. Decision Tree: What Tree Results?
IMDb Rating
<= 7.05 > 7.05
Recurse on this tree?
93. Decision Tree: What Tree Results?
No
Recurse on this tree?
IMDb Rating
<= 7.05 > 7.05
95. Decision Tree: Generic Learning Algorithm
• Greedy approach (NP complete problem)
Key Decision
• Entropy (maximize information gain)
• Gini Index (used in CART algorithm)
• Gain ratio (used in C4.5 algorithm)
• Mean squared error
• …
96. Overfitting
• At what tree size, does overfitting begin?
http://www.alanfielding.co.uk/multivar/crt/dt_example_04.htm
Evaluate on training data
Evaluate on test data
97. Regularization to Avoid Overfitting
• Pruning
• Pre-pruning: stop tree growth earlier
• Post-pruning: prune tree afterwards
https://www.clariba.com/blog/tech-20140811-decision-trees-with-sap-predictive-analytics-and-sap-hana-emilio-nieto
98. Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
99. Machine Learning Goal
• Learn function that maps input features (X) to an output prediction (Y)
Y = f(x)
100. Machine Learning Goal
• Learn function that maps input features (X) to an output prediction (Y)
• Parametric model: has a fixed number of parameters
Y = f(x)
101. Machine Learning Goal
• Learn function that maps input features (X) to an output prediction (Y)
• Parametric model: has a fixed number of parameters
• Non-parametric model: does not specify the number of parameters
Y = f(x)
102. Class Discussion
• For each model, is it parametric or non-parametric?
• Linear regression
• K-nearest neighbors
• What are advantages and disadvantages of parametric models?
• What are advantages and disadvantages of non-parametric models?
103. Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
104. References Used for Today’s Material
• Chapter 3 of Deep Learning book by Goodfellow et al.
• http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC411/06_trees.pdf
• http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC411/tutorial3_CrossVal-DTs.pdf
• http://www.cs.cmu.edu/~epxing/Class/10701/slides/classification15.pdf