SlideShare a Scribd company logo
1 of 15
A basic introduction to 
Machine Learning 
Catalina Hallett 
SwiftKey
Machine learning 
• Machine learning deals with the construction of 
computer systems that act upon information 
learned from data, rather than on a set of specific 
instructions 
• The aim of a machine learning system is to 
generalize from experience, i.e. to perform 
accurately on new, unseen examples/tasks after 
having experienced a learning data set 
• All-pervasive: web search, marketing, financial 
predictions, voice and image recognition, self-driving 
cars
Supervised vs unsupervised learning 
• Supervised learning: 
– the output is known 
– Training data is labelled with their output 
– It learns a function from the inputs to the outputs 
which can be used to generate an output for a new 
instance 
• Unsupervised learning 
– The output is unknown 
– Training data is unlabelled 
– It aims at discovering information from data
Supervised learning for classification 
• Marketing: 
– which promotions are more likely to be effective 
– which customers are more likely to need a certain product 
– Identifying positive/negative feedback 
• Machine vision: Image (face) recognition, handwriting 
identification, fingerprint identification 
• Spam/plagiarism detection 
• Natural language processing: text categorisation (e.g., 
for indexing), parsing, word sense disambiguation, 
speech identification
Supervised learning for classification 
• Step 1: Learning 
Given a target concept: 
– Collect a set of training examples that are 
representative of the concept 
– Identify features that are relevant in describing the 
concept 
– Learn a model that “explains” the concept (select an 
algorithm & fine tune it) 
• Step 2: Classification 
Use the model learnt in the previous step to classify an 
unseen instance
What is Kitty? 
Labelled training examples 
Class: Girl Class: Cat 
Labelled training examples 
A girl? A cat? 
Features 
Has a bow 
Wears clothes 
Is <5 apples tall 
Has whiskers 
Has round face 
Has cat ears 
Walks on 2 feet
Decision trees 
Round 
face 
Has 
whiskers 
5 apples 
tall 
4 Girl 
4 Cat 
5 Girl 
4 Cat 
3 Girl 
4 Cat 
1 Girl 
4 Cat 
0 Girl 
4 Cat 
Cat 
3 Girl 
0 Cat 
Girl 
0 Girl 
4 Cat 
Cat 
Has 
bow 
3 Girl 
0 Cat 
Girl 
0 Girl 
2 Cat 
Cat 
5 Girl 
2 Cat 
yes no 
yes 
yes yes 
yes 
no 
no 
no no 
Has 
whiskers 
…
K-nearest neighbour 
• Compare the classification target with the set of 
training examples using a distance function 
• Chose as output the class that the majority of the k 
closest neighbours belong to 
girl 
cat 
K=1 
K=3 
K=5
Many, many algorithms 
• Neural networks 
• Support vector machines (SVM) 
• Boosting 
• Naïve Bayes 
• Fisher linear discriminant 
… each of them with a large number of possible 
tuning parameters 
… each of them with advantages and disadvantages 
according to size of training data, speed, accuracy, 
overfitting risk, etc
How do you select the right one? 
• “No free lunch” – there is no one ML 
algorithm that outperforms all others on any 
give task 
• Some algorithms are known to work better for 
certain classes of problems, given certain 
circumstances 
[Which estimator] 
• Trial and error
Unsupervised learning 
• Deals with identifying patterns 
• It works with observed patterns (assumed to 
be independent samples from some 
probability distribution) 
• Has some explicit or implicit knowledge of 
what is important 
• Has no knowledge or expectations of target 
outputs
Main approaches 
• Clustering – trying to group object in such a 
way that objects in the same cluster are more 
similar to each other than to objects in a 
different cluster 
• Feature extraction – tries to identify statistical 
regularities or irregularities in data
Clustering techniques 
• K-means clustering - partitions n instances 
into k clusters in which each instance belongs 
to the cluster with the nearest mean 
Initialise k means – 
randomly or using 
some rules 
Partition the data 
according to the 
initial means 
Calculate the 
centroid of each 
cluster and use it as 
the new mean 
Repeat until 
convergence is 
reached 
(assignments to 
clusters no longer 
change 
* Images courtesy of Wikipedia
More models … 
• Distribution models: clusters are modelled 
using statistical distributions 
Expectation-maximization algorithm: use a fixed 
number of Gaussian distributions, initialised 
randomly. Optimise their parameters to fit the 
data set better
• Density-based clustering: “a cluster is a set of 
data objects spread in the data space over a 
contiguous region of high density of objects. 
Density‐based clusters are separated from 
each other by contiguous regions of low 
density of objects” (Kriegel et al, 2011) 
• Objects in low density areas are considered 
outliers

More Related Content

What's hot

Generative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfGenerative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfSaeed Al Dhaheri
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear RegressionSara Hooker
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & ConcernsAjitesh Kumar
 
AI Introduction for high school students
AI Introduction for high school studentsAI Introduction for high school students
AI Introduction for high school studentsMireaCartabbia
 
Governance of trustworthy AI
Governance of trustworthy AIGovernance of trustworthy AI
Governance of trustworthy AIsamossummit
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Edureka!
 

What's hot (20)

Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Machine learning
Machine learning Machine learning
Machine learning
 
Generative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfGenerative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdf
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
Machine learning
Machine learningMachine learning
Machine learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & Concerns
 
supervised learning
supervised learningsupervised learning
supervised learning
 
AI Introduction for high school students
AI Introduction for high school studentsAI Introduction for high school students
AI Introduction for high school students
 
Governance of trustworthy AI
Governance of trustworthy AIGovernance of trustworthy AI
Governance of trustworthy AI
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...Statistics Using Python | Statistics Python Tutorial | Python Certification T...
Statistics Using Python | Statistics Python Tutorial | Python Certification T...
 

Similar to What is Machine Learning?

Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence ApproachesJincy Nelson
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxDr.Shweta
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxDr.Shweta
 
Machine learning --Introduction.pptx
Machine learning --Introduction.pptxMachine learning --Introduction.pptx
Machine learning --Introduction.pptxvinivijayan4
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data miningAshikur Rahman
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-bestABDUmomo
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...DurgaDevi310087
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine LearningAnkit Rai
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 

Similar to What is Machine Learning? (20)

Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
machine learning
machine learningmachine learning
machine learning
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptx
 
Predictive Analysis
Predictive AnalysisPredictive Analysis
Predictive Analysis
 
Machine learning --Introduction.pptx
Machine learning --Introduction.pptxMachine learning --Introduction.pptx
Machine learning --Introduction.pptx
 
Machine learning
Machine learning Machine learning
Machine learning
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

What is Machine Learning?

  • 1. A basic introduction to Machine Learning Catalina Hallett SwiftKey
  • 2. Machine learning • Machine learning deals with the construction of computer systems that act upon information learned from data, rather than on a set of specific instructions • The aim of a machine learning system is to generalize from experience, i.e. to perform accurately on new, unseen examples/tasks after having experienced a learning data set • All-pervasive: web search, marketing, financial predictions, voice and image recognition, self-driving cars
  • 3. Supervised vs unsupervised learning • Supervised learning: – the output is known – Training data is labelled with their output – It learns a function from the inputs to the outputs which can be used to generate an output for a new instance • Unsupervised learning – The output is unknown – Training data is unlabelled – It aims at discovering information from data
  • 4. Supervised learning for classification • Marketing: – which promotions are more likely to be effective – which customers are more likely to need a certain product – Identifying positive/negative feedback • Machine vision: Image (face) recognition, handwriting identification, fingerprint identification • Spam/plagiarism detection • Natural language processing: text categorisation (e.g., for indexing), parsing, word sense disambiguation, speech identification
  • 5. Supervised learning for classification • Step 1: Learning Given a target concept: – Collect a set of training examples that are representative of the concept – Identify features that are relevant in describing the concept – Learn a model that “explains” the concept (select an algorithm & fine tune it) • Step 2: Classification Use the model learnt in the previous step to classify an unseen instance
  • 6. What is Kitty? Labelled training examples Class: Girl Class: Cat Labelled training examples A girl? A cat? Features Has a bow Wears clothes Is <5 apples tall Has whiskers Has round face Has cat ears Walks on 2 feet
  • 7. Decision trees Round face Has whiskers 5 apples tall 4 Girl 4 Cat 5 Girl 4 Cat 3 Girl 4 Cat 1 Girl 4 Cat 0 Girl 4 Cat Cat 3 Girl 0 Cat Girl 0 Girl 4 Cat Cat Has bow 3 Girl 0 Cat Girl 0 Girl 2 Cat Cat 5 Girl 2 Cat yes no yes yes yes yes no no no no Has whiskers …
  • 8. K-nearest neighbour • Compare the classification target with the set of training examples using a distance function • Chose as output the class that the majority of the k closest neighbours belong to girl cat K=1 K=3 K=5
  • 9. Many, many algorithms • Neural networks • Support vector machines (SVM) • Boosting • Naïve Bayes • Fisher linear discriminant … each of them with a large number of possible tuning parameters … each of them with advantages and disadvantages according to size of training data, speed, accuracy, overfitting risk, etc
  • 10. How do you select the right one? • “No free lunch” – there is no one ML algorithm that outperforms all others on any give task • Some algorithms are known to work better for certain classes of problems, given certain circumstances [Which estimator] • Trial and error
  • 11. Unsupervised learning • Deals with identifying patterns • It works with observed patterns (assumed to be independent samples from some probability distribution) • Has some explicit or implicit knowledge of what is important • Has no knowledge or expectations of target outputs
  • 12. Main approaches • Clustering – trying to group object in such a way that objects in the same cluster are more similar to each other than to objects in a different cluster • Feature extraction – tries to identify statistical regularities or irregularities in data
  • 13. Clustering techniques • K-means clustering - partitions n instances into k clusters in which each instance belongs to the cluster with the nearest mean Initialise k means – randomly or using some rules Partition the data according to the initial means Calculate the centroid of each cluster and use it as the new mean Repeat until convergence is reached (assignments to clusters no longer change * Images courtesy of Wikipedia
  • 14. More models … • Distribution models: clusters are modelled using statistical distributions Expectation-maximization algorithm: use a fixed number of Gaussian distributions, initialised randomly. Optimise their parameters to fit the data set better
  • 15. • Density-based clustering: “a cluster is a set of data objects spread in the data space over a contiguous region of high density of objects. Density‐based clusters are separated from each other by contiguous regions of low density of objects” (Kriegel et al, 2011) • Objects in low density areas are considered outliers