SlideShare a Scribd company logo
Teaching AI about
human knowledge
Ines Montani
Explosion AI
Explosion AI is a digital studio
specialising in Artificial Intelligence
and Natural Language Processing.
Open-source library for industrial-strength
Natural Language Processing
spaCy’s next-generation Machine Learning library
for deep learning with text
Data Store
coming soon: pre-trained, customisable models

for a variety of languages and domains
Machine learning is
programming by
example.
Examples are your source
code, training is compilation.
draw examples from the same distribution as
the runtime inputs
goal: system’s prediction given some input
matches label a human would have assigned
examples
training
predictioninput
labels
How machines “learn”
def train_tagger(examples):
W = defaultdict(lambda: zeros(n_tags))
for (word, prev, next), human_tag in examples:
scores = W[word] + W[prev] + W[next]
guess = scores.argmax()
if guess != human_tag:
for feat in (word, prev, next):
W[feat][guess] -= 1
W[feat][human_tag] += 1
the weights we'll train
if the guess was wrong, adjust weights
get the best-scoring tag
score each tag given weights & context
examples = words, tags, contexts
decrease score for bad tag in this context
increase score for good tag in this context
Example: training a simple part-of-speech tagger with the perceptron algorithm

(teach the model to recognise verbs, nouns, etc.)
The bottleneck in AI
is data, not algorithms.
Algorithms are general,
training data is specific.
data quality, data quantity and accuracy
problems are still the biggest problems in AI 

(Source: The State of AI survey)
you can extract knowledge from all kinds of
sources, e.g. sentiment from emoji on Reddit 😍
you usually need at least some data specific to
your problem annotated by humans
Where human knowledge
in AI really comes from
Images: Amazon Mechanical Turk, depressing.org
Mechanical Turk
human annotators
~$5 per hour
boring tasks
low incentives
Don’t expect great data if
you’re boring the shit out of
underpaid people.
Why are we “designing around” this?

“Taking a HIT: Designing around Rejection, Mistrust, Risk, and
Workers’ Experiences in Amazon Mechanical Turk” 

(McInnis et al., 2016)
data collection needs the same treatment as all
other human-facing processes
good UX + purpose + incentives = better quality
UX-driven data collection
with active learning
assist human with good UX and task structure
the things that are hard for the computer are
usually easy for the human, and vice versa
don’t waste time on what the model already
knows, ask human about what the model is 

most interested in
SOLUTION #1
model human
tasks
human annotates
all tasks
annotated tasks
are used as training
data for model
Batch learning vs. active learning approach to annotation and training
BATCH
BATCH
model human
tasks
human annotates
all tasks
annotated tasks
are used as training
data for model
model chooses
one task
human annotates
chosen task
annotated single task influences model’s
decision on what to ask next
Batch learning vs. active learning approach to annotation and training
BATCH
BATCH
ACTIVE
Import knowledge with
pre-trained models
start off with general information about the
language, the world etc.
fine-tune and improve to fit custom needs
big models can work with little training data
backpropagate error signals to correct model
SOLUTION #2
user input
word meanings
entity labels
phrase meanings
intent
your examples
“whats the best way to catalinas”
fit meaning
representations
to your data
Backpropagation
user input
word meanings
entity labels
phrase meanings
intent
your examples
“whats the best way to catalinas”
fit meaning
representations
to your data
add workaround
for read-only model
Backpropagation
If you don’t own the
weights of your model,
you’re missing the
point of deep learning.
Deep learning is about
reading and writing.
deep learning = learning internal
representations to help solve your task.
backpropagate errors on the task you care
about to adjust the internal representations
SaaS is great for read-only software – but deep
learning needs read and write access
1. AI systems today are no magically smart
black boxes. They’re made of annotations and
statistics.
2. AI has an HR problem. To improve the
technology, we need to fix the way we collect
data from humans.
3. AI needs write access. This challenges the
current trends towards thin clients and
centralised cloud computing.
Thanks!
💥 Explosion AI

explosion.ai
📲 Follow us on Twitter

@explosion_ai

@_inesmontani

@honnibal

More Related Content

What's hot

Ai vs machine learning vs deep learning
Ai vs machine learning vs deep learningAi vs machine learning vs deep learning
Ai vs machine learning vs deep learning
Sanjay Patel
 
SkillsFuture Festival at NUS 2019- Machine Learning for Humans
SkillsFuture Festival at NUS 2019- Machine Learning for HumansSkillsFuture Festival at NUS 2019- Machine Learning for Humans
SkillsFuture Festival at NUS 2019- Machine Learning for Humans
NUS-ISS
 
Aaa ped-1- Python: Introduction to AI, Python and Colab
Aaa ped-1- Python: Introduction to AI, Python and ColabAaa ped-1- Python: Introduction to AI, Python and Colab
Aaa ped-1- Python: Introduction to AI, Python and Colab
AminaRepo
 
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Melanie Swan
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive Computing
Varun Trivedi
 
Distributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in PythonDistributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in Python
Clare Corthell
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
JimmyLiang20
 
How to Build AI That Works For People | Seattle Interactive Conference 2018
How to Build AI That Works For People | Seattle Interactive Conference 2018How to Build AI That Works For People | Seattle Interactive Conference 2018
How to Build AI That Works For People | Seattle Interactive Conference 2018
Seattle Interactive Conference
 
Unit 1 part 1
Unit 1   part 1Unit 1   part 1
Unit 1 part 1
Kalai Selvi
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
Kalai Selvi
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
Suman Srinivasan
 
Ai in finance
Ai in financeAi in finance
Ai in finance
Vishwas N
 
Searching techniques in AI
Searching techniques in AISearching techniques in AI
Searching techniques in AI
Kalai Selvi
 
Unit 1 part 2
Unit 1  part 2Unit 1  part 2
Unit 1 part 2
Kalai Selvi
 
SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al...
 SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al... SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al...
SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al...
NUS-ISS
 
Deep Learning - Hype, Reality and Applications in Manufacturing
Deep Learning - Hype, Reality and Applications in ManufacturingDeep Learning - Hype, Reality and Applications in Manufacturing
Deep Learning - Hype, Reality and Applications in Manufacturing
Adam Cook
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
Manning Publications
 
Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...
Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...
Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...
Databeers Dublin
 
Introduction To Machine Learning | Edureka
Introduction To Machine Learning | EdurekaIntroduction To Machine Learning | Edureka
Introduction To Machine Learning | Edureka
Edureka!
 

What's hot (20)

Ai vs machine learning vs deep learning
Ai vs machine learning vs deep learningAi vs machine learning vs deep learning
Ai vs machine learning vs deep learning
 
SkillsFuture Festival at NUS 2019- Machine Learning for Humans
SkillsFuture Festival at NUS 2019- Machine Learning for HumansSkillsFuture Festival at NUS 2019- Machine Learning for Humans
SkillsFuture Festival at NUS 2019- Machine Learning for Humans
 
Aaa ped-1- Python: Introduction to AI, Python and Colab
Aaa ped-1- Python: Introduction to AI, Python and ColabAaa ped-1- Python: Introduction to AI, Python and Colab
Aaa ped-1- Python: Introduction to AI, Python and Colab
 
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive Computing
 
Distributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in PythonDistributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in Python
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
How to Build AI That Works For People | Seattle Interactive Conference 2018
How to Build AI That Works For People | Seattle Interactive Conference 2018How to Build AI That Works For People | Seattle Interactive Conference 2018
How to Build AI That Works For People | Seattle Interactive Conference 2018
 
Unit 1 part 1
Unit 1   part 1Unit 1   part 1
Unit 1 part 1
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Searching techniques in AI
Searching techniques in AISearching techniques in AI
Searching techniques in AI
 
Unit 1 part 2
Unit 1  part 2Unit 1  part 2
Unit 1 part 2
 
SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al...
 SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al... SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al...
SkillsFuture Festival at NUS 2019- Industrial Deep Learning and Latest AI Al...
 
Deep Learning - Hype, Reality and Applications in Manufacturing
Deep Learning - Hype, Reality and Applications in ManufacturingDeep Learning - Hype, Reality and Applications in Manufacturing
Deep Learning - Hype, Reality and Applications in Manufacturing
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
 
Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...
Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...
Databeers Dub #2 - Brian Mac Namee - Deep Learning, A Different Kind of Machi...
 
Introduction To Machine Learning | Edureka
Introduction To Machine Learning | EdurekaIntroduction To Machine Learning | Edureka
Introduction To Machine Learning | Edureka
 

Viewers also liked

Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies
exouniversity
 
Buyer's Remorse - Using analytics to mitigate M&A risk
Buyer's Remorse - Using analytics to mitigate M&A riskBuyer's Remorse - Using analytics to mitigate M&A risk
Buyer's Remorse - Using analytics to mitigate M&A risk
KDDanalytics
 
Types of databases
Types of databasesTypes of databases
Types of databasesPAQUIAAIZEL
 
International AI and ML Landscape
International AI and ML LandscapeInternational AI and ML Landscape
International AI and ML Landscape
Pranath Sisodiya
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
LinkedIn
 

Viewers also liked (6)

Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies
 
Database and different types of databases available in market
Database and different types of databases available in marketDatabase and different types of databases available in market
Database and different types of databases available in market
 
Buyer's Remorse - Using analytics to mitigate M&A risk
Buyer's Remorse - Using analytics to mitigate M&A riskBuyer's Remorse - Using analytics to mitigate M&A risk
Buyer's Remorse - Using analytics to mitigate M&A risk
 
Types of databases
Types of databasesTypes of databases
Types of databases
 
International AI and ML Landscape
International AI and ML LandscapeInternational AI and ML Landscape
International AI and ML Landscape
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Similar to Teaching AI about human knowledge

Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
Osama Ghandour Geris
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
HassanElalfy4
 
Azure machine learning indiandotnet
Azure machine learning indiandotnetAzure machine learning indiandotnet
Azure machine learning indiandotnet
Indiandotnet
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
Suresh Arora
 
Machine learning
Machine learningMachine learning
Machine learning
ADARSHMISHRA126
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
ARVIND SARDAR
 
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners
Antonio Fernandes
 
Machine Learning lecture1(introduction)
Machine Learning lecture1(introduction)Machine Learning lecture1(introduction)
Machine Learning lecture1(introduction)
cairo university
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)
NirupamNishant2
 
Machine Learning Fundamentals.docx
Machine Learning Fundamentals.docxMachine Learning Fundamentals.docx
Machine Learning Fundamentals.docx
HaritvKrishnagiri
 
Machine learning
Machine learningMachine learning
Machine learning
Abrar ali
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure ml
Mostafa
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
chadhar227
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Eng Teong Cheah
 
Savage_Samuel.doc
Savage_Samuel.docSavage_Samuel.doc
Savage_Samuel.docbutest
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
AyanGain
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
Sara Hooker
 
Cracking The Technical Interview Uw
Cracking The Technical Interview   UwCracking The Technical Interview   Uw
Cracking The Technical Interview Uw
careercup
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
pradeepskvch
 

Similar to Teaching AI about human knowledge (20)

Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
 
Azure machine learning indiandotnet
Azure machine learning indiandotnetAzure machine learning indiandotnet
Azure machine learning indiandotnet
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
 
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners
 
Machine Learning lecture1(introduction)
Machine Learning lecture1(introduction)Machine Learning lecture1(introduction)
Machine Learning lecture1(introduction)
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)
 
Machine Learning Fundamentals.docx
Machine Learning Fundamentals.docxMachine Learning Fundamentals.docx
Machine Learning Fundamentals.docx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure ml
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Savage_Samuel.doc
Savage_Samuel.docSavage_Samuel.doc
Savage_Samuel.doc
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Cracking The Technical Interview Uw
Cracking The Technical Interview   UwCracking The Technical Interview   Uw
Cracking The Technical Interview Uw
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Teaching AI about human knowledge

  • 1. Teaching AI about human knowledge Ines Montani Explosion AI
  • 2. Explosion AI is a digital studio specialising in Artificial Intelligence and Natural Language Processing. Open-source library for industrial-strength Natural Language Processing spaCy’s next-generation Machine Learning library for deep learning with text Data Store coming soon: pre-trained, customisable models
 for a variety of languages and domains
  • 4. Examples are your source code, training is compilation. draw examples from the same distribution as the runtime inputs goal: system’s prediction given some input matches label a human would have assigned examples training predictioninput labels
  • 5. How machines “learn” def train_tagger(examples): W = defaultdict(lambda: zeros(n_tags)) for (word, prev, next), human_tag in examples: scores = W[word] + W[prev] + W[next] guess = scores.argmax() if guess != human_tag: for feat in (word, prev, next): W[feat][guess] -= 1 W[feat][human_tag] += 1 the weights we'll train if the guess was wrong, adjust weights get the best-scoring tag score each tag given weights & context examples = words, tags, contexts decrease score for bad tag in this context increase score for good tag in this context Example: training a simple part-of-speech tagger with the perceptron algorithm
 (teach the model to recognise verbs, nouns, etc.)
  • 6. The bottleneck in AI is data, not algorithms.
  • 7. Algorithms are general, training data is specific. data quality, data quantity and accuracy problems are still the biggest problems in AI 
 (Source: The State of AI survey) you can extract knowledge from all kinds of sources, e.g. sentiment from emoji on Reddit 😍 you usually need at least some data specific to your problem annotated by humans
  • 8. Where human knowledge in AI really comes from Images: Amazon Mechanical Turk, depressing.org Mechanical Turk human annotators ~$5 per hour boring tasks low incentives
  • 9. Don’t expect great data if you’re boring the shit out of underpaid people. Why are we “designing around” this?
 “Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers’ Experiences in Amazon Mechanical Turk” 
 (McInnis et al., 2016) data collection needs the same treatment as all other human-facing processes good UX + purpose + incentives = better quality
  • 10. UX-driven data collection with active learning assist human with good UX and task structure the things that are hard for the computer are usually easy for the human, and vice versa don’t waste time on what the model already knows, ask human about what the model is 
 most interested in SOLUTION #1
  • 11. model human tasks human annotates all tasks annotated tasks are used as training data for model Batch learning vs. active learning approach to annotation and training BATCH BATCH
  • 12. model human tasks human annotates all tasks annotated tasks are used as training data for model model chooses one task human annotates chosen task annotated single task influences model’s decision on what to ask next Batch learning vs. active learning approach to annotation and training BATCH BATCH ACTIVE
  • 13. Import knowledge with pre-trained models start off with general information about the language, the world etc. fine-tune and improve to fit custom needs big models can work with little training data backpropagate error signals to correct model SOLUTION #2
  • 14. user input word meanings entity labels phrase meanings intent your examples “whats the best way to catalinas” fit meaning representations to your data Backpropagation
  • 15. user input word meanings entity labels phrase meanings intent your examples “whats the best way to catalinas” fit meaning representations to your data add workaround for read-only model Backpropagation
  • 16. If you don’t own the weights of your model, you’re missing the point of deep learning.
  • 17. Deep learning is about reading and writing. deep learning = learning internal representations to help solve your task. backpropagate errors on the task you care about to adjust the internal representations SaaS is great for read-only software – but deep learning needs read and write access
  • 18. 1. AI systems today are no magically smart black boxes. They’re made of annotations and statistics. 2. AI has an HR problem. To improve the technology, we need to fix the way we collect data from humans. 3. AI needs write access. This challenges the current trends towards thin clients and centralised cloud computing.
  • 19. Thanks! 💥 Explosion AI
 explosion.ai 📲 Follow us on Twitter
 @explosion_ai
 @_inesmontani
 @honnibal