SlideShare a Scribd company logo
1 of 29
Download to read offline
Probabilistic Fact-Finding at Scale
Large-scale machine learning at Google
Corinna Cortes	

Google Research	

corinna@google.com
1
pageMachine Learning at Google 2015
Knowledge Panels a Commodity
2
pageMachine Learning at Google 2015
3/3/2015: http://searchengineland.com/google-researchers-
introduce-system-rank-web-pages-facts-not-links-215835
3
pageMachine Learning at Google 2015
Structured Snippets
[harder they come, harder they fall]	

[10467 usps]	

!
[amd a10 7850k specs]
4
Link
Link
Link
pageMachine Learning at Google 2015
Structured Snippets
Launched Q3, 2014	

• http://googleresearch.blogspot.com/2014/09/
introducing-structured-snippets-now.html	

• 40+ online articles (US as well as international),
including articles from Engadget, PC Magazine,
Search Engine Land, …
5
pageMachine Learning at Google 2015
Structured Snippets
Highlights	

• “[...] makes Google smarter and means you'll have to do less clicking in some
cases, so we're all for it.” -- TheVerge, 2014-09-23	

• “[...] the results themselves have usually been skimpy; you've seen preview text,
and that's about it.Thankfully, Google has made that sneak peek considerably
more useful.” -- Engadget, 2014-09-23	

• “[...] Now, Google is making search even more convenient.” -- PC Magazine,
2014-09-23	

• “[...] show Google's ongoing emphasis on extracting structure and entities
from unstructured content.” -- Search Engine Watch, 2014-09-24	

• "[...] The big pro for users, of course, is that structured snippets make
searching for specific facts easier and faster. Google’s search results are
smarter and more detailed, so users will have to do less clicking to find the
information they’re searching for." -- Mareeg Media, 2014-09-26	

• …
6
pageMachine Learning at Google 2015
Probabilistic Fact-Finding at Scale
Knowledge databases of curated facts:	

• Freebase, https://www.freebase.com	

• Google Knowledge Graph,
http://www.google.com/insidesearch/features/search/knowledge.html	

Web-scale probabilistic knowledge base that
combines extractions from Web content
7
pageMachine Learning at Google 2015
Outline
WebTables	

• How do we find the good tables on the web?	

KnowledgeVault	

• How do we find other quality facts on the web?	

Lattice Regression	

• How do we form interpretable non-linear models?
8
pageMachine Learning at Google 2015
Web Tables
“Applying WebTables in Practice”, CIDR 2015
Sreeram Balakrishnan,Alon Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan,
Afshin Rostamizadeh,Warren Shen, Kenneth Wilder, Fei Wu, CongYu.	

March 5, 2014: 11 billion HTML tables reduced to
147 million quasi-relational Web tables,
http://webdatacommons.org/webtables/
9
pageMachine Learning at Google 2015
Finding the ‘Good’ Tables
Machine learning classifier
• feature design	

• training example generation	

• model selection
10
pageMachine Learning at Google 2015
Feature Design
Complicating fact: the semantics of the table is often
determined by the surrounding text;	

Detecting subject columns: many tables contain a
subject column, the other columns contains properties
of the subject:	

• Binary classifier, 94% accuracy;	

• Header row of other columns used as a schema.	

Determining possible column classes:
• Annotate entries using the Google Knowledge Graph,
(subject, predicate, value), pick the largest class(es).
11
pageMachine Learning at Google 2015
Feature Design
Determining properties of the subject column:	

• Captions may mention the properties	

• Query stream and text used to determine what
properties appear with what classes.“Biperpedia:
An Ontology for Search Applications”,VLDB 2014,
Rahul Gupta,Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu.	

Additional structural features
• number of rows and columns, mean/variance of the
number of characters per cell, the fraction of non-
empty cells, the fraction of cells <th> tags, and the
number of distinct tokens in a table.
12
pageMachine Learning at Google 2015
Machine Learning
Training data: original 98% negative, 2% positive	

Heuristics: eliminate tables based on rules:	

• tiny tables (less than 3 rows and 2 columns),
calendars, password tables, table-of-content tables;	

Simple classifier: filter the tables based on cheap
features;	

A stratified sample of good and bad tables:	

• 35% positive, 65% negative.	

SVM classifier, multi-kernel learning, high accuracy.
13
pageMachine Learning at Google 2015
Outline
WebTables	

• How do we find the good tables on the web?	

KnowledgeVault	

• How do we find other quality facts on the web?	

Lattice Regression	

• How do we form interpretable non-linear models?
14
pageMachine Learning at Google 2015
KnowledgeVault
“Google Researchers Introduce System To Rank
Web Pages On Facts, Not Links”,
http://searchengineland.com/google-researchers-introduce-system-rank-web-
pages-facts-not-links-215835	

• “Knowledge-Based Trust: Estimating the
Trustworthiness of Web Sources”,
http://arxiv.org/pdf/1502.03519v1.pdf	

• “KnowledgeVault:A Web-Scale Approach to
Probabilistic Knowledge Fusion”, KDD 2014,
http://dl.acm.org/citation.cfm?id=2623623
15
pageMachine Learning at Google 2015
KnowledgeVault
Combines noisy extractions + prior knowledge	

• Table facts	

• Text extractors with scores derived from KG	

• Graph-based paths with scores derived from KG	

• Applies learning: P(entity, predicate, entity)
16
Barack!
Obama
Honolulu
Barack!
Obama, Sr.
USA
Place of birth
Son of
President of
Place of birth? President of? Lives in?
pageMachine Learning at Google 2015
KnowledgeVault
Triplets (entity, predicate, entity) + score.	

• (</m/02mjmr, /people/person/place_of_birth, 	

/m/02hrh0_>);	

• /m/02mjmr is the Freebase id for Barack Obama;	

• m/02hrh0_ is the id for Honolulu;	

• score is the KV probability of correctness.
17
pageMachine Learning at Google 2015
Text Extraction
Tag text: entity recognition, part of speech tagging,
dependency parsing, co-reference resolution, …	

Train relation extractors using distant supervision:	

• given predicate,“married to”:	

• find KG validated pairs for that predicate with this
predicate (BarackObama, MichelleObama),
(BillClinton, HillaryClinton);	

• find occurrences of the pairs, assuming they may
express the predicate;	

• build classifier to score the predicate between the
pairs based on the diverse occurrences.18
pageMachine Learning at Google 2015
Graph-Based Scores
Graph of entities	

Walk the graph to learn that two people that are
both ParentOf may be married. Binary classifier
based on evidence from different paths.
19
Barack!
Obama
Michele!
Obama
Married To
Natasha!
Obama
Malia Ann!
Obama
Parent of
Parent of
Parent of
19
Bill !
Clinton
Hillary !
Clinton
Married !
To?
Chelsea!
Clinton
Parent of
Parent of
Ted
Willke
Ted
Dunning
Josh !
Wills
Shon
Burton
Nikolaos
Vasiloglou Courtney!
Burton
Michael
Mossa
Danny
Bickson
pageMachine Learning at Google 2015
KnowledgeVault
Example of learned graph-paths for
‘went_to_college’
20
pageMachine Learning at Google 2015
KnowledgeVault
Train a classifier on the confidence scores from text-
and graph-based extraction	

• Target values:	

curated facts	

• AUC= 0.911
21
pageMachine Learning at Google 2015
Outline
WebTables	

• How do we find the good tables on the web?	

KnowledgeVault	

• How do we find other quality facts on the web?	

Lattice Regression	

• How do we form interpretable non-linear models?
22
pageMachine Learning at Google 2015
Lattice Regression
Interpolated look-up tables as function class
23
pageMachine Learning at Google 2015
Lattice Regression
Interpolated look-up tables as function class	

• low-dimensional functions	

• determine values at corners, linearly interpolate in
between:	

• for draft paper on monotonic lattice regression
see mayagupta.org
24
pageMachine Learning at Google 2015
Linear Interpolating in D Dimensions
Linear interpolating in 1 dimension	

In 2 dimensions
25
1
1
2
10 x
2
3 4
f(x) = 1 + ( 2 1)x = 1(1 x) + 2x
= 1
2
1 x
x
x
x1
x2
f(x) =
1
2
3
4
(1 x1)(1 x2)
x1(1 x2)
(1 x1)x2
x1x2
= 1(1 x2) + 3x2
2(1 x2) + 4x2
1 x1
x1
pageMachine Learning at Google 2015
Lattice Regression in D Dimensions
In dimension D the lattice has vertices	

Monotonicity:	

• D constraints per vertex, vertices	

• monotonicity constraints
26
2D
1 2
34
5
78
2 1 4 1 5 1
D2(D 1)
6
2D
f(x) = (x), , R2D
pageMachine Learning at Google 2015
Lattice Regression
Loss function	

• cost measured in Mean Squared Error	

• R is a regularizer	

• A represents all the constraints
27
argmin
m
i=1
l(yi, (xi)) + R( ), s.t. A b
pageMachine Learning at Google 2015
Lattice Regression
Examining the output function for a misclassified
point:	

• what score is it worth improving?
Feature j
f(x)
Classification!
threshold
Feature k
f(x)
Classification!
threshold
pageMachine Learning at Google 2015
Outline
WebTables	

• How do we find the good tables on the web?	

KnowledgeVault	

• How do we find other quality facts on the web?	

Lattice Regression	

• How do we form interpretable non-linear models?
29

More Related Content

Similar to Corinna Cortes, Head of Research, Google at MLconf NYC

Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainLouis Dorard
 
Integrating Content Strategy into Your User Experience (UX) Design Process
Integrating Content Strategy into Your User Experience (UX) Design ProcessIntegrating Content Strategy into Your User Experience (UX) Design Process
Integrating Content Strategy into Your User Experience (UX) Design ProcessMightybytes
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...e-dialog GmbH
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 
Software Development Practices.pdf
Software Development Practices.pdfSoftware Development Practices.pdf
Software Development Practices.pdfEzhumalai p
 
Google spreadsheets
Google spreadsheetsGoogle spreadsheets
Google spreadsheetsRaceel
 
Store, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged ConferenceStore, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged ConferenceAni Lopez
 
Visualising montioring and evaluation data
Visualising montioring and evaluation dataVisualising montioring and evaluation data
Visualising montioring and evaluation dataRob Worthington
 
How to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO StrategyHow to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO StrategySearch Engine Journal
 
Website essentials and analytics
Website essentials and analyticsWebsite essentials and analytics
Website essentials and analyticsRyan Jones
 

Similar to Corinna Cortes, Head of Research, Google at MLconf NYC (20)

Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML Spain
 
Table structured schema markup
Table structured schema markupTable structured schema markup
Table structured schema markup
 
Integrating Content Strategy into Your User Experience (UX) Design Process
Integrating Content Strategy into Your User Experience (UX) Design ProcessIntegrating Content Strategy into Your User Experience (UX) Design Process
Integrating Content Strategy into Your User Experience (UX) Design Process
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
 
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
Software Development Practices.pdf
Software Development Practices.pdfSoftware Development Practices.pdf
Software Development Practices.pdf
 
Google spreadsheets
Google spreadsheetsGoogle spreadsheets
Google spreadsheets
 
Store, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged ConferenceStore, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged Conference
 
Visualising montioring and evaluation data
Visualising montioring and evaluation dataVisualising montioring and evaluation data
Visualising montioring and evaluation data
 
How to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO StrategyHow to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO Strategy
 
Website essentials and analytics
Website essentials and analyticsWebsite essentials and analytics
Website essentials and analytics
 
Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Corinna Cortes, Head of Research, Google at MLconf NYC

  • 1. Probabilistic Fact-Finding at Scale Large-scale machine learning at Google Corinna Cortes Google Research corinna@google.com 1
  • 2. pageMachine Learning at Google 2015 Knowledge Panels a Commodity 2
  • 3. pageMachine Learning at Google 2015 3/3/2015: http://searchengineland.com/google-researchers- introduce-system-rank-web-pages-facts-not-links-215835 3
  • 4. pageMachine Learning at Google 2015 Structured Snippets [harder they come, harder they fall] [10467 usps] ! [amd a10 7850k specs] 4 Link Link Link
  • 5. pageMachine Learning at Google 2015 Structured Snippets Launched Q3, 2014 • http://googleresearch.blogspot.com/2014/09/ introducing-structured-snippets-now.html • 40+ online articles (US as well as international), including articles from Engadget, PC Magazine, Search Engine Land, … 5
  • 6. pageMachine Learning at Google 2015 Structured Snippets Highlights • “[...] makes Google smarter and means you'll have to do less clicking in some cases, so we're all for it.” -- TheVerge, 2014-09-23 • “[...] the results themselves have usually been skimpy; you've seen preview text, and that's about it.Thankfully, Google has made that sneak peek considerably more useful.” -- Engadget, 2014-09-23 • “[...] Now, Google is making search even more convenient.” -- PC Magazine, 2014-09-23 • “[...] show Google's ongoing emphasis on extracting structure and entities from unstructured content.” -- Search Engine Watch, 2014-09-24 • "[...] The big pro for users, of course, is that structured snippets make searching for specific facts easier and faster. Google’s search results are smarter and more detailed, so users will have to do less clicking to find the information they’re searching for." -- Mareeg Media, 2014-09-26 • … 6
  • 7. pageMachine Learning at Google 2015 Probabilistic Fact-Finding at Scale Knowledge databases of curated facts: • Freebase, https://www.freebase.com • Google Knowledge Graph, http://www.google.com/insidesearch/features/search/knowledge.html Web-scale probabilistic knowledge base that combines extractions from Web content 7
  • 8. pageMachine Learning at Google 2015 Outline WebTables • How do we find the good tables on the web? KnowledgeVault • How do we find other quality facts on the web? Lattice Regression • How do we form interpretable non-linear models? 8
  • 9. pageMachine Learning at Google 2015 Web Tables “Applying WebTables in Practice”, CIDR 2015 Sreeram Balakrishnan,Alon Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh,Warren Shen, Kenneth Wilder, Fei Wu, CongYu. March 5, 2014: 11 billion HTML tables reduced to 147 million quasi-relational Web tables, http://webdatacommons.org/webtables/ 9
  • 10. pageMachine Learning at Google 2015 Finding the ‘Good’ Tables Machine learning classifier • feature design • training example generation • model selection 10
  • 11. pageMachine Learning at Google 2015 Feature Design Complicating fact: the semantics of the table is often determined by the surrounding text; Detecting subject columns: many tables contain a subject column, the other columns contains properties of the subject: • Binary classifier, 94% accuracy; • Header row of other columns used as a schema. Determining possible column classes: • Annotate entries using the Google Knowledge Graph, (subject, predicate, value), pick the largest class(es). 11
  • 12. pageMachine Learning at Google 2015 Feature Design Determining properties of the subject column: • Captions may mention the properties • Query stream and text used to determine what properties appear with what classes.“Biperpedia: An Ontology for Search Applications”,VLDB 2014, Rahul Gupta,Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu. Additional structural features • number of rows and columns, mean/variance of the number of characters per cell, the fraction of non- empty cells, the fraction of cells <th> tags, and the number of distinct tokens in a table. 12
  • 13. pageMachine Learning at Google 2015 Machine Learning Training data: original 98% negative, 2% positive Heuristics: eliminate tables based on rules: • tiny tables (less than 3 rows and 2 columns), calendars, password tables, table-of-content tables; Simple classifier: filter the tables based on cheap features; A stratified sample of good and bad tables: • 35% positive, 65% negative. SVM classifier, multi-kernel learning, high accuracy. 13
  • 14. pageMachine Learning at Google 2015 Outline WebTables • How do we find the good tables on the web? KnowledgeVault • How do we find other quality facts on the web? Lattice Regression • How do we form interpretable non-linear models? 14
  • 15. pageMachine Learning at Google 2015 KnowledgeVault “Google Researchers Introduce System To Rank Web Pages On Facts, Not Links”, http://searchengineland.com/google-researchers-introduce-system-rank-web- pages-facts-not-links-215835 • “Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources”, http://arxiv.org/pdf/1502.03519v1.pdf • “KnowledgeVault:A Web-Scale Approach to Probabilistic Knowledge Fusion”, KDD 2014, http://dl.acm.org/citation.cfm?id=2623623 15
  • 16. pageMachine Learning at Google 2015 KnowledgeVault Combines noisy extractions + prior knowledge • Table facts • Text extractors with scores derived from KG • Graph-based paths with scores derived from KG • Applies learning: P(entity, predicate, entity) 16 Barack! Obama Honolulu Barack! Obama, Sr. USA Place of birth Son of President of Place of birth? President of? Lives in?
  • 17. pageMachine Learning at Google 2015 KnowledgeVault Triplets (entity, predicate, entity) + score. • (</m/02mjmr, /people/person/place_of_birth, /m/02hrh0_>); • /m/02mjmr is the Freebase id for Barack Obama; • m/02hrh0_ is the id for Honolulu; • score is the KV probability of correctness. 17
  • 18. pageMachine Learning at Google 2015 Text Extraction Tag text: entity recognition, part of speech tagging, dependency parsing, co-reference resolution, … Train relation extractors using distant supervision: • given predicate,“married to”: • find KG validated pairs for that predicate with this predicate (BarackObama, MichelleObama), (BillClinton, HillaryClinton); • find occurrences of the pairs, assuming they may express the predicate; • build classifier to score the predicate between the pairs based on the diverse occurrences.18
  • 19. pageMachine Learning at Google 2015 Graph-Based Scores Graph of entities Walk the graph to learn that two people that are both ParentOf may be married. Binary classifier based on evidence from different paths. 19 Barack! Obama Michele! Obama Married To Natasha! Obama Malia Ann! Obama Parent of Parent of Parent of 19 Bill ! Clinton Hillary ! Clinton Married ! To? Chelsea! Clinton Parent of Parent of Ted Willke Ted Dunning Josh ! Wills Shon Burton Nikolaos Vasiloglou Courtney! Burton Michael Mossa Danny Bickson
  • 20. pageMachine Learning at Google 2015 KnowledgeVault Example of learned graph-paths for ‘went_to_college’ 20
  • 21. pageMachine Learning at Google 2015 KnowledgeVault Train a classifier on the confidence scores from text- and graph-based extraction • Target values: curated facts • AUC= 0.911 21
  • 22. pageMachine Learning at Google 2015 Outline WebTables • How do we find the good tables on the web? KnowledgeVault • How do we find other quality facts on the web? Lattice Regression • How do we form interpretable non-linear models? 22
  • 23. pageMachine Learning at Google 2015 Lattice Regression Interpolated look-up tables as function class 23
  • 24. pageMachine Learning at Google 2015 Lattice Regression Interpolated look-up tables as function class • low-dimensional functions • determine values at corners, linearly interpolate in between: • for draft paper on monotonic lattice regression see mayagupta.org 24
  • 25. pageMachine Learning at Google 2015 Linear Interpolating in D Dimensions Linear interpolating in 1 dimension In 2 dimensions 25 1 1 2 10 x 2 3 4 f(x) = 1 + ( 2 1)x = 1(1 x) + 2x = 1 2 1 x x x x1 x2 f(x) = 1 2 3 4 (1 x1)(1 x2) x1(1 x2) (1 x1)x2 x1x2 = 1(1 x2) + 3x2 2(1 x2) + 4x2 1 x1 x1
  • 26. pageMachine Learning at Google 2015 Lattice Regression in D Dimensions In dimension D the lattice has vertices Monotonicity: • D constraints per vertex, vertices • monotonicity constraints 26 2D 1 2 34 5 78 2 1 4 1 5 1 D2(D 1) 6 2D f(x) = (x), , R2D
  • 27. pageMachine Learning at Google 2015 Lattice Regression Loss function • cost measured in Mean Squared Error • R is a regularizer • A represents all the constraints 27 argmin m i=1 l(yi, (xi)) + R( ), s.t. A b
  • 28. pageMachine Learning at Google 2015 Lattice Regression Examining the output function for a misclassified point: • what score is it worth improving? Feature j f(x) Classification! threshold Feature k f(x) Classification! threshold
  • 29. pageMachine Learning at Google 2015 Outline WebTables • How do we find the good tables on the web? KnowledgeVault • How do we find other quality facts on the web? Lattice Regression • How do we form interpretable non-linear models? 29