SlideShare a Scribd company logo
1 of 37
Download to read offline
1
© Searchmetrics. All rights reserved. Do not distribute without permission.
Enriching content with Knowledge Base
by Search Keywords and Wikidata
Fang Xu
f.xu@searchmetrics.com
@allxufang
2
© Searchmetrics. All rights reserved. Do not distribute without permission.
Data Science@Searchmetrics
Data driven search and content optimization marketing
• Learning from keywords
• Content optimization
• Data visualization
3
© Searchmetrics. All rights reserved. Do not distribute without permission.
Looooots of Data
• 120 Million Domains
• 600 Million Keywords
• 120 Billion Links
• 25,000 Billion Social Signals
• 25 PB raw data
4
© Searchmetrics. All rights reserved. Do not distribute without permission.
Authors submit content
ü Rate the content’s effectiveness
ü Feedback to optimize and enrich it
Content Production in Real-time
5
© Searchmetrics. All rights reserved. Do not distribute without permission.
Beyond keywords
• Keyword
• Typos
• Ambiguous
• Sparse
• Entity
• Augmented with
metadata
• Relations among entities
6
© Searchmetrics. All rights reserved. Do not distribute without permission.
Q64
Entity
7
© Searchmetrics. All rights reserved. Do not distribute without permission.
8
© Searchmetrics. All rights reserved. Do not distribute without permission.http://brendangriffen.com/blog/gow-programming-languages
Knowledge Base (KB)
9
© Searchmetrics. All rights reserved. Do not distribute without permission.
2001
2012
2014
2008
Knowledge vaults
2012
2005
KB Timeline
10
© Searchmetrics. All rights reserved. Do not distribute without permission.
• Free collaborative KB
• Continuous evolution
• Open multilingual Data
• mapping to other KBs
Why Wikidata
11
© Searchmetrics. All rights reserved. Do not distribute without permission.
Link content to KB
• Entity Linking -- free text to entities
• Blog posts
• Tweets
• Keywords
• User-generated Contents
• Entities from a knowledge base
• Wikipedia
• Wikidata
• Domain-specific KBs
12
© Searchmetrics. All rights reserved. Do not distribute without permission.
Image from Milne and Witten (2008b). Learning to Link with Wikipedia. In CIKM 2008
Entity Linking
13
© Searchmetrics. All rights reserved. Do not distribute without permission.
• Identify important keywords to link in the text
• Link to right entity
Main Problems
14
© Searchmetrics. All rights reserved. Do not distribute without permission.
Dictionary of keywords to KB entities
Search keyword mentions in text
15
© Searchmetrics. All rights reserved. Do not distribute without permission.
Keyword to wiki uris in top SERP
16
© Searchmetrics. All rights reserved. Do not distribute without permission.
Not all keywords are useful
Keyword Cleaning:
• Navigational or factual words
• Non-frequent words
• Non-latin letters
17
© Searchmetrics. All rights reserved. Do not distribute without permission.
Keyword Filtering:
• Starting or ending tokens
• Stopwords
• Part-of-speech tags
• Wikipedia popularity:
• popular wiki uris for one keyword
• Search popularity:
• popular keywords for one wiki uri
Not all keywords are useful
18
© Searchmetrics. All rights reserved. Do not distribute without permission.
Search Popularity Filtering
Keyword Search Popularity (Volume)
germany 268583
germany facts 4291
germany article 24
german encyclopedia 23
germany encyclopedia 19
germany t 18
ger many 16
19
© Searchmetrics. All rights reserved. Do not distribute without permission.
parse wikidata
dump & extract
entities as json
Entity data
{
entity: "Berlin",
Freebase Id: "/m/0156q",
OpenStreetMap Relation identifier: 62422,
alias: ["Berlin, Germany"],
capital of:
[ "Germany", "Kingdom of Prussia", "Weimar Republic",
"Brandenburg-Prussia", "Free State of Prussia", ... ],
contains administrative territorial entity:
[ "Mitte", "Friedrichshain-Kreuzberg", "Pankow",
"Charlottenburg-Wilmersdorf", "Spandau", "Steglitz-Zehlendorf",
"Tempelhof-Schöneberg", "Neukölln", "Treptow-Köpenick", ... ],
coordinate location:
[ {
altitude: null,
latitude: 52.516666666667,
longitude: 13.383333333333,
precision: 0.016666666666667
} ],
country: "Germany",
... ... }
20
© Searchmetrics. All rights reserved. Do not distribute without permission.
Link to the right Wikipedia entity
Word Sense Disambiguation
21
© Searchmetrics. All rights reserved. Do not distribute without permission.
d
Tree 92.82%
Tree (graph theory) 2.94%
Tree (data structure) 2.57%
Tree (set theory) 0.15%
Phylogenetic tree 0.07%
Christmas tree 0.07%
Binary tree 0.04%
Family tree 0.04%
… ...
Link to Most Common Entities
e ew
ew
L
L i
,
,
ew entity,textsurfacewith
LinksofNumber
Entity Wikipedia Commnoness
(Milne and Witten 2008b)
tree
22
© Searchmetrics. All rights reserved. Do not distribute without permission.
https://en.wikipedia.org/wiki/Tree_data_structure
https://en.wikipedia.org/wiki/Tree
Disambiguation
23
© Searchmetrics. All rights reserved. Do not distribute without permission.
Disambiguation using context
24
© Searchmetrics. All rights reserved. Do not distribute without permission.
• Build a Word2Vec model for Wikiepdia entity
• Calculate Word2Vec similarity to contextual entities
 
contextcontext
TreestructureTree_data_ )(similarity)(similarity
Entity Disambiguation
25
© Searchmetrics. All rights reserved. Do not distribute without permission.
Relatedness between Entities
26
© Searchmetrics. All rights reserved. Do not distribute without permission.
Image from Milne and Witten (2008a). An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links
Entity Relatedness
27
© Searchmetrics. All rights reserved. Do not distribute without permission.
• Jaccard similarity
• Word2Vec similarity of entity to context
ee
ee


andentitytolinksofUnion
andentitytolinksofonIntersecti
Relatedness Score
28
© Searchmetrics. All rights reserved. Do not distribute without permission.
Wikipedia Data Parsing
29
© Searchmetrics. All rights reserved. Do not distribute without permission.
Wikipedia Dump
'''Berlin''' is the [[Capital city|capital]] of [[Germany]] and one of its 16
[[states of Germany|states]]. With a population of approximately 3.5
million people,<ref name="Population" /> Berlin is the second [[Largest
cities of the European Union by population within city limits|most
populous city proper]] and the seventh [[Largest urban areas of the
European Union|most populous urban area]] in the [[European Union]].
30
© Searchmetrics. All rights reserved. Do not distribute without permission.
Wikipedia Article as Json
31
© Searchmetrics. All rights reserved. Do not distribute without permission.
Word2Vector Training
• Collection of plain article text
... ...
can4linux ||open_source|| ||controller_area_network|| ||linux_kernel||
||device_driver||
development started 1990s philips 82c200 controller stand chip
1995 version created bus linux laboratory automation project linux lab project
||freie_universität_berlin||
nxp sja1000 successor supported controller philips 82c200 intel 82527
development powerful ||microcontroller||s integrated controllers capable
... ...
32
© Searchmetrics. All rights reserved. Do not distribute without permission.
Linking vectors
• Pairs of uri, annotations
outlink vector [Capital_City, Germany , States_of_Germany, European_Union,
Spree, Havel, Berlin-Brandenburg_Metropolitan_Region, ... ... ]
inlink vector [Germany, Prussia, Berlin_Wall, Albert_Einstein, Kosmos_(Berlin),
Berlin_International_Film_Festival, .. .. ]
33
© Searchmetrics. All rights reserved. Do not distribute without permission.
Wikipedia Popularity
• Aggregation of annotations
Surface text Wiki entity Popularity
United States United_States 174338
World War II World_War_II 106483
India India 95966
France France 94666
American United_States 85976
Iran Iran 83249
Australia Australia 76655
Germany Germany 76384
34
© Searchmetrics. All rights reserved. Do not distribute without permission.
Overall System
Keyword
Database
Keyword
Processing
Parser
User
Content
Keyword
Matching
Disam-
biguation
Relatedness
calculation Result
Wikipedia
Popularity
Entity Linking API
Wiki
Parser
W2V
Model
Wiki
LinksKeyword
to KB
entities
35
© Searchmetrics. All rights reserved. Do not distribute without permission.
• https://github.com/piskvorky/gensim
• https://github.com/jodaiber/Annotated-WikiExtractor
• https://dumps.wikimedia.org/
• https://dumps.wikimedia.org/wikidatawiki/entities/
Resources
36
© Searchmetrics. All rights reserved. Do not distribute without permission.
Thank you
37
© Searchmetrics. All rights reserved. Do not distribute without permission.
Questions?
f.xu@searchmetrics.com
We are hiring

More Related Content

Viewers also liked

Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014PyData
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischPyData
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...PyData
 
Python resampling
Python resamplingPython resampling
Python resamplingPyData
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipyPyData
 
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPyData
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebookPyData
 
Large scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlLarge scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlPyData
 

Viewers also liked (12)

Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...
 
Python resampling
Python resamplingPython resampling
Python resampling
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipy
 
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebook
 
Large scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlLarge scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartl
 

Similar to Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata

Visualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data ConferenceVisualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data Conferencesredmore
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigmJonathan Challener
 
Webinar - Maximize Your Library Technology - 2016-05-24
Webinar - Maximize Your Library Technology - 2016-05-24Webinar - Maximize Your Library Technology - 2016-05-24
Webinar - Maximize Your Library Technology - 2016-05-24TechSoup
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoGilles Legoux
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsJie Bao
 
AI Is My Co-Pilot - DevWeek17
AI Is My Co-Pilot - DevWeek17AI Is My Co-Pilot - DevWeek17
AI Is My Co-Pilot - DevWeek17Builtio
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionJesse Wang
 
Chunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesChunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesZilliz
 
Week 5 - Interactive News Editing and Producing
Week 5 - Interactive News Editing and ProducingWeek 5 - Interactive News Editing and Producing
Week 5 - Interactive News Editing and Producingkurtgessler
 
7 Things Your Nonprofit Can Do to Get the Most out of Your Website in 2020
7 Things Your Nonprofit Can Do to Get the Most out of Your Website in 20207 Things Your Nonprofit Can Do to Get the Most out of Your Website in 2020
7 Things Your Nonprofit Can Do to Get the Most out of Your Website in 2020TechSoup
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرنمركز البحوث الأقسام العلمية
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyMark Rittman
 
ICSE 2017 Keynote: Open Collaboration at Eclipse
ICSE 2017 Keynote: Open Collaboration at EclipseICSE 2017 Keynote: Open Collaboration at Eclipse
ICSE 2017 Keynote: Open Collaboration at EclipseMike Milinkovich
 
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...HostedbyConfluent
 
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdfNeo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdfNeo4j
 

Similar to Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata (20)

Visualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data ConferenceVisualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data Conference
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Bigowl aitech
Bigowl aitechBigowl aitech
Bigowl aitech
 
Webinar - Maximize Your Library Technology - 2016-05-24
Webinar - Maximize Your Library Technology - 2016-05-24Webinar - Maximize Your Library Technology - 2016-05-24
Webinar - Maximize Your Library Technology - 2016-05-24
 
Tech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @CriteoTech Job Conference: Software Engineer @Criteo
Tech Job Conference: Software Engineer @Criteo
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
AI Is My Co-Pilot - DevWeek17
AI Is My Co-Pilot - DevWeek17AI Is My Co-Pilot - DevWeek17
AI Is My Co-Pilot - DevWeek17
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in Action
 
Chunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesChunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector Databases
 
Week 5 - Interactive News Editing and Producing
Week 5 - Interactive News Editing and ProducingWeek 5 - Interactive News Editing and Producing
Week 5 - Interactive News Editing and Producing
 
7 Things Your Nonprofit Can Do to Get the Most out of Your Website in 2020
7 Things Your Nonprofit Can Do to Get the Most out of Your Website in 20207 Things Your Nonprofit Can Do to Get the Most out of Your Website in 2020
7 Things Your Nonprofit Can Do to Get the Most out of Your Website in 2020
 
T presentation
T presentationT presentation
T presentation
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
 
Latest technology trends Microsoft
Latest technology trends MicrosoftLatest technology trends Microsoft
Latest technology trends Microsoft
 
ICSE 2017 Keynote: Open Collaboration at Eclipse
ICSE 2017 Keynote: Open Collaboration at EclipseICSE 2017 Keynote: Open Collaboration at Eclipse
ICSE 2017 Keynote: Open Collaboration at Eclipse
 
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
 
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdfNeo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 

Recently uploaded (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata

  • 1. 1 © Searchmetrics. All rights reserved. Do not distribute without permission. Enriching content with Knowledge Base by Search Keywords and Wikidata Fang Xu f.xu@searchmetrics.com @allxufang
  • 2. 2 © Searchmetrics. All rights reserved. Do not distribute without permission. Data Science@Searchmetrics Data driven search and content optimization marketing • Learning from keywords • Content optimization • Data visualization
  • 3. 3 © Searchmetrics. All rights reserved. Do not distribute without permission. Looooots of Data • 120 Million Domains • 600 Million Keywords • 120 Billion Links • 25,000 Billion Social Signals • 25 PB raw data
  • 4. 4 © Searchmetrics. All rights reserved. Do not distribute without permission. Authors submit content ü Rate the content’s effectiveness ü Feedback to optimize and enrich it Content Production in Real-time
  • 5. 5 © Searchmetrics. All rights reserved. Do not distribute without permission. Beyond keywords • Keyword • Typos • Ambiguous • Sparse • Entity • Augmented with metadata • Relations among entities
  • 6. 6 © Searchmetrics. All rights reserved. Do not distribute without permission. Q64 Entity
  • 7. 7 © Searchmetrics. All rights reserved. Do not distribute without permission.
  • 8. 8 © Searchmetrics. All rights reserved. Do not distribute without permission.http://brendangriffen.com/blog/gow-programming-languages Knowledge Base (KB)
  • 9. 9 © Searchmetrics. All rights reserved. Do not distribute without permission. 2001 2012 2014 2008 Knowledge vaults 2012 2005 KB Timeline
  • 10. 10 © Searchmetrics. All rights reserved. Do not distribute without permission. • Free collaborative KB • Continuous evolution • Open multilingual Data • mapping to other KBs Why Wikidata
  • 11. 11 © Searchmetrics. All rights reserved. Do not distribute without permission. Link content to KB • Entity Linking -- free text to entities • Blog posts • Tweets • Keywords • User-generated Contents • Entities from a knowledge base • Wikipedia • Wikidata • Domain-specific KBs
  • 12. 12 © Searchmetrics. All rights reserved. Do not distribute without permission. Image from Milne and Witten (2008b). Learning to Link with Wikipedia. In CIKM 2008 Entity Linking
  • 13. 13 © Searchmetrics. All rights reserved. Do not distribute without permission. • Identify important keywords to link in the text • Link to right entity Main Problems
  • 14. 14 © Searchmetrics. All rights reserved. Do not distribute without permission. Dictionary of keywords to KB entities Search keyword mentions in text
  • 15. 15 © Searchmetrics. All rights reserved. Do not distribute without permission. Keyword to wiki uris in top SERP
  • 16. 16 © Searchmetrics. All rights reserved. Do not distribute without permission. Not all keywords are useful Keyword Cleaning: • Navigational or factual words • Non-frequent words • Non-latin letters
  • 17. 17 © Searchmetrics. All rights reserved. Do not distribute without permission. Keyword Filtering: • Starting or ending tokens • Stopwords • Part-of-speech tags • Wikipedia popularity: • popular wiki uris for one keyword • Search popularity: • popular keywords for one wiki uri Not all keywords are useful
  • 18. 18 © Searchmetrics. All rights reserved. Do not distribute without permission. Search Popularity Filtering Keyword Search Popularity (Volume) germany 268583 germany facts 4291 germany article 24 german encyclopedia 23 germany encyclopedia 19 germany t 18 ger many 16
  • 19. 19 © Searchmetrics. All rights reserved. Do not distribute without permission. parse wikidata dump & extract entities as json Entity data { entity: "Berlin", Freebase Id: "/m/0156q", OpenStreetMap Relation identifier: 62422, alias: ["Berlin, Germany"], capital of: [ "Germany", "Kingdom of Prussia", "Weimar Republic", "Brandenburg-Prussia", "Free State of Prussia", ... ], contains administrative territorial entity: [ "Mitte", "Friedrichshain-Kreuzberg", "Pankow", "Charlottenburg-Wilmersdorf", "Spandau", "Steglitz-Zehlendorf", "Tempelhof-Schöneberg", "Neukölln", "Treptow-Köpenick", ... ], coordinate location: [ { altitude: null, latitude: 52.516666666667, longitude: 13.383333333333, precision: 0.016666666666667 } ], country: "Germany", ... ... }
  • 20. 20 © Searchmetrics. All rights reserved. Do not distribute without permission. Link to the right Wikipedia entity Word Sense Disambiguation
  • 21. 21 © Searchmetrics. All rights reserved. Do not distribute without permission. d Tree 92.82% Tree (graph theory) 2.94% Tree (data structure) 2.57% Tree (set theory) 0.15% Phylogenetic tree 0.07% Christmas tree 0.07% Binary tree 0.04% Family tree 0.04% … ... Link to Most Common Entities e ew ew L L i , , ew entity,textsurfacewith LinksofNumber Entity Wikipedia Commnoness (Milne and Witten 2008b) tree
  • 22. 22 © Searchmetrics. All rights reserved. Do not distribute without permission. https://en.wikipedia.org/wiki/Tree_data_structure https://en.wikipedia.org/wiki/Tree Disambiguation
  • 23. 23 © Searchmetrics. All rights reserved. Do not distribute without permission. Disambiguation using context
  • 24. 24 © Searchmetrics. All rights reserved. Do not distribute without permission. • Build a Word2Vec model for Wikiepdia entity • Calculate Word2Vec similarity to contextual entities   contextcontext TreestructureTree_data_ )(similarity)(similarity Entity Disambiguation
  • 25. 25 © Searchmetrics. All rights reserved. Do not distribute without permission. Relatedness between Entities
  • 26. 26 © Searchmetrics. All rights reserved. Do not distribute without permission. Image from Milne and Witten (2008a). An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links Entity Relatedness
  • 27. 27 © Searchmetrics. All rights reserved. Do not distribute without permission. • Jaccard similarity • Word2Vec similarity of entity to context ee ee   andentitytolinksofUnion andentitytolinksofonIntersecti Relatedness Score
  • 28. 28 © Searchmetrics. All rights reserved. Do not distribute without permission. Wikipedia Data Parsing
  • 29. 29 © Searchmetrics. All rights reserved. Do not distribute without permission. Wikipedia Dump '''Berlin''' is the [[Capital city|capital]] of [[Germany]] and one of its 16 [[states of Germany|states]]. With a population of approximately 3.5 million people,<ref name="Population" /> Berlin is the second [[Largest cities of the European Union by population within city limits|most populous city proper]] and the seventh [[Largest urban areas of the European Union|most populous urban area]] in the [[European Union]].
  • 30. 30 © Searchmetrics. All rights reserved. Do not distribute without permission. Wikipedia Article as Json
  • 31. 31 © Searchmetrics. All rights reserved. Do not distribute without permission. Word2Vector Training • Collection of plain article text ... ... can4linux ||open_source|| ||controller_area_network|| ||linux_kernel|| ||device_driver|| development started 1990s philips 82c200 controller stand chip 1995 version created bus linux laboratory automation project linux lab project ||freie_universität_berlin|| nxp sja1000 successor supported controller philips 82c200 intel 82527 development powerful ||microcontroller||s integrated controllers capable ... ...
  • 32. 32 © Searchmetrics. All rights reserved. Do not distribute without permission. Linking vectors • Pairs of uri, annotations outlink vector [Capital_City, Germany , States_of_Germany, European_Union, Spree, Havel, Berlin-Brandenburg_Metropolitan_Region, ... ... ] inlink vector [Germany, Prussia, Berlin_Wall, Albert_Einstein, Kosmos_(Berlin), Berlin_International_Film_Festival, .. .. ]
  • 33. 33 © Searchmetrics. All rights reserved. Do not distribute without permission. Wikipedia Popularity • Aggregation of annotations Surface text Wiki entity Popularity United States United_States 174338 World War II World_War_II 106483 India India 95966 France France 94666 American United_States 85976 Iran Iran 83249 Australia Australia 76655 Germany Germany 76384
  • 34. 34 © Searchmetrics. All rights reserved. Do not distribute without permission. Overall System Keyword Database Keyword Processing Parser User Content Keyword Matching Disam- biguation Relatedness calculation Result Wikipedia Popularity Entity Linking API Wiki Parser W2V Model Wiki LinksKeyword to KB entities
  • 35. 35 © Searchmetrics. All rights reserved. Do not distribute without permission. • https://github.com/piskvorky/gensim • https://github.com/jodaiber/Annotated-WikiExtractor • https://dumps.wikimedia.org/ • https://dumps.wikimedia.org/wikidatawiki/entities/ Resources
  • 36. 36 © Searchmetrics. All rights reserved. Do not distribute without permission. Thank you
  • 37. 37 © Searchmetrics. All rights reserved. Do not distribute without permission. Questions? f.xu@searchmetrics.com We are hiring