SlideShare a Scribd company logo
DBpedia SpotlightShedding Light on the Web of Documents Pablo N. Mendes, Max Jakob, Andrés Garcia-Silva, Christian Bizer pablo.mendes@fu-berlin.de I-SEMANTICS, Graz, Austria September 9th 2011 1
Agenda What is text annotation? What can you build with it? Why is it difficult? How did we approach the challenge? How well did it work? What are the next steps? 2 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
What is it? 3
Text Annotation From: To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.  (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.  http://dbpedia.org/resource/New_York_City http://dbpedia.org/resource/Apple_Corps 4 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Challenge: Term Ambiguity 5 ...this apple on the palm of my hand... ...Apple tried to acquire Palm Inc.... ...eating an apple sitted by a palm tree... What do “apple” and “palm” mean in each case? Our objective is to recognize entities and disambiguate their meaning, generating DBpedia annotation in text. Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
What can you do with annotations? Links to complementary information “More about this” Faceted browsing of blog posts Show only posts with topics related to Sports Rich snippets on Google Search engines start to display info from annotations More expressive filtering of information streams Twarql (entry at I-SEMANTICS 2010 Challenge) 6 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Rich Snippets Search Engines already benefit from some kinds of annotations 7 http://www.google.com/webmasters/tools/richsnippets Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Twarql Example Use Case What competitors of my product are being mentioned with my product on Twitter? - comparative opinion! SELECT ? competitor WHERE { dbpedia:IPadskos:subject 	?category .   ?competitor 	skos:subject 	?category .   ?tweet 		moat:taggedWith 	?competitor . } ?tweet 		moat:taggedWithdbpedia:Ipad . 8 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Twarql Example Use Case (2) Incoming microposts… Background Knowledge (e.g. DBpedia) @anonymized Loremipsumblabla this is an example tweet dbpedia:IPad skos:subject ?category ?category ?competitor skos:subject skos:subject moat:taggedWith Competition is modeled as two products  in the same category in DBpedia ?tweet 9 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Twarql Example Use Case (3) Incoming microposts… Background Knowledge (e.g. DBpedia) @anonymized Loremipsumblabla this is an example tweet category:Wi-Fi dbpedia:IPad category:Touchscreen skos:subject ?category ?category ?competitor skos:subject skos:subject moat:taggedWith Background knowledge is dynamically “brought into” microposts. ?tweet 10 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Twarql Example Use Case (4) Background Knowledge (e.g. DBpedia) @anonymized Loremipsumblabla this is an example tweet category:Wi-Fi dbpedia:IPad category:Touchscreen skos:subject ?category ?category ?competitor skos:subject skos:subject moat:taggedWith ?tweet Trigger action if micropost matches constraints. 11 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
DBpedia Spotlight DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages  Learns how to recognize that a DBpedia resource was mentioned Given plain text as input, generates annotated text 12 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Why is it difficult? 13
Dataset overview Volume of Wikipedia 56,9 GB in raw text data Occurrences of Ambiguous Terms in Wikipedia: 58.8% Sparsity: less data for some DBpedia resources 14 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Histogram: URI occurrences Many “rare” URIs,  (few links on Wikipedia) Most of previous work deals with these entities: People, Organization, Location Few “popular” URIs (lots of links on Wikipedia) log(n(uri)))) 15 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Histogram: Surface Form Ambiguity Many “unambiguous” surface forms Max: 1199 (log=7.08) Min: 1 Mean: 1.328949 Few very “ambiguous” surface forms log(n(uri,sf)))) 16 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Ambiguity 17 What are the most ambiguous surface forms? Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Name Variation 18 What are the URIs with many surface forms? Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
How did we approach the challenge? 19
A 4-stage approach Spotting Candidate Mapping Disambiguation Linking 20 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Stage 1: Spotting Find substrings that seem worthy of annotation Naïve implementation (impractical) all n-grams of length (1,|text|) Input: (…) Upon their return, Lennon and McCartney went to New York  to announce the formation of Apple Corps.  Output: “Lennon”, “McCartney”, “New York”, “Apple Corps” 21 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Spotting in DBpedia Spotlight Detect that the label (surface form) of a DBpedia Resource was mentioned Lexicalized, Aho-Corasick algorithm (LingPipe) Name variations from redirects, disambiguation pages, anchor texts Advantages:  Simple implementation, well studied problem, Produces a reduced set of spots,  Relies on user provided terms. Drawback:  high memory requirements (~7G) 22 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Stage 2: Candidate Mapping What are the possible senses of a given surface form (the candidate DBpedia resources)? Input: “Lennon”, “McCartney”, “New York”, “Apple Corps” Output: “Lennon”: { Lennon_(album), Lennon,_Michigan, … } “McCartney”: { McCartney(surname), Paul_McCartney, … } “New York”: { New_York_State, New_York_City, … } “Apple Corps”: { Apple_Corps} 23 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Candidate Mapping in DBpedia Spotlight Sources of mappings between surface forms and DBpedia Resources Page titles offer “chosen names” for resources Redirects offer alternative spellings, aliases, etc. Disambiguation Pages: link a common term to many resources 24 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Candidate Map: Disambiguation Pages Collectively provide a list of ambiguous terms and meanings for each 25 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Candidate Map: Redirects AAPL Apple (Company) Apple (Computers) Apple (company) Apple (computer) Apple Company Apple Computer Apple Computer Co. Apple Computer Inc. Apple Computer Incorporated Apple Computer, Inc Apple Computer, Inc. Apple Computers Apple Inc Apple Incorporate Apple Incorporated Apple India Apple comp Apple compputer Apple computer Apple computer Inc Apple computers Apple inc Apple inc. Apple incoporated Apple incorporated Apple pc Apple's Apple, Inc Apple, Inc. Apple,inc. Apple.com AppleComputer Bowman Bank Cripple Inc. Inc. Apple Computer Jobs and Wozniak Option-Shift-K  Inc. 26 Apple_Inc Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Stage 3: Disambiguation Select the correct candidate DBpedia Resource for a given surface form. Decision is made based on the context(1) the surface form was mentioned con·text  (kntkst)n. 1. the parts of a discourse that surround a word or passage and can throw light on its meaning 2. The circumstances in which an event occurs; a setting. 27 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents http://mw1.merriam-webster.com/dictionary/context
Learning the Context for a resource Collect context for DBpedia Resources from Wikipedia Types of context Wikipedia Pages  Definitions from disambiguation pages Paragraphs that link to resources 28 (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.  Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Disambiguation in DBpedia Spotlight Model DBpedia Resources as vectors of terms found in Wikipedia text Define functions for term scoring and vector similarity (e.g. frequency and cosine) Rank candidate resource vectors based on their similarity with vector of input text Choose highest ranking candidate 29 Lennon = {Beatles,McCartney,rock,guitar,...} Lennon = {tf(Beatles)=320,tf(McCartney)=100,...} Cos(Input,Lennon) = 0.12 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Scoring Strategies TF*IDF (Term Freq. * Inverse Doc. Freq.) TF: insight into the relevance of the term in the context of a DBpedia Resource IDF: insight into the rarity of the term. Co-occurrence of rare terms is more informative ICF: Inverse Candidate Frequency IDF is the “rarity” in the entire Wikipedia ICF is the rarity of a word with relation to the possible senses only 30 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Context-Independent Strategies NAÏVE Use surface form to build URI: “berlin” -> dbpedia:Berlin PROMINENCE P(u): n(u) / N (what is the ‘popularity’/importance of this URL) n(u): number of times URI u occurred N: total number of occurrences Intuition: URIs that have appeared a lot are more likely to appear again DEFAULT SENSE P(u|s): n(u,s) / n(s) n(u,s): number of times URI u occurred with surface form s Intuition: some surface forms are strongly associated to some specific URIs 31 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Linking (Configuration) Decide which spots to annotate with links to the disambiguated resources Different use cases have different needs Only annotate prominent resources? Only if you’re sure disambiguation is correct? Only people? Only things related to Berlin? 32 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Linking in DBpedia Spotlight Can be configured based on: Thresholds Confidence Prominence (support) Whitelist or Blacklist of types Hide all people, Show only organizations Complex definition of a “type” through a SPARQL query. 33 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
How  well did it work? 34
Evaluation: Disambiguation Used held out (unseen) Wikipedia occurrences as test data Evaluates accuracy of disambiguation stage Baselines Random: performs well with low ambiguity Default Sense: only prominence, without context Default Similarity (TF*IDF) : Lucene implementation 35 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Disambiguation Evaluation Results 36 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Evaluation: Annotation News text, different topics Hand-annotated examples by 4 annotators Gold standard from agreement	 Evaluates precision and recall of annotations. 37 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Annotation Evaluation Results (2) 38 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Annotation Evaluation Results 39 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
Conclusions DBpedia Spotlight: a configurable annotation tool to support a variety of use cases Very simple methods work surprisingly well for disambiguation More work is needed to alleviate sparsity Most challenging step is linking More evaluation on larger annotation datasets is needed 40 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
What are the next steps? 41
A preview of next release CORS-enabled + jQuery client One line to annotate any web page: A new demo interface: based on the plugin Types: DBpedia 3.7, Freebase, Schema.org New configuration parameters E.g. perform smarter spotting Easier install: maven2, jar, debian package 42 $(“div”).annotate() Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents

More Related Content

What's hot

Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
Neo4j
 
CSS
CSSCSS
HTML/CSS Crash Course (april 4 2017)
HTML/CSS Crash Course (april 4 2017)HTML/CSS Crash Course (april 4 2017)
HTML/CSS Crash Course (april 4 2017)
Daniel Friedman
 
Modern JS with ES6
Modern JS with ES6Modern JS with ES6
Modern JS with ES6
Kevin Langley Jr.
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
Thomas Francart
 
Model Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresModel Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON Structures
Markus Lanthaler
 
Linux intro 4 awk + makefile
Linux intro 4  awk + makefileLinux intro 4  awk + makefile
Linux intro 4 awk + makefile
Giovanni Marco Dall'Olio
 
About Best friends - HTML, CSS and JS
About Best friends - HTML, CSS and JSAbout Best friends - HTML, CSS and JS
About Best friends - HTML, CSS and JS
Naga Harish M
 
Modernizing Web Apps with .NET 6.pptx
Modernizing Web Apps with .NET 6.pptxModernizing Web Apps with .NET 6.pptx
Modernizing Web Apps with .NET 6.pptx
Ed Charbeneau
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
Chengjen Lee
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
4Science
 
Web design - Working with Links and Images
Web design - Working with Links and ImagesWeb design - Working with Links and Images
Web design - Working with Links and Images
Mustafa Kamel Mohammadi
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
Mariano Rodriguez-Muro
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
Fabien Gandon
 
Or2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflowOr2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflow
4Science
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
Jose Emilio Labra Gayo
 
Oracle ADF Overview for Beginners
Oracle ADF Overview for BeginnersOracle ADF Overview for Beginners
Oracle ADF Overview for Beginners
Jithin Kuriakose
 
Introduction to JSON
Introduction to JSONIntroduction to JSON
Introduction to JSON
Kanda Runapongsa Saikaew
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
Introduction to HTML5
Introduction to HTML5Introduction to HTML5
Introduction to HTML5
IT Geeks
 

What's hot (20)

Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
CSS
CSSCSS
CSS
 
HTML/CSS Crash Course (april 4 2017)
HTML/CSS Crash Course (april 4 2017)HTML/CSS Crash Course (april 4 2017)
HTML/CSS Crash Course (april 4 2017)
 
Modern JS with ES6
Modern JS with ES6Modern JS with ES6
Modern JS with ES6
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
 
Model Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresModel Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON Structures
 
Linux intro 4 awk + makefile
Linux intro 4  awk + makefileLinux intro 4  awk + makefile
Linux intro 4 awk + makefile
 
About Best friends - HTML, CSS and JS
About Best friends - HTML, CSS and JSAbout Best friends - HTML, CSS and JS
About Best friends - HTML, CSS and JS
 
Modernizing Web Apps with .NET 6.pptx
Modernizing Web Apps with .NET 6.pptxModernizing Web Apps with .NET 6.pptx
Modernizing Web Apps with .NET 6.pptx
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
Web design - Working with Links and Images
Web design - Working with Links and ImagesWeb design - Working with Links and Images
Web design - Working with Links and Images
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Or2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflowOr2019 DSpace 7 Enhanced submission & workflow
Or2019 DSpace 7 Enhanced submission & workflow
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Oracle ADF Overview for Beginners
Oracle ADF Overview for BeginnersOracle ADF Overview for Beginners
Oracle ADF Overview for Beginners
 
Introduction to JSON
Introduction to JSONIntroduction to JSON
Introduction to JSON
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Introduction to HTML5
Introduction to HTML5Introduction to HTML5
Introduction to HTML5
 

Viewers also liked

A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
Pablo Mendes
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Creating Knowledge out of Interlinked Data
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
Auro Tripathy
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
krisztianbalog
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiHow to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
Social Media Camp
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
Auro Tripathy
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
Sandy Ryza
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
Syntactic Analysis
Syntactic AnalysisSyntactic Analysis
Syntactic AnalysisAleli Lac
 
Semantics: Seven types of meaning
Semantics: Seven types of meaningSemantics: Seven types of meaning
Semantics: Seven types of meaning
Miftadia Laula
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
Giuseppe Rizzo
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
Erik Bernhardsson
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
m_ackermann
 

Viewers also liked (15)

A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiHow to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Syntactic Analysis
Syntactic AnalysisSyntactic Analysis
Syntactic Analysis
 
Semantics: Seven types of meaning
Semantics: Seven types of meaningSemantics: Seven types of meaning
Semantics: Seven types of meaning
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 
Semantics
SemanticsSemantics
Semantics
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 

Similar to DBpedia Spotlight at I-SEMANTICS 2011

The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)
Rensselaer Polytechnic Institute
 
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of DataCapturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of DataAndriy Nikolov
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Artificial Intelligence Institute at UofSC
 
Filtering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open DataFiltering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open Data
ebrahim_bagheri
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
Fabien Gandon
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
National Institute of Informatics (NII)
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
Frank van Harmelen
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10
Houw Liong The
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpedia
Samantha Lam
 
MongoDB Workshop
MongoDB WorkshopMongoDB Workshop
MongoDB Workshop
eagerdeveloper
 
Mongodbworkshop I: get started
Mongodbworkshop I: get startedMongodbworkshop I: get started
Mongodbworkshop I: get started
Vivian S. Zhang
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
Stefan Dietze
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
Marc Smith
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
Rensselaer Polytechnic Institute
 
Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
Dan Brickley
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
Jakob .
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Jennifer Bowen
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
Sakthivel C R
 

Similar to DBpedia Spotlight at I-SEMANTICS 2011 (20)

ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)
 
The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)
 
Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of DataCapturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
Filtering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open DataFiltering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open Data
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpedia
 
MongoDB Workshop
MongoDB WorkshopMongoDB Workshop
MongoDB Workshop
 
Mongodbworkshop I: get started
Mongodbworkshop I: get startedMongodbworkshop I: get started
Mongodbworkshop I: get started
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
 
Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 

More from Pablo Mendes

Entity Aware Click Graph
Entity Aware Click GraphEntity Aware Click Graph
Entity Aware Click Graph
Pablo Mendes
 
WWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesWWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL Queries
Pablo Mendes
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
Pablo Mendes
 
Ligado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 WorkshopLigado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 Workshop
Pablo Mendes
 
SMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning TalkSMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning TalkPablo Mendes
 
Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011
Pablo Mendes
 
Cuebee Architecture
Cuebee ArchitectureCuebee Architecture
Cuebee Architecture
Pablo Mendes
 
Twarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsTwarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated Tweets
Pablo Mendes
 
Dynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebDynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data Web
Pablo Mendes
 

More from Pablo Mendes (9)

Entity Aware Click Graph
Entity Aware Click GraphEntity Aware Click Graph
Entity Aware Click Graph
 
WWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesWWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL Queries
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Ligado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 WorkshopLigado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 Workshop
 
SMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning TalkSMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning Talk
 
Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011
 
Cuebee Architecture
Cuebee ArchitectureCuebee Architecture
Cuebee Architecture
 
Twarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsTwarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated Tweets
 
Dynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebDynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data Web
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 

DBpedia Spotlight at I-SEMANTICS 2011

  • 1. DBpedia SpotlightShedding Light on the Web of Documents Pablo N. Mendes, Max Jakob, Andrés Garcia-Silva, Christian Bizer pablo.mendes@fu-berlin.de I-SEMANTICS, Graz, Austria September 9th 2011 1
  • 2. Agenda What is text annotation? What can you build with it? Why is it difficult? How did we approach the challenge? How well did it work? What are the next steps? 2 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 4. Text Annotation From: To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. http://dbpedia.org/resource/New_York_City http://dbpedia.org/resource/Apple_Corps 4 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 5. Challenge: Term Ambiguity 5 ...this apple on the palm of my hand... ...Apple tried to acquire Palm Inc.... ...eating an apple sitted by a palm tree... What do “apple” and “palm” mean in each case? Our objective is to recognize entities and disambiguate their meaning, generating DBpedia annotation in text. Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 6. What can you do with annotations? Links to complementary information “More about this” Faceted browsing of blog posts Show only posts with topics related to Sports Rich snippets on Google Search engines start to display info from annotations More expressive filtering of information streams Twarql (entry at I-SEMANTICS 2010 Challenge) 6 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 7. Rich Snippets Search Engines already benefit from some kinds of annotations 7 http://www.google.com/webmasters/tools/richsnippets Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 8. Twarql Example Use Case What competitors of my product are being mentioned with my product on Twitter? - comparative opinion! SELECT ? competitor WHERE { dbpedia:IPadskos:subject ?category . ?competitor skos:subject ?category . ?tweet moat:taggedWith ?competitor . } ?tweet moat:taggedWithdbpedia:Ipad . 8 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 9. Twarql Example Use Case (2) Incoming microposts… Background Knowledge (e.g. DBpedia) @anonymized Loremipsumblabla this is an example tweet dbpedia:IPad skos:subject ?category ?category ?competitor skos:subject skos:subject moat:taggedWith Competition is modeled as two products in the same category in DBpedia ?tweet 9 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 10. Twarql Example Use Case (3) Incoming microposts… Background Knowledge (e.g. DBpedia) @anonymized Loremipsumblabla this is an example tweet category:Wi-Fi dbpedia:IPad category:Touchscreen skos:subject ?category ?category ?competitor skos:subject skos:subject moat:taggedWith Background knowledge is dynamically “brought into” microposts. ?tweet 10 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 11. Twarql Example Use Case (4) Background Knowledge (e.g. DBpedia) @anonymized Loremipsumblabla this is an example tweet category:Wi-Fi dbpedia:IPad category:Touchscreen skos:subject ?category ?category ?competitor skos:subject skos:subject moat:taggedWith ?tweet Trigger action if micropost matches constraints. 11 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 12. DBpedia Spotlight DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages Learns how to recognize that a DBpedia resource was mentioned Given plain text as input, generates annotated text 12 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 13. Why is it difficult? 13
  • 14. Dataset overview Volume of Wikipedia 56,9 GB in raw text data Occurrences of Ambiguous Terms in Wikipedia: 58.8% Sparsity: less data for some DBpedia resources 14 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 15. Histogram: URI occurrences Many “rare” URIs, (few links on Wikipedia) Most of previous work deals with these entities: People, Organization, Location Few “popular” URIs (lots of links on Wikipedia) log(n(uri)))) 15 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 16. Histogram: Surface Form Ambiguity Many “unambiguous” surface forms Max: 1199 (log=7.08) Min: 1 Mean: 1.328949 Few very “ambiguous” surface forms log(n(uri,sf)))) 16 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 17. Ambiguity 17 What are the most ambiguous surface forms? Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 18. Name Variation 18 What are the URIs with many surface forms? Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 19. How did we approach the challenge? 19
  • 20. A 4-stage approach Spotting Candidate Mapping Disambiguation Linking 20 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 21. Stage 1: Spotting Find substrings that seem worthy of annotation Naïve implementation (impractical) all n-grams of length (1,|text|) Input: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. Output: “Lennon”, “McCartney”, “New York”, “Apple Corps” 21 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 22. Spotting in DBpedia Spotlight Detect that the label (surface form) of a DBpedia Resource was mentioned Lexicalized, Aho-Corasick algorithm (LingPipe) Name variations from redirects, disambiguation pages, anchor texts Advantages: Simple implementation, well studied problem, Produces a reduced set of spots, Relies on user provided terms. Drawback: high memory requirements (~7G) 22 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 23. Stage 2: Candidate Mapping What are the possible senses of a given surface form (the candidate DBpedia resources)? Input: “Lennon”, “McCartney”, “New York”, “Apple Corps” Output: “Lennon”: { Lennon_(album), Lennon,_Michigan, … } “McCartney”: { McCartney(surname), Paul_McCartney, … } “New York”: { New_York_State, New_York_City, … } “Apple Corps”: { Apple_Corps} 23 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 24. Candidate Mapping in DBpedia Spotlight Sources of mappings between surface forms and DBpedia Resources Page titles offer “chosen names” for resources Redirects offer alternative spellings, aliases, etc. Disambiguation Pages: link a common term to many resources 24 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 25. Candidate Map: Disambiguation Pages Collectively provide a list of ambiguous terms and meanings for each 25 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 26. Candidate Map: Redirects AAPL Apple (Company) Apple (Computers) Apple (company) Apple (computer) Apple Company Apple Computer Apple Computer Co. Apple Computer Inc. Apple Computer Incorporated Apple Computer, Inc Apple Computer, Inc. Apple Computers Apple Inc Apple Incorporate Apple Incorporated Apple India Apple comp Apple compputer Apple computer Apple computer Inc Apple computers Apple inc Apple inc. Apple incoporated Apple incorporated Apple pc Apple's Apple, Inc Apple, Inc. Apple,inc. Apple.com AppleComputer Bowman Bank Cripple Inc. Inc. Apple Computer Jobs and Wozniak Option-Shift-K  Inc. 26 Apple_Inc Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 27. Stage 3: Disambiguation Select the correct candidate DBpedia Resource for a given surface form. Decision is made based on the context(1) the surface form was mentioned con·text  (kntkst)n. 1. the parts of a discourse that surround a word or passage and can throw light on its meaning 2. The circumstances in which an event occurs; a setting. 27 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents http://mw1.merriam-webster.com/dictionary/context
  • 28. Learning the Context for a resource Collect context for DBpedia Resources from Wikipedia Types of context Wikipedia Pages Definitions from disambiguation pages Paragraphs that link to resources 28 (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 29. Disambiguation in DBpedia Spotlight Model DBpedia Resources as vectors of terms found in Wikipedia text Define functions for term scoring and vector similarity (e.g. frequency and cosine) Rank candidate resource vectors based on their similarity with vector of input text Choose highest ranking candidate 29 Lennon = {Beatles,McCartney,rock,guitar,...} Lennon = {tf(Beatles)=320,tf(McCartney)=100,...} Cos(Input,Lennon) = 0.12 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 30. Scoring Strategies TF*IDF (Term Freq. * Inverse Doc. Freq.) TF: insight into the relevance of the term in the context of a DBpedia Resource IDF: insight into the rarity of the term. Co-occurrence of rare terms is more informative ICF: Inverse Candidate Frequency IDF is the “rarity” in the entire Wikipedia ICF is the rarity of a word with relation to the possible senses only 30 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 31. Context-Independent Strategies NAÏVE Use surface form to build URI: “berlin” -> dbpedia:Berlin PROMINENCE P(u): n(u) / N (what is the ‘popularity’/importance of this URL) n(u): number of times URI u occurred N: total number of occurrences Intuition: URIs that have appeared a lot are more likely to appear again DEFAULT SENSE P(u|s): n(u,s) / n(s) n(u,s): number of times URI u occurred with surface form s Intuition: some surface forms are strongly associated to some specific URIs 31 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 32. Linking (Configuration) Decide which spots to annotate with links to the disambiguated resources Different use cases have different needs Only annotate prominent resources? Only if you’re sure disambiguation is correct? Only people? Only things related to Berlin? 32 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 33. Linking in DBpedia Spotlight Can be configured based on: Thresholds Confidence Prominence (support) Whitelist or Blacklist of types Hide all people, Show only organizations Complex definition of a “type” through a SPARQL query. 33 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 34. How well did it work? 34
  • 35. Evaluation: Disambiguation Used held out (unseen) Wikipedia occurrences as test data Evaluates accuracy of disambiguation stage Baselines Random: performs well with low ambiguity Default Sense: only prominence, without context Default Similarity (TF*IDF) : Lucene implementation 35 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 36. Disambiguation Evaluation Results 36 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 37. Evaluation: Annotation News text, different topics Hand-annotated examples by 4 annotators Gold standard from agreement Evaluates precision and recall of annotations. 37 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 38. Annotation Evaluation Results (2) 38 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 39. Annotation Evaluation Results 39 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 40. Conclusions DBpedia Spotlight: a configurable annotation tool to support a variety of use cases Very simple methods work surprisingly well for disambiguation More work is needed to alleviate sparsity Most challenging step is linking More evaluation on larger annotation datasets is needed 40 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 41. What are the next steps? 41
  • 42. A preview of next release CORS-enabled + jQuery client One line to annotate any web page: A new demo interface: based on the plugin Types: DBpedia 3.7, Freebase, Schema.org New configuration parameters E.g. perform smarter spotting Easier install: maven2, jar, debian package 42 $(“div”).annotate() Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 43. 43 Preview: Temporarily available for I-SEMANTICS 2011 http://spotlight.dbpedia.org/dev/demo Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 44. Future work Internationalization (German, Spanish,...) More sophisticated spotting New disambiguation strategies Global disambiguation: one disambiguation decision helps the other decisions Sparsity problems: try smoothing, dimensionality reduction, etc. Store user feedback, learn from mistakes 44 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents
  • 45. We are open Tell us about your use cases Hack something with us Drupal/Wordpress Plugin Semantic Media Wiki integration Are you a good engineer? Help us make it faster, smaller! Are you a good researcher? Let’s collaborate on your/our ideas. 45 Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents Licensed as Apache v2.0 (Business friendly)
  • 46. Thank you! On Twitter: @pablomendes E-mail: pablo.mendes@fu-berlin.de Web: http://pablomendes.com Special thanks to Jo Daiber (working with us for the next release) Partially funded by LOD2.eu and Neofonie Gmbh 46 http://spotlight.dbpedia.org Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents

Editor's Notes

  1. This use case requires merging streaming data with background knowledge information (e.g. from DBpedia). Examples of ?category include category:Wi-Fi devices and category:Touchscreen portable media players amongst others. As a result, without having to elicit all products of interest as keywords to lter a stream, a user is able to leverage relationships in background knowledge to more effectively narrow down the stream of tweets to a subset of interest.
  2. This use case requires merging streaming data with background knowledge information (e.g. from DBpedia). Examples of ?category include category:Wi-Fi devices and category:Touchscreen portable media players amongst others. As a result, without having to elicit all products of interest as keywords to lter a stream, a user is able to leverage relationships in background knowledge to more effectively narrow down the stream of tweets to a subset of interest.
  3. This use case requires merging streaming data with background knowledge information (e.g. from DBpedia). Examples of ?category include category:Wi-Fi devices and category:Touchscreen portable media players amongst others. As a result, without having to elicit all products of interest as keywords to lter a stream, a user is able to leverage relationships in background knowledge to more effectively narrow down the stream of tweets to a subset of interest.
  4. $ gunzip -c MostCommon-surfaceForm.count.gz | grep -Pc "\\t1$"4258908$ gunzip -c MostCommon-surfaceForm.count.gz | wc -l72442894258908 / 7244289 = 0.58789868819424514952399055311018
  5. Max = 200,474 (log = 12.2)Min = 1Mean = 8.343878
  6. Lexicalized: uses a list of resource namesComes from titles, redirects, disambiguates, anchor texts
  7. The agreement between individual annotators is:Annotator 1 vs Annotator 2 (Kappa = 0.674)Annotator 1 vs Annotator 3 (Kappa = 0.606)Annotator 2 vs Annotator 3 (Kappa = 0.577)Annotator 2 vs Annotator 4 (Kappa = 0.528)Annotator 1 vs Annotator 4 (Kappa = 0.469)Annotator 3 vs Annotator 4 (Kappa = 0.385)