Your SlideShare is downloading. ×
0
DBpedia SpotlightShedding Light on the Web of Documents<br />Pablo N. Mendes, Max Jakob, Andrés Garcia-Silva, Christian Bi...
Agenda<br />What is text annotation?<br />What can you build with it?<br />Why is it difficult?<br />How did we approach t...
What is it?<br />3<br />
Text Annotation<br />From:<br />To:<br />(…) Upon their return, Lennon and McCartney went to New York to announce the form...
Challenge: Term Ambiguity<br />5<br />...this apple on the palm of my hand...<br />...Apple tried to acquire Palm Inc....<...
What can you do with annotations?<br />Links to complementary information<br />“More about this”<br />Faceted browsing of ...
Rich Snippets<br />Search Engines already benefit from some kinds of annotations<br />7<br />http://www.google.com/webmast...
Twarql Example Use Case<br />What competitors of my product are being mentioned with my product on Twitter?<br />- compara...
Twarql Example Use Case (2)<br />Incoming microposts…<br />Background Knowledge (e.g. DBpedia)<br />@anonymized<br />Lorem...
Twarql Example Use Case (3)<br />Incoming microposts…<br />Background Knowledge (e.g. DBpedia)<br />@anonymized<br />Lorem...
Twarql Example Use Case (4)<br />Background Knowledge (e.g. DBpedia)<br />@anonymized<br />Loremipsumblabla this is an exa...
DBpedia Spotlight<br />DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data<br ...
Why is it difficult?<br />13<br />
Dataset overview<br />Volume of Wikipedia<br />56,9 GB in raw text data<br />Occurrences of Ambiguous Terms in Wikipedia: ...
Histogram: URI occurrences<br />Many “rare” URIs, <br />(few links on Wikipedia)<br />Most of previous work deals with the...
Histogram: Surface Form Ambiguity<br />Many “unambiguous” surface forms<br />Max: 1199 (log=7.08)<br />Min: 1<br />Mean: 1...
Ambiguity<br />17<br />What are the most ambiguous surface forms?<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlig...
Name Variation<br />18<br />What are the URIs with many surface forms?<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Sp...
How did we approach the challenge?<br />19<br />
A 4-stage approach<br />Spotting<br />Candidate Mapping<br />Disambiguation<br />Linking<br />20<br />Mendes, Jakob, Garci...
Stage 1: Spotting<br />Find substrings that seem worthy of annotation<br />Naïve implementation (impractical)<br />all n-g...
Spotting in DBpedia Spotlight<br />Detect that the label (surface form) of a DBpedia Resource was mentioned<br />Lexicaliz...
Stage 2: Candidate Mapping<br />What are the possible senses of a given surface form (the candidate DBpedia resources)?<br...
Candidate Mapping in DBpedia Spotlight<br />Sources of mappings between surface forms and DBpedia Resources<br />Page titl...
Candidate Map: Disambiguation Pages<br />Collectively provide a list of ambiguous terms and meanings for each<br />25<br /...
Candidate Map: Redirects<br />AAPL<br />Apple (Company)<br />Apple (Computers)<br />Apple (company)<br />Apple (computer)<...
Stage 3: Disambiguation<br />Select the correct candidate DBpedia Resource for a given surface form.<br />Decision is made...
Learning the Context for a resource<br />Collect context for DBpedia Resources from Wikipedia<br />Types of context<br />W...
Disambiguation in DBpedia Spotlight<br />Model DBpedia Resources as vectors of terms found in Wikipedia text<br />Define f...
Scoring Strategies<br />TF*IDF (Term Freq. * Inverse Doc. Freq.)<br />TF: insight into the relevance of the term in the co...
Context-Independent Strategies<br />NAÏVE<br />Use surface form to build URI: “berlin” -> dbpedia:Berlin<br />PROMINENCE<b...
Linking (Configuration)<br />Decide which spots to annotate with links to the disambiguated resources<br />Different use c...
Linking in DBpedia Spotlight<br />Can be configured based on:<br />Thresholds<br />Confidence<br />Prominence (support)<br...
How  well did it work?<br />34<br />
Evaluation: Disambiguation<br />Used held out (unseen) Wikipedia occurrences as test data<br />Evaluates accuracy of disam...
Disambiguation Evaluation Results<br />36<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on th...
Evaluation: Annotation<br />News text, different topics<br />Hand-annotated examples by 4 annotators<br />Gold standard fr...
Annotation Evaluation Results (2)<br />38<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on th...
Annotation Evaluation Results<br />39<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the We...
Conclusions<br />DBpedia Spotlight: a configurable annotation tool to support a variety of use cases<br />Very simple meth...
What are the next steps?<br />41<br />
A preview of next release<br />CORS-enabled + jQuery client<br />One line to annotate any web page:<br />A new demo interf...
Upcoming SlideShare
Loading in...5
×

DBpedia Spotlight at I-SEMANTICS 2011

3,685

Published on

DBpedia Spotlight: a configurable annotation tool to support a variety of use cases. Given input text in English, we extract DBpedia Resources and generate annotations according to user-provided configuration parameters. These parameters can include score thresholds, entity types, and even arbitrary "type" definitions through SPARQL queries.

This is the presentation at the best paper award session at I-SEMANTICS 2011.

Published in: Technology, Business
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,685
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
112
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide
  • This use case requires merging streaming data with background knowledge information (e.g. from DBpedia). Examples of ?category include category:Wi-Fi devices and category:Touchscreen portable media players amongst others. As a result, without having to elicit all products of interest as keywords to lter a stream, a user is able to leverage relationships in background knowledge to more effectively narrow down the stream of tweets to a subset of interest.
  • This use case requires merging streaming data with background knowledge information (e.g. from DBpedia). Examples of ?category include category:Wi-Fi devices and category:Touchscreen portable media players amongst others. As a result, without having to elicit all products of interest as keywords to lter a stream, a user is able to leverage relationships in background knowledge to more effectively narrow down the stream of tweets to a subset of interest.
  • This use case requires merging streaming data with background knowledge information (e.g. from DBpedia). Examples of ?category include category:Wi-Fi devices and category:Touchscreen portable media players amongst others. As a result, without having to elicit all products of interest as keywords to lter a stream, a user is able to leverage relationships in background knowledge to more effectively narrow down the stream of tweets to a subset of interest.
  • $ gunzip -c MostCommon-surfaceForm.count.gz | grep -Pc &quot;\\t1$&quot;4258908$ gunzip -c MostCommon-surfaceForm.count.gz | wc -l72442894258908 / 7244289 = 0.58789868819424514952399055311018
  • Max = 200,474 (log = 12.2)Min = 1Mean = 8.343878
  • Lexicalized: uses a list of resource namesComes from titles, redirects, disambiguates, anchor texts
  • The agreement between individual annotators is:Annotator 1 vs Annotator 2 (Kappa = 0.674)Annotator 1 vs Annotator 3 (Kappa = 0.606)Annotator 2 vs Annotator 3 (Kappa = 0.577)Annotator 2 vs Annotator 4 (Kappa = 0.528)Annotator 1 vs Annotator 4 (Kappa = 0.469)Annotator 3 vs Annotator 4 (Kappa = 0.385)
  • Transcript of "DBpedia Spotlight at I-SEMANTICS 2011"

    1. 1. DBpedia SpotlightShedding Light on the Web of Documents<br />Pablo N. Mendes, Max Jakob, Andrés Garcia-Silva, Christian Bizer<br />pablo.mendes@fu-berlin.de<br />I-SEMANTICS, Graz, Austria<br />September 9th 2011<br />1<br />
    2. 2. Agenda<br />What is text annotation?<br />What can you build with it?<br />Why is it difficult?<br />How did we approach the challenge?<br />How well did it work?<br />What are the next steps?<br />2<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    3. 3. What is it?<br />3<br />
    4. 4. Text Annotation<br />From:<br />To:<br />(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. <br />(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. <br />http://dbpedia.org/resource/New_York_City<br />http://dbpedia.org/resource/Apple_Corps<br />4<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    5. 5. Challenge: Term Ambiguity<br />5<br />...this apple on the palm of my hand...<br />...Apple tried to acquire Palm Inc....<br />...eating an apple sitted by a palm tree...<br />What do “apple” and “palm” mean in each case?<br />Our objective is to recognize entities and disambiguate their meaning, generating DBpedia annotation in text.<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    6. 6. What can you do with annotations?<br />Links to complementary information<br />“More about this”<br />Faceted browsing of blog posts<br />Show only posts with topics related to Sports<br />Rich snippets on Google<br />Search engines start to display info from annotations<br />More expressive filtering of information streams<br />Twarql (entry at I-SEMANTICS 2010 Challenge)<br />6<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    7. 7. Rich Snippets<br />Search Engines already benefit from some kinds of annotations<br />7<br />http://www.google.com/webmasters/tools/richsnippets<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    8. 8. Twarql Example Use Case<br />What competitors of my product are being mentioned with my product on Twitter?<br />- comparative opinion!<br />SELECT ? competitor<br />WHERE {<br />dbpedia:IPadskos:subject ?category .<br /> ?competitor skos:subject ?category .<br /> ?tweet moat:taggedWith ?competitor .<br />}<br />?tweet moat:taggedWithdbpedia:Ipad .<br />8<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    9. 9. Twarql Example Use Case (2)<br />Incoming microposts…<br />Background Knowledge (e.g. DBpedia)<br />@anonymized<br />Loremipsumblabla this is an example tweet<br />dbpedia:IPad<br />skos:subject<br />?category<br />?category<br />?competitor<br />skos:subject<br />skos:subject<br />moat:taggedWith<br />Competition is modeled as two products <br />in the same category in DBpedia<br />?tweet<br />9<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    10. 10. Twarql Example Use Case (3)<br />Incoming microposts…<br />Background Knowledge (e.g. DBpedia)<br />@anonymized<br />Loremipsumblabla this is an example tweet<br />category:Wi-Fi<br />dbpedia:IPad<br />category:Touchscreen<br />skos:subject<br />?category<br />?category<br />?competitor<br />skos:subject<br />skos:subject<br />moat:taggedWith<br />Background knowledge is dynamically “brought into” microposts.<br />?tweet<br />10<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    11. 11. Twarql Example Use Case (4)<br />Background Knowledge (e.g. DBpedia)<br />@anonymized<br />Loremipsumblabla this is an example tweet<br />category:Wi-Fi<br />dbpedia:IPad<br />category:Touchscreen<br />skos:subject<br />?category<br />?category<br />?competitor<br />skos:subject<br />skos:subject<br />moat:taggedWith<br />?tweet<br />Trigger action if micropost matches constraints.<br />11<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    12. 12. DBpedia Spotlight<br />DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data<br />DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages <br />Learns how to recognize that a DBpedia resource was mentioned<br />Given plain text as input, generates annotated text<br />12<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    13. 13. Why is it difficult?<br />13<br />
    14. 14. Dataset overview<br />Volume of Wikipedia<br />56,9 GB in raw text data<br />Occurrences of Ambiguous Terms in Wikipedia: 58.8%<br />Sparsity: less data for some DBpedia resources<br />14<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    15. 15. Histogram: URI occurrences<br />Many “rare” URIs, <br />(few links on Wikipedia)<br />Most of previous work deals with these entities:<br />People, Organization, Location<br />Few “popular” URIs<br />(lots of links on Wikipedia)<br />log(n(uri))))<br />15<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    16. 16. Histogram: Surface Form Ambiguity<br />Many “unambiguous” surface forms<br />Max: 1199 (log=7.08)<br />Min: 1<br />Mean: 1.328949<br />Few very “ambiguous” surface forms<br />log(n(uri,sf))))<br />16<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    17. 17. Ambiguity<br />17<br />What are the most ambiguous surface forms?<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    18. 18. Name Variation<br />18<br />What are the URIs with many surface forms?<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    19. 19. How did we approach the challenge?<br />19<br />
    20. 20. A 4-stage approach<br />Spotting<br />Candidate Mapping<br />Disambiguation<br />Linking<br />20<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    21. 21. Stage 1: Spotting<br />Find substrings that seem worthy of annotation<br />Naïve implementation (impractical)<br />all n-grams of length (1,|text|)<br />Input:<br />(…) Upon their return, Lennon and McCartney went to New York <br />to announce the formation of Apple Corps. <br />Output:<br />“Lennon”, “McCartney”, “New York”, “Apple Corps”<br />21<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    22. 22. Spotting in DBpedia Spotlight<br />Detect that the label (surface form) of a DBpedia Resource was mentioned<br />Lexicalized, Aho-Corasick algorithm (LingPipe)<br />Name variations from redirects, disambiguation pages, anchor texts<br />Advantages: <br />Simple implementation, well studied problem,<br />Produces a reduced set of spots, <br />Relies on user provided terms.<br />Drawback: <br />high memory requirements (~7G)<br />22<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    23. 23. Stage 2: Candidate Mapping<br />What are the possible senses of a given surface form (the candidate DBpedia resources)?<br />Input:<br />“Lennon”, “McCartney”, “New York”, “Apple Corps”<br />Output:<br />“Lennon”: { Lennon_(album), Lennon,_Michigan, … }<br />“McCartney”: { McCartney(surname), Paul_McCartney, … }<br />“New York”: { New_York_State, New_York_City, … }<br />“Apple Corps”: { Apple_Corps}<br />23<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    24. 24. Candidate Mapping in DBpedia Spotlight<br />Sources of mappings between surface forms and DBpedia Resources<br />Page titles offer “chosen names” for resources<br />Redirects offer alternative spellings, aliases, etc.<br />Disambiguation Pages: link a common term to many resources<br />24<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    25. 25. Candidate Map: Disambiguation Pages<br />Collectively provide a list of ambiguous terms and meanings for each<br />25<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    26. 26. Candidate Map: Redirects<br />AAPL<br />Apple (Company)<br />Apple (Computers)<br />Apple (company)<br />Apple (computer)<br />Apple Company<br />Apple Computer<br />Apple Computer Co.<br />Apple Computer Inc.<br />Apple Computer Incorporated<br />Apple Computer, Inc<br />Apple Computer, Inc.<br />Apple Computers<br />Apple Inc<br />Apple Incorporate<br />Apple Incorporated<br />Apple India<br />Apple comp<br />Apple compputer<br />Apple computer<br />Apple computer Inc<br />Apple computers<br />Apple inc<br />Apple inc.<br />Apple incoporated<br />Apple incorporated<br />Apple pc<br />Apple's<br />Apple, Inc<br />Apple, Inc.<br />Apple,inc.<br />Apple.com<br />AppleComputer<br />Bowman Bank<br />Cripple Inc.<br />Inc. Apple Computer<br />Jobs and Wozniak<br />Option-Shift-K<br /> Inc.<br />26<br />Apple_Inc<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    27. 27. Stage 3: Disambiguation<br />Select the correct candidate DBpedia Resource for a given surface form.<br />Decision is made based on the context(1) the surface form was mentioned<br />con·text  (kntkst)n.<br />1. the parts of a discourse that surround a word or passage and can throw light on its meaning<br />2. The circumstances in which an event occurs; a setting.<br />27<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />http://mw1.merriam-webster.com/dictionary/context<br />
    28. 28. Learning the Context for a resource<br />Collect context for DBpedia Resources from Wikipedia<br />Types of context<br />Wikipedia Pages <br />Definitions from disambiguation pages<br />Paragraphs that link to resources<br />28<br />(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. <br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    29. 29. Disambiguation in DBpedia Spotlight<br />Model DBpedia Resources as vectors of terms found in Wikipedia text<br />Define functions for term scoring and vector similarity (e.g. frequency and cosine)<br />Rank candidate resource vectors based on their similarity with vector of input text<br />Choose highest ranking candidate<br />29<br />Lennon = {Beatles,McCartney,rock,guitar,...}<br />Lennon = {tf(Beatles)=320,tf(McCartney)=100,...}<br />Cos(Input,Lennon) = 0.12<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    30. 30. Scoring Strategies<br />TF*IDF (Term Freq. * Inverse Doc. Freq.)<br />TF: insight into the relevance of the term in the context of a DBpedia Resource<br />IDF: insight into the rarity of the term. Co-occurrence of rare terms is more informative<br />ICF: Inverse Candidate Frequency<br />IDF is the “rarity” in the entire Wikipedia<br />ICF is the rarity of a word with relation to the possible senses only<br />30<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    31. 31. Context-Independent Strategies<br />NAÏVE<br />Use surface form to build URI: “berlin” -> dbpedia:Berlin<br />PROMINENCE<br />P(u): n(u) / N (what is the ‘popularity’/importance of this URL)<br />n(u): number of times URI u occurred<br />N: total number of occurrences<br />Intuition: URIs that have appeared a lot are more likely to appear again<br />DEFAULT SENSE<br />P(u|s): n(u,s) / n(s)<br />n(u,s): number of times URI u occurred with surface form s<br />Intuition: some surface forms are strongly associated to some specific URIs<br />31<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    32. 32. Linking (Configuration)<br />Decide which spots to annotate with links to the disambiguated resources<br />Different use cases have different needs<br />Only annotate prominent resources?<br />Only if you’re sure disambiguation is correct?<br />Only people?<br />Only things related to Berlin?<br />32<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    33. 33. Linking in DBpedia Spotlight<br />Can be configured based on:<br />Thresholds<br />Confidence<br />Prominence (support)<br />Whitelist or Blacklist of types<br />Hide all people, Show only organizations<br />Complex definition of a “type” through a SPARQL query.<br />33<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    34. 34. How well did it work?<br />34<br />
    35. 35. Evaluation: Disambiguation<br />Used held out (unseen) Wikipedia occurrences as test data<br />Evaluates accuracy of disambiguation stage<br />Baselines<br />Random: performs well with low ambiguity<br />Default Sense: only prominence, without context<br />Default Similarity (TF*IDF) : Lucene implementation<br />35<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    36. 36. Disambiguation Evaluation Results<br />36<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    37. 37. Evaluation: Annotation<br />News text, different topics<br />Hand-annotated examples by 4 annotators<br />Gold standard from agreement <br />Evaluates precision and recall of annotations.<br />37<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    38. 38. Annotation Evaluation Results (2)<br />38<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    39. 39. Annotation Evaluation Results<br />39<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    40. 40. Conclusions<br />DBpedia Spotlight: a configurable annotation tool to support a variety of use cases<br />Very simple methods work surprisingly well for disambiguation<br />More work is needed to alleviate sparsity<br />Most challenging step is linking<br />More evaluation on larger annotation datasets is needed<br />40<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    41. 41. What are the next steps?<br />41<br />
    42. 42. A preview of next release<br />CORS-enabled + jQuery client<br />One line to annotate any web page:<br />A new demo interface: based on the plugin<br />Types: DBpedia 3.7, Freebase, Schema.org<br />New configuration parameters<br />E.g. perform smarter spotting<br />Easier install: maven2, jar, debian package<br />42<br />$(“div”).annotate()<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    43. 43. 43<br />Preview:<br />Temporarily available for I-SEMANTICS 2011<br />http://spotlight.dbpedia.org/dev/demo<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    44. 44. Future work<br />Internationalization (German, Spanish,...)<br />More sophisticated spotting<br />New disambiguation strategies<br />Global disambiguation: one disambiguation decision helps the other decisions<br />Sparsity problems: try smoothing, dimensionality reduction, etc.<br />Store user feedback, learn from mistakes<br />44<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    45. 45. We are open<br />Tell us about your use cases<br />Hack something with us<br />Drupal/Wordpress Plugin<br />Semantic Media Wiki integration<br />Are you a good engineer?<br />Help us make it faster, smaller!<br />Are you a good researcher?<br />Let’s collaborate on your/our ideas.<br />45<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />Licensed as Apache v2.0<br />(Business friendly)<br />
    46. 46. Thank you!<br />On Twitter: @pablomendes<br />E-mail: pablo.mendes@fu-berlin.de<br />Web: http://pablomendes.com<br />Special thanks to Jo Daiber (working with us for the next release)<br />Partially funded by LOD2.eu and Neofonie Gmbh<br />46<br />http://spotlight.dbpedia.org<br />Mendes, Jakob, Garcia-Silva, Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×