Successfully reported this slideshow.

NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink

0

Share

Upcoming SlideShare
Judy daines slides final
Judy daines slides final
Loading in …3
×
1 of 29
1 of 29

NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink

0

Share

Download to read offline

Digital publications host a large amount of data that currently is not harvested, due to its unstructured nature.
However, manually annotating these publications is tedious.
Current tools that automatically analyze unstructured text are too fine-grained for larger amounts of text such as books.
A workable machine-interpretable version of larger bodies of text is thus necessary.
In this paper, we suggest a workflow to automatically create and publish a machine-interpretable version of digital publications as linked data via DBpedia Spotlight.
Furthermore, we make use of the Everything is Connected Engine on top of this published linked data to link digital publications using a Web application dubbed "StoryBlink".
StoryBlink shows the added value of publishing machine-interpretable content of unstructured digital publications by finding relevant books that are connected to selected classic works.
Currently, the time to find a connecting path can be quite long, but this can be overcome by using caching mechanisms, and the relevancy of found paths can be improved by better denoising the DBpedia Spotlight results, or by using alternative disambiguation engines.

Digital publications host a large amount of data that currently is not harvested, due to its unstructured nature.
However, manually annotating these publications is tedious.
Current tools that automatically analyze unstructured text are too fine-grained for larger amounts of text such as books.
A workable machine-interpretable version of larger bodies of text is thus necessary.
In this paper, we suggest a workflow to automatically create and publish a machine-interpretable version of digital publications as linked data via DBpedia Spotlight.
Furthermore, we make use of the Everything is Connected Engine on top of this published linked data to link digital publications using a Web application dubbed "StoryBlink".
StoryBlink shows the added value of publishing machine-interpretable content of unstructured digital publications by finding relevant books that are connected to selected classic works.
Currently, the time to find a connecting path can be quite long, but this can be overcome by using caching mechanisms, and the relevancy of found paths can be improved by better denoising the DBpedia Spotlight results, or by using alternative disambiguation engines.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink

  1. 1. Exposing Digital Content as Linked Data, and Linking them using StoryBlink Ben De Meester Tom De Nies, Laurens De Vocht, Ruben Verborgh, Erik Mannens, and Rik Van de Walle University Ghent – iMinds – Multimedia Lab ben.demeester@ugent.be | @Ben__DM NLPDBpedia2015@ISWC | October 11th 2015 | Bethlehem, PA
  2. 2. We live in a fast world with a lot of content to sift through http://blog.qmee.com/qmee-online-in-60-seconds/
  3. 3. Book ≠ Fast
  4. 4. Finding a good book in short time? Recommendations!
  5. 5. Recommendations? Social recommendations Long tail Metadata recommendations Manual?
  6. 6. What do we want? Automatic content-based metadata to fuel future recommendation-engines
  7. 7. Content-based metadata Get the tags… DBPedia Spotlight ... use them to represent books’ content … EPUB CFI, NIF, ITS, … … and link to other books … in a good way. TPF, EiCE Storyblink!
  8. 8. Get the tags Find out what a book is about… Semantic tags! Using NER/NED! Extract all semantic concepts from the book
  9. 9. AGDISTIS
  10. 10. AGDISTIS Open source Local NER/NED/NEL
  11. 11. From a book to a semantic book … … Split HTML into chunks HTML to text Local Spotlight
  12. 12. Represent a book by tags @prefix schema: <http://schema.org/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix dbr: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> . pg84:book a schema:Book . pg84:epubcfi(/6/12!/4/2/4) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/2!/4/46[chap01]/16/42) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/12!/4/2/6) itsrdf:taIdentRef dbr:Desert ; nif:sourceUrl pg84:book . ...
  13. 13. Represent a book by tags @prefix schema: <http://schema.org/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix dbr: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> . pg84:book a schema:Book . pg84:epubcfi(/6/12!/4/2/4) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/2!/4/46[chap01]/16/42) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/12!/4/2/6) itsrdf:taIdentRef dbr:Desert ; nif:sourceUrl pg84:book . ...
  14. 14. Link to other books Open Source Linked data path finding Multiple paths
  15. 15. Keeping all concepts… Not all mentioned concepts are useful. The path finding becomes really slow.
  16. 16. Keeping all concepts… Not all mentioned concepts are useful. The path finding becomes really slow. What happens if we keep the top X%?
  17. 17. 0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 70 80 90 100 Time (s)#paths Amount of considered concepts (%) Top 50% of found concepts gives similar paths, but a lot faster
  18. 18. 0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 70 80 90 100 Time (s)#paths Amount of considered concepts (%) Top 50% of found concepts gives similar paths, but a lot faster Time-out
  19. 19. Optimized Results @prefix schema: <http://schema.org/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix dbr: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> . pg84:book a schema:Book . pg84:book itsrdf:taIdentRef dbr:Chamois, dbr:Desert, ... http://uvdt.test.iminds.be/storyblinkdata/books
  20. 20. Storyblink Exploring the links between classic works Choose two books, and…
  21. 21. Storyblink
  22. 22. Next steps Scale Indirect paths e.g. book about WWI and book about WWII Relevancy measures Knowledge base influence Filtering influence
  23. 23. Storyblink gives a semantic representation of important semantic concepts inside books, and uses those to connect books together content-wise http://uvdt.test.iminds.be/storyblink Demo 48
  24. 24. Our project The Publisher of the Future Our pilot project partners:

Editor's Notes

  • I would like to talk about representing large bodies of text to a small set of representative tags automatically, and how we can use that to find out how stories are related, so namely how we automatically exposed digital content as linked data, and link those semantic representations using an application we dubbed storyblink
  • Not tf-idf because a book about paper  paper important
  • … so please come and check our booth 48 to play around with storyblink
  • ×