Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Exposing Digital Content as Linked Data,
and Linking them using StoryBlink
Ben De Meester
Tom De Nies, Laurens De Vocht,
R...
We live in a fast world
with a lot of content to sift through
http://blog.qmee.com/qmee-online-in-60-seconds/
Book ≠ Fast
Finding a good book in short time?
Recommendations!
Recommendations?
Social recommendations
Long tail
Metadata recommendations
Manual?
What do we want?
Automatic content-based metadata
to fuel future recommendation-engines
Content-based metadata
Get the tags…
DBPedia Spotlight
... use them to represent books’ content …
EPUB CFI, NIF, ITS, …
… ...
Get the tags
Find out what a book is about…
Semantic tags!
Using NER/NED!
Extract all semantic concepts from the book
AGDISTIS
AGDISTIS
Open source
Local
NER/NED/NEL
From a book to a semantic book
… …
Split HTML
into chunks
HTML
to text
Local
Spotlight
Represent a book by tags
@prefix schema: <http://schema.org/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/o...
Represent a book by tags
@prefix schema: <http://schema.org/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/o...
Link to other books
Open Source
Linked data
path finding
Multiple paths
Keeping all concepts…
Not all mentioned concepts are useful.
The path finding becomes really slow.
Keeping all concepts…
Not all mentioned concepts are useful.
The path finding becomes really slow.
What happens if we keep...
0
10
20
30
40
50
60
0
2
4
6
8
10
12
14
0 10 20 30 40 50 60 70 80 90 100
Time (s)#paths
Amount of considered concepts (%)
T...
0
10
20
30
40
50
60
0
2
4
6
8
10
12
14
0 10 20 30 40 50 60 70 80 90 100
Time (s)#paths
Amount of considered concepts (%)
T...
Optimized Results
@prefix schema: <http://schema.org/> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologi...
Storyblink
Exploring the links between classic works
Choose two books, and…
Storyblink
Next steps
Scale
Indirect paths
e.g. book about WWI and book about WWII
Relevancy measures
Knowledge base influence
Filter...
Storyblink
gives a semantic representation
of important semantic concepts
inside books, and uses those to connect
books to...
Our project
The Publisher of the Future
Our pilot project partners:
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink
Upcoming SlideShare
Loading in …5
×

NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink

666 views

Published on

Digital publications host a large amount of data that currently is not harvested, due to its unstructured nature.
However, manually annotating these publications is tedious.
Current tools that automatically analyze unstructured text are too fine-grained for larger amounts of text such as books.
A workable machine-interpretable version of larger bodies of text is thus necessary.
In this paper, we suggest a workflow to automatically create and publish a machine-interpretable version of digital publications as linked data via DBpedia Spotlight.
Furthermore, we make use of the Everything is Connected Engine on top of this published linked data to link digital publications using a Web application dubbed "StoryBlink".
StoryBlink shows the added value of publishing machine-interpretable content of unstructured digital publications by finding relevant books that are connected to selected classic works.
Currently, the time to find a connecting path can be quite long, but this can be overcome by using caching mechanisms, and the relevancy of found paths can be improved by better denoising the DBpedia Spotlight results, or by using alternative disambiguation engines.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink

  1. 1. Exposing Digital Content as Linked Data, and Linking them using StoryBlink Ben De Meester Tom De Nies, Laurens De Vocht, Ruben Verborgh, Erik Mannens, and Rik Van de Walle University Ghent – iMinds – Multimedia Lab ben.demeester@ugent.be | @Ben__DM NLPDBpedia2015@ISWC | October 11th 2015 | Bethlehem, PA
  2. 2. We live in a fast world with a lot of content to sift through http://blog.qmee.com/qmee-online-in-60-seconds/
  3. 3. Book ≠ Fast
  4. 4. Finding a good book in short time? Recommendations!
  5. 5. Recommendations? Social recommendations Long tail Metadata recommendations Manual?
  6. 6. What do we want? Automatic content-based metadata to fuel future recommendation-engines
  7. 7. Content-based metadata Get the tags… DBPedia Spotlight ... use them to represent books’ content … EPUB CFI, NIF, ITS, … … and link to other books … in a good way. TPF, EiCE Storyblink!
  8. 8. Get the tags Find out what a book is about… Semantic tags! Using NER/NED! Extract all semantic concepts from the book
  9. 9. AGDISTIS
  10. 10. AGDISTIS Open source Local NER/NED/NEL
  11. 11. From a book to a semantic book … … Split HTML into chunks HTML to text Local Spotlight
  12. 12. Represent a book by tags @prefix schema: <http://schema.org/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix dbr: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> . pg84:book a schema:Book . pg84:epubcfi(/6/12!/4/2/4) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/2!/4/46[chap01]/16/42) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/12!/4/2/6) itsrdf:taIdentRef dbr:Desert ; nif:sourceUrl pg84:book . ...
  13. 13. Represent a book by tags @prefix schema: <http://schema.org/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix dbr: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> . pg84:book a schema:Book . pg84:epubcfi(/6/12!/4/2/4) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/2!/4/46[chap01]/16/42) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/12!/4/2/6) itsrdf:taIdentRef dbr:Desert ; nif:sourceUrl pg84:book . ...
  14. 14. Link to other books Open Source Linked data path finding Multiple paths
  15. 15. Keeping all concepts… Not all mentioned concepts are useful. The path finding becomes really slow.
  16. 16. Keeping all concepts… Not all mentioned concepts are useful. The path finding becomes really slow. What happens if we keep the top X%?
  17. 17. 0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 70 80 90 100 Time (s)#paths Amount of considered concepts (%) Top 50% of found concepts gives similar paths, but a lot faster
  18. 18. 0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 70 80 90 100 Time (s)#paths Amount of considered concepts (%) Top 50% of found concepts gives similar paths, but a lot faster Time-out
  19. 19. Optimized Results @prefix schema: <http://schema.org/> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> @prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . @prefix dbr: <http://dbpedia.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> . pg84:book a schema:Book . pg84:book itsrdf:taIdentRef dbr:Chamois, dbr:Desert, ... http://uvdt.test.iminds.be/storyblinkdata/books
  20. 20. Storyblink Exploring the links between classic works Choose two books, and…
  21. 21. Storyblink
  22. 22. Next steps Scale Indirect paths e.g. book about WWI and book about WWII Relevancy measures Knowledge base influence Filtering influence
  23. 23. Storyblink gives a semantic representation of important semantic concepts inside books, and uses those to connect books together content-wise http://uvdt.test.iminds.be/storyblink Demo 48
  24. 24. Our project The Publisher of the Future Our pilot project partners:

×