0
An Interlinking-Hub in the Web of Data

Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann

       Freie Universität ...
DBpedia

  DBpedia.org is a community effort to
     extract structured information from Wikipedia
     make this infor...
Extracting Structured Information from Wikipedia

  Wikipedia consists of
     11.2 million articles (2.5 million in Eng...
Domain
                                  specific
                                  Data

Title
                          ...
Multi-Lingual Abstracts

  The dataset contains a short and a long abstract for each
   concept.
  Short abstracts
    ...
Infobox Extraction




dbpedia:BBC p:network_name
     „British Broadcasting Corporation (BBC)“

dbpedia:BBC p:country dbp...
Accessing the DBpedia Dataset over the Web



  1. DB Dumps for Download


  2. SPARQL Endpoint


  3. Linked Data




   ...
The DBpedia SPARQL Endpoint



  http://dbpedia.org/sparql


  hosted on a OpenLink Virtuoso server


  can answer SPAR...
Linked Data



  Use URIs as names for things


  Use HTTP URIs so that people can look up those names.


  When someon...
URIs




           Wikipedia Article URI:
       http://en.wikipedia.org/wiki/BBC


           DBpedia Resource URI
     ...
W3C Linking Open Data Project




    Community effort to
       publish existing open license datasets as Linked Data o...
LOD Datasets on the Web: May 2007




 Over 500 million RDF triples.
                                  Georgi Kobilarov, ...
LOD Datasets on the Web: April 2008




 Over 2 billion RDF triples.
                                 Georgi Kobilarov, D...
LOD Datasets on the Web: September 2008




                               Georgi Kobilarov, DBpedia at Dublin Core 2008
Linking Enterprise Data




                          Georgi Kobilarov, DBpedia at Dublin Core 2008
Structuring Wikipedia‘s Knowledge




               Currently under development


            Building a class hierarchy ...
Class Hierarchy



  Build from scratch
  170 classes
  900 properties


  Structuring actual data, not modeling the w...
Template Mapping



            Class TV Episode (Work)


              Wikipedia Templates:
               Television Epi...
Parsers



  Handle Templates Values specifically


  Example: Property splitting
  Person             born         „1.1.1...
Parsers

 Example: Class Rules
 MusicalArtist


 If property „currentMembers“ is set
 => Group


 Otherwise
 => Person



...
Parsers

 Example: Range Validation


 Google        keypeople
        „[[Eric Schmidt]] ([[CEO]], [[Chairman]]), [[Sergey...
Class Hierarchy

  200k people (70k athletes, 65k artists, 18k office holders)
  193k places (100k areas, 40k cities, 10...
Thanks




         http://dbpedia.org



         georgi.kobilarov@fu-berlin.de



                                     G...
Upcoming SlideShare
Loading in...5
×

DBpedia - An Interlinking-Hub in the Web of Data

1,874

Published on

Presentation by Georgi Kobilarov about DBpedia at the DC-2008 Wikimedia Workshop on User Generated Metadata

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,874
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
71
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "DBpedia - An Interlinking-Hub in the Web of Data"

  1. 1. An Interlinking-Hub in the Web of Data Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann Freie Universität Berlin, Universität Leipzig Georgi Kobilarov, DBpedia at Dublin Core 2008
  2. 2. DBpedia  DBpedia.org is a community effort to  extract structured information from Wikipedia  make this information available on the Web under an open license  interlink the DBpedia dataset with other open datasets on the Web  Contributors  Freie Universität Berlin (Germany)  Universität Leipzig (Germany)  OpenLink Software (UK)  Linking Open Data Community (W3C SWEO) Georgi Kobilarov, DBpedia at Dublin Core 2008
  3. 3. Extracting Structured Information from Wikipedia  Wikipedia consists of  11.2 million articles (2.5 million in English)  in 264 languages  monthly growth-rate: 4%  Wikipedia articles contain structured information  infoboxes which use a template mechanism  categorization of the article  images depicting the article’s topic  links to external webpages  intra-wiki links to other articles  inter-language links to articles about the same topic in different languages Georgi Kobilarov, DBpedia at Dublin Core 2008
  4. 4. Domain specific Data Title Images Description Languages Infoboxes Web Links Categorization Georgi Kobilarov, DBpedia at Dublin Core 2008
  5. 5. Multi-Lingual Abstracts  The dataset contains a short and a long abstract for each concept.  Short abstracts  English: 2,490,000  German: 391,000  French: 383,000  Dutch: 284,000  Polish: 256,000  Italian: 286,000  Spanish: 226,000  Japanese: 199,000  Portuguese: 246,000  Swedish: 144,000  Chinese: 101,000 Georgi Kobilarov, DBpedia at Dublin Core 2008
  6. 6. Infobox Extraction dbpedia:BBC p:network_name „British Broadcasting Corporation (BBC)“ dbpedia:BBC p:country dbpedia:United_Kingdom dbpedia:BBC p:key_people dbpedia:Michael_Lyons Georgi Kobilarov, DBpedia at Dublin Core 2008
  7. 7. Accessing the DBpedia Dataset over the Web 1. DB Dumps for Download 2. SPARQL Endpoint 3. Linked Data Georgi Kobilarov, DBpedia at Dublin Core 2008
  8. 8. The DBpedia SPARQL Endpoint  http://dbpedia.org/sparql  hosted on a OpenLink Virtuoso server  can answer SPARQL queries like  Give me all Sitcoms that are set in NYC?  All tennis players from Moscow?  All films by Quentin Tarentino?  All German musicians that were born in Berlin in the 19th century?  All soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants? Georgi Kobilarov, DBpedia at Dublin Core 2008
  9. 9. Linked Data  Use URIs as names for things  Use HTTP URIs so that people can look up those names.  When someone looks up a URI, provide useful information.  Include links to other URIs. so that they can discover more things. Georgi Kobilarov, DBpedia at Dublin Core 2008
  10. 10. URIs Wikipedia Article URI: http://en.wikipedia.org/wiki/BBC DBpedia Resource URI http://dbpedia.org/resource/BBC Georgi Kobilarov, DBpedia at Dublin Core 2008
  11. 11. W3C Linking Open Data Project  Community effort to  publish existing open license datasets as Linked Data on the Web  interlink things between different data sources Georgi Kobilarov, DBpedia at Dublin Core 2008
  12. 12. LOD Datasets on the Web: May 2007  Over 500 million RDF triples. Georgi Kobilarov, DBpedia at Dublin Core 2008
  13. 13. LOD Datasets on the Web: April 2008  Over 2 billion RDF triples. Georgi Kobilarov, DBpedia at Dublin Core 2008
  14. 14. LOD Datasets on the Web: September 2008 Georgi Kobilarov, DBpedia at Dublin Core 2008
  15. 15. Linking Enterprise Data Georgi Kobilarov, DBpedia at Dublin Core 2008
  16. 16. Structuring Wikipedia‘s Knowledge Currently under development Building a class hierarchy / ontology Mapping Wikipedia Templates to DBpedia classes Georgi Kobilarov, DBpedia at Dublin Core 2008
  17. 17. Class Hierarchy  Build from scratch  170 classes  900 properties  Structuring actual data, not modeling the world  No AI terminology, no „living thing“ or „agent“ Georgi Kobilarov, DBpedia at Dublin Core 2008
  18. 18. Template Mapping Class TV Episode (Work) Wikipedia Templates: Television Episode UK Office Episode Simpsons Episode DoctorWhoBox Georgi Kobilarov, DBpedia at Dublin Core 2008
  19. 19. Parsers Handle Templates Values specifically Example: Property splitting Person born „1.1.1980, [[Berlin]]“ => split to birthplace Berlin birthdate 1980-01-01 Georgi Kobilarov, DBpedia at Dublin Core 2008
  20. 20. Parsers Example: Class Rules MusicalArtist If property „currentMembers“ is set => Group Otherwise => Person Georgi Kobilarov, DBpedia at Dublin Core 2008
  21. 21. Parsers Example: Range Validation Google keypeople „[[Eric Schmidt]] ([[CEO]], [[Chairman]]), [[Sergey Brin]], [[Larry Page]] Company#keyperson range Person#Class Googlekeyperson Eric Schmidt Sergey Brin Larry Page Georgi Kobilarov, DBpedia at Dublin Core 2008
  22. 22. Class Hierarchy  200k people (70k athletes, 65k artists, 18k office holders)  193k places (100k areas, 40k cities, 10k rivers)  187k works (71k music albums, 24k singles, 31k films, 15k books)  87k species  70k organisations (20k educational institutions, 18k companies, 12k radio stations)  22k buildings (8k airports, 5k stations, 2k stadiums, 1k bridges)  12k planets  And more… (events, diseases, proteins, drugs, aircrafts, automobiles, ships, astronaut, architect, scientists) Georgi Kobilarov, DBpedia at Dublin Core 2008
  23. 23. Thanks http://dbpedia.org georgi.kobilarov@fu-berlin.de Georgi Kobilarov, DBpedia at Dublin Core 2008
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×