Digitale Bibliothek Jakob Voß Wikipedia as source of collaboratively created Knowledge Organization Systems Fachhochschule...
 
Wikipedia <ul><li>de facto standard online reference </li></ul><ul><li>> 13 million articles, > 230 languages </li></ul><u...
Structure of Wikipedia <ul><li>articles </li></ul><ul><li>internal and external links </li></ul><ul><li>redirects and disa...
Articles <ul><li>text, intro, substructure </li></ul><ul><li>specific structure for specific article types (years, people ...
Links <ul><li>[[target]]  or  [[target|label]] </li></ul><ul><li>connect on textual and conceptual level </li></ul><ul><li...
External links <ul><li>links to references </li></ul><ul><li>links to other structured knowledge bases </li></ul><ul><ul><...
Redirects and disambiguations <ul><li>control synonyms and homonys </li></ul>
Redirects and disambiguations
Lists and Portals <ul><li>list:  lead section followed by a list of links to articles in a particular subject area, such a...
Navigation templates <ul><li>grouping of links used in multiple related articles to facilitate navigation between those ar...
Categories Nordrhein-Westfalen nach Ort Ort als Thema Rheinland Köln Kultur (Köln) Kölner Dom Geschichte Kölns Messe Köln ...
Categories
Infoboxes and Geodata <ul><li>structured tables via MediaWiki Templates, a simple field-value-structure </li></ul><ul><li>...
<ul><ul><li>this and following slides based on Georgi Kobilarov’s presentation. </li></ul></ul>
Field values are not atomic
References <ul><li>vast amount of bibliographic data </li></ul><ul><li>Wikipedia cataloguing rules (sic!) </li></ul><ul><l...
Examples without templates
Revisions and other metadata <ul><li>Information  about  articles </li></ul><ul><ul><li>which user changed what an which t...
<ul><li>Wikipedia is/are not just articles but a struc-tured system of knowledge management </li></ul><ul><li>And all of i...
WikiWord <ul><li>WikiWord builds a multilingual thesaurus by mining the link structure </li></ul><ul><ul><li>Every page de...
WikiWord Thesaurus <ul><li>English, German, French, Dutch, Norwegian </li></ul><ul><ul><li>>20 millionen labels </li></ul>...
<ul><li>RDF is URI + Unicode + Triples [ + Rules] </li></ul>&quot;Object&quot; @lang &quot;Object&quot; ^^type-URI Resourc...
RDF example (this: SKOS) &quot;Ananas&quot;@de <ul><li>URI namespaces for abbreviation </li></ul>@prefix skos: <http://www...
RDF formats dc:title foaf:firstName foaf:secondName N3 graph <ul><ul><ul><li>@prefix foaf <http://xmlns.com/foaf/0.1/>. </...
RDF/XML format <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?> <rdf:RDF xmlns: rdf =&quot; http://www.w3.org/1...
<ul><li>initiative to connect and publish open collections of data with RDF on the Web </li></ul><ul><li>one of largest co...
DBPedia Extraction framework <ul><li>http://dbpedia.svn.sourceforge.net  (Open Source) </li></ul>Wikipedia Extraction Trip...
DBPedia Extraction framework <ul><li>core ontology </li></ul><ul><li>people </li></ul><ul><li>places </li></ul><ul><li>org...
Crowd Sourced Extraction Wikipedia Extraction Triple Store
 
Linked Data <ul><li>Benutze  URIs ,  um Objekte zu identifizieren. </li></ul><ul><li>Benutze  HTTP URIs , so dass Objekte ...
Mai 2007 März 2009 September 2008
 
More complex queries <ul><li>examples </li></ul><ul><ul><li>people born in 1965 that contributed music in films </li></ul>...
Beispielanfrage 1965 1965 <ul><ul><ul><li>Filme , deren Musik  jemand  gemacht hat, der im Jahr 1965 geboren wurde? </li><...
Vernüpfungen von Quellen 1965 OPAC DBpedia dbpedia:birthYear Dancer in the Dark Björk Buch über Björk PND:119525054
Inference rules <ul><li>if ... then also ... </li></ul><ul><li>a frbr:creator B => B rdf:type frbr:Work </li></ul><ul><li>...
Semantic Tagging <ul><li>assign controlled concepts to resources  </li></ul><ul><li>subject indexing reinvented </li></ul>...
 
Open Issues <ul><li>user interfaces for query and display </li></ul><ul><li>data quality (needs humans) </li></ul><ul><li>...
References <ul><li>Wikipedia itself (practise editing and discussion!) </li></ul><ul><li>Kinzler (2008):  Automatischer Au...
What do you think?
Upcoming SlideShare
Loading in …5
×

Wikipedia as source of collaboratively created Knowledge Organization Systems

1,826 views

Published on

Presentation about Wikipedia, WikiWord, DBPedia and Semantic Tagging held at Fachhochschule Hannover

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,826
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
47
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Wikipedia as source of collaboratively created Knowledge Organization Systems

  1. 1. Digitale Bibliothek Jakob Voß Wikipedia as source of collaboratively created Knowledge Organization Systems Fachhochschule Hannover 25. Juni 2009
  2. 3. Wikipedia <ul><li>de facto standard online reference </li></ul><ul><li>> 13 million articles, > 230 languages </li></ul><ul><li>run by Wikimedia, run with MediaWiki </li></ul><ul><li>open content (CC-BY-SA / GFDL) </li></ul><ul><li>it‘s a wiki! </li></ul><ul><ul><li>dense hypertext </li></ul></ul><ul><ul><li>anyone can edit (but it‘s a media of its own) </li></ul></ul><ul><ul><li>http://de.wikipedia.org/wiki/Portal:BID </li></ul></ul>
  3. 4. Structure of Wikipedia <ul><li>articles </li></ul><ul><li>internal and external links </li></ul><ul><li>redirects and disambiguation pages </li></ul><ul><li>lists, portals, and navigation templates </li></ul><ul><li>categories </li></ul><ul><li>infoboxes and geodata </li></ul><ul><li>(bibliographic) references </li></ul><ul><li>revisions, flags, featured content .... </li></ul>
  4. 5. Articles <ul><li>text, intro, substructure </li></ul><ul><li>specific structure for specific article types (years, people etc.) </li></ul>
  5. 6. Links <ul><li>[[target]] or [[target|label]] </li></ul><ul><li>connect on textual and conceptual level </li></ul><ul><li>structure of hyperlinks encodes relations </li></ul>
  6. 7. External links <ul><li>links to references </li></ul><ul><li>links to other structured knowledge bases </li></ul><ul><ul><li>authority files (for instance PND) </li></ul></ul><ul><ul><li>MusicBrainz, IMDB ... </li></ul></ul><ul><li>interlanguage links to other wikipedias </li></ul>
  7. 8. Redirects and disambiguations <ul><li>control synonyms and homonys </li></ul>
  8. 9. Redirects and disambiguations
  9. 10. Lists and Portals <ul><li>list: lead section followed by a list of links to articles in a particular subject area, such as people or places, or a timeline of events </li></ul><ul><ul><li>List of _ , Outline of _, Glossary of _, Timeline of _, Index of _ ... </li></ul></ul><ul><li>portal: intended to serve as “Main Pages” for specific topics or areas. May be associated with one or more WikiProjects. </li></ul><ul><ul><li>en: ~140 featured portals of ~600 total </li></ul></ul><ul><ul><li>http://en.wikipedia.org/wiki/Portal:Featured_portals </li></ul></ul>
  10. 11. Navigation templates <ul><li>grouping of links used in multiple related articles to facilitate navigation between those articles. </li></ul>
  11. 12. Categories Nordrhein-Westfalen nach Ort Ort als Thema Rheinland Köln Kultur (Köln) Kölner Dom Geschichte Kölns Messe Köln Multihierarchie of categories Tagged article (social tagging, set model) Kategorien : Katholische Bischhofskirche (Deutschland) | Kölner Dom | Weltkulturerbe in Deutschland | Geschütztes Kulturgut | Architekturikone | Gotisches Bauwerk | Historisches Bauwerk | Stadtbezirk Köln-Innenstadt | Kultbau
  12. 13. Categories
  13. 14. Infoboxes and Geodata <ul><li>structured tables via MediaWiki Templates, a simple field-value-structure </li></ul><ul><li>used for cities, animals, bands, chemicals ... </li></ul><ul><li>qualifiers problematic: date, unit, source... </li></ul><ul><li>special and popular case: geographical coordinates </li></ul>
  14. 15. <ul><ul><li>this and following slides based on Georgi Kobilarov’s presentation. </li></ul></ul>
  15. 16. Field values are not atomic
  16. 17. References <ul><li>vast amount of bibliographic data </li></ul><ul><li>Wikipedia cataloguing rules (sic!) </li></ul><ul><li>partly structured via templates: </li></ul>
  17. 18. Examples without templates
  18. 19. Revisions and other metadata <ul><li>Information about articles </li></ul><ul><ul><li>which user changed what an which time </li></ul></ul><ul><ul><li>flagged revisions </li></ul></ul><ul><ul><li>featured content </li></ul></ul><ul><ul><li>... </li></ul></ul><ul><li>Interesting data available for wiki research </li></ul>
  19. 20. <ul><li>Wikipedia is/are not just articles but a struc-tured system of knowledge management </li></ul><ul><li>And all of it is availabe for further processing! </li></ul><ul><li>Use as Knowledge Organization System (KOS) </li></ul><ul><ul><li>WikiWord </li></ul></ul><ul><ul><li>DBPedia </li></ul></ul><ul><ul><li>Semantic Tagging </li></ul></ul><ul><ul><li>...it‘s up to you! </li></ul></ul>Summary
  20. 21. WikiWord <ul><li>WikiWord builds a multilingual thesaurus by mining the link structure </li></ul><ul><ul><li>Every page describes a concept </li></ul></ul><ul><ul><li>Link labels are terms refering to those concepts </li></ul></ul><ul><ul><li>Links and categories define relations </li></ul></ul><ul><ul><li>Multilingual by merging languages </li></ul></ul><ul><li>German Thesis by Daniel Kinzler </li></ul><ul><ul><li>http://brightbyte.de/page/WikiWord </li></ul></ul>
  21. 22. WikiWord Thesaurus <ul><li>English, German, French, Dutch, Norwegian </li></ul><ul><ul><li>>20 millionen labels </li></ul></ul><ul><ul><li>>11 millionen concepts </li></ul></ul><ul><ul><li>>2 millionen definitions </li></ul></ul><ul><ul><li>>75 millionen related links </li></ul></ul><ul><ul><li>>11 millionen hierarchical links </li></ul></ul><ul><li>Available in SKOS/RDF </li></ul><ul><li>Source code available to generate more </li></ul>
  22. 23. <ul><li>RDF is URI + Unicode + Triples [ + Rules] </li></ul>&quot;Object&quot; @lang &quot;Object&quot; ^^type-URI Resource Description Framework predicate
  23. 24. RDF example (this: SKOS) &quot;Ananas&quot;@de <ul><li>URI namespaces for abbreviation </li></ul>@prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix agro: <http://www.fao.org/aos/agrovoc> .
  24. 25. RDF formats dc:title foaf:firstName foaf:secondName N3 graph <ul><ul><ul><li>@prefix foaf <http://xmlns.com/foaf/0.1/>. </li></ul></ul></ul><ul><ul><ul><li>@prefix dc <http://purl.org/dc/elements/1.1/>. </li></ul></ul></ul><ul><ul><ul><li><http://d-nb.info/96327841X> </li></ul></ul></ul><ul><ul><ul><li>dc:title &quot;Zettelwirtschaft&quot; ; </li></ul></ul></ul><ul><ul><ul><li>dc:creator <http://d-nb.info/gnd/13150794X> . </li></ul></ul></ul><ul><ul><ul><li><http://d-nb.info/gnd/13150794X> </li></ul></ul></ul><ul><ul><ul><li>foaf:firstName &quot;Markus&quot; ; </li></ul></ul></ul><ul><ul><ul><li>foaf:secondName &quot;Krajeski&quot; . </li></ul></ul></ul>dc:creator <ul><ul><ul><li>http://d-nb.info/gnd/13150794X </li></ul></ul></ul>Zettelwirtschaft Krajewski Markus <ul><ul><ul><li>http://d-nb.info/96327841X </li></ul></ul></ul>
  25. 26. RDF/XML format <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?> <rdf:RDF xmlns: rdf =&quot; http://www.w3.org/1999/02/22-rdf-syntax-ns# &quot; xmlns: foaf=&quot;http://xmlns.com/foaf/0.1/&quot; xmlns: dc=&quot;http://purl.org/dc/elements/1.1/&quot; > <rdf:Description about=&quot; http://d-nb.info/96327841X &quot;> < dc:title >Zettelwirtschaft</dc:title> < dc:creator rdf:Resource=&quot; http://d-nb.info/gnd/13150794X &quot; /> </rdf:Description> <rdf:Description about=&quot; http://d-nb.info/gnd/13150794X &quot;> < foaf:firstName >Markus</foaf:firstName> < foaf:secondName >Krajewski</foaf:secondName> </rdf:Description> </rdf:RDF>
  26. 27. <ul><li>initiative to connect and publish open collections of data with RDF on the Web </li></ul><ul><li>one of largest collections and main hub: </li></ul><ul><ul><li>DBpedia (http://dbpedia.org) </li></ul></ul>
  27. 28. DBPedia Extraction framework <ul><li>http://dbpedia.svn.sourceforge.net (Open Source) </li></ul>Wikipedia Extraction Triple Store
  28. 29. DBPedia Extraction framework <ul><li>core ontology </li></ul><ul><li>people </li></ul><ul><li>places </li></ul><ul><li>organizations </li></ul><ul><li>events </li></ul><ul><li>works </li></ul><ul><ul><li>specific infoboxes </li></ul></ul><ul><ul><li>Parsers for each field </li></ul></ul>RDF Triples
  29. 30. Crowd Sourced Extraction Wikipedia Extraction Triple Store
  30. 32. Linked Data <ul><li>Benutze URIs , um Objekte zu identifizieren. </li></ul><ul><li>Benutze HTTP URIs , so dass Objekte nachgeschlagen werden können. </li></ul><ul><li>Wenn jemand eine URI nachschlägt, stelle zweckdienliche Informationen bereit. </li></ul><ul><li>Biete Links zu anderen URIs , so dass weitere Objekte nachgeschlagen werden können. </li></ul>Tim Berners Lee (2006): Linked Data Design Issues http://www.w3.org/DesignIssues/LinkedData.html
  31. 33. Mai 2007 März 2009 September 2008
  32. 35. More complex queries <ul><li>examples </li></ul><ul><ul><li>people born in 1965 that contributed music in films </li></ul></ul><ul><ul><li>books about these people </li></ul></ul><ul><li>notation in SPARQL (“SQL for RDF“) </li></ul><ul><li>one of several ways to access Semantic Web </li></ul>
  33. 36. Beispielanfrage 1965 1965 <ul><ul><ul><li>Filme , deren Musik jemand gemacht hat, der im Jahr 1965 geboren wurde? </li></ul></ul></ul>Problem: Die Prädikate „music“ (hat-darin-Musik- gemacht) und „born“ (ist-geboren-im-Jahr) müssen bekannt und einheitlich verwendet werden! „ Dancer in the Dark“ „ Björk“ ? ?
  34. 37. Vernüpfungen von Quellen 1965 OPAC DBpedia dbpedia:birthYear Dancer in the Dark Björk Buch über Björk PND:119525054
  35. 38. Inference rules <ul><li>if ... then also ... </li></ul><ul><li>a frbr:creator B => B rdf:type frbr:Work </li></ul><ul><li>danger of inference and discrimination </li></ul><ul><ul><li>Bowker and Star (1999): Sorting Things out. Classification and its consequences </li></ul></ul><ul><ul><li>Voss (2007): The Semantic Web and why Wikipedia should bother. </li></ul></ul><ul><li>reality is fuzzy, data is not </li></ul>
  36. 39. Semantic Tagging <ul><li>assign controlled concepts to resources </li></ul><ul><li>subject indexing reinvented </li></ul><ul><li>practised at BBC (!) with DBPedia concepts </li></ul><ul><li>SKOS and CommonTags ontology </li></ul>
  37. 41. Open Issues <ul><li>user interfaces for query and display </li></ul><ul><li>data quality (needs humans) </li></ul><ul><li>fuzzy concepts and mapping (e.g. languages) </li></ul><ul><li>versioning and changes </li></ul><ul><li>underestimated regularly </li></ul><ul><li>interesting research topics </li></ul>
  38. 42. References <ul><li>Wikipedia itself (practise editing and discussion!) </li></ul><ul><li>Kinzler (2008): Automatischer Aufbau eines multilingualen Thesaurus durch Extraktion semantischer und lexikalischer Relationen aus der Wikipedia . < http://brightbyte.de/page/WikiWord > </li></ul><ul><li>Kobilarov, Bizer, Auer & Lehmann (2009): DBpedia - A Linked Data Hub and Data Source for Web and Enterprise Applications . < http://www2009.eprints.org/228/ > < http://videolectures.net/www09_kobilarov_dbpldh/ > </li></ul><ul><li>Voß (2006):  Collaborative thesaurus tagging the Wikipedia way. < http://arxiv.org/abs/cs/0604036 > </li></ul>
  39. 43. What do you think?

×