Wikipedia as source of collaboratively created Knowledge Organization Systems

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites

    Wikipedia as source of collaboratively created Knowledge Organization Systems - Presentation Transcript

    1. Digitale Bibliothek Jakob Voß Wikipedia as source of collaboratively created Knowledge Organization Systems Fachhochschule Hannover 25. Juni 2009
    2.  
    3. Wikipedia
      • de facto standard online reference
      • > 13 million articles, > 230 languages
      • run by Wikimedia, run with MediaWiki
      • open content (CC-BY-SA / GFDL)
      • it‘s a wiki!
        • dense hypertext
        • anyone can edit (but it‘s a media of its own)
        • http://de.wikipedia.org/wiki/Portal:BID
    4. Structure of Wikipedia
      • articles
      • internal and external links
      • redirects and disambiguation pages
      • lists, portals, and navigation templates
      • categories
      • infoboxes and geodata
      • (bibliographic) references
      • revisions, flags, featured content ....
    5. Articles
      • text, intro, substructure
      • specific structure for specific article types (years, people etc.)
    6. Links
      • [[target]] or [[target|label]]
      • connect on textual and conceptual level
      • structure of hyperlinks encodes relations
    7. External links
      • links to references
      • links to other structured knowledge bases
        • authority files (for instance PND)
        • MusicBrainz, IMDB ...
      • interlanguage links to other wikipedias
    8. Redirects and disambiguations
      • control synonyms and homonys
    9. Redirects and disambiguations
    10. Lists and Portals
      • list: lead section followed by a list of links to articles in a particular subject area, such as people or places, or a timeline of events
        • List of _ , Outline of _, Glossary of _, Timeline of _, Index of _ ...
      • portal: intended to serve as “Main Pages” for specific topics or areas. May be associated with one or more WikiProjects.
        • en: ~140 featured portals of ~600 total
        • http://en.wikipedia.org/wiki/Portal:Featured_portals
    11. Navigation templates
      • grouping of links used in multiple related articles to facilitate navigation between those articles.
    12. Categories Nordrhein-Westfalen nach Ort Ort als Thema Rheinland Köln Kultur (Köln) Kölner Dom Geschichte Kölns Messe Köln Multihierarchie of categories Tagged article (social tagging, set model) Kategorien : Katholische Bischhofskirche (Deutschland) | Kölner Dom | Weltkulturerbe in Deutschland | Geschütztes Kulturgut | Architekturikone | Gotisches Bauwerk | Historisches Bauwerk | Stadtbezirk Köln-Innenstadt | Kultbau
    13. Categories
    14. Infoboxes and Geodata
      • structured tables via MediaWiki Templates, a simple field-value-structure
      • used for cities, animals, bands, chemicals ...
      • qualifiers problematic: date, unit, source...
      • special and popular case: geographical coordinates
        • this and following slides based on Georgi Kobilarov’s presentation.
    15. Field values are not atomic
    16. References
      • vast amount of bibliographic data
      • Wikipedia cataloguing rules (sic!)
      • partly structured via templates:
    17. Examples without templates
    18. Revisions and other metadata
      • Information about articles
        • which user changed what an which time
        • flagged revisions
        • featured content
        • ...
      • Interesting data available for wiki research
      • Wikipedia is/are not just articles but a struc-tured system of knowledge management
      • And all of it is availabe for further processing!
      • Use as Knowledge Organization System (KOS)
        • WikiWord
        • DBPedia
        • Semantic Tagging
        • ...it‘s up to you!
      Summary
    19. WikiWord
      • WikiWord builds a multilingual thesaurus by mining the link structure
        • Every page describes a concept
        • Link labels are terms refering to those concepts
        • Links and categories define relations
        • Multilingual by merging languages
      • German Thesis by Daniel Kinzler
        • http://brightbyte.de/page/WikiWord
    20. WikiWord Thesaurus
      • English, German, French, Dutch, Norwegian
        • >20 millionen labels
        • >11 millionen concepts
        • >2 millionen definitions
        • >75 millionen related links
        • >11 millionen hierarchical links
      • Available in SKOS/RDF
      • Source code available to generate more
      • RDF is URI + Unicode + Triples [ + Rules]
      "Object" @lang "Object" ^^type-URI Resource Description Framework predicate
    21. RDF example (this: SKOS) "Ananas"@de
      • URI namespaces for abbreviation
      @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix agro: <http://www.fao.org/aos/agrovoc> .
    22. RDF formats dc:title foaf:firstName foaf:secondName N3 graph
          • @prefix foaf <http://xmlns.com/foaf/0.1/>.
          • @prefix dc <http://purl.org/dc/elements/1.1/>.
          • <http://d-nb.info/96327841X>
          • dc:title &quot;Zettelwirtschaft&quot; ;
          • dc:creator <http://d-nb.info/gnd/13150794X> .
          • <http://d-nb.info/gnd/13150794X>
          • foaf:firstName &quot;Markus&quot; ;
          • foaf:secondName &quot;Krajeski&quot; .
      dc:creator
          • http://d-nb.info/gnd/13150794X
      Zettelwirtschaft Krajewski Markus
          • http://d-nb.info/96327841X
    23. RDF/XML format <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?> <rdf:RDF xmlns: rdf =&quot; http://www.w3.org/1999/02/22-rdf-syntax-ns# &quot; xmlns: foaf=&quot;http://xmlns.com/foaf/0.1/&quot; xmlns: dc=&quot;http://purl.org/dc/elements/1.1/&quot; > <rdf:Description about=&quot; http://d-nb.info/96327841X &quot;> < dc:title >Zettelwirtschaft</dc:title> < dc:creator rdf:Resource=&quot; http://d-nb.info/gnd/13150794X &quot; /> </rdf:Description> <rdf:Description about=&quot; http://d-nb.info/gnd/13150794X &quot;> < foaf:firstName >Markus</foaf:firstName> < foaf:secondName >Krajewski</foaf:secondName> </rdf:Description> </rdf:RDF>
      • initiative to connect and publish open collections of data with RDF on the Web
      • one of largest collections and main hub:
        • DBpedia (http://dbpedia.org)
    24. DBPedia Extraction framework
      • http://dbpedia.svn.sourceforge.net (Open Source)
      Wikipedia Extraction Triple Store
    25. DBPedia Extraction framework
      • core ontology
      • people
      • places
      • organizations
      • events
      • works
        • specific infoboxes
        • Parsers for each field
      RDF Triples
    26. Crowd Sourced Extraction Wikipedia Extraction Triple Store
    27.  
    28. Linked Data
      • Benutze URIs , um Objekte zu identifizieren.
      • Benutze HTTP URIs , so dass Objekte nachgeschlagen werden können.
      • Wenn jemand eine URI nachschlägt, stelle zweckdienliche Informationen bereit.
      • Biete Links zu anderen URIs , so dass weitere Objekte nachgeschlagen werden können.
      Tim Berners Lee (2006): Linked Data Design Issues http://www.w3.org/DesignIssues/LinkedData.html
    29. Mai 2007 März 2009 September 2008
    30.  
    31. More complex queries
      • examples
        • people born in 1965 that contributed music in films
        • books about these people
      • notation in SPARQL (“SQL for RDF“)
      • one of several ways to access Semantic Web
    32. Beispielanfrage 1965 1965
          • Filme , deren Musik jemand gemacht hat, der im Jahr 1965 geboren wurde?
      Problem: Die Prädikate „music“ (hat-darin-Musik- gemacht) und „born“ (ist-geboren-im-Jahr) müssen bekannt und einheitlich verwendet werden! „ Dancer in the Dark“ „ Björk“ ? ?
    33. Vernüpfungen von Quellen 1965 OPAC DBpedia dbpedia:birthYear Dancer in the Dark Björk Buch über Björk PND:119525054
    34. Inference rules
      • if ... then also ...
      • a frbr:creator B => B rdf:type frbr:Work
      • danger of inference and discrimination
        • Bowker and Star (1999): Sorting Things out. Classification and its consequences
        • Voss (2007): The Semantic Web and why Wikipedia should bother.
      • reality is fuzzy, data is not
    35. Semantic Tagging
      • assign controlled concepts to resources
      • subject indexing reinvented
      • practised at BBC (!) with DBPedia concepts
      • SKOS and CommonTags ontology
    36.  
    37. Open Issues
      • user interfaces for query and display
      • data quality (needs humans)
      • fuzzy concepts and mapping (e.g. languages)
      • versioning and changes
      • underestimated regularly
      • interesting research topics
    38. References
      • Wikipedia itself (practise editing and discussion!)
      • Kinzler (2008): Automatischer Aufbau eines multilingualen Thesaurus durch Extraktion semantischer und lexikalischer Relationen aus der Wikipedia . < http://brightbyte.de/page/WikiWord >
      • Kobilarov, Bizer, Auer & Lehmann (2009): DBpedia - A Linked Data Hub and Data Source for Web and Enterprise Applications . < http://www2009.eprints.org/228/ > < http://videolectures.net/www09_kobilarov_dbpldh/ >
      • Voß (2006):  Collaborative thesaurus tagging the Wikipedia way. < http://arxiv.org/abs/cs/0604036 >
    39. What do you think?

    + nichtichnichtich, 5 months ago

    custom

    526 views, 2 favs, 0 embeds more stats

    Presentation about Wikipedia, WikiWord, DBPedia and more

    More info about this document

    CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

    Go to text version

    • Total Views 526
      • 526 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 13
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories