Jakob Voss Wikipedia2007

3,438 views

Published on

Published in: Technology, News & Politics
1 Comment
4 Likes
Statistics
Notes
  • I created this presentation before using Slideshare. My other slides are available at http://www.slideshare.net/nichtich (currently more about digital libraries and metadata and more in German).
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,438
On SlideShare
0
From Embeds
0
Number of Embeds
53
Actions
Shares
0
Downloads
0
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Jakob Voss Wikipedia2007

  1. 1. The Semantic Web and why Wikipedia should bother Jakob Voß Wikimania 2007 Taipei, Taiwan, 2007-08-03
  2. 2. Agenda (1) The Semantic Web (2) Wikipedia’s contribution (3) Examples and problems (4) Possible solutions
  3. 3. The Semantic Web Everything can be linked via its URI ● Every data in triples with typed links ● Image taken from: Semantic Wikipedia (2006)
  4. 4. The Semantic Web Ontologies define ● common structures and rules More data is generated by aggregation ● and reasoning on distributed data from several sources Software agents understand your ● commands, aggregate, reason, decide and act independently (at least in theory)
  5. 5. Wikipedia’s contribution Largest source of freely available ● non-specialized data Templates and categories ● contain structured data Persondata – DBpedia.org – Geodata – ... – Semantic MediaWiki ● adds typed links and attributes
  6. 6. Aggregating and Reasoning
  7. 7. Aggregating and Reasoning Which polish authors are currently most published in Germany?
  8. 8. Aggregating and Reasoning Which polish authors are currently most published in Germany? Currently published in Germany ● List of published books by book vendors – or by the German National Library
  9. 9. Aggregating and Reasoning Which polish authors are currently most published in Germany? Currently published in Germany ● Authors ● National Library catalouge contains author – and uniquely identifies author by PND-ID
  10. 10. Aggregating and Reasoning Which polish authors are currently most published in Germany? Currently published in Germany ● Authors ● Polish authors ● German Wikipedia contains PND => article – Article linked via Interwiki => more articles – Biographical articles contain place of birth – Place of birth linked to country via category –
  11. 11. Aggregating and Reasoning subject predicate object Publication published-in Germany Publication has-author Person Person born-in Town Town place-in Poland
  12. 12. Where is Poland?
  13. 13. Where is Poland? Somewhere here
  14. 14. Where is Poland? Somewhere here Or five times here in Maine, Ohia, or NY
  15. 15. Where is Poland? Somewhere here Or five times here in Maine, Ohia, or NY Or did you mean Poland, Kiribati?
  16. 16. Poland around 1619 Polish-Lithuanian Commonwealth
  17. 17. Poland 1772...1793..1795
  18. 18. Poland 1945–
  19. 19. Where is Poland? Reality is complex, confusing, and fuzzy ● What’s the »default« Poland? ● Humans can look up context in Wikipedia ● Semantic Web only consists of statements ●
  20. 20. Example #2 Presidents of the United States Bill Clinton 1993-01-20 – 2001-01-20 ●
  21. 21. Example #2 Presidents of the United States Bill Clinton 1993-01-20 – 2001-01-20 ● George W. Bush 2001-01-20 – 2009-01-20 ●
  22. 22. Example #2 Presidents of the United States Bill Clinton 1993-01-20 – 2001-01-20 ● George W. Bush 2001-01-20 – 2009-01-20 ● Barack Obama 2009-01-20 – ●
  23. 23. Example #2 Presidents of the United States Bill Clinton 1993-01-20 – 2001-01-20 ● George W. Bush 2001-01-20 – 2009-01-20 ● Barack Obama 2009-01-20 – 2013-01-20 ● A. Schwarzenegger 2013-01-20 – ●
  24. 24. Presidents of the United States George W. Bush 2001-01-20 – 2002-06-29 ● Dick Cheney 07:09 – 09:24 a.m. ● George W. Bush 2002-06-29 – 2007-07-21 ● Dick Cheney 07:14 – 09:21 a.m. ● George W. Bush 2007–07-21 – ● Twice president of the US (see 25th amendment)
  25. 25. Presidents of the United States The devil is in the details ;-) ● Automatic reasoning will ● give you inconvenient results
  26. 26. Example #3 Finally a clear division 女性 男性 XX XY
  27. 27. So let’s formalize... owl:disjointWith ”Classes may be stated to be disjoint from each other. For example, Man and Woman can be stated to be disjoint classes. [...] a reasoner can deduce that if A is an instance of Man, then A is not an instance of Woman.“ OWL Web Ontology Language Guide http://www.w3.org/TR/owl-guide/
  28. 28. A clear division? Other chromosal sexes (karotype) Turner syndrome (X_), Trisomy X... ● Klinefelter syndrome (XXY), XYY-Syndrome ... ●
  29. 29. A clear division? Other chromosal sexes (karotype) Turner syndrome (X_), Trisomy X... ● Klinefelter syndrome (XXY), XYY-Syndrome ... ● Intersexuality, Hermaproditism Chromosomal sex inconsistent with phenotypic ● sex or phenotype is not just male or female
  30. 30. A clear division? Other chromosal sexes (karotype) Turner syndrome (X_), Trisomy X... ● Klinefelter syndrome (XXY), XYY-Syndrome ... ● Intersexuality, Hermaproditism Chromosomal sex inconsistent with phenotypic ● sex or phenotype is not just male or female Gender identity Gender with which a person identifies ● independent from biological sex.
  31. 31. A clear division? Reality is far more complicated ● Many kinds of exceptions ●
  32. 32. Problems Clear divisions discriminate ● Discussion and context gets lost ● Example #4 ● IF your name = X AND X on a list of suspected terrorists THEN you have a problem
  33. 33. Not our problem? Ẁikipedia is already used as ● source by millions of people People can think, judge and ask, ● computers cannot We create definitions that will be used in ● thousands of applications Statistics lie ● Aggragation/resoning even lies better
  34. 34. Possible Solutions More of all (data, aggregation, reasoning) ● Less of all ● Statements about statements ● Fuzzy logic ● Data provenance / data lineage ● Allow exceptions ● Teach people to be careful ● Do not expect or believe simple answers ● It’s just dirty data ●
  35. 35. Summary Semantic Web is great ● Reality is based on exceptions ● Simplification is useful but dangerous ● Data POV != NPOV ● We also bear responsability for ● stupid use of Wikipedia data Never stop analyzing and thinking ● instead of relying on computers
  36. 36. More to read Shadbolt, Berners-Lee, and Hall: The Semantic Web ● Revisited. IEEE Intelligent Systems 21 (3) pp. 96-101. May/June 2006. http://eprints.ecs.soton.ac.uk/12614/01/Semantic_Web_Revisted.pdf Völkel, Krötzsch, Vrandecic, Haller, and Studer: ● Semantic Wikipedia. Proceedings of the WWW2006. http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ_id= Doctorow: Metacrap: Putting the torch to seven straw- ● men of the meta-utopia. August 2001. http://www.well.com/~doctorow/metacrap.htm Geoffrey and Star: Sorting Things Out: Classification ● and Its Consequences. MIT Press, 1999.

×