Semantic Wiki, Great Candidate for Knowledge Acquisition


Published on

this is a high-level pitch deck for knowledge acquisition (KA), beside the textual part. We already decide on matter that we need low level textual entailment based KA, while the high-level part involving more human computation is partially ignored at the point of presentation. This deck is an introduction to social semantic web and let people know how it can help with our KA tasks.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Semantic Wiki, Great Candidate for Knowledge Acquisition

  1. 1. From Text and Data to Knowledge: ViaSemantic WikisThe Social Semantic Web in the SmallJesse Wang
  2. 2. The Bottleneck of AI is Knowledge Acquisition2HumanIntelligenceComputerIntelligence
  4. 4. Connecting both Information and PeopleConnections between peopleConnectionsbetweenInformationEmailSocial NetworkingGroupwareJavascriptWeblogsDatabasesFile SystemsHTTPKeyword SearchUSENETWikisWebsitesDirectory Portals2010 - 2020Web 1.02000 - 20101990 - 2000PC Era1980 - 1990RSSWidgetsPC’s2020 - 2030Office 2.0XMLRDFSPARQLAJAXFTP IRCSOAPMashupsFile ServersSocial Media SharingLightweight CollaborationATOMWeb 3.0Web 4.0Semantic SearchSemantic DatabasesDistributed SearchIntelligent personal agentsJavaSaaSWeb 2.0FlashOWLHTMLSGMLSQLGopherP2PThe WebThe PCWindowsMacOSSWRLOpenIDBBSMMO’sVRSemantic WebIntelligent WebThe InternetSocial WebWeb OS
  5. 5. At Multiple Levels of Understanding5Signal entity (Words)Signal form (Syntax)Signal semantics (Concepts)Categories (taxonomy)StatementsModelsDecision-making
  6. 6. HOW DO WE CAPTURE ALL?At least, the semantics?6
  7. 7. Two Paths for Semantics (>>KB Construction) “Bottom-Up”– Add semantic metadata to pages and databases all over the Web• Alternatively train models to extract above info (machine-assisted)– Every Website becomes semantic• except for those not tagged, trained, or errors “Top-Down”– Experts build models and rules for semantics– Create services that provide this as an overlay to non-semanticWeb– Every website becomes semantic• except for those not covered -- Alex Iskold
  8. 8. Five Approaches to Semantics Tagging Statistics Linguistics Semantic Web Artificial Intelligence
  9. 9. The Tagging Approach Pros– Easy for users to add and read tags– Tags are just strings– No algorithms or ontologies to dealwith– No technology to learn Cons– Easy for users to add and read tags– Tags are just strings– No algorithms or ontologies to dealwith– No technology to learn Technorati Flickr Wikipedia YouTube
  10. 10. The Statistical Approach Pros:– Pure mathematical algorithms– Massively scalable with good trainingdata– Language independent Cons:– No understanding of the content– Hard to craft good queries– Best for finding really popular things –not good at finding needles inhaystacks– Limited by data (esp. quality trainingdata)– Not great for sparse structured datawith strong inherent semantics Google Lucene Autonomy Farecast (Bing Travel)
  11. 11. The Linguistic Approach Pros:– Almost-true language understanding– Extract knowledge from text– Best for search for particular facts orrelationships– More precise queries Cons:– Computationally intensive– Difficult to scale– Lots of special case and other errors– Language-dependent Powerset Hakia Inxight, Attensity, and others…
  12. 12. The Semantic Web Approach Pros:– More precise queries– Smarter apps with less work– Not as computationally intensive– Share & link data between apps– Works for both unstructured andstructured data Cons:– Lack of tools– Difficult to scale– Who makes all the metadata? Radar Networks DBpedia Project Metaweb (Freebase)
  13. 13. The Artificial Intelligence Approach Pros:– Smart in narrow domains– Answer questions intelligently– Reasoning and learning Cons:– Computationally intensive– Difficult to scale– Extremely hard to program– Does not work well outside of narrowdomains– Training takes a lot of work Cycorp AURA (Project Halo)
  14. 14. The Approaches ComparedMake the software smarterMake the Data SmarterStatisticsLinguisticsSemanticWebA.I.Tagging
  15. 15. In PracticeTaggingSemantic WebStatisticsLinguisticsArtificial intelligence
  16. 16. From Tagging to AIData StructureIntelligence16
  17. 17. The Semantic Web is a Key Enabler Moves the “intelligence” out of applications, into the data Data need special structures becomes self-describing; Meaning of data becomes part ofthe data Apps can become smarter with less work, because the datacarries knowledge about what it is and how to use it Data can be shared and linked more easily
  18. 18. The Semantic Web = Open Database Layer for the WebUserProfilesWebContentDataRecordsApps &ServicesAds &ListingsOpen Data MappingsOpen Data RecordsOpen RulesOpen OntologiesOpen Query Interfaces
  19. 19. And The Web IS the Database!Application A Application B
  21. 21. 21
  22. 22. In Every Part or Layer of the Semantic Web, We Need22
  23. 23. Now a Complete Web23
  24. 24. Crowd Wisdom To Best Map Human Knowledge for Human24
  25. 25. Clear Semantics for Machine to Understand Knowledge25
  26. 26. Semantic Wikis: the Social Semantic Web in Action!26SemanticWikis
  27. 27. What is a Wiki? A Key Feature of Wikis is27This distinguishes wikis from other publication tools
  28. 28. Consensus in Wikis Comes from Collaboration– ~17 edits/page on average inWikipedia (with high variance)– Wikipedia’s Neutral Point of View Convention– Users follow customs andconventions to engage witharticles effectively28
  29. 29. Software Support Makes Wikis Successful Trivial to edit by anyone Tracking of all changes, one-step rollback Every article has a “Talk” pagefor discussion Notification facility allows anyoneto “watch” an article Sufficient security onpages, logins can be required A hierarchy ofadministrators, gardeners, andeditors Software Bots recognize certainkinds of vandalism and auto-revert, or recognize articles thatneed work, and flag them foreditors 29
  30. 30. Success of Wikis30Actual number of articles on (thickblue line) compared with a Gompertz model that leadseventually to a maximum of about 4.4 million articles(thin green line)
  31. 31. Summary: What Wiki Is Really AboutQuick and Easy – No downloadLayered Community AuthoringInterlinked Hierarchical ContentRevision ControlNotification
  32. 32. What is a Semantic Wiki A wiki that has an underlying model of theknowledge described in its pages. To allow users to make their knowledge explicit and formal Semantic Web Compatible32Semantic Wiki
  33. 33. Combining Human Knowledge and Data StructuresWikis forMetadataMetadatafor Wikis33
  34. 34. Basics of Semantic Wikis Still a wiki, with regular wiki features– E.g. Category/Tags, Namespaces, Title, Versioning, ... Typed Content– E.g. Page/Card, Date, Number, URL/Email, String, … Typed Links– E.g. “capital_of”, “contains”, “born_in”… Querying Interface Support– E.g. “[[Category:Person]] [[Age::<30]]”34
  35. 35. Advanced Semantic Wiki Features Semantic forms or templates Auto-completion based on semantics Powerful visualizations based on semantics/structures/types Rules and reasoning support Advanced search and queries (facetedsearch, SPARQL, etc.) Semantic notifications (personalized information filtering) Import and Export of Semantic Data Data Integration:identification, disambiguation, merging, trust, security/privacy, …35
  36. 36. Characteristics of Semantic Wikis36
  37. 37. What is the Promise of Semantic Wikis? Semantic Wikis facilitateConsensus over Data(Knowledge) Combine low-expressivitydata authorship with thebest features of traditionalwikis User-governed, user-maintained, user-defined Easy to use as anextension of text authoring37
  38. 38. One Key Helpful Feature of Semantic WikisSemantic Wikis are “Schema-Last”Databases require DBAs and schema design;Semantic Wikis develop and maintain the schema in the wiki
  39. 39. Great Candidate for Knowledge Acquisition Combining both unstructured and semi-structured data High connectivity on both information and social dimensions Collaboration with sophisticated software support Expected low-cost for crowd-sourcing Evolving category and template systems But…39
  40. 40. BUT – Plain Wikis Are Not Good Enoughfor Deep Knowledge Acquisition40Knowledge is representedMOSTLY in unstructured andsemi-structured ways• Plain text• Templates• Infoboxes• Tables• Section headers• Links• References• Redirects• …
  41. 41. Software/Feature Enhancements Are NeededQuick and easy way to view and edit schemaMachine assistence (NLP, Auto-suggest…)Better visualizations with structured dataMore user layers for better KB constructionBetter targeted (semantic) notifications41
  42. 42.  K.A. is the well-known Artificial Intelligence Problem– AI authoring is too expensive, too slow, not scalable Three Possible Solutions– Automatic Machine Parsing (e.g. NELL, ReVerb)• Quality (depth) not good enough for textbook sentences• Error rates are too high• Still need humans in the loop for training data– Crowd Sourced Authoring (e.g. AMT)• Biology and Knowledge Engineering expertise is difficult to get• Mechanical Turk uses individuals, but the Knowledge Entry tasks appear torequire coordination, judgment, discussion, and working together– Social Authoring and Crowdsourcing with Intelligence SoftwareAssistance• Wikipedia showed this could work for text• Semantic Wiki software R&D to make it work for more structured knowledgeBest Bet for Knowledge Acquisition?42
  43. 43. With All These Features…EffectiveKnowledgeacquisition viaSemanticWikisCombine thestrength ofhuman andmachinesConnectingHuman andMachinesHigh Qualitywhile low cost43
  44. 44. Conclusion: To Bridge Machine and Human Intelligence44
  45. 45. To Dive Into Social Semantic Web45
  46. 46. THANK YOU!Credits: some slides are originally from the following people, with little or nomodifications:Nova SpivackDenny VrandecicMark GreavesBao Jie46