1. Semantic Wikis Social Semantic Web In Action 2011-03-25 Specially Prepared for Tsinghua University Alumni in greater Seattle area for centennial celebration
4. What does Vulcan do 4 Vulcan Inc. was established in 1986 by investor and philanthropist Paul G. Allen, co-founder of Microsoft, to manage his business and philanthropic efforts. Allen is chairman of Vulcan and his sister, Jody Allen, is president and CEO.
6. Now the Vision Continues as Project Halo 6 Automatic Question Answering System Project Halo is a staged, long-range research effort by Vulcan Inc. towards the development of a "Digital Aristotle"—a reasoning system capable of answering novel questions and solving advanced problems in a broad range of scientific disciplines and related human affairs. The project focuses on creating two primary functions: a tutor capable of instructing and assessing students in those subjects, and a research assistant with broad, interdisciplinary skills to help scientists and others in their work.
7. Project Halo’s Focus Areas 7 Knowledge Acquisition Plus other related semantic technologies and commercial efforts
8. Project Halo’s Goals Address the core problems in Knowledge Bases scale brittleness Have high impact 8 Now Future
11. 11 Outline Wikis and Semantic Wikis Crowdsourcing and Consensus on Data Semantic MediaWiki and its Extensions Wiki-based Knowledge Management on a Larger Scale Enterprise Knowledge Management Semantic Encyclopedia Evolving as a Web Application Development Platform Examples: Semantic Football, Agile Project Management
12. A Key Feature of Wiki 12 Consensus This distinguishes wikis from other publication tools
13. Consensus in Wikis Comes from Collaboration ~17 edits/page on average in Wikipedia (with high variance) Wikipedia’s Neutral Point of View Convention Users follow customs and conventions to engage with articles effectively 13
14. Software Support Makes Wikis Successful Trivial to editby anyone Tracking of all changes, one-step rollback Every article has a “Talk” page for discussion Notification facility allows anyone to “watch” an article Sufficient security on pages, logins can be required A hierarchy of administrators, gardeners, and editors Software Bots recognize certain kinds of vandalism and auto-revert, or recognize articles that need work, and flag them for editors 14
15.
16. How Wikipedia Answers – List! 16 http://en.wikipedia.org/wiki/List_of_fastest_cars_by_acceleration
23. Static List, Tables, …, Not Useable Enough 23 http://en.wikipedia.org/wiki/List_of_lists_about_Oregon
24.
25. Sci-Fi movies made after year 2000 that cost less than $10M and gross more than $30M
26. A map showing where all Mercedes-Benz vehicles are manufactured
27. All skyscrapers in China (Japan, Thailand,…) of 50 (40/60/70) floors or more, and built in year 2000 (2001/2002) and after, sorted by built year, floors…, grouped by cities, regions…
28. And many moreWe need structured data with clear and consistent semantics 24
29. What is a Semantic Wiki A wiki that has an underlying model of the knowledge described in its pages. To allow users to make their knowledge explicit and formal Semantic Web Compatible 25 Semantic Wiki Hybrid ... Better Gas Mileage!
32. List of Semantic Wikis AceWiki ArtificialMemory Wagn - Ruby on Rails-based KiWi – Knowledge in a Wiki Knoodl – Semantic Collaboration tool and application platform Metaweb - the software that powers Freebase OntoWiki OpenRecord PhpWiki Semantic MediaWiki - an extension to MediaWiki that turns it into a semantic wiki Swirrl - a spreadsheet-based semantic wiki application TaOPis - has a semantic wiki subsystem based on Frame logic TikiWiki CMS/Groupware integrates Semantic links as a core feature zAgile Wikidsmart - semantically enables Confluence 28
33. Basics of Semantic Wikis Still a wiki, with regular wiki features Category/Tags, Namespaces, Title, Versioning, ... Typed Content (built-ins + user created, e.g. categories) Page/Card, Date, Number, URL/Email, String, … Typed Links (e.g. properties) “capital_of”, “contains”, “born_in”… Querying Interface Support E.g. “[[Category:Member]] [[Age::<30]]” (in SMW) 29
34. Short History of Semantic MediaWiki Born at AIFB Typed links and types and more Export articles as RDF Maximally flexible for the wiki user SMW 0.1 released by AIFB in Sept 2005 Parser/storage support for typed links – [[type::link | label]] FactBox for semantic relations at end of article Special:SearchSemantic, with basic auto-completion for link types Simple query language (“ask”) Vulcan kicks off Halo Extensions to SMW project in August 2007 SMW 1.0 released by AIFB in Dec 2007, Ontoprise releases Halo Extension 1.0 in parallel “Property” instead of “Relation” and “Attribute” Many new datatypes/special pages/UI features 30
35. Semantic MediaWiki (SMW) Markup Syntax 31 [[Property::Value | Display]] Tsinghua is a university located in [[Has location::Beijing]], with [[Has population::27000|about 27 thousands]] students. In page "Property:Haslocation": [[Has type::Page]] In page "Property:Haspopulation": [[Has type::number]]
36. Special Properties “Has Type” is a pre-defined “special” property for meta-data Example: [[Has type::String]] “Allowed Values” is another special property [[Allows value::Low]], [[Allows value::Medium]], [[Allows value::High]] In Halo Extensions, there are domain and range support RDFs expressivity Semantic Gardening extension also supports “Cardinality” 32
37. Define Classes 33 Beijing is a city in [[Has country::China]], with population [[Has population::2,200,000]]. [[Category::Cities]] Categories are used to define classes because they are better for class inheritance. The Jin Mao Tower (金茂大厦) is an 88-story landmarksupertallskyscraper in … [[Categories: 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore, Owings and Merrill buildings]] Category: Skyscrapers by country Category:Skyscrapers in China
38. Database-style Query over Wiki Data 34 Example: Skyscrapers in China higher than 50 stories, built before 2000 ASK/SPARQL query target {{#ask: [[Category:Skyscrapers]] [[Located in::China]] [[Floor count::>50]] [[Year built::<2000]] … }} Data via DBpedia
39.
40. Context dependent adaptation and presentationdifferent domains have different ways of presenting content, personal preferences, etc.
43. 37 Challenges on Data Consensus Data modeling is (seemingly) a specialized skill Finding disagreements in data is difficult Consistently revising data schemas is difficult Consistency of schema information (“Population”, “Pop”, “Number_of_inhabitants”, etc...) Consistency of types, units of measure, application of rules… Semantics/interpretation of properties need explanation for humans …
44. One Key Helpful Feature of Semantic Wikis 38 Semantic Wikis are “Schema-Last” Databases require DBAs and schema design; Semantic Wikis develop and maintain the schema in the wiki
45. Semantic MediaWiki Community Open source (GPL) Well documented Active mailing list Commercial support available World-wide community Regular Conferences Next SMWCon 4/28-30, 2011 Arlington, VA 39 http://semantic-mediawiki.org/ Very stable SMW core Mature while still growing, slowly but steadily
47. Example: Ultrapedia – Semantic Wikipedia Ultrapedia: An SMW demo built to explore general knowledge acquisition in a wiki Wikipedia merged with the power of a database Help Readers and Writers Be More Productive 41 An Analytical Encyclopedia
71. Template:Run Source Code <noinclude> This is the 'Run' template. It should be called in the following format: <pre> {{Run |Running Back= |Run Direction Type= |Yardage= |Run of X Yards= |Result of Run Type= }} </pre> Edit the page to see the template text. </noinclude> <includeonly> {| class="wikitable" {{#if:{{{Running Back|}}}| ! Running Back {{!}} {{#arraymap:{{{Running Back|}}}|,|x|[[Running Back::x]]}} {{!}}- }} {{#if:{{{Run Direction Type|}}}| ! Run Direction Type {{!}} {{#arraymap:{{{Run Direction Type|}}}|,|x|[[Run Direction Type::x]]}} {{!}}- }} {{#if:{{{Yardage|}}}| ! Yardage {{!}} {{#arraymap:{{{Yardage|}}}|,|x|[[Yardage::x]]}} {{!}}- }} {{#if:{{{Run of X Yards|}}}| ! Run of X Yards {{!}} [[Run of X Yards::{{{Run of X Yards|}}}]] {{!}}- }} {{#if:{{{Result of Run Type|}}}| ! Result of Run Type {{!}} {{#arraymap:{{{Result of Run Type|}}}|,|x|[[Result of Run Type::x]]}} }} |} [[Category:Play]] </includeonly> 53
83. Showcase: RPI Map 60 RPI Map http://map.rpi.edu A mash-up map application based on Semantic MediaWiki Provides location-based information in the RPI campus Integrates data from various external sources Visualizes integrated data using Google Map
84. Social Semantic Web Applications 61 Omitting x examples, y pictures and z lines of text…
95. Vulcan Project Management Wiki (Task) 64 Color codes to indicate types and status SVN Integration automatically “Completed” task and relate to repository
97. Screenshot of a Sprint page 66 Data automatically generated via template queries on page http://wiking.vulcan.com/dev/index.php/Sprint_101020
98. Requirements for Wiki “Developers” 67 One need not Write code like a hardcore programmer Design, setup RDBMS or make frequent schema changes Possess knowledge of a senior system admin Instead one need Configure the wiki with desired extensions Design and evolve the data model (schema) Design Content Customize templates, forms, styles, skin, etc.
99. Effectiveness of SMW as a Platform Choice SMW + Extensions Packaged Software Custom Development ☺ Still quick to program ☺ Easy to customize ☺ Low-moderate cost Vulcan Project Wiki B.L.S. RPI map ☺Very quick to obtain N Hard to customize N Expensive Microsoft Project Version One Microsoft SharePoint N Slow to develop ☺Extremely flexible N High cost to develop and maintain .NET Framework J2EE, … Ruby on rails 68
100. Openness of SMW as a Platform 69 Open Source Open Content Open Metadata
101. Other SMW+ use? Collaboration applications were conceived as desktop apps Then wikis made the web collaborative Now the action is in mobile apps 70
102. Potential to Build Many More Apps Why are there 300,000+ apps in the iPhone App Store? UI limitations drive specificity in apps People personalize their phones But better browser technologies are shrinking the gap between native apps and web pages HTML5, JavaScript, etc. SMW is a tool to build apps! Collaborative: social semantic in nature Data flow and report driven Cheap to customize and rapidly deployable High signal-to-noise ratio for the users Vulcan is investigating this concept 71
103. Summary: Application Platform by SMW+ Extensions Semantic MediaWiki + wide range of extensions make it a potential application development platform for social semantic web SMW + extensions provide a choice that fits into cost-effective sweet spot SMW + extensions could become a great platform for social semantic web application development, with more Extensions, Widgets and Applications 72 There is an app for it!
104. 73 Conclusions: Semantic MediaWiki is a Powerful Tool Semantic MediaWiki+ (http://smwforum.ontoprise.com) Open-source, growing semantic wiki software system Wiki-style text + semantic markups Collaborative, user-governed subject models and data curation Simple and extensible data models with easy import/export SMW+ has many government and industry users People built applications with it Knowledge Management viacrowds can work A way to leverage and exploit web-collected data A lightweight collaborative knowledge management tool A new platform for lightweight web application development Now Future
107. Case Study: Battle-space Luminary System Discover when New Information represents a change in understanding of entities Discovery of explicit entity links, implicit relationships Large Volumes of Data in various formats Unstructured news articles Tactical Reports, Field Intelligence Structured Database Information Use Wiki Pages to represent current knowledge about an entity – “what we know” Domain Ontology to represent domain of information – “what we want to know” Issue Alerts when Significant Events occur New information according to category Changing information on topics of interest Need to send information to various devices – cell phones, email, etc. 76
108. System Design Wiki Configuration Semantic MediaWiki: Large developer community, active development, open source. Wikipedia uses MediaWiki, so scalability and performance are important. Semantic Results Format: Provides various rich media displays of semantic information, including graphs, timelines, maps Semantic Forms: Provides convenient user interface for entering semantic data into wiki, avoiding cumbersome wikitext Semantic Notifications: Enables sending of notifications when results of semantic query change. Domain Ontology Created OWL Ontology for Terrorism Semantic Parsing, Extraction, Reasoning Java Process using various Open-Source Toolkits Rapid plugin of new technologies Multiple Data Sources supported 77
110. Wiki Content Design Use Templates to Ensure Consistent Look-and-Feel Templates Correspond to Ontology Classes Fields within Templates correspond to Properties within Ontology Rich Content Visualizations derived in consistent way Hierarchical Categories match Class Hierarchy within Ontology Ensures Validity for Properties Category included on each Template page to ensure consistency FormsProvide ability for users to enter data directly into wiki without knowing Wiki Text Each form corresponds to a Template Fields within forms correspond to the fields/properties within the Template GUI can include auto-completion Created Page immediately linked semantically to rest of Wiki 79
111. Sample Visualizations 80 UI enables notifications based on results of query – message sent when visualization changes Visualizations automatically created w/o user edit (tables, timelines, maps, social networks…)
----- Meeting Notes (3/24/11 15:29) -----Vulcan is the MothershipProviding funds and supportPaul Allen successful
Of course once you have data, Ultrapedia can support data visualizations. This is a simple Flash-based chart widget based on the same Porsche 996 data, and included in Ultrapedia’s Porsche 996 page.It shows us that while acceleration varies dramatically, top speed and peak engine power remain fairly constant across models.The chart was specified manually with a query. There are of course a huge number of possible ways to chart a set of data, and most of these ways are uninteresting.In the Ultrapedia concept, we rely on article authors to specify interesting charts for their readers that will support the particular points in the article.
Of course once you have data, Ultrapedia can support data visualizations. This is a simple Flash-based chart widget based on the same Porsche 996 data, and included in Ultrapedia’s Porsche 996 page.It shows us that while acceleration varies dramatically, top speed and peak engine power remain fairly constant across models.The chart was specified manually with a query. There are of course a huge number of possible ways to chart a set of data, and most of these ways are uninteresting.In the Ultrapedia concept, we rely on article authors to specify interesting charts for their readers that will support the particular points in the article.
Of course once you have data, Ultrapedia can support data visualizations. This is a simple Flash-based chart widget based on the same Porsche 996 data, and included in Ultrapedia’s Porsche 996 page.It shows us that while acceleration varies dramatically, top speed and peak engine power remain fairly constant across models.The chart was specified manually with a query. There are of course a huge number of possible ways to chart a set of data, and most of these ways are uninteresting.In the Ultrapedia concept, we rely on article authors to specify interesting charts for their readers that will support the particular points in the article.
But, did you know that Uusikaupunki, Finland, is a major hub for Porsche manufacturing?Ultrapedia allows us to drill down to look at Finland’s contribution to Porsche production.
The problem we are going to solve is “find the 0-60 times of all Porsche cars in Wikipedia”This is a sample Wikipedia page for the Porshe 996, showing its acceleration times in a performance data table.This table is manually built – all the table data exists as constants in the table.
This is a Wikipedia page showing 0-60 times for the Porsche Cayenne.If we have to manually go through every Porsche model to assemble the 0-60 data for each model and type, this is going to take a while.A better idea is to treat Wikipedia like a database, and simply query it. Enter Ultrapedia.
This is the Ultrapedia home page.
First notice that Ultrapedia can leverage all the data it extracts from Wikipedia to support a much more helpful UI.For example, Ultrapedia adds a manufacturer-based navigation system on the side, and show explanatory popups. These kinds of UI tweaks aren’t possible with MediaWiki now, and are an important benefit of having the semantic data.
Remember that we want to find the 0-60 acceleration data for all Porsche models that Wikipedia knows about.Let’s start by looking at a query generated table on the Ultrapedia Porsche 996 page. For comparison, Ultrapedia also includes the original performance table from Wikipedia (above)
This is Ultrapedia’sPorsche 996 performance table, built by a query to the Ultrapedia database of Wikipedia-extracted data.Notice that it has the same information that the original static table has, this is because we scrape the data from the static table.This table is dyamically generated at each page load out of the extracted Wikipedia data, so it is always up to date.It is sortable and also accepts feedback and ratings on individual data items.
Now we can answer our question about 0-60 times across all Porsche models with one simple query in Ultrapedia. We can make this an Ultrapedia-only page – the page itself just 5 queries on it (one for each acceleration range).We could also do this as one big table but it’s easier to read as 5 smaller tables.All the data here flows from Wikipedia.
Of course once you have data, Ultrapedia can support data visualizations. This is a simple Flash-based chart widget based on the same Porsche 996 data, and included in Ultrapedia’s Porsche 996 page.It shows us that while acceleration varies dramatically, top speed and peak engine power remain fairly constant across models.The chart was specified manually with a query. There are of course a huge number of possible ways to chart a set of data, and most of these ways are uninteresting.In the Ultrapedia concept, we rely on article authors to specify interesting charts for their readers that will support the particular points in the article.
We can also use the data to dynamically link to other data sources. In this case we have configured the Ultrapedia Porsche 996 article to include a live ebay query to find out what the Porsche 996 sells for today…We access the ebay data through a web services interface.We can do this for arbitrary other web-service-accessible data sources, like amazon or geonames.In a government or enterprise context, we would link articles to supporting data from appropriate systems of record.
I don’t think I’ll be buying one… I think I’d rather send my daughter to college.
Pictures automatically get metadata, so Ultrapedia can deliver an iPod-like “cover flow” browsing experience with images to augment the table data. We could also embed images or videos in the tables.
Since Ultrapedia includes some simple internal logic about time, we can generate simple browsable timelines and use them in articles.Here we see a timeline of VW models.
But, did you know that Uusikaupunki, Finland, is a major hub for Porsche manufacturing?Ultrapedia allows us to drill down to look at Finland’s contribution to Porsche production.