1. Semantic Wikis
Social Semantic Web In Action
2011-03-25
Specially Prepared for Tsinghua University Alumni
in greater Seattle area for centennial celebration
7. Project Halo’s Focus Areas
• Automated User-Centered
AURA Reasoning and Acquisition System
• Text book you can talk to
• Semantic Inference with Large
SILK Knowledge-base
• Non-monotonic rule system / RIF
• Semantic MediaWiki +
SMW+ • Knowledge authoring with SMEs
Plus other related semantic technologies and commercial efforts
7
8. Project Halo’s Goals
Address the core problems
in Knowledge Bases
– scale
– brittleness
Have high impact
KB Effort (cost, people,…)
Now
Vulcan
Future
KB size (number of assertions, complexity…)
8
10. Wiki as a Crowdsourcing Tool
This distinguishes wikis from other publication tools
11
11. Consensus in Wikis Comes from
Collaboration
– ~17 edits/page on average in
Wikipedia (with high variance)
– Wikipedia‟s Neutral Point of View
Convention
– Users follow customs and
conventions to engage with
articles effectively
12
12. Software Support Makes Wikis Successful
Trivial to edit by anyone
Tracking of all changes, one-
step rollback
Every article has a “Talk” page
for discussion
Notification facility allows
anyone to “watch” an article
Sufficient security on pages,
logins can be required
A hierarchy of administrators,
gardeners, and editors
Software Bots recognize certain
kinds of vandalism and auto-
revert, or recognize articles that
need work, and flag them for
editors
13
15. How About Hidden Goodies in the Wiki?
Wikipedia has articles
about…
•… all cities
•… their populations
•… their mayors
•… the skyscrapers
So can I ask for a list of
the world‟s 5 largest
cities with a female
mayor?
Or Skyscrapers in
Shanghai with 50+ floors
and built after 2000?
16
16. Enters Semantics…
To answer questions like:
• The female majors of top 10 cities,
sorted by population, starting year,
age…
• All skyscrapers in China (Japan,
Thailand,…) of 50 (40/60/70) floors or
more, and built in year 2000
(2001/2002) and after, sorted by built
year, floors…, grouped by cities,
regions…
• Median (average) base annual salary
of CEOs of Fortune 100 companies in
America (Europe, Asian,…)
• All Porsche Vehicles Made in Germany
that accelerate from 1-100 km/h less
than 4 seconds
• Sci-Fi movies made after year 2000
that cost less than $10M and gross
more than $30M
• A map showing where all Mercedes-
Benz vehicles are manufactured
• And many more
17
17. What is a Semantic Wiki
A wiki that has an underlying model of the
knowledge described in its pages.
To allow users to make their knowledge explicit and formal
Semantic Web Compatible
Semantic Wiki
18
20. List of Semantic Wikis
AceWiki Semantic MediaWiki - an
ArtificialMemory extension to MediaWiki that
Wagn - Ruby on Rails-based turns it into a semantic wiki
KiWi – Knowledge in a Wiki Swirrl - a spreadsheet-based
semantic wiki application
Knoodl – Semantic
Collaboration tool and TaOPis - has a semantic wiki
application platform subsystem based on Frame
logic
Metaweb - the software that
powers Freebase TikiWiki CMS/Groupware
integrates Semantic links as a
OntoWiki core feature
OpenRecord zAgile Wikidsmart - semantically
PhpWiki enables Confluence
21
21. Basics of Semantic Wikis
Still a wiki, with regular wiki features
– Category/Tags, Namespaces, Title, Versioning, ...
Typed Content (built-ins + user created, e.g. categories)
– Page/Card, Date, Number, URL/Email, String, …
Typed Links (e.g. properties)
– “capital_of”, “contains”, “born_in”…
Querying Interface Support
– E.g. “[[Category:Member]] [[Age::<30]]” (in SMW)
22
22. SMW Markup Syntax
Tsinghua is a university located in
[[Has location::Beijing]], with
[[Has population::27,000]] students.
In page "Property:Has location": In page "Property:Has population":
[[Has type::Page]] [[Has type::number]]
24
23. Define Classes
On Page Beijing a city in [[Has
Beijing is
country::China]], with population
One possible solution:
[[Has population::2,200,000]].
– Beijing is a [[Is a::city]]
[[Category::Cities]]
Categories are used to define classes because they are better for class inheritance.
The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall
skyscraper in …
[[Categories: 1998 architecture | Skyscrapers in
Shanghai | Hotels in Shanghai | Skyscrapers over 350
meters | Visitor attractions in Shanghai | Landmarks in
Shanghai | Skidmore, Owings and Merrill buildings]]
Category:Skyscrapers in China Category: Skyscrapers by country
26
24. Database-style Query over Wiki Data
Example: Skyscrapers in China
higher than 50 stories, built before
2000
ASK/SPARQL query target
{{#ask:
[[Category:Skyscrapers]]
[[Located in::China]]
[[Floor count::>50]]
[[Year built::<2000]]
…
}}
27
25. What is the Promise of Semantic Wikis?
Semantic Wikis promise
Consensus over Data
Combine low-expressivity
data authorship with the
best features of traditional
wikis
User-governed, user-
maintained, user-defined
Easy to use as an
extension of text authoring
29
26. One Key Helpful Feature of Semantic Wikis
Semantic Wikis are “Schema-Last”
Databases require DBAs and schema design;
Semantic Wikis develop and maintain the schema in the wiki
31
27. Semantic MediaWiki in 2010
Open source (GPL)
Well documented
Active mailing list
Commercial support available
World-wide community
Regular Conferences
– Next SMWCon 4/28-30, 2011 Arlington, VA
Very stable SMW core
Mature while still growing, slowly but steadily
32
28. SMW Extensions
Data I/O
• Halo Extensions, Semantic Forms, Semantic Notification, …
Query and Browsing
• Semantic Toolbar, Semantic Drilldown, Enhanced Retrieval, Search…
Visualization
• Semantic Result Printers, Tree View, Exhibit, Flash charts…
Other useful extensions
• HaloACL, Deployment, Triplestore Connector, Simple Rules…
• Semantic WikiTags and Subversion Integration extensions
• Upcoming Linked Data Extension, with R2R and SILK from F.U.Berlin
33
29. Wikis Can Help Information Management
Research = Locate and Find Data ?
Business Intelligence
Finding Expertise
Internal Encyclopedia
Documentation
Enterprise Search
Crowd Sourcing is a Great Solution!
37
30. Example I: KnowIT in Johnson & Johnson
Most Frequently Asked Questions: (J&J example)
– What are the directions between two J&J sites?
– What is the meaning of KOL ? HLM ? DRU ?
– What data sources can we use to compare biological pathways?
– Can you give us a list of R&D applications, related servers and
stakeholders and send us an update every six months?
Capture Facts About Things
– Definitions, concepts, questions
– Locations
– Data sources
– Organizations and people
– Technologies and systems
38
32. Example II: Knowledge Encapsulation Framework
Allow modelers to exploit the „information resources‟ they
have and discover new, potentially relevant material across
new media types
KEF aims to provide:
– an effective method for storing, retrieving, reviewing and
annotating your documents
– an environment where you can share these materials with team
members and discuss
– a mechanism to discover new, related information for social and
traditional media
– a means to link this material to model representations to aid
analysis and game-play
Achieved by a semantic wiki enabled with an NLP pipeline
41
35. Example 3: Ultrapedia – An Analytical Semantic Wikipedia
Ultrapedia: An SMW demo built to explore general knowledge
acquisition in a wiki
Wikipedia merged with the power of a database
– Data extracted from Wikipedia Infobox and Table data; stored in RDF
– For Authors: tools to create more compelling articles
• Great visualizations: charts, tables, timelines, photos, analytics
• Always up-to-date across the Encyclopedia
• Encourage data consistency and find data errors
• Link in other web data sources
– For Readers:
• Enhanced articles and data interaction
• Faceted navigation
• Sophisticated queries (both standing and ad-hoc)
Maintenance via the Wikipedia update process
– Data is from the article text, with simple ways for article authors to maintain and
extend it.
– Authors and readers always in the loop for merging, updating, validating,
mapping
45
39. Video: Semantic Wikis for A New Problem
Increasing technical complexity →
← Increasing User Participation
Social tag-based Algorithm-based
Semantic
characterization object
Entertainment
Keyword search over Wiki characterization
tag data Database-style
Inconsistent Social database-style search
semantics characterization Consistent semantics
Easy to engineer Database search + Extremely difficult to
wiki text search engineer
Semantic consistency
via wiki mechanisms
Easy to engineer
55
42. Semantic Entertainment: Query Result Highlight Reel
Commercial
Look/Feel
Play-by-play
video search
Highlight reel
generation
Search on
crowd-defined
patterns
(“touchdowns
with big hits”)
Tree-based
navigation
widget
Very favorable
economics
43. The Inspiration
We started with a
We built a
We now have an
60
44. We CAN Build Applications (Fairly) Easily
With all the extensions of Semantic MediaWiki.
Data I/O
• Halo Extensions, Semantic Forms, Semantic Notification, …
Query and Browsing
• Semantic Toolbar, Semantic Drilldown, Enhanced Retrieval, Search…
Visualization
• Semantic Result Printers, Tree View, Exhibit, Flash charts…
Other useful extensions
• HaloACL, Deployment, Triplestore Connector, Simple Rules…
• Semantic WikiTags and SVN Integration extensions
• Upcoming Linked Data Extension, with R2R and SILK from FUB
61
46. Social Semantic Web Applications
Omitting x examples, y pictures and z lines of text…
67
47. Case Study 2 and Demo: Project Management with SMW+
Automatically
populate tables
Just the data you
want,
At the level you want
Calendars and
timelines
Workflows
Personal menus
Form-oriented inputs
Notifications via
email/RSS
MS Office integration
SVN integration
68
51. Screenshot of a Sprint page
Data automatically generated via template queries on page
http://wiking.vulcan.com/dev/index.php/Sprint_101020
72
52. Requirements for Wiki “Developers”
One need not
– Write code like a hardcore programmer
– Design, setup RDBMS or make frequent schema changes
– Possess knowledge of a senior system admin
Instead one need
– Configure the wiki with desired extensions
– Design and evolve the data model (schema)
– Design Content
• Customize templates, forms, styles, skin, etc.
The bar is dramatically lowered to build applications
– “Source code” is part of the open content of wiki too!
73
53. Effectiveness of SMW as a Platform Choice
Packaged Software SMW + Extensions Custom Development
☺Very quick to ☺ Still quick to N Slow to develop
obtain program ☺Extremely flexible
N Hard to customize ☺ Easy to customize N High cost to develop
N Expensive ☺ Low-moderate cost and maintain
Microsoft Project Vulcan Project Wiki .NET Framework
Version One B.L.S. J2EE, …
Microsoft RPI map Ruby on rails
SharePoint
74
54. Conclusions
Semantic MediaWiki+ (http://smwforum.ontoprise.com)
– Open-source, growing semantic wiki software system
– Wiki-style text + semantic markups
– Collaborative, user-governed subject models and data curation
– Simple and extensible data models with easy import/export
SMW+ has many government and industry users
– People built applications with it
Knowledge Management via
KB Effort (cost, people,…)
Now
crowds can work
– A way to leverage and exploit
web-collected data Vulcan
– A lightweight collaborative
knowledge management tool Future
A new platform for lightweight KB size (number of assertions, complexity…)
web application development
79
57. Case Study: Battle-space Luminary System
Discover when New Information represents a change in understanding of entities
– Discovery of explicit entity links, implicit relationships
Large Volumes of Data in various formats
– Unstructured news articles
– Tactical Reports, Field Intelligence
– Structured Database Information
Use Wiki Pages to represent current knowledge about an entity – “what we know”
Domain Ontology to represent domain of information – “what we want to know”
Issue Alerts when Significant Events occur
– New information according to category
– Changing information on topics of interest
– Need to send information to various devices – cell phones, email, etc.
82
58. System Design
Wiki Configuration
– Semantic MediaWiki: Large developer community, active development, open
source. Wikipedia uses MediaWiki, so scalability and performance are
important.
– Semantic Results Format: Provides various rich media displays of semantic
information, including graphs, timelines, maps
– Semantic Forms: Provides convenient user interface for entering semantic
data into wiki, avoiding cumbersome wikitext
– Semantic Notifications: Enables sending of notifications when results of
semantic query change.
Domain Ontology
– Created OWL Ontology for Terrorism
Semantic Parsing, Extraction, Reasoning
– Java Process using various Open-Source Toolkits
– Rapid plugin of new technologies
83 – Multiple Data Sources supported
60. Wiki Content Design
Use Templates to Ensure Consistent Look-and-Feel
– Templates Correspond to Ontology Classes
– Fields within Templates correspond to Properties within Ontology
– Rich Content Visualizations derived in consistent way
Hierarchical Categories match Class Hierarchy within Ontology
– Ensures Validity for Properties
– Category included on each Template page to ensure consistency
Forms Provide ability for users to enter data directly into wiki without
knowing Wiki Text
– Each form corresponds to a Template
– Fields within forms correspond to the fields/properties within the Template
– GUI can include auto-completion
– Created Page immediately linked semantically to rest of Wiki
85
68. Dynamically-Generated Tables forfast?
Which Porsches accelerate
Queries
Information Need: All Porsche models that accelerate 0-
100kph in under 5, 6, and 7 seconds
Of course once you have data, Ultrapedia can support data visualizations. This is a simple Flash-based chart widget based on the same Porsche 996 data, and included in Ultrapedia’s Porsche 996 page.It shows us that while acceleration varies dramatically, top speed and peak engine power remain fairly constant across models.The chart was specified manually with a query. There are of course a huge number of possible ways to chart a set of data, and most of these ways are uninteresting.In the Ultrapedia concept, we rely on article authors to specify interesting charts for their readers that will support the particular points in the article.
But, did you know that Uusikaupunki, Finland, is a major hub for Porsche manufacturing?Ultrapedia allows us to drill down to look at Finland’s contribution to Porsche production.
The problem we are going to solve is “find the 0-60 times of all Porsche cars in Wikipedia”This is a sample Wikipedia page for the Porshe 996, showing its acceleration times in a performance data table.This table is manually built – all the table data exists as constants in the table.
This is a Wikipedia page showing 0-60 times for the Porsche Cayenne.If we have to manually go through every Porsche model to assemble the 0-60 data for each model and type, this is going to take a while.A better idea is to treat Wikipedia like a database, and simply query it. Enter Ultrapedia.
This is the Ultrapedia home page.
First notice that Ultrapedia can leverage all the data it extracts from Wikipedia to support a much more helpful UI.For example, Ultrapedia adds a manufacturer-based navigation system on the side, and show explanatory popups. These kinds of UI tweaks aren’t possible with MediaWiki now, and are an important benefit of having the semantic data.
Remember that we want to find the 0-60 acceleration data for all Porsche models that Wikipedia knows about.Let’s start by looking at a query generated table on the Ultrapedia Porsche 996 page. For comparison, Ultrapedia also includes the original performance table from Wikipedia (above)
This is Ultrapedia’sPorsche 996 performance table, built by a query to the Ultrapedia database of Wikipedia-extracted data.Notice that it has the same information that the original static table has, this is because we scrape the data from the static table.This table is dyamically generated at each page load out of the extracted Wikipedia data, so it is always up to date.It is sortable and also accepts feedback and ratings on individual data items.
Now we can answer our question about 0-60 times across all Porsche models with one simple query in Ultrapedia. We can make this an Ultrapedia-only page – the page itself just 5 queries on it (one for each acceleration range).We could also do this as one big table but it’s easier to read as 5 smaller tables.All the data here flows from Wikipedia.
Of course once you have data, Ultrapedia can support data visualizations. This is a simple Flash-based chart widget based on the same Porsche 996 data, and included in Ultrapedia’s Porsche 996 page.It shows us that while acceleration varies dramatically, top speed and peak engine power remain fairly constant across models.The chart was specified manually with a query. There are of course a huge number of possible ways to chart a set of data, and most of these ways are uninteresting.In the Ultrapedia concept, we rely on article authors to specify interesting charts for their readers that will support the particular points in the article.
We can also use the data to dynamically link to other data sources. In this case we have configured the Ultrapedia Porsche 996 article to include a live ebay query to find out what the Porsche 996 sells for today…We access the ebay data through a web services interface.We can do this for arbitrary other web-service-accessible data sources, like amazon or geonames.In a government or enterprise context, we would link articles to supporting data from appropriate systems of record.
I don’t think I’ll be buying one… I think I’d rather send my daughter to college.
Pictures automatically get metadata, so Ultrapedia can deliver an iPod-like “cover flow” browsing experience with images to augment the table data. We could also embed images or videos in the tables.
Since Ultrapedia includes some simple internal logic about time, we can generate simple browsable timelines and use them in articles.Here we see a timeline of VW models.
But, did you know that Uusikaupunki, Finland, is a major hub for Porsche manufacturing?Ultrapedia allows us to drill down to look at Finland’s contribution to Porsche production.