Collaborative Ontology Building Project Presentation Transcript
Collaborative Ontology Building Project - a multiagent-based ontology editing and discovery environment Jie Bao Artificial Intelligence Research Laboratory Dept of Computer Science Iowa State University Ames IA 50010 [email_address] http://www.cs.iastate.edu/~baojie Project homepage: http://boole.cs.iastate.edu:9090/COB/ A Research proposal Dec 02, 2003
Without SHOE how can you be a RACER ?
Without Sesame how can you make OIL ?
Semantic Web is a plan of good
But with no ontology it’s only a nil.
Everyone makes a small piece of brick
Not in one day can we make Rome real.
Let’s build ontology together and hard
Just like ants build their hill.
A tentative framework
What is the problem
Semantic web needs general and open ontology library, but ontology building is a time-consuming, knowledge sensitive process.
Domain experts are needed, and nobody has full knowledge
Also, intellectual asset/copyright issue hinders the wide usage of commercial ontology (e.g. Cyc)
Automatic ontology discovery and mapping are still impossible in general
Existent ontology editing and discovery tools are standalone and too complex
Not suitable for team ontology generation.
Jargons are horrible for common people who knows little about ontology.
Data sources are distributed, heterogonous, dynamic
New concept appears everyday: Election2004
Learning from distributed, heterogonous, dynamic, multiple dataset
Concurrent version control and management
Open Source Issue (copyright vs. copyleft)
Knowledge sharing in group/project
Automatic knowledge aggregation
Design Philosophy (1) ----- about people
Teamwork is needed
Nobody can know everything
But everyone is an expert somehow
Everybody knows something: your dog, your department, your favorite TV show
You can build big things from small pieces
One expert can write several articles for an encyclopedia
And hundreds of experts can work together.
However, People always have different viewpoints
Conflict: 21 st century begins at 2000/2001
Redundancy: IraqWar, WarInIraq, GulfWarII
Design Philosophy (2) ----- about agent and software
Small pieces of ontologies are generated by agents
Those agents are domain experts or trained agents
Light-weight ontology editor which requires minimal user effort: browser-based
Automatic and controllable information collection by software robots.
Ontology repository is maintained by machine learning algorithms
Ontology mapping on controlled topics.
Detect and reduce redundancy and conflicts by inference
A Desirable Case -- Pop Music Ontology (1)
Suppose we want to build an ontology and knowledge base about pop music called PopOnt
Even kids know John is a teenager student and knows nothing about ontology. But he knows much about pop music. He’d like to share his knowledge to PopOnt.
I’m willing to spend 5 minutes for you There are millions of pop music fans like John, their knowledge is complementary each other. Some of them may go to the website of PopOnt and write one or two pieces of simple sentences, like [ M. Jackson] [isn’t] a [country music artist]. They may also correct others’ mistakes
A Desirable Case -- Pop Music Ontology (2)
You even don’t need to go to the website There are also mailing lists, newsgroups, weblogs, p2p applications and websites about pop music, which can be used for validation or mining. For example, if [M. Jackson] hardly coincides with [country music], it’s more possible [ M. Jackson] [isn’t] a [country music artist] is true
Agent can be expert, too. It will be more desirable if those articles have subject, abstract, or even keywords, which can be used as labeled instances for machine learning. New concepts can be mined and cross-validated by people, too.
Finally, PopOnt is built in a couple of months and free to use for everyone.
A tentative framework
Key Difficulties 1 : Logic breakdown
How to make ontology editing as easy as writing diary?
Classes and Slots Instances Can complex ontology be broken down into group of single sentences? Or say, how to decompose complex description logic statement into very simple FOPL sentences? And inverse composition is also needed. Each single sentences is as simple as A is B , A has B
Key Difficulties 2 : Ontology Evolution
How to refine an ontology by cooperation of experts and software agents?
People and agents are all error-prone. Interactive and iterative cross-validation are central.
People are “lazy” and “natural”. An ontology piece may be firstly written in short natural language and be refined latterly by other people or agents into a former and more complex piece.
Inference are needed to rule out conflict information, to detect malicious/wrong information
Key Difficulties 3 : Ontology Mining
Where to collect source information?
Google search? No
Pull: agents search and know where are “good” sources. That can be verified by whether the source is well cited(referenced) or not.
Push: information are automatic pushed to agent via credible channels.
Automatic extraction is still impossible
Depends on NLP
Article summary/keywords are helpful, especially when the summary overlaps with existent ontology.
Such summarized text can be used as labeled instance.
Simplified tasks are feasible
It the keyword a consistent concept?
Do some keywords are related?
Comparison: In content-based retrieval of video database, automatic discovery of semantics based on image processing / pattern recognition are proven not quite successful. Semantics from expert knowledge are needed in MPEG 7 stream.
Key Difficulties 4 : Ontology Mapping
People always name same thing with different names, or divide concepts into groups in multiple ways.
Automatic general ontology mapping is still hard.
Simplified mapping is more feasible while still useful
Check concept pair (with instances) are same or not
Detect redundancy and suggest merge.
A tentative framework
INDUS is a distributed learning system, while COB is a MAS learning system
Agents in different channels have different focus for learning
They work together for the same goal.
INDUS have a heavy-weight database mechanism while COB aims at light-weight implementation
Ontology/KB are stored in atom sentences
Interface for dummies, not for gurus.
Data sources are usually small but change quickly, and their number is huge.
In query, uses the inference power of ontology language.
Semantic Web meets MAS
COB is an application of MAS learning from data on web
Learn new concept from instances
Validate concept of other agents/human
Learner can be any form: BayesNet, Neural Net, Decision Tree, KNN
Everything is about semantics
Agents share an ontology but also have dialect issue
Small pieces of semantics are carried by agents and aggregated in the “home”
Guess semantics from labeled instance.
An application shows how to implement proof and trust on semantic web
Dynamic knowledge sharing
RSS(RDF site summary): answering questions like "Who wrote this?", "When was this published?", and "What is/are the topic(s) of discussion?"
RSS is widely used for news aggregation and automatic news discovery.
Grid: distribute the compuation task across the internet and compose result together.
Blog and Wiki: easy to use site building tools, instead of HTML editor. Topics are refined by the effort of a community.
Local repository can be shared to other peer
The other peer can be a agent in COB !
However, they are all somehow missing of semantics. The unfiltered information may flood the user.
Collaborative Ontology Building Example FOAF
FoaF is an acronym for Friend of a Friend , an experimental project and vocabulary for the Semantic Web .
It is based on the idea of a machine-readable version of the current World Wide Web, with homepages, mailling lists, travel itineraries, calendars, address books and the likes.
Everyone can join and add their own information
It’s RDF based
Collaborative Ontology Building Example wikipedia
170,000 concepts in English only, more in other language.
An open encyclopedia
Everyone can edit any page.
Based on the assumption that most of people are nice
And it’s proven true!
Limitation: the relation between items is not formal, and it’s to human read only(at least for now)
Collaborative Ontology Building Example Open Directory Project
http:// www.dmoz.org /
60,000 editors 460,000 concepts
Collaborative taxonomy building
Open to everyone
Limitation: Taxonomy only
A tentative framework
System design Ontology Repository OntoWiki OWL-like syntax Human Expert Email list Newsgroup Forum Blog Wiki P2P node Semantic RSS-aware Channel Semantic RSS-aware Channel Semantic RSS-aware Channel Agents: Ontology Mining Browser Ontology Alignment