Collaborative Ontology Building Project

Collaborative Ontology Building Project - a multiagent-based ontology editing and discovery environment Jie Bao Artificial Intelligence Research Laboratory Dept of Computer Science Iowa State University Ames IA 50010 [email_address] http://www.cs.iastate.edu/~baojie Project homepage: http://boole.cs.iastate.edu:9090/COB/ A Research proposal Dec 02, 2003

COB Without SHOE how can you be a RACER ? Without Sesame how can you make OIL ? Semantic Web is a plan of good But with no ontology it’s only a nil. Everyone makes a small piece of brick Not in one day can we make Rome real. Let’s build ontology together and hard Just like ants build their hill.

Outline Objectives Key difficulties Background review A tentative framework

What is the problem Semantic web needs general and open ontology library, but ontology building is a time-consuming, knowledge sensitive process. Domain experts are needed, and nobody has full knowledge Also, intellectual asset/copyright issue hinders the wide usage of commercial ontology (e.g. Cyc) Automatic ontology discovery and mapping are still impossible in general Existent ontology editing and discovery tools are standalone and too complex Not suitable for team ontology generation. Jargons are horrible for common people who knows little about ontology. Data sources are distributed, heterogonous, dynamic New concept appears everyday: Election2004

Related problems Distributed Learning Learning from distributed, heterogonous, dynamic, multiple dataset Software engineering Concurrent version control and management Open Source Issue (copyright vs. copyleft) Knowledge Management Knowledge sharing in group/project Automatic knowledge aggregation

Design Philosophy (1) ----- about people Teamwork is needed Nobody can know everything But everyone is an expert somehow Everybody knows something: your dog, your department, your favorite TV show You can build big things from small pieces One expert can write several articles for an encyclopedia And hundreds of experts can work together. However, People always have different viewpoints Conflict: 21 st century begins at 2000/2001 Redundancy: IraqWar, WarInIraq, GulfWarII

Design Philosophy (2) ----- about agent and software Small pieces of ontologies are generated by agents Those agents are domain experts or trained agents Light-weight ontology editor which requires minimal user effort: browser-based Automatic and controllable information collection by software robots. Ontology repository is maintained by machine learning algorithms Ontology mapping on controlled topics. Detect and reduce redundancy and conflicts by inference

A Desirable Case -- Pop Music Ontology (1) Suppose we want to build an ontology and knowledge base about pop music called PopOnt Even kids know John is a teenager student and knows nothing about ontology. But he knows much about pop music. He’d like to share his knowledge to PopOnt. I’m willing to spend 5 minutes for you There are millions of pop music fans like John, their knowledge is complementary each other. Some of them may go to the website of PopOnt and write one or two pieces of simple sentences, like [ M. Jackson] [isn’t] a [country music artist]. They may also correct others’ mistakes

A Desirable Case -- Pop Music Ontology (2) You even don’t need to go to the website There are also mailing lists, newsgroups, weblogs, p2p applications and websites about pop music, which can be used for validation or mining. For example, if [M. Jackson] hardly coincides with [country music], it’s more possible [ M. Jackson] [isn’t] a [country music artist] is true Agent can be expert, too. It will be more desirable if those articles have subject, abstract, or even keywords, which can be used as labeled instances for machine learning. New concepts can be mined and cross-validated by people, too. Finally, PopOnt is built in a couple of months and free to use for everyone.

Key Difficulties 1 : Logic breakdown How to make ontology editing as easy as writing diary? Ontology [subject][predicate][object] [subject][predicate][object] [subject][predicate][object] [subject][predicate][object] Class SubClass SubSubClass SubSubClass SubClass SubSubClass SubSubClass Classes and Slots Instances Can complex ontology be broken down into group of single sentences? Or say, how to decompose complex description logic statement into very simple FOPL sentences? And inverse composition is also needed. Each single sentences is as simple as A is B , A has B

Key Difficulties 2 : Ontology Evolution How to refine an ontology by cooperation of experts and software agents? People and agents are all error-prone. Interactive and iterative cross-validation are central. People are “lazy” and “natural”. An ontology piece may be firstly written in short natural language and be refined latterly by other people or agents into a former and more complex piece. Inference are needed to rule out conflict information, to detect malicious/wrong information

Key Difficulties 3 : Ontology Mining Where to collect source information? Google search? No Pull: agents search and know where are “good” sources. That can be verified by whether the source is well cited(referenced) or not. Push: information are automatic pushed to agent via credible channels. Automatic extraction is still impossible Depends on NLP Article summary/keywords are helpful, especially when the summary overlaps with existent ontology. Such summarized text can be used as labeled instance. Simplified tasks are feasible It the keyword a consistent concept? Do some keywords are related? Comparison: In content-based retrieval of video database, automatic discovery of semantics based on image processing / pattern recognition are proven not quite successful. Semantics from expert knowledge are needed in MPEG 7 stream.

Key Difficulties 4 : Ontology Mapping People always name same thing with different names, or divide concepts into groups in multiple ways. Automatic general ontology mapping is still hard. Simplified mapping is more feasible while still useful Check concept pair (with instances) are same or not Detect redundancy and suggest merge.

Beyond INDUS INDUS is a distributed learning system, while COB is a MAS learning system Agents in different channels have different focus for learning They work together for the same goal. INDUS have a heavy-weight database mechanism while COB aims at light-weight implementation Ontology/KB are stored in atom sentences Interface for dummies, not for gurus. Data sources are usually small but change quickly, and their number is huge. In query, uses the inference power of ontology language.

Semantic Web meets MAS COB is an application of MAS learning from data on web Learn new concept from instances Validate concept of other agents/human Learner can be any form: BayesNet, Neural Net, Decision Tree, KNN Everything is about semantics Agents share an ontology but also have dialect issue Small pieces of semantics are carried by agents and aggregated in the “home” Guess semantics from labeled instance. An application shows how to implement proof and trust on semantic web

Ready Techniques Dynamic knowledge sharing RSS(RDF site summary): answering questions like "Who wrote this?", "When was this published?", and "What is/are the topic(s) of discussion?" RSS is widely used for news aggregation and automatic news discovery. Grid/Social Computation Grid: distribute the compuation task across the internet and compose result together. Blog and Wiki: easy to use site building tools, instead of HTML editor. Topics are refined by the effort of a community. Peer-to-peer communication Local repository can be shared to other peer The other peer can be a agent in COB ! However, they are all somehow missing of semantics. The unfiltered information may flood the user.

Collaborative Ontology Building Example FOAF http://xml.mfd-consult.dk/foaf/explorer/ FoaF is an acronym for Friend of a Friend , an experimental project and vocabulary for the Semantic Web . It is based on the idea of a machine-readable version of the current World Wide Web, with homepages, mailling lists, travel itineraries, calendars, address books and the likes. Everyone can join and add their own information It’s RDF based

Collaborative Ontology Building Example wikipedia 170,000 concepts in English only, more in other language. An open encyclopedia Everyone can edit any page. Based on the assumption that most of people are nice And it’s proven true! Limitation: the relation between items is not formal, and it’s to human read only(at least for now)

Collaborative Ontology Building Example Open Directory Project http:// www.dmoz.org / 60,000 editors 460,000 concepts Collaborative taxonomy building Open to everyone Limitation: Taxonomy only

System design Ontology Repository OntoWiki OWL-like syntax Human Expert Email list Newsgroup Forum Blog Wiki P2P node Semantic RSS-aware Channel Semantic RSS-aware Channel Semantic RSS-aware Channel Agents: Ontology Mining Browser Ontology Alignment Version Control Redundancy Check Conflict Check Cross Validation A B C D

Part A (1): OntoWiki Everyone can edit any concept Version control is enabled Ontology-guide editing Should have a ontology visualizer

Part A (2): OWL-like syntax // COB terms cob:equals cob:documentation // OWL terms owl:AllDifferent owl:allValuesFrom owl:backwardCompatibleWith owl:cardinality owl:Class owl:complementOf owl:DatatypeProperty owl:DeprecatedClass owl:DeprecatedProperty owl:differentFrom owl:disjointWith owl:distinctMembers owl:equivalentClass owl:equivalentProperty owl:FunctionalProperty owl:hasValue owl:imports owl:incompatibleWith owl:intersectionOf owl:InverseFunctionalProperty owl:inverseOf owl:maxCardinality owl:minCardinality owl:Nothing owl:ObjectProperty owl:oneOf owl:onProperty owl:Ontology owl:priorVersion owl:Restriction owl:sameAs owl:someValuesFrom owl:SymmetricProperty owl:Thing owl:TransitiveProperty owl:unionOf owl:versionInfo rdf:List rdf:nil rdf:type rdfs:comment rdfs:Datatype rdfs:domain rdfs:label rdfs:Literal rdfs:Literal rdfs:range rdfs:subClassOf rdfs:subPropertyOf A subset of OWL is used Single statement are RDF-like triple [subject] [predicate] [object] Name Space are used cob:instanceOf owl:Class rdfs:subClassOf Core COB language is defined in it’s own namespace (see right)

Part A (3): Instance Example # [cob:Instance] # [cob:instanceOf] [Student] # [cob:instanceOf] [Chinese] # [cob:equals][ 鲍捷 ] # [hasSurname] Bao # [hasFirstname] Jie # [worksOn] [semanticWeb] # [worksOn] [MAS] # [worksOn] [complexSystem] # [advisedBy] [Honavar] # [memberOf] [aiLab] # [hasEmail] baojie@cs.iastate.edu # [hasHomepage] http://www.cs.iastate.edu/~baojie # [cob:documentation] Hi, I love cats BaoJie cob:Instance cob:instanceOf Student ? cob:instanceOf Chinese ? cob:equals 鲍捷 hasSurname Bao hasFirstname Jie worksOn semanticWeb ? worksOn MAS ? worksOn complexSystem ? advisedBy Honavar ? memberOf aiLab ? hasEmail baojie@cs.iastate.edu hasHomepage http://www.cs.iastate.edu/~baojie cob:documentation Hi, I love cats Edit this page More info... Attach file... Source Screen shows

Part A (4): Name Space Java-like package naming, which shows the relatedness of concepts even when they don’t inherit from the same concept. Packages are in DAG Internationalization is enabled //cob:Thing.Country.US.Iowa.Ames.ISU //cob:Thing.Education.University.Iowa.ISU [cob:instanceOf] [PublicUniversity] [cob:instanceOf] [dmoz:University] [cob:equals] [Iowa State University] // cobZH: 事物 . 美国大学 . 艾奥瓦州立大学 [cob:language] zh // Chinese [cob:equals] [cob:Thing.Country.US.Iowa.Ames.ISU] //cob:Thing.Education.University.Idaho.ISU [cob:instanceOf] [PublicUniversity] [cob:instanceOf] [dmoz:University] [cob:equals] [Idaho State University]

Part B: Semantic RSS RSS has no semantics We can use Dublin Core to enhance RSS Keywords are concepts or concept candidates in the ontology Agents listen to S-RSS channels and discover new concepts <channel rdf:about="http://boole.cs.iastate.edu:9090/COB/"> <title>COB Project</title> <link>http://boole.cs.iastate.edu:9090/COB/</link> <description>AI Ontology</description> <language>en-us</language> <items> <rdf:Seq> <rdf:li rdf:resource="http://boole.cs.iastate.edu:9090/COB/Wiki.jsp?page=Main" /> </rdf:Seq> </items> </channel> <item rdf:about="http://boole.cs.iastate.edu:9090/COB/Wiki.jsp?page=Main"> <title>Main</title> <link>http://boole.cs.iastate.edu:9090/COB/Wiki.jsp?page=Main</link> <description>129.186.93.7 changed this page on Wed Dec 03 19:18:23 CST 2003:<br /><hr /><br /></description> <wiki:version>27</wiki:version> <wiki:diff>http://boole.cs.iastate.edu:9090/COB/Diff.jsp?page=Main&r1=-1</wiki:diff> <dc:date>2003-12-04T01:18:23Z</dc:date> <dc:contributor> <rdf:Description> <rdf:value>129.186.93.7</rdf:value> </rdf:Description> </dc:contributor> <wiki:history>http://boole.cs.iastate.edu:9090/COB/PageInfo.jsp?page=Main</wiki:history> </item>

Part C (1): Agent Each agent does Trace back information source and check its credibility. Do filtering and text normalization Extract new concept from instances Extract possible general relationship (like [cob:alsoSee]) between concepts And they may differs Not necessarily should use the same learning algorithm Learning from email header are different from learning from free text content Dialect Agent 1: I listens to Idaho S.U. maillist and know ISU = Idaho State University Agent 2: I watch a blog in Iowa and know ISU = Iowa State University Communication helps Agent 1: P([M. Jackson]^[CountryMusic])=0.1 Agent 2: P([M. Jackson]^[CountryMusic])=0.03

Part C (2): Ontology Alignment Do mapping on restricted cases When an agent or expert doubts if some concepts are same, it will ask OntologyAlignmenter with instance set Merge detected duplicated concepts like IraqWar and WarInIraq be careful: UniversityOfWashington, WashtingtonUniversity are different. It can be learnt from instances. Manual alignment enabled, too

Part D : Ontology Repository Version control Keep version for each concept, lock mature concepts, detect malicious changes Redundancy check [I.S.U] [cob:instanceOf] [University] [I.S.U] [cob:alsoSee] [Cyclone] [Iowa Stete University] [cob:instanceOf] [PublicUniversity] [Iowa Stete University] [cob:alsoSee] [Cyclone] [PublicUniversity] [cob:subClassOf][University] Conflict check [ISU] [locatedIn] [Ames] [ISU] [locatedIn] [Des Moines] Cross validation Score agent and expert for it’s credibility Check soundness of inputs from it’s peer inputs. Refactoring (rename, remove, merge)

Summary What’s new Light-weight ontology editor for community Collaborative, distributed ontology learning based on logic decomposition Semantic extension to RSS Mulitagent ontology mining from trusted channel. Do ontology management based on proof and trust COB doesn't want to Solve ontology mapping in general Solve ontology extract from free text in general

Collaborative Ontology Building Project

More Related Content

What's hot

Viewers also liked

Similar to Collaborative Ontology Building Project

More from Jie Bao

Recently uploaded

Collaborative Ontology Building Project