Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Collaborative Ontology Building Project


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Collaborative Ontology Building Project

  1. 1. Collaborative Ontology Building Project - a multiagent-based ontology editing and discovery environment Jie Bao Artificial Intelligence Research Laboratory Dept of Computer Science Iowa State University Ames IA 50010 [email_address] Project homepage: A Research proposal Dec 02, 2003
  2. 2. COB <ul><li>Without SHOE how can you be a RACER ? </li></ul><ul><li>Without Sesame how can you make OIL ? </li></ul><ul><li>Semantic Web is a plan of good </li></ul><ul><li>But with no ontology it’s only a nil. </li></ul><ul><li>Everyone makes a small piece of brick </li></ul><ul><li>Not in one day can we make Rome real. </li></ul><ul><li>Let’s build ontology together and hard </li></ul><ul><li>Just like ants build their hill. </li></ul>
  3. 3. Outline <ul><li>Objectives </li></ul><ul><li>Key difficulties </li></ul><ul><li>Background review </li></ul><ul><li>A tentative framework </li></ul>
  4. 4. What is the problem <ul><li>Semantic web needs general and open ontology library, but ontology building is a time-consuming, knowledge sensitive process. </li></ul><ul><ul><li>Domain experts are needed, and nobody has full knowledge </li></ul></ul><ul><ul><li>Also, intellectual asset/copyright issue hinders the wide usage of commercial ontology (e.g. Cyc) </li></ul></ul><ul><li>Automatic ontology discovery and mapping are still impossible in general </li></ul><ul><li>Existent ontology editing and discovery tools are standalone and too complex </li></ul><ul><ul><li>Not suitable for team ontology generation. </li></ul></ul><ul><ul><li>Jargons are horrible for common people who knows little about ontology. </li></ul></ul><ul><li>Data sources are distributed, heterogonous, dynamic </li></ul><ul><ul><li>New concept appears everyday: Election2004 </li></ul></ul>
  5. 5. Related problems <ul><li>Distributed Learning </li></ul><ul><ul><li>Learning from distributed, heterogonous, dynamic, multiple dataset </li></ul></ul><ul><li>Software engineering </li></ul><ul><ul><li>Concurrent version control and management </li></ul></ul><ul><ul><li>Open Source Issue (copyright vs. copyleft) </li></ul></ul><ul><li>Knowledge Management </li></ul><ul><ul><li>Knowledge sharing in group/project </li></ul></ul><ul><ul><li>Automatic knowledge aggregation </li></ul></ul>
  6. 6. Design Philosophy (1) ----- about people <ul><li>Teamwork is needed </li></ul><ul><ul><li>Nobody can know everything </li></ul></ul><ul><li>But everyone is an expert somehow </li></ul><ul><ul><li>Everybody knows something: your dog, your department, your favorite TV show </li></ul></ul><ul><li>You can build big things from small pieces </li></ul><ul><ul><li>One expert can write several articles for an encyclopedia </li></ul></ul><ul><ul><li>And hundreds of experts can work together. </li></ul></ul><ul><li>However, People always have different viewpoints </li></ul><ul><ul><li>Conflict: 21 st century begins at 2000/2001 </li></ul></ul><ul><ul><li>Redundancy: IraqWar, WarInIraq, GulfWarII </li></ul></ul>
  7. 7. Design Philosophy (2) ----- about agent and software <ul><li>Small pieces of ontologies are generated by agents </li></ul><ul><ul><li>Those agents are domain experts or trained agents </li></ul></ul><ul><ul><li>Light-weight ontology editor which requires minimal user effort: browser-based </li></ul></ul><ul><ul><li>Automatic and controllable information collection by software robots. </li></ul></ul><ul><li>Ontology repository is maintained by machine learning algorithms </li></ul><ul><ul><li>Ontology mapping on controlled topics. </li></ul></ul><ul><ul><li>Detect and reduce redundancy and conflicts by inference </li></ul></ul>
  8. 8. A Desirable Case -- Pop Music Ontology (1) <ul><li>Suppose we want to build an ontology and knowledge base about pop music called PopOnt </li></ul><ul><li>Even kids know John is a teenager student and knows nothing about ontology. But he knows much about pop music. He’d like to share his knowledge to PopOnt. </li></ul><ul><li>I’m willing to spend 5 minutes for you There are millions of pop music fans like John, their knowledge is complementary each other. Some of them may go to the website of PopOnt and write one or two pieces of simple sentences, like [ M. Jackson] [isn’t] a [country music artist]. They may also correct others’ mistakes </li></ul>
  9. 9. A Desirable Case -- Pop Music Ontology (2) <ul><li>You even don’t need to go to the website There are also mailing lists, newsgroups, weblogs, p2p applications and websites about pop music, which can be used for validation or mining. For example, if [M. Jackson] hardly coincides with [country music], it’s more possible [ M. Jackson] [isn’t] a [country music artist] is true </li></ul><ul><li>Agent can be expert, too. It will be more desirable if those articles have subject, abstract, or even keywords, which can be used as labeled instances for machine learning. New concepts can be mined and cross-validated by people, too. </li></ul><ul><li>Finally, PopOnt is built in a couple of months and free to use for everyone. </li></ul>
  10. 10. Outline <ul><li>Objectives </li></ul><ul><li>Key difficulties </li></ul><ul><li>Background review </li></ul><ul><li>A tentative framework </li></ul>
  11. 11. Key Difficulties 1 : Logic breakdown <ul><li>How to make ontology editing as easy as writing diary? </li></ul>Ontology [subject][predicate][object] [subject][predicate][object] [subject][predicate][object] [subject][predicate][object] <ul><li>Class </li></ul><ul><ul><li>SubClass </li></ul></ul><ul><ul><ul><li>SubSubClass </li></ul></ul></ul><ul><ul><ul><li>SubSubClass </li></ul></ul></ul><ul><ul><li>SubClass </li></ul></ul><ul><ul><ul><li>SubSubClass </li></ul></ul></ul><ul><ul><ul><li>SubSubClass </li></ul></ul></ul>Classes and Slots Instances Can complex ontology be broken down into group of single sentences? Or say, how to decompose complex description logic statement into very simple FOPL sentences? And inverse composition is also needed. Each single sentences is as simple as A is B , A has B
  12. 12. Key Difficulties 2 : Ontology Evolution <ul><li>How to refine an ontology by cooperation of experts and software agents? </li></ul><ul><li>People and agents are all error-prone. Interactive and iterative cross-validation are central. </li></ul><ul><li>People are “lazy” and “natural”. An ontology piece may be firstly written in short natural language and be refined latterly by other people or agents into a former and more complex piece. </li></ul><ul><li>Inference are needed to rule out conflict information, to detect malicious/wrong information </li></ul>
  13. 13. Key Difficulties 3 : Ontology Mining <ul><li>Where to collect source information? </li></ul><ul><ul><li>Google search? No </li></ul></ul><ul><ul><li>Pull: agents search and know where are “good” sources. That can be verified by whether the source is well cited(referenced) or not. </li></ul></ul><ul><ul><li>Push: information are automatic pushed to agent via credible channels. </li></ul></ul><ul><li>Automatic extraction is still impossible </li></ul><ul><ul><li>Depends on NLP </li></ul></ul><ul><ul><li>Article summary/keywords are helpful, especially when the summary overlaps with existent ontology. </li></ul></ul><ul><ul><li>Such summarized text can be used as labeled instance. </li></ul></ul><ul><li>Simplified tasks are feasible </li></ul><ul><ul><li>It the keyword a consistent concept? </li></ul></ul><ul><ul><li>Do some keywords are related? </li></ul></ul>Comparison: In content-based retrieval of video database, automatic discovery of semantics based on image processing / pattern recognition are proven not quite successful. Semantics from expert knowledge are needed in MPEG 7 stream.
  14. 14. Key Difficulties 4 : Ontology Mapping <ul><li>People always name same thing with different names, or divide concepts into groups in multiple ways. </li></ul><ul><li>Automatic general ontology mapping is still hard. </li></ul><ul><li>Simplified mapping is more feasible while still useful </li></ul><ul><ul><li>Check concept pair (with instances) are same or not </li></ul></ul><ul><ul><li>Detect redundancy and suggest merge. </li></ul></ul>
  15. 15. Outline <ul><li>Objectives </li></ul><ul><li>Key difficulties </li></ul><ul><li>Background review </li></ul><ul><li>A tentative framework </li></ul>
  16. 16. Beyond INDUS <ul><li>INDUS is a distributed learning system, while COB is a MAS learning system </li></ul><ul><ul><li>Agents in different channels have different focus for learning </li></ul></ul><ul><ul><li>They work together for the same goal. </li></ul></ul><ul><li>INDUS have a heavy-weight database mechanism while COB aims at light-weight implementation </li></ul><ul><ul><li>Ontology/KB are stored in atom sentences </li></ul></ul><ul><ul><li>Interface for dummies, not for gurus. </li></ul></ul><ul><ul><li>Data sources are usually small but change quickly, and their number is huge. </li></ul></ul><ul><ul><li>In query, uses the inference power of ontology language. </li></ul></ul>
  17. 17. Semantic Web meets MAS <ul><li>COB is an application of MAS learning from data on web </li></ul><ul><ul><li>Learn new concept from instances </li></ul></ul><ul><ul><li>Validate concept of other agents/human </li></ul></ul><ul><ul><li>Learner can be any form: BayesNet, Neural Net, Decision Tree, KNN </li></ul></ul><ul><li>Everything is about semantics </li></ul><ul><ul><li>Agents share an ontology but also have dialect issue </li></ul></ul><ul><ul><li>Small pieces of semantics are carried by agents and aggregated in the “home” </li></ul></ul><ul><ul><li>Guess semantics from labeled instance. </li></ul></ul><ul><li>An application shows how to implement proof and trust on semantic web </li></ul>
  18. 18. Ready Techniques <ul><li>Dynamic knowledge sharing </li></ul><ul><ul><li>RSS(RDF site summary): answering questions like &quot;Who wrote this?&quot;, &quot;When was this published?&quot;, and &quot;What is/are the topic(s) of discussion?&quot; </li></ul></ul><ul><ul><li>RSS is widely used for news aggregation and automatic news discovery. </li></ul></ul><ul><li>Grid/Social Computation </li></ul><ul><ul><li>Grid: distribute the compuation task across the internet and compose result together. </li></ul></ul><ul><ul><li>Blog and Wiki: easy to use site building tools, instead of HTML editor. Topics are refined by the effort of a community. </li></ul></ul><ul><li>Peer-to-peer communication </li></ul><ul><ul><li>Local repository can be shared to other peer </li></ul></ul><ul><ul><li>The other peer can be a agent in COB ! </li></ul></ul><ul><li>However, they are all somehow missing of semantics. The unfiltered information may flood the user. </li></ul>
  19. 19. Collaborative Ontology Building Example FOAF <ul><li> </li></ul><ul><li>FoaF is an acronym for Friend of a Friend , an experimental project and vocabulary for the Semantic Web . </li></ul><ul><li>It is based on the idea of a machine-readable version of the current World Wide Web, with homepages, mailling lists, travel itineraries, calendars, address books and the likes. </li></ul><ul><li>Everyone can join and add their own information </li></ul><ul><li>It’s RDF based </li></ul>
  20. 20. Collaborative Ontology Building Example wikipedia <ul><li>170,000 concepts in English only, more in other language. </li></ul><ul><li>An open encyclopedia </li></ul><ul><li>Everyone can edit any page. </li></ul><ul><li>Based on the assumption that most of people are nice </li></ul><ul><li>And it’s proven true! </li></ul><ul><li>Limitation: the relation between items is not formal, and it’s to human read only(at least for now) </li></ul>
  21. 21. Collaborative Ontology Building Example Open Directory Project <ul><li>http:// / </li></ul><ul><li>60,000 editors 460,000 concepts </li></ul><ul><li>Collaborative taxonomy building </li></ul><ul><li>Open to everyone </li></ul><ul><li>Limitation: Taxonomy only </li></ul>
  22. 22. Outline <ul><li>Objectives </li></ul><ul><li>Key difficulties </li></ul><ul><li>Background review </li></ul><ul><li>A tentative framework </li></ul>
  23. 23. System design Ontology Repository OntoWiki OWL-like syntax Human Expert Email list Newsgroup Forum Blog Wiki P2P node Semantic RSS-aware Channel Semantic RSS-aware Channel Semantic RSS-aware Channel Agents: Ontology Mining Browser Ontology Alignment <ul><li>Version Control </li></ul><ul><li>Redundancy Check </li></ul><ul><li>Conflict Check </li></ul><ul><li>Cross Validation </li></ul>A B C D
  24. 24. Part A (1): OntoWiki <ul><li>Everyone can edit any concept </li></ul><ul><li>Version control is enabled </li></ul><ul><li>Ontology-guide editing </li></ul><ul><li>Should have a ontology visualizer </li></ul>
  25. 25. Part A (2): OWL-like syntax <ul><li>// COB terms </li></ul><ul><li>cob:equals </li></ul><ul><li>cob:documentation </li></ul><ul><li>// OWL terms </li></ul><ul><li>owl:AllDifferent </li></ul><ul><li>owl:allValuesFrom </li></ul><ul><li>owl:backwardCompatibleWith </li></ul><ul><li>owl:cardinality </li></ul><ul><li>owl:Class </li></ul><ul><li>owl:complementOf </li></ul><ul><li>owl:DatatypeProperty </li></ul><ul><li>owl:DeprecatedClass </li></ul><ul><li>owl:DeprecatedProperty </li></ul><ul><li>owl:differentFrom </li></ul><ul><li>owl:disjointWith </li></ul><ul><li>owl:distinctMembers </li></ul><ul><li>owl:equivalentClass </li></ul><ul><li>owl:equivalentProperty </li></ul><ul><li>owl:FunctionalProperty </li></ul><ul><li>owl:hasValue </li></ul><ul><li>owl:imports </li></ul><ul><li>owl:incompatibleWith </li></ul><ul><li>owl:intersectionOf </li></ul><ul><li>owl:InverseFunctionalProperty </li></ul><ul><li>owl:inverseOf </li></ul><ul><li>owl:maxCardinality </li></ul>owl:minCardinality owl:Nothing owl:ObjectProperty owl:oneOf owl:onProperty owl:Ontology owl:priorVersion owl:Restriction owl:sameAs owl:someValuesFrom owl:SymmetricProperty owl:Thing owl:TransitiveProperty owl:unionOf owl:versionInfo rdf:List rdf:nil rdf:type rdfs:comment rdfs:Datatype rdfs:domain rdfs:label rdfs:Literal rdfs:Literal rdfs:range rdfs:subClassOf rdfs:subPropertyOf <ul><li>A subset of OWL is used </li></ul><ul><li>Single statement are RDF-like triple [subject] [predicate] [object] </li></ul><ul><li>Name Space are used cob:instanceOf owl:Class rdfs:subClassOf </li></ul><ul><li>Core COB language is defined in it’s own namespace (see right) </li></ul>
  26. 26. Part A (3): Instance Example <ul><li># [cob:Instance] </li></ul><ul><li># [cob:instanceOf] [Student] </li></ul><ul><li># [cob:instanceOf] [Chinese] </li></ul><ul><li># [cob:equals][ 鲍捷 ] </li></ul><ul><li># [hasSurname] Bao </li></ul><ul><li># [hasFirstname] Jie </li></ul><ul><li># [worksOn] [semanticWeb] </li></ul><ul><li># [worksOn] [MAS] </li></ul><ul><li># [worksOn] [complexSystem] </li></ul><ul><li># [advisedBy] [Honavar] </li></ul><ul><li># [memberOf] [aiLab] </li></ul><ul><li># [hasEmail] </li></ul><ul><li># [hasHomepage] </li></ul><ul><li># [cob:documentation] Hi, I love cats </li></ul>BaoJie cob:Instance cob:instanceOf Student ? cob:instanceOf Chinese ? cob:equals 鲍捷 hasSurname Bao hasFirstname Jie worksOn semanticWeb ? worksOn MAS ? worksOn complexSystem ? advisedBy Honavar ? memberOf aiLab ? hasEmail hasHomepage cob:documentation Hi, I love cats Edit this page    More info...    Attach file... Source Screen shows
  27. 27. Part A (4): Name Space <ul><li>Java-like package naming, which shows the relatedness of concepts even when they don’t inherit from the same concept. </li></ul><ul><li>Packages are in DAG </li></ul><ul><li>Internationalization is enabled </li></ul>//cob:Thing.Country.US.Iowa.Ames.ISU //cob:Thing.Education.University.Iowa.ISU [cob:instanceOf] [PublicUniversity] [cob:instanceOf] [dmoz:University] [cob:equals] [Iowa State University] // cobZH: 事物 . 美国大学 . 艾奥瓦州立大学 [cob:language] zh // Chinese [cob:equals] [cob:Thing.Country.US.Iowa.Ames.ISU] //cob:Thing.Education.University.Idaho.ISU [cob:instanceOf] [PublicUniversity] [cob:instanceOf] [dmoz:University] [cob:equals] [Idaho State University]
  28. 28. Part B: Semantic RSS <ul><li>RSS has no semantics </li></ul><ul><li>We can use Dublin Core to enhance RSS </li></ul><ul><li>Keywords are concepts or concept candidates in the ontology </li></ul><ul><li>Agents listen to S-RSS channels and discover new concepts </li></ul><channel rdf:about=&quot;;> <title>COB Project</title> <link></link> <description>AI Ontology</description> <language>en-us</language> <items> <rdf:Seq> <rdf:li rdf:resource=&quot;; /> </rdf:Seq> </items> </channel> <item rdf:about=&quot;;> <title>Main</title> <link></link> <description> changed this page on Wed Dec 03 19:18:23 CST 2003:&lt;br />&lt;hr />&lt;br /></description> <wiki:version>27</wiki:version> <wiki:diff>;r1=-1</wiki:diff> <dc:date>2003-12-04T01:18:23Z</dc:date> <dc:contributor> <rdf:Description> <rdf:value></rdf:value> </rdf:Description> </dc:contributor> <wiki:history></wiki:history> </item>
  29. 29. Part C (1): Agent <ul><li>Each agent does </li></ul><ul><ul><li>Trace back information source and check its credibility. </li></ul></ul><ul><ul><li>Do filtering and text normalization </li></ul></ul><ul><ul><li>Extract new concept from instances </li></ul></ul><ul><ul><li>Extract possible general relationship (like [cob:alsoSee]) between concepts </li></ul></ul><ul><li>And they may differs </li></ul><ul><ul><li>Not necessarily should use the same learning algorithm </li></ul></ul><ul><ul><ul><li>Learning from email header are different from learning from free text content </li></ul></ul></ul><ul><ul><li>Dialect </li></ul></ul><ul><ul><ul><li>Agent 1: I listens to Idaho S.U. maillist and know ISU = Idaho State University </li></ul></ul></ul><ul><ul><ul><li>Agent 2: I watch a blog in Iowa and know ISU = Iowa State University </li></ul></ul></ul><ul><li>Communication helps </li></ul><ul><ul><li>Agent 1: P([M. Jackson]^[CountryMusic])=0.1 </li></ul></ul><ul><ul><li>Agent 2: P([M. Jackson]^[CountryMusic])=0.03 </li></ul></ul>
  30. 30. Part C (2): Ontology Alignment <ul><li>Do mapping on restricted cases </li></ul><ul><ul><li>When an agent or expert doubts if some concepts are same, it will ask OntologyAlignmenter with instance set </li></ul></ul><ul><ul><li>Merge detected duplicated concepts like IraqWar and WarInIraq </li></ul></ul><ul><ul><ul><li>be careful: UniversityOfWashington, WashtingtonUniversity are different. It can be learnt from instances. </li></ul></ul></ul><ul><li>Manual alignment enabled, too </li></ul>
  31. 31. Part D : Ontology Repository <ul><li>Version control </li></ul><ul><ul><li>Keep version for each concept, lock mature concepts, detect malicious changes </li></ul></ul><ul><li>Redundancy check </li></ul><ul><ul><li>[I.S.U] [cob:instanceOf] [University] [I.S.U] [cob:alsoSee] [Cyclone] </li></ul></ul><ul><ul><li>[Iowa Stete University] [cob:instanceOf] [PublicUniversity] [Iowa Stete University] [cob:alsoSee] [Cyclone] </li></ul></ul><ul><ul><li>[PublicUniversity] [cob:subClassOf][University] </li></ul></ul><ul><li>Conflict check </li></ul><ul><ul><li>[ISU] [locatedIn] [Ames] </li></ul></ul><ul><ul><li>[ISU] [locatedIn] [Des Moines] </li></ul></ul><ul><li>Cross validation </li></ul><ul><ul><li>Score agent and expert for it’s credibility </li></ul></ul><ul><ul><li>Check soundness of inputs from it’s peer inputs. </li></ul></ul><ul><li>Refactoring (rename, remove, merge) </li></ul>
  32. 32. Summary <ul><li>What’s new </li></ul><ul><ul><li>Light-weight ontology editor for community </li></ul></ul><ul><ul><li>Collaborative, distributed ontology learning based on logic decomposition </li></ul></ul><ul><ul><li>Semantic extension to RSS </li></ul></ul><ul><ul><li>Mulitagent ontology mining from trusted channel. </li></ul></ul><ul><ul><li>Do ontology management based on proof and trust </li></ul></ul><ul><li>COB doesn't want to </li></ul><ul><ul><li>Solve ontology mapping in general </li></ul></ul><ul><ul><li>Solve ontology extract from free text in general </li></ul></ul>