Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Developing the AIP Thesaurus: The Platform for an Ontology

3,320 views

Published on

Case study of the American Institute of Physics thesaurus. Presented by Mark Cassar of the American Institute of Physics and Jack Bruce of Access Innovations, Inc. at the 2012 Data Harmony User Group meeting on February 8, 2012 at the Access Innovations, Inc. offices.

Published in: Education, Technology
  • Be the first to comment

Developing the AIP Thesaurus: The Platform for an Ontology

  1. 1. Developing the AIP Thesaurus: The Platform for an Ontology Mark Cassar American Institute of Physics Jack Bruce Marjorie (Margie) M.K. Hlava Access Innovations: 505-998-0800
  2. 2. Background• Physics and Astronomy Classification Scheme (PACS)• Six digit code schema used for indexing scholarly content• 10 digit based – domain headings with subcategories nested under each domain.• Precoordinated system – Combine terms (concepts) at the time of indexing
  3. 3. Why Change?• Improve searchability• Move to Post coordinated system – Combine terms at time of search• Semantic enrichment• Flexible metadata for many applications• Naturalize the vocabulary – Represent concepts succinctly and concisely – Easily add new concepts based on new and emerging technologies and applications – Allow unlimited hierarchy levels and polyhierarchy
  4. 4. Better ROI• Rules-assisted indexing – Provide end users with a swift indexing solution based on the Machine-Aided Indexer (M.A.I.) engine. – Batch index large corpus of scholarly content, as well as future content.• Improve costs – Automate a large portion of electronic indexing – Less overhead for indexing
  5. 5. Roadmap of the AIP Thesaurus• Data Collection – Load PACS codes and terms – Incorporate Search logs; add top searched concepts into the vocabulary• Analysis of Content – Test comparison of indexing to humanly indexed articles• Thesaurus Construction – Separate, disambiguate, and migrate concepts; Break up top domains – Apply thesaurus and taxonomy standardization to each term – Multiple reviews for each top section• Evaluation and Feedback – Send back working draft to AIP for review – Gather feedback from subject matter experts and incorporate the changes into the thesaurus• Finalization and Product Delivery
  6. 6. Source Data• PACS 2009 ed.• 1999 ed. Of AIP Thesaurus (out of date)• Terms added to INSPEC since 2000• Internal and external search logs• Cumulative journal indexes – Digital – (2006 through 2009)• List of AIP divisions and their internal classifications
  7. 7. Analysis of Content• Organizational warrant – PACS 2009 (2010) – www.aip.org – UniPHY• Literary warrant – Where we found the term used• Most frequent search terms loaded into thesaurus
  8. 8. Thesaurus Creation Process• Load data (vocabulary) into Data Harmony MAIstro™• PACS – Restructure top domains – Separate into discrete – Disambiguate terms – Remove parenthetical qualifiers – Create post coordinated terms – Migrate separated terms into new/relevant categories• Sort flat lists (search logs) into main categories determined• Use multiple reviewers for each physics domain• About 8181 preferred terms and 5217 synonyms
  9. 9. PACS TERM:– Low-energy electron diffraction (LEED) and reflectionhigh-energy electron diffraction (RHEED) (condensedmatter structure determination)– Becomes– BT Condensed matter structure determination • NT Low energy electron diffraction –Synonym LEED • NT Reflection high energy electron diffraction –Synonym RHEED
  10. 10. Evaluation and Feedback• Weekly scheduled live demos of the thesaurus• Free web-hosted version of the thesaurus and periodic spreadsheet exports• Collect feedback based on SME suggestions and AIP PACS experts – Correspondence via email• Incorporate changes into thesaurus
  11. 11. Available versions• Electronic copy of AIP thesaurus supplied in – XML – Excel – Web-based, read-only versions (Thesviewer) – MARC, SKOS, OWL, CSV etc
  12. 12. Taxonomy view Thesaurus Term Record view
  13. 13. To make an ontology• Define additional Associative relationships• Define additional Hierarchical relationships – IsA, IsPartOf, HasA• Define additional Equivalence relationship • Multilingual options • Weights and measures
  14. 14. Clearer disambiguation? TemperaturePlanets IsA TypeOf IsA BrandOf MercuryRoman god IsA Automobile Metallic element
  15. 15. Knowledge Organization Systems• Uncontrolled list Not complex• Name authority file• Synonym set/ring• Controlled vocabulary• Taxonomy• Thesaurus AIP Thesaurus is here• Ontology• Semantic network Highly complex
  16. 16. Lessons Learned• Learning the style for indexing• Tendency to reversion to PACS style of language and classification• SME feedback turnaround – Sit with them 2 hours – Incorporate suggestions 8 hours – 2117 Terms Added 1354 Terms changed or updated 1333 Terms deleted 11259 Other actions
  17. 17. Where are we now?• Platform is established• OWL and other formats available• One kind of Associative relationship – (Related terms)• One kind of Hierarchical relationship – Broader Narrower / Parent Child – Multiple broader terms for interdisciplinary options• One kind of Equivalence relationship • Synonym non preferred terms• Built using the Z39.19 standard - interoperable
  18. 18. To Review AIP Thes• Use a web browser• http://thesview.accessinn.com/aipThes/• username/password twice - in all cases both are aip.• Begins a java app in your browser that shows the thesaurus starting from the top level of the hierarchy.• Use the collaboration module to comment and discuss
  19. 19. Thank you Marjorie Hlava mhlava@accessin.com 505-998-0800

×