Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ProQuest Taxonomy Boot Camp Presentation 2008


Published on

Paul McCoy of ProQuest talks about life there before and after implementing the taxonomy management software from Synaptica.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

ProQuest Taxonomy Boot Camp Presentation 2008

  1. 1. Finding a Common Language:Bringing Complex and Disparate Vocabularies Together Paula R. McCoy Manager, Taxonomy Development ProQuest
  2. 2. Part of Cambridge Information Group & CSAHeadquartered in Ann Arbor, MichiganEditorial offices in Louisville, Kentucky
  3. 3. Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current &historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries
  4. 4. Louisville editors abstract & index 4,000+periodicals & newspapersProQuest Controlled Vocabulary used to indexsubjects; Authority Files used to indexcompany, geographic, personal, product namesCV applied to non-periodical & third-partycontent via mapping, to allow cross-searchingof multiple DBs with one vocabulary
  5. 5. Topics of DiscussionDescription of ProQuest ControlledVocabulary & Authority FilesTaxonomy Management -- OverviewLife Before SynapticaThesaurus Management System PurchaseImplementing SynapticaLife With SynapticaQ&A
  6. 6. PQ CV ProQuest Controlled Vocabulary Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) Created in 1970s for ABI/INFORM business database Based on Library of Congress Subject Headings
  7. 7. PQ CV ProQuest Controlled Vocabulary Merged with general reference vocabulary in 1980s Major development effort in past 4 years to boost science, education & medical terms Thesaurus subjects: Business, economics & trade – 4300 terms Science, math & technology – 1600 terms Medicine – 1150 terms Humanities – 960 terms Government & policy – 850 terms Education – 400 terms
  8. 8. PQ CV ProQuest CV: Statistics Preferred terms: 11,046 Non-preferred terms: 5631 Scope Notes: 3194 (29%) Cross-references (Broader, Narrower, Related terms): 67,700 Terms added in 2007: 77 Terms added in 2008: 58+
  9. 9. PQ CV Authority Files: Statistics Corporate/Organization Names: 438,098 Names added in 2008: 5489 Personal Names: 416,239 Names added in 2008: 1526 Geographic (Location) Names: 34,331 Names added in 2008: 144 Product Names: 38,210 Names added in 2008: 54
  10. 10. Taxonomy Management The Taxonomy Manager’s Job Add subject terms as dictated by new concepts & new content to index Maintain hierarchies & Scope Notes Load updated Thesaurus to ProQuest interface Manage authority files to maintain standards & control file size
  11. 11. Taxonomy Management The Taxonomy Manager’s Job OBJECTIVE: To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest
  12. 12. Taxonomy Management Thesaurus on ProQuest®
  13. 13. Taxonomy Management Sample Subject Term Preferred, or main term Scope note defining term and how it is used Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow Non-preferred term: points UF COPD to term used to index BT Disease Terms broader in nature to BT Respiratory diseases main term: COPD is a NT Asthma disease, and specifically, a NT Bronchitis respiratory disease NT Emphysema Terms narrower in nature RT Airway management to main term: these are RT Lungs chronic lung diseases Terms related to main term that might be used to narrow the search
  14. 14. Before Synaptica Managing terms meant:Multiple files  Duplicate entries  Errors = less than ideal thesaurus management
  15. 15. Before Synaptica MS Word Document
  16. 16. Before Synaptica Vocabulary Documents in Word ProQuest controlled vocabulary French-language controlled vocabulary German-language controlled vocabulary Spanish-language controlled vocabulary Combined PQ-CBCA controlled vocabulary Ethnic database vocabulary, English Ethnic database vocabulary, Spanish
  17. 17. Before Synaptica Foreign-Language Vocabularies French German Spanish
  18. 18. Before Synaptica Oracle Database Forms
  19. 19. Before Synaptica Authority Files in Oracle Class codes (related to subjects) CORP names (391,665+ terms) NAIC codes (related to companies) GEOG names (32,000+ terms) PERS names (350,000+ terms) PROD names (38,000+ terms)
  20. 20. Before Synaptica Adding New Terms 1. Enter full term hierarchy into new Word doc 2. Copy term into main Word-based vocabulary & enter reciprocal relationships 3. Enter term & relationships into Oracle 4. Review next-day report on Oracle activity 5. Send new term doc to editors via e-mail 6. Print new vocabulary (at least every two years)
  21. 21. SN BT Class Code [whew!]UF NT RT
  22. 22. TMS Purchase Thesaurus Management Systems Buying Criteria Synaptica Up to 40 admin & 100 in real time within multiple locations 1. Ability to interact read-only users editorial system Ability to load vocabs from multiple Word docs & Oracle authority filesaccommodate authority files of 400,000+ 2. Ability to names Support for foreign-language vocabularies Ability to add new vocabularies Vendor onsite installation & training Software upgrades & tech support
  23. 23. Implementing Synaptica Implementing Synaptica Contract signed and work begun in August 2004 PQ sent to Synaptica all the Word & Oracle files for analysis Decision points: how to load & structure data; how to handle “suspect” or erroneous relationships
  24. 24. Implementing Synaptica Synaptica Data Analysis Relationship Validation Tests: Term Uniqueness Use Violations Self-Referencing Relationships One Relationship per Term Pair Relationship Unique Relationship Reciprocates Circular References Exception Reports delivered to PQ; Errors fixed before production
  25. 25. Implementing Synaptica Use Validation Error Marine resources
  26. 26. Implementing Synaptica Foreign-Language Errors Terms with no language equivalent (LEQ), e.g., no translation In all 3 languages, multiple English terms with the same translation, e.g.: English term French term French term-revised Purchasing Achats Shopping Achats Shopping Buyers Acheteurs Purchasing agents Acheteurs Agents dachat
  27. 27. Implementing Synaptica Final Challenge Issue: Different editorial systems = 2x data entry: once for Synaptica, once for Oracle Solution: Overnight synchronization process to copy Synaptica work into Oracle every night Synch process discontinued April 2008
  28. 28. Implementing Synaptica Putting Synaptica Into Production Nov 2004 Train users — provide documentation & hands-on demonstrative training Deal with people resistant to change Encourage written feedback on system functionality Send feedback to Synaptica – many of our suggestions implemented in later versions
  29. 29. Life With Synaptica Life With Synaptica Terms Management Made Easy! Word – Old, Bad  Synaptica – New, Good 
  30. 30. Life With Synaptica Adding Terms Today: 3 Easy Steps 1. Enter term and relationships into Synaptica “Item Details” window 2. Export report of new terms into Word 3. Send Word document to editors
  31. 31. Life With Synaptica Improving Thesaurus Management Categories Feature
  32. 32. Life With Synaptica Subject Term Categories
  33. 33. Life With Synaptica CORP Names – Categories & Website
  34. 34. Life With Synaptica Foreign-Language Vocabularies Language Equivalents
  35. 35. Life With Synaptica Foreign-Language Vocabularies
  36. 36. Life With Synaptica Foreign-Language Vocabularies Spanish Spanish Alphabetical by language German French
  37. 37. Life With Synaptica Synaptica Updates Synaptica version 6.0 released in early 2006 Synaptica version 7.0 is being implemented now: • Enhanced user interface • Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration • Expanded Reporting functionality • Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing • Improved global term editing • Online help and user guides
  38. 38. Life With Synaptica Benefits of Synaptica Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non-preferred” instead of Use and Used For Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics Increase in Company name NPTs — from 1935 to 8952 today Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs Easier loading of updated Thesaurus on PQ interface
  39. 39. Questions?thank you!