Your SlideShare is downloading. ×
ProQuest Taxonomy Boot Camp Presentation 2008
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

ProQuest Taxonomy Boot Camp Presentation 2008

177
views

Published on

Paul McCoy of ProQuest talks about life there before and after implementing the taxonomy management software from Synaptica.

Paul McCoy of ProQuest talks about life there before and after implementing the taxonomy management software from Synaptica.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
177
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Finding a Common Language:Bringing Complex and Disparate Vocabularies Together Paula R. McCoy Manager, Taxonomy Development ProQuest paula.mccoy@proquest.com
  • 2. Part of Cambridge Information Group & CSAHeadquartered in Ann Arbor, MichiganEditorial offices in Louisville, Kentucky
  • 3. Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current &historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries
  • 4. Louisville editors abstract & index 4,000+periodicals & newspapersProQuest Controlled Vocabulary used to indexsubjects; Authority Files used to indexcompany, geographic, personal, product namesCV applied to non-periodical & third-partycontent via mapping, to allow cross-searchingof multiple DBs with one vocabulary
  • 5. Topics of DiscussionDescription of ProQuest ControlledVocabulary & Authority FilesTaxonomy Management -- OverviewLife Before SynapticaThesaurus Management System PurchaseImplementing SynapticaLife With SynapticaQ&A
  • 6. PQ CV ProQuest Controlled Vocabulary Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) Created in 1970s for ABI/INFORM business database Based on Library of Congress Subject Headings
  • 7. PQ CV ProQuest Controlled Vocabulary Merged with general reference vocabulary in 1980s Major development effort in past 4 years to boost science, education & medical terms Thesaurus subjects: Business, economics & trade – 4300 terms Science, math & technology – 1600 terms Medicine – 1150 terms Humanities – 960 terms Government & policy – 850 terms Education – 400 terms
  • 8. PQ CV ProQuest CV: Statistics Preferred terms: 11,046 Non-preferred terms: 5631 Scope Notes: 3194 (29%) Cross-references (Broader, Narrower, Related terms): 67,700 Terms added in 2007: 77 Terms added in 2008: 58+
  • 9. PQ CV Authority Files: Statistics Corporate/Organization Names: 438,098 Names added in 2008: 5489 Personal Names: 416,239 Names added in 2008: 1526 Geographic (Location) Names: 34,331 Names added in 2008: 144 Product Names: 38,210 Names added in 2008: 54
  • 10. Taxonomy Management The Taxonomy Manager’s Job Add subject terms as dictated by new concepts & new content to index Maintain hierarchies & Scope Notes Load updated Thesaurus to ProQuest interface Manage authority files to maintain standards & control file size
  • 11. Taxonomy Management The Taxonomy Manager’s Job OBJECTIVE: To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest
  • 12. Taxonomy Management Thesaurus on ProQuest®
  • 13. Taxonomy Management Sample Subject Term Preferred, or main term Scope note defining term and how it is used Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow Non-preferred term: points UF COPD to term used to index BT Disease Terms broader in nature to BT Respiratory diseases main term: COPD is a NT Asthma disease, and specifically, a NT Bronchitis respiratory disease NT Emphysema Terms narrower in nature RT Airway management to main term: these are RT Lungs chronic lung diseases Terms related to main term that might be used to narrow the search
  • 14. Before Synaptica Managing terms meant:Multiple files  Duplicate entries  Errors = less than ideal thesaurus management
  • 15. Before Synaptica MS Word Document
  • 16. Before Synaptica Vocabulary Documents in Word ProQuest controlled vocabulary French-language controlled vocabulary German-language controlled vocabulary Spanish-language controlled vocabulary Combined PQ-CBCA controlled vocabulary Ethnic database vocabulary, English Ethnic database vocabulary, Spanish
  • 17. Before Synaptica Foreign-Language Vocabularies French German Spanish
  • 18. Before Synaptica Oracle Database Forms
  • 19. Before Synaptica Authority Files in Oracle Class codes (related to subjects) CORP names (391,665+ terms) NAIC codes (related to companies) GEOG names (32,000+ terms) PERS names (350,000+ terms) PROD names (38,000+ terms)
  • 20. Before Synaptica Adding New Terms 1. Enter full term hierarchy into new Word doc 2. Copy term into main Word-based vocabulary & enter reciprocal relationships 3. Enter term & relationships into Oracle 4. Review next-day report on Oracle activity 5. Send new term doc to editors via e-mail 6. Print new vocabulary (at least every two years)
  • 21. SN BT Class Code [whew!]UF NT RT
  • 22. TMS Purchase Thesaurus Management Systems Buying Criteria Synaptica Up to 40 admin & 100 in real time within multiple locations 1. Ability to interact read-only users editorial system Ability to load vocabs from multiple Word docs & Oracle authority filesaccommodate authority files of 400,000+ 2. Ability to names Support for foreign-language vocabularies Ability to add new vocabularies Vendor onsite installation & training Software upgrades & tech support
  • 23. Implementing Synaptica Implementing Synaptica Contract signed and work begun in August 2004 PQ sent to Synaptica all the Word & Oracle files for analysis Decision points: how to load & structure data; how to handle “suspect” or erroneous relationships
  • 24. Implementing Synaptica Synaptica Data Analysis Relationship Validation Tests: Term Uniqueness Use Violations Self-Referencing Relationships One Relationship per Term Pair Relationship Unique Relationship Reciprocates Circular References Exception Reports delivered to PQ; Errors fixed before production
  • 25. Implementing Synaptica Use Validation Error Marine resources
  • 26. Implementing Synaptica Foreign-Language Errors Terms with no language equivalent (LEQ), e.g., no translation In all 3 languages, multiple English terms with the same translation, e.g.: English term French term French term-revised Purchasing Achats Shopping Achats Shopping Buyers Acheteurs Purchasing agents Acheteurs Agents dachat
  • 27. Implementing Synaptica Final Challenge Issue: Different editorial systems = 2x data entry: once for Synaptica, once for Oracle Solution: Overnight synchronization process to copy Synaptica work into Oracle every night Synch process discontinued April 2008
  • 28. Implementing Synaptica Putting Synaptica Into Production Nov 2004 Train users — provide documentation & hands-on demonstrative training Deal with people resistant to change Encourage written feedback on system functionality Send feedback to Synaptica – many of our suggestions implemented in later versions
  • 29. Life With Synaptica Life With Synaptica Terms Management Made Easy! Word – Old, Bad  Synaptica – New, Good 
  • 30. Life With Synaptica Adding Terms Today: 3 Easy Steps 1. Enter term and relationships into Synaptica “Item Details” window 2. Export report of new terms into Word 3. Send Word document to editors
  • 31. Life With Synaptica Improving Thesaurus Management Categories Feature
  • 32. Life With Synaptica Subject Term Categories
  • 33. Life With Synaptica CORP Names – Categories & Website
  • 34. Life With Synaptica Foreign-Language Vocabularies Language Equivalents
  • 35. Life With Synaptica Foreign-Language Vocabularies
  • 36. Life With Synaptica Foreign-Language Vocabularies Spanish Spanish Alphabetical by language German French
  • 37. Life With Synaptica Synaptica Updates Synaptica version 6.0 released in early 2006 Synaptica version 7.0 is being implemented now: • Enhanced user interface • Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration • Expanded Reporting functionality • Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing • Improved global term editing • Online help and user guides
  • 38. Life With Synaptica Benefits of Synaptica Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non-preferred” instead of Use and Used For Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics Increase in Company name NPTs — from 1935 to 8952 today Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs Easier loading of updated Thesaurus on PQ interface
  • 39. Questions?thank you!