0
Finding a Common Language:Bringing Complex and Disparate     Vocabularies Together             Paula R. McCoy     Manager,...
Part of Cambridge Information Group & CSAHeadquartered in Ann Arbor, MichiganEditorial offices in Louisville, Kentucky
Access to over 125 billion digital pages of content from  magazine, trade, & scholarly publications, current &historical n...
Louisville editors abstract & index 4,000+periodicals & newspapersProQuest Controlled Vocabulary used to indexsubjects; Au...
Topics of DiscussionDescription of ProQuest ControlledVocabulary & Authority FilesTaxonomy Management -- OverviewLife Befo...
PQ CV        ProQuest Controlled Vocabulary         Natural language, hierarchical vocabulary complying         with ANSI/...
PQ CV        ProQuest Controlled Vocabulary        Merged with general reference vocabulary in 1980s        Major developm...
PQ CV        ProQuest CV: Statistics          Preferred terms: 11,046          Non-preferred terms: 5631          Scope No...
PQ CV        Authority Files: Statistics         Corporate/Organization Names: 438,098         Names added in 2008: 5489  ...
Taxonomy Management                The Taxonomy Manager’s Job                      Add subject terms as dictated by new   ...
Taxonomy Management                The Taxonomy Manager’s Job                                OBJECTIVE:         To ensure ...
Taxonomy Management                      Thesaurus on ProQuest®
Taxonomy Management                        Sample Subject Term                                          Preferred, or main...
Before Synaptica         Managing terms meant:Multiple files  Duplicate entries  Errors = less than ideal thesaurus mana...
Before Synaptica                   MS Word Document
Before Synaptica                   Vocabulary Documents in Word                      ProQuest controlled vocabulary       ...
Before Synaptica                   Foreign-Language Vocabularies               French         German       Spanish
Before Synaptica                   Oracle Database Forms
Before Synaptica                   Authority Files in Oracle                    Class codes (related to subjects)         ...
Before Synaptica                          Adding New Terms                   1. Enter full term hierarchy into new Word do...
SN     BT     Class Code       [whew!]UF                  NT          RT
TMS Purchase               Thesaurus Management Systems                       Buying Criteria                         Syna...
Implementing Synaptica                         Implementing Synaptica                   Contract signed and work begun in ...
Implementing Synaptica                         Synaptica Data Analysis                          Relationship Validation Te...
Implementing Synaptica                         Use Validation Error                            Marine resources
Implementing Synaptica                         Foreign-Language Errors           Terms with no language equivalent (LEQ), ...
Implementing Synaptica                            Final Challenge              Issue:     Different editorial systems = 2x...
Implementing Synaptica             Putting Synaptica Into Production                                   Nov 2004           ...
Life With Synaptica                        Life With Synaptica                      Terms Management Made Easy!           ...
Life With Synaptica              Adding Terms Today: 3 Easy Steps                      1. Enter term and relationships int...
Life With Synaptica            Improving Thesaurus Management                      Categories Feature
Life With Synaptica                      Subject Term Categories
Life With Synaptica         CORP Names – Categories & Website
Life With Synaptica                  Foreign-Language Vocabularies                                            Language    ...
Life With Synaptica                  Foreign-Language Vocabularies
Life With Synaptica                  Foreign-Language Vocabularies                               Spanish                  ...
Life With Synaptica                               Synaptica Updates                        Synaptica version 6.0 released ...
Life With Synaptica                             Benefits of Synaptica                      Greater awareness of thesaurus ...
Questions?thank you!
Upcoming SlideShare
Loading in...5
×

ProQuest Taxonomy Boot Camp Presentation 2008

219

Published on

Paul McCoy of ProQuest talks about life there before and after implementing the taxonomy management software from Synaptica.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
219
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "ProQuest Taxonomy Boot Camp Presentation 2008"

  1. 1. Finding a Common Language:Bringing Complex and Disparate Vocabularies Together Paula R. McCoy Manager, Taxonomy Development ProQuest paula.mccoy@proquest.com
  2. 2. Part of Cambridge Information Group & CSAHeadquartered in Ann Arbor, MichiganEditorial offices in Louisville, Kentucky
  3. 3. Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current &historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries
  4. 4. Louisville editors abstract & index 4,000+periodicals & newspapersProQuest Controlled Vocabulary used to indexsubjects; Authority Files used to indexcompany, geographic, personal, product namesCV applied to non-periodical & third-partycontent via mapping, to allow cross-searchingof multiple DBs with one vocabulary
  5. 5. Topics of DiscussionDescription of ProQuest ControlledVocabulary & Authority FilesTaxonomy Management -- OverviewLife Before SynapticaThesaurus Management System PurchaseImplementing SynapticaLife With SynapticaQ&A
  6. 6. PQ CV ProQuest Controlled Vocabulary Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) Created in 1970s for ABI/INFORM business database Based on Library of Congress Subject Headings
  7. 7. PQ CV ProQuest Controlled Vocabulary Merged with general reference vocabulary in 1980s Major development effort in past 4 years to boost science, education & medical terms Thesaurus subjects: Business, economics & trade – 4300 terms Science, math & technology – 1600 terms Medicine – 1150 terms Humanities – 960 terms Government & policy – 850 terms Education – 400 terms
  8. 8. PQ CV ProQuest CV: Statistics Preferred terms: 11,046 Non-preferred terms: 5631 Scope Notes: 3194 (29%) Cross-references (Broader, Narrower, Related terms): 67,700 Terms added in 2007: 77 Terms added in 2008: 58+
  9. 9. PQ CV Authority Files: Statistics Corporate/Organization Names: 438,098 Names added in 2008: 5489 Personal Names: 416,239 Names added in 2008: 1526 Geographic (Location) Names: 34,331 Names added in 2008: 144 Product Names: 38,210 Names added in 2008: 54
  10. 10. Taxonomy Management The Taxonomy Manager’s Job Add subject terms as dictated by new concepts & new content to index Maintain hierarchies & Scope Notes Load updated Thesaurus to ProQuest interface Manage authority files to maintain standards & control file size
  11. 11. Taxonomy Management The Taxonomy Manager’s Job OBJECTIVE: To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest
  12. 12. Taxonomy Management Thesaurus on ProQuest®
  13. 13. Taxonomy Management Sample Subject Term Preferred, or main term Scope note defining term and how it is used Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow Non-preferred term: points UF COPD to term used to index BT Disease Terms broader in nature to BT Respiratory diseases main term: COPD is a NT Asthma disease, and specifically, a NT Bronchitis respiratory disease NT Emphysema Terms narrower in nature RT Airway management to main term: these are RT Lungs chronic lung diseases Terms related to main term that might be used to narrow the search
  14. 14. Before Synaptica Managing terms meant:Multiple files  Duplicate entries  Errors = less than ideal thesaurus management
  15. 15. Before Synaptica MS Word Document
  16. 16. Before Synaptica Vocabulary Documents in Word ProQuest controlled vocabulary French-language controlled vocabulary German-language controlled vocabulary Spanish-language controlled vocabulary Combined PQ-CBCA controlled vocabulary Ethnic database vocabulary, English Ethnic database vocabulary, Spanish
  17. 17. Before Synaptica Foreign-Language Vocabularies French German Spanish
  18. 18. Before Synaptica Oracle Database Forms
  19. 19. Before Synaptica Authority Files in Oracle Class codes (related to subjects) CORP names (391,665+ terms) NAIC codes (related to companies) GEOG names (32,000+ terms) PERS names (350,000+ terms) PROD names (38,000+ terms)
  20. 20. Before Synaptica Adding New Terms 1. Enter full term hierarchy into new Word doc 2. Copy term into main Word-based vocabulary & enter reciprocal relationships 3. Enter term & relationships into Oracle 4. Review next-day report on Oracle activity 5. Send new term doc to editors via e-mail 6. Print new vocabulary (at least every two years)
  21. 21. SN BT Class Code [whew!]UF NT RT
  22. 22. TMS Purchase Thesaurus Management Systems Buying Criteria Synaptica Up to 40 admin & 100 in real time within multiple locations 1. Ability to interact read-only users editorial system Ability to load vocabs from multiple Word docs & Oracle authority filesaccommodate authority files of 400,000+ 2. Ability to names Support for foreign-language vocabularies Ability to add new vocabularies Vendor onsite installation & training Software upgrades & tech support
  23. 23. Implementing Synaptica Implementing Synaptica Contract signed and work begun in August 2004 PQ sent to Synaptica all the Word & Oracle files for analysis Decision points: how to load & structure data; how to handle “suspect” or erroneous relationships
  24. 24. Implementing Synaptica Synaptica Data Analysis Relationship Validation Tests: Term Uniqueness Use Violations Self-Referencing Relationships One Relationship per Term Pair Relationship Unique Relationship Reciprocates Circular References Exception Reports delivered to PQ; Errors fixed before production
  25. 25. Implementing Synaptica Use Validation Error Marine resources
  26. 26. Implementing Synaptica Foreign-Language Errors Terms with no language equivalent (LEQ), e.g., no translation In all 3 languages, multiple English terms with the same translation, e.g.: English term French term French term-revised Purchasing Achats Shopping Achats Shopping Buyers Acheteurs Purchasing agents Acheteurs Agents dachat
  27. 27. Implementing Synaptica Final Challenge Issue: Different editorial systems = 2x data entry: once for Synaptica, once for Oracle Solution: Overnight synchronization process to copy Synaptica work into Oracle every night Synch process discontinued April 2008
  28. 28. Implementing Synaptica Putting Synaptica Into Production Nov 2004 Train users — provide documentation & hands-on demonstrative training Deal with people resistant to change Encourage written feedback on system functionality Send feedback to Synaptica – many of our suggestions implemented in later versions
  29. 29. Life With Synaptica Life With Synaptica Terms Management Made Easy! Word – Old, Bad  Synaptica – New, Good 
  30. 30. Life With Synaptica Adding Terms Today: 3 Easy Steps 1. Enter term and relationships into Synaptica “Item Details” window 2. Export report of new terms into Word 3. Send Word document to editors
  31. 31. Life With Synaptica Improving Thesaurus Management Categories Feature
  32. 32. Life With Synaptica Subject Term Categories
  33. 33. Life With Synaptica CORP Names – Categories & Website
  34. 34. Life With Synaptica Foreign-Language Vocabularies Language Equivalents
  35. 35. Life With Synaptica Foreign-Language Vocabularies
  36. 36. Life With Synaptica Foreign-Language Vocabularies Spanish Spanish Alphabetical by language German French
  37. 37. Life With Synaptica Synaptica Updates Synaptica version 6.0 released in early 2006 Synaptica version 7.0 is being implemented now: • Enhanced user interface • Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration • Expanded Reporting functionality • Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing • Improved global term editing • Online help and user guides
  38. 38. Life With Synaptica Benefits of Synaptica Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non-preferred” instead of Use and Used For Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics Increase in Company name NPTs — from 1935 to 8952 today Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs Easier loading of updated Thesaurus on PQ interface
  39. 39. Questions?thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×