Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
 
Post to Twitter Post to Twitter
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons
SlideShare is now available on LinkedIn. Add it to your LinkedIn profile.

Finding a Common Language: Bringing Complex and Disparate Vocabularies Together

From danielabarbosa, 1 month ago Add as contact

This case study addresses the challenges ProQuest faced in managing multilingual controlled vocabularies using multiple Word documents and authority files maintained in an Oracle database. Speakers describe how implementing a thesaurus management tool helped ProQuest simplify and standardize its business semantic management to create a common language and connect disparate information assets as well as handling large and varied vocabularies and authority files, linking new and existing editorial systems and enabling hierarchical views, and automating thesaurus management tasks.

202 views | 0 comments | 0 favorites | 9 downloads | 2 embeds (Stats)

Groups/Events

Embed in your blog options close
Embed (wordpress.com) Exclude related slideshows Embed in your blog

More Info

This slideshow is Public
Total Views: 202 on Slideshare: 192 from embeds: 10
Most viewed embeds (Top 5): More
Flagged as inappropriate Flag as inappropriate

Flag as inappropriate

Select your reason for flagging this slideshow as inappropriate.

If needed, use the feedback form to let us know more details.

Slideshow Transcript

  1. Slide 1: Finding a Common Language: Bringing Complex and Disparate Vocabularies Together Paula R. McCoy Manager, Taxonomy Development ProQuest paula.mccoy@proquest.com
  2. Slide 2: Part of Cambridge Information Group & CSA Headquartered in Ann Arbor, Michigan Editorial offices in Louisville, Kentucky
  3. Slide 3: Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current & historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries
  4. Slide 4: Louisville editors abstract & index 4,000+ periodicals & newspapers ProQuest Controlled Vocabulary used to index subjects; Authority Files used to index company, geographic, personal, product names CV applied to non-periodical & third-party content via mapping, to allow cross-searching of multiple DBs with one vocabulary
  5. Slide 5: Topics of Discussion Description of ProQuest Controlled Vocabulary & Authority Files Taxonomy Management -- Overview Life Before Synaptica Thesaurus Management System Purchase Implementing Synaptica Life With Synaptica Q&A
  6. Slide 6: PQ CV ProQuest Controlled Vocabulary Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) Created in 1970s for ABI/INFORM business database Based on Library of Congress Subject Headings
  7. Slide 7: PQ CV ProQuest Controlled Vocabulary Merged with general reference vocabulary in 1980s Major development effort in past 4 years to boost science, education & medical terms Thesaurus subjects: Business, economics & trade – 4300 terms Science, math & technology – 1600 terms Medicine – 1150 terms Humanities – 960 terms Government & policy – 850 terms Education – 400 terms
  8. Slide 8: PQ CV ProQuest CV: Statistics Preferred terms: 11,046 Non-preferred terms: 5631 Scope Notes: 3194 (29%) Cross-references (Broader, Narrower, Related terms): 67,700 Terms added in 2007: 77 Terms added in 2008: 58+
  9. Slide 9: PQ CV Authority Files: Statistics Corporate/Organization Names: 438,098 Names added in 2008: 5489 Personal Names: 416,239 Names added in 2008: 1526 Geographic (Location) Names: 34,331 Names added in 2008: 144 Product Names: 38,210 Names added in 2008: 54
  10. Slide 10: Taxonomy Management The Taxonomy Manager’s Job Add subject terms as dictated by new concepts & new content to index Maintain hierarchies & Scope Notes Load updated Thesaurus to ProQuest interface Manage authority files to maintain standards & control file size
  11. Slide 11: Taxonomy Management The Taxonomy Manager’s Job OBJECTIVE: To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest
  12. Slide 12: Taxonomy Management Thesaurus on ProQuest®
  13. Slide 13: Taxonomy Management Sample Subject Term Preferred, or main term Scope note defining term and how it is used Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow Non-preferred term: points UF COPD to term used to index BT Disease Terms broader in nature to BT Respiratory diseases main term: COPD is a NT Asthma disease, and specifically, a NT Bronchitis respiratory disease NT Emphysema Terms narrower in nature RT Airway management to main term: these are RT Lungs chronic lung diseases Terms related to main term that might be used to narrow the search
  14. Slide 14: Before Synaptica Managing terms meant: Multiple files  Duplicate entries  Errors = less than ideal thesaurus management
  15. Slide 15: Before Synaptica MS Word Document Version 2004 ProQuest Controlled Vocabulary of Subject Terms Page 3 Academic degrees Academic guidance counseling Academic underachievement SN: A title conferred on students upon UF: Guidance counseling SN: Student performance that is below graduating from a program of Student counseling standards or below potential study at a college or university BT: Counseling RT: Academic achievement UF: Associates degree Education Academic achievement gaps Bachelors degree RT: Career preparation Academic failure Doctoral degree Counselor client relationships Academic standards Masters degree Counselor education At risk students BT: Academic achievement School counseling Grade repetition RT: Colleges & universities Social promotion Graduate studies Academic libraries Graduation requirements UF: College libraries Academy awards Higher education School libraries UF: Oscars (Motion picture awards) MBA programs & graduates BT: Libraries BT: Awards & honors RT: Librarians Motion picture industry Academic failure Library resources RT: Actors SN: The failure of a student to meet academic standards, including Academic marketing Acadian culture failure to be promoted or to SN: Efforts of educational institutions UF: Cajuns graduate to attract students and funding BT: Minority & ethnic groups UF: Student failure BT: Marketing RT: Academic achievement NT: Student recruitment Accelerated cost recovery system Academic grading RT: Admissions policies CC: 4210 Academic probation College admissions UF: ACRS Academic underachievement College choice BT: Cost recovery At risk students Colleges & universities Depreciation Grade repetition Enrollment management Depreciation methods Graduation requirements Enrollments NT: Modified accelerated cost School dropouts recovery system Social promotion Academic probation RT: Capital cost recovery allowances RT: Academic failure Declining balance method Academic freedom Academic grading Depreciable assets SN: Educators’ freedom to teach and Academic underachievement Tax basis research what they choose BT: Education Academic standards Accelerated death benefits RT: Colleges & universities SN: Standards for performance in CC: 4220 Curricula defined academic areas set at the CC: 8210 Research local, state, or federal levels UF: Living benefits Teachers BT: Standards Viatical settlement Teaching RT: Academic achievement BT: Death benefits Academic achievement gaps RT: Estate planning Academic grading Academic underachievement Hardship distributions UF: Grading of students Achievement tests Insurance policies BT: Academic achievement Core curriculum Life insurance RT: Academic failure Education policy Riders Academic probation Educational evaluation Terminal illnesses Achievement tests No Child Left Behind Act 2001-US Cheating Quality of education Education portfolios School effectiveness Accelerated depreciation methods Educational evaluation Standardized tests USE: Depreciation methods Tests Ke y: S N=Sc o pe no te CC=Clas s ific atio n c o de UF=Us e fo r BT=Bro ade r te rm NT=Narro we r te rm RT=Re late d te rm
  16. Slide 16: Before Synaptica Vocabulary Documents in Word ProQuest controlled vocabulary French-language controlled vocabulary German-language controlled vocabulary Spanish-language controlled vocabulary Combined PQ-CBCA controlled vocabulary Ethnic database vocabulary, English Ethnic database vocabulary, Spanish
  17. Slide 17: Before Synaptica Oracle Database Forms
  18. Slide 18: Before Synaptica Authority Files in Oracle Class codes (related to subjects) CORP names (391,665+ terms) NAIC codes (related to companies) GEOG names (32,000+ terms) PERS names (350,000+ terms) PROD names (38,000+ terms)
  19. Slide 19: Before Synaptica Foreign-Language Vocabularies French German Spanish
  20. Slide 20: Before Synaptica Adding New Terms 1. Enter full term hierarchy into new Word doc 2. Copy term into main Word-based vocabulary & enter reciprocal relationships 3. Enter term & relationships into Oracle 4. Review next-day report on Oracle activity 5. Send new term doc to editors via e-mail 6. Print new vocabulary (at least every two years)
  21. Slide 21: TMS Purchase Thesaurus Management Systems
  22. Slide 22: TMS Purchase Buying Criteria Buying Criteria Up to 40 admin & 100 read-only users in multiple locations Ability to load vocabs from multiple Word docs & Oracle authority files Ability to add new vocabularies Support for foreign language vocabularies Vendor onsite installation & training Software upgrades & tech support
  23. Slide 23: TMS Purchase Buying Criteria 1. Ability to interact in real time with editorial system 2. Ability to accommodate authority files of 400,000+ names
  24. Slide 24: Implementing Synaptica Implementing Synaptica Contract signed and work begun in August 2004 PQ sent to Synaptica all the Word & Oracle files for analysis Decision points: how to load & structure data; how to handle “suspect” or erroneous relationships
  25. Slide 25: Implementing Synaptica Synaptica Data Analysis Relationship Validation Tests: Term Uniqueness Use Violations Self-Referencing Relationships One Relationship per Term Pair Relationship Unique Relationship Reciprocates Circular References Exception Reports delivered to PQ; Errors fixed before production
  26. Slide 26: Implementing Synaptica Use Validation Error Marine resources Marine ecology SN: The ecology of the seas and oceans UF: Benthic ecology BT: Ecology RT: Marine conservation Marine pollution Marine resources Oceans Marine resources USE: Underwater resources Marine pollution BT: Pollution Water pollution RT: Marine conservation Underwater resources Marine ecology UF: Marine resources Ocean dumping BT: Natural resources Marine resources RT: Marine conservation Marine ecology Marine pollution
  27. Slide 27: Implementing Synaptica Foreign-Language Errors Terms with no language equivalent (LEQ), e.g., no translation In all 3 languages, multiple English terms with the same translation, e.g.: English term French term French term-revised Purchasing Achats Shopping Achats Shopping Buyers Acheteurs Purchasing agents Acheteurs Agents d'achat
  28. Slide 28: Implementing Synaptica Final Challenge Issue: Different editorial systems = 2x data entry: once for Synaptica, once for Oracle Solution: Overnight synchronization process to copy Synaptica work into Oracle every night Synch process discontinued April 2008
  29. Slide 29: Implementing Synaptica Putting Synaptica Into Production Nov 2004 Train users — provide documentation & hands-on demonstrative training Deal with people resistant to change Encourage written feedback on system functionality Send feedback to Synaptica – many of our suggestions implemented in later versions
  30. Slide 30: Life With Synaptica Life With Synaptica Word – Old, Bad  Synaptica – New, Good 
  31. Slide 31: Life With Synaptica Adding Terms Today: 3 Easy Steps 1. Enter term and relationships into Synaptica “Item Details” window 2. Export report of new terms into Word 3. Send Word document to editors
  32. Slide 32: Life With Synaptica Synaptica Updates Synaptica version 6.0 released in early 2006 Synaptica version 7.0 is being implemented now: • Enhanced user interface • Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration • Expanded Reporting functionality • Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing • Improved global term editing • Online help and user guides
  33. Slide 33: Life With Synaptica Benefits of Synaptica Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non- preferred” instead of Use and Used For Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics Increase in Company name NPTs — from 1935 to 8952 today Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs Easier loading of updated Thesaurus on PQ interface