Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
 
Post to Twitter Post to Twitter
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons
SlideShare is now available on LinkedIn. Add it to your LinkedIn profile.

Centralized Taxonomy Management for Enterprise Information Systems

From danielabarbosa, 1 month ago Add as contact

Daniela Barbosa, Synaptica Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company
Paula R McCoy, Manager, Taxonomy Development, ProQuest

Now that you have built your taxonomies, you need to manage and maintain them in a centralized environment that can be leveraged by all of your enterprise applications including search tools, portals, and CMS/DMS systems. This session will review some best practices in centralized taxonomy management and go through the implementation of a thesaurus management tool at ProQuest, which enabled them to create a common language to connect disparate information assets using large and varied vocabularies and authority files linked to new and existing editorial systems.

676 views | 0 comments | 1 favorites | 19 downloads | 2 embeds (Stats)

Groups/Events

Embed in your blog options close
Embed (wordpress.com) Exclude related slideshows Embed in your blog

More Info

This slideshow is Public
Total Views: 676 on Slideshare: 660 from embeds: 16
Most viewed embeds (Top 5): More
Flagged as inappropriate Flag as inappropriate

Flag as inappropriate

Select your reason for flagging this slideshow as inappropriate.

If needed, use the feedback form to let us know more details.

Slideshow Transcript

  1. Slide 1: Centralized Taxonomy Management for Enterprise Information Systems Enterprise Search Summit Wednesday, September 24th, 2:00 pm – 2:30 pm Dow Jones Client Solutions ProQuest Synaptica Manager, Taxonomy Development daniela.barbosa@dowjones.com paula.mccoy@proquest.com © Copyright 2007 Dow Jones and Company, Inc.
  2. Slide 2: Dow Jones Taxonomy Solutions Words Expertise Tools  Dow Jones taxonomy  Taxonomy Assessment  Synaptica: licensing  Other taxonomy licensing  Taxonomy Consulting Taxonomy / Metadata (Taxonomy Warehouse) -- Management Tool  Analysis  Taxonomy customization  Taxonomy development  Recommendations  Implementation  Workshops © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  3. Slide 3: Some Definitions A taxonomy is a hierarchical topic structure to which information can be assigned through the dual processes of classification (filing to a location) and categorisation (tagging with relevant metadata). A taxonomy provides browsable navigation and supports filtered searching A thesaurus is a controlled vocabulary linking an organisation’s common language to its taxonomy structure. It accommodates synonyms, acronyms, language variants and other near equivalences. It also signposts non-hierarchical linkages within and across the taxonomy facets. A thesaurus is usually employed to interpret and guide user search queries An ontology is the working model of entities and interactions in a particular domain of knowledge or content set. It is a set of concepts - such as things, events, and relations - that are specified in some way in order to create an agreed-upon vocabulary for exchanging information. An ontology is increasingly used to visualise (or map) a set of search results and discover new or hidden connections © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  4. Slide 4: UP Multi- SIDEWAYS Directional DOWN Classic taxonomy… Traditional thesaurus… Emerging ontology… groups things or captures the different shows a network of concepts into families names of the family multi-dimensional members and explores relationships and some more distant properties both within and associations outside the family groups (cousins & close friends) © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  5. Slide 5: UP Multi- SIDEWAYS Directional DOWN Telephones Mobile Phones Mobile Phones Is a broader term than AKA as Cell Phones & Are made by Mobile Phones Hand Phones Phone Manufacturers And Similar to And use the networks of Hand Held Devices Telecoms & PDAs Service Providers © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  6. Slide 6: •Metadata’s Evolutionary Path Controlled Ontologies Vocabulary Thesauri Hierarchical Taxonomies Structured Metadata is evolving Authority Files organically – the less complex metadata elements Dictionaries form the building blocks for & Flat Lists creating the more complex structures © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  7. Slide 7: Practical Applications  Portal navigation and browsable website menus  Conceptual access to large databases  Records management and cataloging  e-Commerce online product catalogues  Inventory control and de-duplication  Auto-classification of internal documents and email  Multilingual search and browse  Metasearch of enterprise-wide resources © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  8. Slide 8: Centralized Taxonomy and Metadata Management As a centralized repository for multi-lingual semantic management that is: - Independent from systems like web-portal search and categorization systems - Scalable; capable of evolving with emerging corporate semantic standards Synaptica® HTML P e CSV Portals Portals Multiple users r Categorizers m XML working in i Centralized collaborative and s Taxonomy ZThes Portals Portals compartmentalized s SKOS Search Engines i Management space System OWL o n Web Portals Portals s Services Content Portals © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  9. Slide 9: Why Centralized?  Metadata can transcend information islands and data silos but only if the enterprise is committed to common standards  A centralized system that supports both collaboration and compartmentalization allows common metadata to be shared while also allowing user communities the independence to manage specialized metadata files © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  10. Slide 10: Why Independent?  Enterprises are increasingly making use of multiple proprietary and open source software tools for categorization, search and portal tasks  While many of these tools support some level of metadata management the diversity of standards, data formats and business rules they support can actually result in exacerbating the data silo problem by creating metadata silos © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  11. Slide 11: Where taxonomy fits with Search Search Engine Taxonomy & Metadata Platform Information Processing, Management and Storage Shared News & DMS CMS Data Docs Research © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  12. Slide 12: 4 Good Reasons for Taxonomy Search Relevancy Knowledge Worker Productivity Search Completeness Effective Research/Risk Mitigation Search Federation Better & Faster Decisions Search Visualisation Discovery & Innovation © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  13. Slide 13: 1. Improved Search Relevancy  Ambiguity of Language  Is a Blackberry a fruit or a handheld device?  By including this brand name in a taxonomy we can give context to the user search query  In a telecoms domain we can assume that the user means the latter and only return content tagged as such  Alternatively we can weight the results, promoting those documents about handheld devices above those that refer to the fruit  Either way the result is increased search precision which translates into time savings © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  14. Slide 14: 2. Improved Search Completeness  Synonymous and Related Term Relationships  Mobile Phone (PT) = Cell Phone (NPT) = Hand Phone (NPT)  Mobile Phone is related to Hand Held Device (RT)  User Search Query = “Cell Phones”  The taxonomy simultaneously broadens the search and prioritises the returned results giving increased recall without compromising relevancy  Content tagged with Mobile Phone category are promoted over those not tagged using a weighting in the search algorithm  Content tagged with Hand Held Device category may also receive a weighting © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  15. Slide 15: 3. Search federation and data integration  A snapshot or dashboard is often more desirable than a list of document titles or snippets, especially when looking for information on a customer, supplier or competitor  Also, information will most likely reside in a number of internal repositories, each with their own levels of metadata structure  Taxonomy allows the combination of news, internal CI reports, price plans, coverage data, market share data, share price etc. in one consolidated view by providing mappings or cross-walks  This is essentially applying business intelligence discipline to the world of unstructured information © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  16. Slide 16: 4. Search Visualisation  The previous three scenarios assume the user knows what they are looking for  But what about serendipitous discovery?  By being able see across an aggregation of content and extract facts and relationships from deep within the information stores, true (and sometimes fortunate) discovery can take place © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  17. Slide 17: Back End CIOs; CTOs; Front End Information Structure IT Architects Information Intelligence Filing & Storage Search Taxonomies Engine Thesauri Metadata Tagging Ontologies Navigation (Categorisation) Synaptica® Process Vocabulary & Metadata Visualisation Management Document, Content & Records Intranet / Portal Management User Interface Librarians; Taxonomists; Indexers; Knowledge & Information Managers Information Creators; Information Users Records Managers; (the business; the public) Content Managers; Librarians; Indexers © Copyright 2007 Dow Jones and Company, Inc. Proprietary and Confidential |
  18. Slide 18: Centralized Taxonomy Management for Enterprise Information Systems Paula R. McCoy Manager, Taxonomy Development ProQuest paula.mccoy@proquest.com
  19. Slide 19: Topics of Discussion Description of ProQuest Controlled Vocabulary & Authority Files Taxonomy Management -- Overview Managing Terms Manually Synaptica Thesaurus Management System
  20. Slide 20: Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current & historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries
  21. Slide 21: ProQuest Controlled Vocabulary used to index subjects; Authority Files used to index company, geographic, personal, product names CV applied to non-periodical & third-party content via mapping, to allow cross-searching of multiple DBs with one vocabulary
  22. Slide 22: ProQuest Controlled Vocabulary Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) Created in 1970s for ABI/INFORM business database Based on Library of Congress Subject Headings
  23. Slide 23: ProQuest Controlled Vocabulary Merged with general reference vocabulary in 1980s Major development effort in past 4 years to boost science, education & medical terms Thesaurus subjects: Business, economics & trade – 4300 terms Science, math & technology – 1600 terms Medicine – 1150 terms Humanities – 960 terms Government & policy – 850 terms Education – 400 terms
  24. Slide 24: ProQuest CV: Statistics Preferred terms: 11,046 Non-preferred terms: 5631 Scope Notes: 3194 (29%) Cross-references (Broader, Narrower, Related terms): 67,700 Terms added in 2007: 77 Terms added in 2008: 58+
  25. Slide 25: Authority Files: Statistics Corporate/Organization Names: 438,098 Names added in 2008: 5489 Personal Names: 416,239 Names added in 2008: 1526 Geographic (Location) Names: 34,331 Names added in 2008: 144 Product Names: 38,210 Names added in 2008: 54
  26. Slide 26: The Taxonomy Manager’s Job Add subject terms as dictated by new concepts and new content to index Maintain hierarchies & Scope Notes Load updated Thesaurus to ProQuest interface Manage authority files to maintain standards & control file size
  27. Slide 27: The Taxonomy Manager’s Job OBJECTIVE: To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest
  28. Slide 28: Sample Subject Term Preferred, or main term Scope note defining term and how it is used Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow Non-preferred term: points UF COPD to term used to index BT Disease Terms broader in nature to BT Respiratory diseases main term: COPD is a NT Asthma disease, and specifically, a NT Bronchitis respiratory disease NT Emphysema Terms narrower in nature RT Airway management to main term: these are RT Lungs chronic lung diseases Terms related to main term that might be used to narrow the search
  29. Slide 29: Managing Terms Manually Seven MS Word vocabulary documents— English and foreign language (French, German, Spanish)—printed for internal use only Six authority files & 3 vocabulary files in Oracle databases, requiring duplicate entry of subject terms in Word and Oracle Legacy editorial system in process of being replaced New scientific content requiring a huge enhancement to vocabulary
  30. Slide 30: Thesaurus Management Systems Thesaurus Management System: Buying Criteria Requirements Eliminate double entry Automate entry of reciprocal relationships Improve editorial interface with vocabulary
  31. Slide 31: Life With Synaptica Word – Old, Bad  Synaptica – New, Good 
  32. Slide 32: Adding Terms Today: 3 Easy Steps 1. Enter term and relationships into Synaptica “Item Details” window 2. Export report of new terms into Word 3. Send Word document to editors
  33. Slide 33: Improving Thesaurus Management Categories Feature
  34. Slide 34: Subject Term Categories
  35. Slide 35: CORP Names – Categories & Website
  36. Slide 36: Foreign-Language Vocabularies Language Equivalents
  37. Slide 37: Life With Synaptica Foreign-Language Vocabularies Spanish Spanish Alphabetical by language German French
  38. Slide 38: Synaptica Updates Synaptica version 6.0 released in early 2006 Synaptica version 7.0 is being implemented now: • Enhanced user interface • Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration • Expanded Reporting functionality • Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing • Improved global term editing • Online help and user guides
  39. Slide 39: Benefits of Synaptica Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non-preferred” instead of Use and Used For Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics Increase in Company name NPTs — from 1935 to 8952 today Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs Easier loading of updated Thesaurus on PQ interface
  40. Slide 40: thank you!