Centralized Taxonomy Management for Enterprise Information Systems - Presentation Transcript
Centralized Taxonomy Management for Enterprise Information Systems Enterprise Search Summit Wednesday, September 24th, 2:00 pm – 2:30 pm Dow Jones Client Solutions ProQuest Synaptica Manager, Taxonomy Development [email_address] [email_address]
Dow Jones Taxonomy Solutions
Words
Dow Jones taxonomy licensing
Other taxonomy licensing (Taxonomy Warehouse)
Taxonomy customization
Taxonomy development
Expertise
Taxonomy Assessment
Taxonomy Consulting
Analysis
Recommendations
Implementation
Workshops
Tools
Synaptica:
Taxonomy / Metadata -- Management Tool
Some Definitions A taxonomy is a hierarchical topic structure to which information can be assigned through the dual processes of classification (filing to a location) and categorisation (tagging with relevant metadata ). A taxonomy provides browsable navigation and supports filtered search ing A thesaurus is a controlled vocabulary linking an organisation’s common language to its taxonomy structure. It accommodates synonyms, acronyms, language variants and other near equivalences. It also signposts non-hierarchical linkages within and across the taxonomy facets. A thesaurus is usually employed to interpret and guide user search queries An ontology is the working model of entities and interactions in a particular domain of knowledge or content set. It is a set of concepts - such as things, events, and relations - that are specified in some way in order to create an agreed-upon vocabulary for exchanging information. An ontology is increasingly used to visualise (or map) a set of search results and discover new or hidden connections
Classic taxonomy … groups things or concepts into families SIDEWAYS Traditional thesaurus … captures the different names of the family members and explores some more distant associations (cousins & close friends) Multi- Directional Emerging ontology … shows a network of multi-dimensional relationships and properties both within and outside the family groups UP DOWN
Telephones Is a broader term than Mobile Phones SIDEWAYS Mobile Phones AKA as Cell Phones & Hand Phones And Similar to Hand Held Devices & PDAs Multi- Directional Mobile Phones Are made by Phone Manufacturers And use the networks of Telecoms Service Providers UP DOWN
Metadata’s Evolutionary Path
Dictionaries & Flat Lists Hierarchical Taxonomies Controlled Vocabulary Thesauri Ontologies Structured Authority Files Metadata is evolving organically – the less complex metadata elements form the building blocks for creating the more complex structures
Portal navigation and browsable website menus
Conceptual access to large databases
Records management and cataloging
e-Commerce online product catalogues
Inventory control and de-duplication
Auto-classification of internal documents and email
Multilingual search and browse
Metasearch of enterprise-wide resources
Practical Applications
Centralized Taxonomy and Metadata Management As a centralized repository for multi-lingual semantic management that is: - Independent from systems like web-portal search and categorization systems - Scalable ; capable of evolving with emerging corporate semantic standards HTML CSV XML ZThes SKOS OWL Web Services Centralized Taxonomy Management System Synaptica ® Portals Portals Categorizers Portals Portals Search Engines Portals Portals Content Portals Multiple users working in collaborative and compartmentalized space P e r m i s s i o n s
Metadata can transcend information islands and data silos but only if the enterprise is committed to common standards
A centralized system that supports both collaboration and compartmentalization allows common metadata to be shared while also allowing user communities the independence to manage specialized metadata files
Why Centralized?
Enterprises are increasingly making use of multiple proprietary and open source software tools for categorization, search and portal tasks
While many of these tools support some level of metadata management the diversity of standards, data formats and business rules they support can actually result in exacerbating the data silo problem by creating metadata silos
Why Independent?
Where taxonomy fits with Search DMS CMS Shared Docs News & Research Data Search Engine Taxonomy & Metadata Platform Information Processing, Management and Storage
By including this brand name in a taxonomy we can give context to the user search query
In a telecoms domain we can assume that the user means the latter and only return content tagged as such
Alternatively we can weight the results, promoting those documents about handheld devices above those that refer to the fruit
Either way the result is increased search precision which translates into time savings
2. Improved Search Completeness
Synonymous and Related Term Relationships
Mobile Phone (PT) = Cell Phone (NPT) = Hand Phone (NPT)
Mobile Phone is related to Hand Held Device (RT)
User Search Query = “Cell Phones”
The taxonomy simultaneously broadens the search and prioritises the returned results giving increased recall without compromising relevancy
Content tagged with Mobile Phone category are promoted over those not tagged using a weighting in the search algorithm
Content tagged with Hand Held Device category may also receive a weighting
3. Search federation and data integration
A snapshot or dashboard is often more desirable than a list of document titles or snippets, especially when looking for information on a customer, supplier or competitor
Also, information will most likely reside in a number of internal repositories, each with their own levels of metadata structure
Taxonomy allows the combination of news, internal CI reports, price plans, coverage data, market share data, share price etc. in one consolidated view by providing mappings or cross-walks
This is essentially applying business intelligence discipline to the world of unstructured information
4. Search Visualisation
The previous three scenarios assume the user knows what they are looking for
But what about serendipitous discovery?
By being able see across an aggregation of content and extract facts and relationships from deep within the information stores, true (and sometimes fortunate) discovery can take place
Document, Content & Records Management Synaptica ® Vocabulary & Metadata Management Thesauri Ontologies Filing & Storage Metadata Tagging (Categorisation) Process Search Engine Visualisation Navigation Intranet / Portal User Interface Back End Information Structure Front End Information Intelligence Librarians; Taxonomists; Indexers; Knowledge & Information Managers Information Creators; Records Managers; Content Managers; Librarians; Indexers Information Users (the business; the public) Taxonomies CIOs; CTOs; IT Architects
Paula R. McCoy Manager, Taxonomy Development ProQuest [email_address] Centralized Taxonomy Management for Enterprise Information Systems
Description of ProQuest Controlled
Vocabulary & Authority Files
Taxonomy Management -- Overview
Managing Terms Manually
Synaptica Thesaurus Management System
Topics of Discussion
Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current & historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds Subscription-based ProQuest® online information service available in academic and public libraries
The Taxonomy Manager’s Job To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest OBJECTIVE:
Sample Subject Term Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow UF COPD BT Disease BT Respiratory diseases NT Asthma NT Bronchitis NT Emphysema RT Airway management RT Lungs Preferred, or main term Scope note defining term and how it is used Non-preferred term: points to term used to index Terms broader in nature to main term: COPD is a disease, and specifically, a respiratory disease Terms narrower in nature to main term: these are chronic lung diseases Terms related to main term that might be used to narrow the search
New scientific content requiring a huge enhancement to vocabulary
Seven MS Word vocabulary documents—
English and foreign language (French, German,
Spanish)—printed for internal use only
Six authority files & 3 vocabulary files in Oracle
databases, requiring duplicate entry of subject
terms in Word and Oracle
Legacy editorial system in process of being
replaced
Managing Terms Manually
Thesaurus Management Systems Buying Criteria Thesaurus Management System: Requirements
Eliminate double entry
Improve editorial interface with vocabulary
Automate entry of reciprocal relationships
Life With Synaptica Word – Old, Bad Synaptica – New, Good
Adding Terms Today: 3 Easy Steps 2. Export report of new terms into Word 1. Enter term and relationships into Synaptica “ Item Details” window 3. Send Word document to editors
Improving Thesaurus Management Categories Feature
Subject Term Categories
CORP Names – Categories & Website
Foreign-Language Vocabularies Language Equivalents
Foreign-Language Vocabularies Life With Synaptica Spanish German French Spanish Alphabetical by language
Synaptica Updates
Synaptica version 6.0 released in early 2006
Synaptica version 7.0 is being implemented now:
Enhanced user interface
Semantic Web standardization (RDF, OWL, SKOS) and
Web Services integration
Expanded Reporting functionality
Enhanced adding and editing of term relationships
including “rapid-fire” simple drag-and-drop editing
Improved global term editing
Online help and user guides
Benefits of Synaptica
Greater awareness of thesaurus standards and
terminology, e.g.: “preferred” and “non-preferred”
instead of Use and Used For
Long-needed updating and improvement in term
hierarchies; ability to provide thesaurus statistics
Increase in Company name NPTs — from 1935 to
8952 today
Immediate responsiveness to indexer needs —
real-time term additions, esp. NPTs and SNs
Easier loading of updated Thesaurus on PQ interface
Daniela Barbosa, Synaptica Business Development Man more
Daniela Barbosa, Synaptica Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company Paula R McCoy, Manager, Taxonomy Development, ProQuest
Now that you have built your taxonomies, you need to manage and maintain them in a centralized environment that can be leveraged by all of your enterprise applications including search tools, portals, and CMS/DMS systems. This session will review some best practices in centralized taxonomy management and go through the implementation of a thesaurus management tool at ProQuest, which enabled them to create a common language to connect disparate information assets using large and varied vocabularies and authority files linked to new and existing editorial systems. less
0 comments
Post a comment