Your SlideShare is downloading. ×
Putting Controlled Vocabulary To Work I Davis 2008
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Putting Controlled Vocabulary To Work I Davis 2008

585
views

Published on

Published in: Business, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
585
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Putting Structured Business Vocabularies to Work November 4, 2008 Data Management and Information Quality Conference IRM UK Ian Davis Global Project Manger, Dow Jones & Company © Copyright 2008 Dow Jones and Company, Inc.
  • 2. What we’ll cover today:  Understanding the challenges of controlled versus uncontrolled vocabularies  Developing a strategy to create and maintain controlled vocabularies  Identifying how you want to integrate your controlled vocabularies into your systems  Understanding the requirements of integrating controlled vocabularies into multiple applications © Copyright 2008 Dow Jones and Company, Inc. 2
  • 3. Setting the Context © Copyright 2008 Dow Jones and Company, Inc.
  • 4. Once upon a time…  Most of the business was IT enabled.  There was some degree of “sharing” of information and content, there were even some large, well structured document repositories.  Yet, no one could find anything.  Actually, they found things,  but not what they wanted when they wanted it  and they were never sure they found the “best” or “saw it all”. © Copyright 2008 Dow Jones and Company, Inc. 4
  • 5. Once upon a time…  The C-level executives were a bit irritated.  They’d spent lots on the technology  and people really weren’t much more efficient,  the pinch point in the workflow had simply moved further downstream.  So, what happened next? © Copyright 2008 Dow Jones and Company, Inc. 5
  • 6. Once upon a time…  They SPENT <more> MONEY and bought the best in class search utilities.  Yet, no one could find anything.  Actually, they found things,  but not what they wanted when they wanted it  and they were never sure they found the “best” or “saw it all”. © Copyright 2008 Dow Jones and Company, Inc. 6
  • 7. Once upon a time…  The C-level executives became a bit more irritated.  Everyone was a bit frustrated.  What was missing? © Copyright 2008 Dow Jones and Company, Inc. 7
  • 8. Optimized?  Is the search utility optimized using all the bells and whistles it came with?  Relevancy rankings  “Thesaurus” files (synonym lists)  Multi-lingual capabilities  Common searches saved and presented to users  Logs reviewed to understand user issues © Copyright 2008 Dow Jones and Company, Inc. 8
  • 9. Usable?  Is the user interface considerate to users?  Was it designed with YOUR users in mind  Designed for occasional users?  Designed for power users?  Was it designed with YOUR business in mind  Task-based views for context sensitive searches  Present results in a format readily used within work flows © Copyright 2008 Dow Jones and Company, Inc. 9
  • 10. Metadata?  Are there required metadata fields within the CMS?  Author, Title, Language, Topic, Product/Service, etc  Are the entry values to those fields controlled?  Lookups against authority files, taxonomies, thesauri  Does the search utility support fielded searches?  Does the search utility weight terms within metadata fields higher than free-text? © Copyright 2008 Dow Jones and Company, Inc. 10
  • 11. Metadata?  For example:  If a financial analyst enters the query term “stock” within the company’s knowledge base,  Will he get back results with the documents specifically discussing “stock” as a financial instrument listed first?  Or will he have to look through 100’s of documents discussing what’s relevant to him as well as every document that references free-text in the body of the document about:  soup stock (food industry),  cows (livestock industry),  or stock car racing (professional sports industry)? © Copyright 2008 Dow Jones and Company, Inc. 11
  • 12. Metadata?  Precise and comprehensive searches  Only if controlled vocabularies have been used to populate metadata fields AND  The search utility takes advantage of that by giving priority to query term occurrence within controlled value metadata fields OR  Fielded searches are enabled  e.g. <Author = Smith> + <Service = Consulting> + <Industry = Automotive> + <Date = January 2006> + <Content Type = Proposal> © Copyright 2008 Dow Jones and Company, Inc. 12
  • 13. Challenges: Controlled versus Uncontrolled © Copyright 2008 Dow Jones and Company, Inc.
  • 14. Controlled Vocabularies Explained  Authority files  e.g. Company’s active directory, ISO standard for Languages  Typically a flat list of allowed values  Taxonomies  e.g. Linnaean Classification (kingdom, phylum, class, order, family, genus, and species )  Typically includes only hierarchical relationships between terms  Thesauri  e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm)  Includes full set of semantic relationships defined between terms (hierarchical, associative, equivalence) © Copyright 2008 Dow Jones and Company, Inc. 14
  • 15. NASA Thesaurus – Sample Entry © Copyright 2008 Dow Jones and Company, Inc. 15
  • 16. Semantic Relationships  Hierarchical  Superordination - representing a class or a whole, and subordination - referring to members or parts  e.g. mammals and vertebrates  e.g. cherry pie and cherry pie slices  Equivalence  One concept expressed by two or more terms  e.g. dogs and canines  Associative  Terms that are conceptually linked, but not through hierarchy or equivalence  e.g. accounting and accountant © Copyright 2008 Dow Jones and Company, Inc. 16
  • 17. Challenges – Uncontrolled Vocabularies  Uncontrolled vocabularies are:  Comprehensive but noisy  Only comprehensive if synonym lists are used  Limited in their precision and relevancy  Time lost scanning through hundreds of “miss” hits  Reduced effectiveness of cross-repository searches  Limited ways to disambiguate ‘soup stock’ from ‘stock car’ © Copyright 2008 Dow Jones and Company, Inc. 17
  • 18. Challenges - Controlled Vocabularies  Controlled vocabularies can produce:  Potentially significant overhead effort (manual and technical)  Organizational politics can add YEARS to establishing an initial set of controlled vocabularies  A lack of basic understanding of what the controlled vocabularies are and how they work impedes effective development and utilization © Copyright 2008 Dow Jones and Company, Inc. 18
  • 19. Challenges - Controlled Vocabularies  Controlled vocabularies:  Richness and power comes from a full set of semantic relationships, not just hierarchical ones  Hierarchy supports the ability to narrow and broaden search queries  Association supports “did you mean” and “you might also want to look at”  Equivalence enables the use of familiar language to retrieve content which is conceptually on target but never uses their term  e.g. user enters dog and search utility expands query to include “canine, k-9, puppy” © Copyright 2008 Dow Jones and Company, Inc. 19
  • 20. Challenges - Controlled Vocabularies  Controlled vocabularies:  Richness and power comes at the cost of added complexity of development, implementation, integration and maintenance  Utilization of controlled vocabularies can produce performance issues  During search index creation  During query run time © Copyright 2008 Dow Jones and Company, Inc. 20
  • 21. Tackling the Challenges © Copyright 2008 Dow Jones and Company, Inc.
  • 22. Strategy – Creation and Maintenance  State the business case clearly  Benefits  Reduced time for knowledge discovery  Increased richness of knowledge discovery  Decreased risk to firm of making business decisions with partial information  Scope  One business unit or enterprise-wide?  Resource requirements  Skill sets (IS, IT, business knowledge)  Time commitment © Copyright 2008 Dow Jones and Company, Inc. 22
  • 23. Strategy – Creation and Maintenance  Tackle organizational politics head-on  Gain credibility and ensure usability by establishing a cross-functional working committee that will become the Review Committee  Include all major stakeholder groups and any interested parties (even the non-supporters)  Establish methods of broadly soliciting end-user input that will become a source of change requests during maintenance phases © Copyright 2008 Dow Jones and Company, Inc. 23
  • 24. Strategy – Creation and Maintenance  Additional considerations before you start:  How rigorous does it need to be?  What external standards should be adopted?  ANSI/NISO Z39.19-2005  British Standard – BS 8723  What internal standards should be developed?  Editorial Guidelines  Usage Guidelines  How extensive will it be?  Depth and breadth within and across facets  What about adaptability and flexibility  Will there be a need for local extensions? © Copyright 2008 Dow Jones and Company, Inc. 24
  • 25. Strategy – Creation and Maintenance  Additional considerations before you start:  Projected frequency of revisions  How quickly does the content base change with respect to concepts; is there significant content drift?  How volatile is the language?  Management consulting vs. accounting  Vocabulary Management Software  DON’T spend money just to spend money  However, you CAN’T manage controlled vocabularies in a spreadsheet  Buy the tool you need based on your documented functional requirements © Copyright 2008 Dow Jones and Company, Inc. 25
  • 26. Strategy – Integration Choices  Performance trade-offs  Store UIDs within content, then use look-up table at query run time  Store full-text of a term, then touch all content when taxonomy value changes (must re-assign new term value)  Version control  Use static versions of controlled vocabularies within CMS and search utilities, releasing new versions periodically  Use dynamic version of controlled vocabularies with continuous revisions occurring © Copyright 2008 Dow Jones and Company, Inc. 26
  • 27. Strategy – Integration Choices  Utilizing semantic relationships  Store full set (term values or UIDs) within content record OR  Store single UID and have search utility use reference tables to determine related terms  Display of semantic relationships  User interface considerations for effective presentation of non-hierarchically related terms © Copyright 2008 Dow Jones and Company, Inc. 27
  • 28. Strategy – Integration Choices Query entry (including ability to broaden or narrow current search results) Previous query statement user entered Related topics Browse navigation plus any auto-expansion done by engine (defined through options Associative relationships) Query results listing © Copyright 2008 Dow Jones and Company, Inc. 28
  • 29. Strategy – Multiple Applications  Expanding the adoption and use of controlled vocabularies  Know the business objectives of the applications  In conjunction with the search utility, does the controlled vocabulary enable this objective?  Are there metadata fields available within current application for the controlled vocabulary?  Does the business have resources to assign the controlled vocabulary?  What format does the controlled vocabulary need to be in to be integrated with the application? © Copyright 2008 Dow Jones and Company, Inc. 29
  • 30. Strategy – Multiple Applications  Additional considerations  Will there be conflicting version management needs?  How does search currently index these applications and will that change with the use of controlled vocabularies? © Copyright 2008 Dow Jones and Company, Inc. 30
  • 31. Five Key Points 1. Controlled vocabularies are a lever to improve precision and comprehensiveness 2. Controlled vocabularies are never finished – they are always a work in process 3. Search utilities can only be tweaked so far 4. Tapping into the richness of the semantic relationships between terms can be extremely powerful 5. There are lots of options for implementing and integrating controlled vocabularies © Copyright 2008 Dow Jones and Company, Inc. 31
  • 32. Thank you for your attention! Ian Davis ian.davis@dowjones.com © Copyright 2008 Dow Jones and Company, Inc.

×