Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Synaptica Proquest Talk Taxonomy Boot Camp 2009

1,904 views

Published on

Power Point presentation given by Dave Clarke, CEO, Synaptica, LLC and Paula McCoy of Proquest at Taxonomy Boot Camp 2009 in San Jose, California.

Published in: Technology, Business
  • Be the first to comment

Synaptica Proquest Talk Taxonomy Boot Camp 2009

  1. 1. Taxonomies: Tools or People? TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide When would one favor human indexing over machine indexing? An example of the human indexing effort is presented along with tools that can help with the process. An example of autocategorization is illustrated with a discussion of the reciprocal flow of information between the taxonomy management tool and the autocategorization tool. Speakers then discuss how structured vocabularies help refine categorizers and how feedback from the categorizer tool to the human editorial team contributes to the continual improvement of the vocabularies. by Dave Clarke & Paula McCoy
  2. 2. <ul><li>Dave Clarke </li></ul><ul><li>CEO </li></ul><ul><li>Synaptica, LLC </li></ul><ul><li>[email_address] </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide HUMAN VS. MACHINE & THE HUMAN OPTION
  3. 3. Humans will invent almost anything to save time TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide
  4. 4. Human or machine indexing – depends on the data and the user subtle & abstract concepts non-textual, e.g. images, sounds highly structured very high volume homogeneous topics mission-critical precision & recall noisy or incomplete results tolerable very quick turnaround TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide Humans Machines Size Time-sensitivity Generalist users Machine-readability Conceptual-abstraction Expert users Data-structure Homogeneity
  5. 5. Human indexing – the process TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide Data Set 1. Review the content 2. Consult the vocabularies 3. Either tag the content item or build an index table Controlled Vocabularies Index Table
  6. 6. Human indexing – a wish list of time-saving tools <ul><ul><li>Minimize switching between screens - integrate vocabulary search / browse with content interface </li></ul></ul><ul><ul><li>Filter specific metadata elements to restrict lookup to relevant vocabularies or subsets of vocabularies </li></ul></ul><ul><ul><li>Search-as-you-type access to controlled vocabularies </li></ul></ul><ul><ul><li>Tree-browse as an alternative to search </li></ul></ul><ul><ul><li>Redirect queries at any time by exploring semantic relationships </li></ul></ul><ul><ul><li>Inline definitional and indexer notes </li></ul></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide
  7. 7. Human indexing – a wish list of time-saving tools <ul><ul><li>Self-correcting substitution of variants with their preferred terms </li></ul></ul><ul><ul><li>Optional pre-population of possible target terms based on text matches </li></ul></ul><ul><ul><li>In-line submission of candidate terms where no appropriate term identified </li></ul></ul><ul><ul><li>Optional automatic expansion of tag-set to include variants, parents, children, associations, language equivalents and crosswalk schema equivalents </li></ul></ul><ul><ul><li>Profile templates to save user- and content-based indexing preferences </li></ul></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide
  8. 8. Human indexing – Synaptica’s “IMS” Toolbox TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide <ul><ul><li>Minimize switching between screens - integrate vocabulary search / browse with content interface </li></ul></ul><ul><ul><li>Filter specific metadata elements to restrict lookup to relevant vocabularies </li></ul></ul><ul><ul><li>Search-as-you-type access to controlled vocabularies </li></ul></ul><ul><ul><li>Tree-browse and drop-down pick-list alternatives to search </li></ul></ul><ul><ul><li>Redirect queries at any time by exploring semantic relationships </li></ul></ul><ul><ul><li>Inline definitional and indexer notes </li></ul></ul><ul><ul><li>Self-correcting substitution of variants with their preferred terms </li></ul></ul><ul><ul><li>Optional pre-population of possible target terms based on text matches </li></ul></ul><ul><ul><li>In-line submission of candidate terms where no appropriate term identified </li></ul></ul><ul><ul><li>Optional automatic expansion of tag-set to include variants, parents, children, associations, language equivalents and crosswalk schema equivalents </li></ul></ul><ul><ul><li>Profile templates to save user- and content-based indexing preferences </li></ul></ul>
  9. 9. Human indexing – IMS Workflow Detail TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide
  10. 10. Human indexing – profile set up screen shot TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide
  11. 11. Human indexing – examples TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide <ul><li>A national library could use IMS to human index digital images and multimedia assets against a set of authority files. </li></ul><ul><li>A professional services corporation could use IMS to human index mission-critical legal documents against a taxonomy of compliance terminology. </li></ul><ul><li>A multinational electronics company could use IMS to human index product data according to product lines and families, hardware assets and other product based keyword groups. </li></ul>
  12. 12. Human indexing – conclusions TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide <ul><li>Like everything else in life, if we can possibly pass the task on to machines, we’d like to </li></ul><ul><li>There are some situations where machines are the only solution and there are others where human indexing is required (non-machine-readable data sets, subtle/abstract concepts, mission-critical precision-recall requirements, etc.) </li></ul><ul><li>If human indexing is required there are tools that can help speed up the process and help attain indexing consistency </li></ul><ul><li>The Synaptica “wish list” represents those time-saving tools requested by our user base over the past ten years </li></ul>
  13. 13. <ul><li>Paula McCoy </li></ul><ul><li>Manager, Taxonomy Development </li></ul><ul><li>ProQuest </li></ul><ul><li>[email_address] </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide AUTOCATEGORIZATION A CASE STUDY USING SYNAPTICA
  14. 14. TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide <ul><li>Information aggregator & database producer, with content ranging from newspapers to academic/scholarly publications, in topics spanning business and management, STM (scientific, technical, medical), humanities, social science, general reference </li></ul><ul><li>Abstracts/indexes more than 6,000 periodicals and newspapers </li></ul><ul><li>Daily ingest of more than 60,000 new newspaper and newswire articles </li></ul><ul><li>Customer base: Public and academic libraries </li></ul><ul><li>End users: Academic and student researchers </li></ul>
  15. 15. <ul><li>The Mandate: </li></ul><ul><li>To promote discovery of all content relevant to the user’s search query </li></ul><ul><li>The Solution: </li></ul><ul><li>Index and abstract as much content as possible in order to maximize the number of “entry points” to an article. </li></ul><ul><ul><li>Indexing provided for different parts of an article: </li></ul></ul><ul><ul><ul><li>SUBJECTS </li></ul></ul></ul><ul><ul><ul><li>COMPANIES </li></ul></ul></ul><ul><ul><ul><li>PEOPLE </li></ul></ul></ul><ul><ul><ul><li>LOCATIONS </li></ul></ul></ul><ul><ul><li>Abstracts provided for all articles of minimum length </li></ul></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide ProQuest Search Interface
  16. 16. <ul><li>A Growing Challenge: </li></ul><ul><li>How to A&I hundred of thousands of new articles every day? </li></ul><ul><li>The Only Answer: </li></ul><ul><li>Autocategorization, or auto-indexing: </li></ul><ul><li>Machine-based application of index terms to a document or other object </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide ProQuest Search Interface
  17. 17. The Autocategorization Solution <ul><li>Basic Tenets of Autocategorization: </li></ul><ul><li>Must have a controlled vocabulary in place </li></ul><ul><li>Must have other controlled lists if you want to index companies, people, locations, etc. </li></ul><ul><li>Must have a way to manage your vocabularies </li></ul><ul><li>Must have a way to manage the results of the autocat—no automated indexing method is perfect </li></ul><ul><li>Autocat success rests upon the existence of a strong controlled vocabulary with a history of usage from which the automation software can learn. </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide
  18. 18. <ul><li>Implement Synaptica thesaurus management solution to manage 11,300+-term subject thesaurus and authority files for companies, people, and locations </li></ul><ul><li>Purchase Nstein Technologies’ T ext M ining E ngine solution to automate abstracting and indexing of subject and other terms </li></ul><ul><li>Train the TME to understand the usage of ProQuest thesaurus terms (3-month collaborative process) </li></ul><ul><li>Implement Nstein’s Knowledge Base Manager (TME Manager) to manage subject terms rules base </li></ul><ul><li>Synaptica Taxonomy Manager Nstein </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide The ProQuest Approach
  19. 19. Thesaurus and Autocat Management <ul><li>Synaptica Thesaurus Management: </li></ul><ul><li>New terms added, hierarchies revised, Scope Notes added/revised </li></ul><ul><li>Use For (non-preferred) terms added frequently to reflect variant usages in the indexed literature and provide additional cross-references </li></ul><ul><li>Nstein Autocat Management: </li></ul><ul><li>Nstein TME Manager tool used to manage indexing rules base for all thesaurus terms </li></ul><ul><li>Autocat rules supplement and complement the underlying concept training </li></ul><ul><li>Autocat rules can be added, deleted, revised </li></ul><ul><li>Autocat rules enable autocat indexing to keep up with changes in term usages so that new variants can be added and rules created based on current topics in the literature or in the news </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide
  20. 20. Synaptica-TME Interaction <ul><li>Thesaurus management informs 2 levels of indexing: manual and automated </li></ul><ul><li>The thesaurus as represented in Synaptica must display all cross-references (mainly Use refs) required by manual indexers </li></ul><ul><li>The thesaurus as represented in Nstein must contain rules reflecting those Use references </li></ul><ul><li>Term updates made in Synaptica are duplicated in Nstein via indexing rules </li></ul><ul><li>Use references in Synaptica point human indexers to the right term </li></ul><ul><li>Use references in Nstein rules base point the automated indexer to the right term </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide
  21. 21. Synaptica & Autocat: Benefits <ul><li>A semantic-based autocat solution indexes as well as it’s been trained  that training is most successful if based on years of manual indexing using a controlled subject vocabulary  combined with a rules base, autocat can produce intelligent and informed indexing </li></ul><ul><li>Reviewing the results of good autocat leads to comparison with ongoing manual indexing  questions about term usages rise to the surface  human indexing can improve by becoming more flexible and adaptable to changes in terminology  revised term usages are reflected in Synaptica </li></ul><ul><li>Human indexers raise issues of new term variants and need for new terms  Synaptica is updated  the rules base is updated to allow autocat to capture terms better </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide
  22. 22. Benefits for Synaptica Thesaurus Control <ul><li>Day-to-day review of automated indexing highlights correct and incorrect term usages, leading to greater discipline in Synaptica thesaurus management to ensure human indexers remain aware of terms and their proper usage. </li></ul><ul><li>The need for precision in subject terms means terms must be exact and descriptive—automated indexing will not work with vague, ambiguous terms or one-word terms with multiple meanings, like “Apologies,” “Affect,” “Articulation.” The result is a more robust and controlled subject vocabulary. </li></ul><ul><li>Automated indexing will use terms in the thesaurus that human indexers may have forgotten about—leading again to revised hierarchies in Synaptica, new Scope Notes, and instant feedback to indexers. </li></ul>TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Proquest, Inc., 2009 www.proquest.com 12/09/09 Slide
  23. 23. TBC; Taxonomies: Tools or People? By Dave Clarke & Paula McCoy Copyright © Synaptica, LLC, 2009 www.synapticasoftware.com 12/09/09 Slide [email_address] [email_address] Questions?

×