ROI & Impact - Quantitative & Qualitative Measures for Taxonomies

1,588 views

Published on

A presentation by Dr. Jay Ven Eman, CEO of Access Innovations, Inc., on measuring the financial benefits of taxonomies. First presented at the 2009 Data Harmony Users Group meeting.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

ROI & Impact - Quantitative & Qualitative Measures for Taxonomies

  1. 1. ROI & Impact: Quantitative & Qualitative Measures for Taxonomies Wednesday, 11 February 2009 12:00 – 12:30 PM MST Presented by Jay Ven Eman, Ph.D., CEO Access Innovations, Inc. / Data Harmony 505.998.0800 / www.accessinn.com / www.dataharmony.com j_ven_eman@accessinn.com DHUG 2009
  2. 2. First, some questions  Do you know what a taxonomy is?  Does your boss’s boss know? Care?  What are YOU trying to accomplish?  What are your objectives?  What isn’t working? What is?  How badly?  How much?  Who? Where? Copyright © 2007 Access Innovations, Inc.
  3. 3. First, some questions - 2  Who are your searchers?  Internal? Intranet?  External? Web? Fee based (commercial)?  How many?  What do they do? How do they do it?  What are they seeking?  Why? Copyright © 2007 Access Innovations, Inc.
  4. 4. First, some questions - 3  Where are they looking?  How many searching environments?  Physical?  Internal resources?  External resources?  Search interfaces?  And so on… Copyright © 2007 Access Innovations, Inc.
  5. 5. Copyright © 2007 Access Innovations, Inc. “Meaning” starts with a knowledge organization system (KOS)  Uncontrolled list  Name authority file  Synonym set/ring  Controlled vocabulary  Taxonomy  Thesaurus Not complex - $ Highly complex - $$$$ LOTS OF OVERLAP! Topic MapOntology SKOS
  6. 6. The Pain of Search Copyright © 2007 Access Innovations, Inc. The Pain of Search Percent Number of Employees Search & Use Timel Per Week Time Searching Per Week Time Analysing Per Week Average Loaded Salary Annual Cost of Looking Search Time Reduction Difference Mission critical 1000 Hours Hours Hours $ Per Hour 10% High 10 100 14 8.4 5.6 200 8,736,000 7,862,400 873,600 Medium 80 800 12 7.2 4.8 150 44,928,000 40,435,200 4,492,800 Low 10 100 10 6 4 100 3,120,000 2,808,000 312,000 $56,784,000 $51,105,600 $5,678,400
  7. 7. ROI - Segments  Cost of taxonomy system  Indexing costs  Cost of getting system ready  Ongoing maintenance  Increased efficiency  Increased quality of retrieval  Cost of legacy system maintenance
  8. 8. Copyright © 2005 Access Innovations, Inc. Taxonomy construction Process Terms/hr # of terms Cost/hr Cost From scratch 4 5000 $75 $93,750 License 0 - 100K License & customize 6 5000 75 62,500+ 5,000 Auto- generate/cleanup + tool 6 5000 75 62,500+ 100,000 Mapping 8 5000 75 46,875
  9. 9. Indexing & Search Metrics  Hit, Miss, Noise  Subjective  Relevance  Aboutness  Statistical  Precision  Recall  Level of effort
  10. 10. Hit, Miss, Noise  Hit – exactly what a human indexer would use  Miss – human indexer would use but system did not assign  Noise – system assigned but human did not  Relevant noise – could have been assigned  Irrelevant noise – just plain wrong
  11. 11. Subjective  Relevance  Reflects how akin it is to the users request  Aboutness  Reflects the topical match between the document content and the term  How well the topic describes what the document is about  Varies with level of conceptual terms vs. factual terms in the thesaurus
  12. 12. Subjective  “There is now a 92% accuracy rating accuracy on accounting and regulatory document search based on hit, miss and noise or relevance, precision and recall statistics…Access Innovations.” USGAO  “IEEE had their system up and running in three days, in full production in less than two weeks.” Institute of Electrical and Electronics Engineers (IEEE)  “The American Economic Association said its editors think using it is fun and makes time fly!” American Economic Association (AEA)  “ ProQuest CSA have achieved a 7 fold increase in productivity – thus they have four licenses.” ProQuest CSA  “Weather Channel finds things 50% faster using Data Harmony. A significant saving in time.” The Weather Channel
  13. 13. Statistical  Precision  Correct retrieval / Total retrieval  Hits / hits + noise  Recall  Correct retrieval / Total correct in system  Hits / Hits + misses  Level of effort  Hits / Hits + misses + noise
  14. 14. Cost Goals  Cost Savings  Software/hardware  More efficient delivery systems  Retirement of legacy systems  Cost Avoidance  Additional staff not needed to scale  Lower training costs
  15. 15. Productivity Goals  Productivity gains  Employee productivity – fourfold  Get up to speed faster  Learn vocabulary faster  Able to capture peoples knowledge in the rule base  Staff savings / redeployment  Elimination of new hires
  16. 16. Additional Benefits  Revenue Generation  Higher hit rates  More purchases off the site  Competitive advantage  Shorter product / sales cycles  Faster implementation  Better search experience  Ability to meet regulatory requirements
  17. 17. Go – No Go  Reach 85% precision to launch for productivity - assisted  Reach 85% for filtering or categorization  Sorting for production  Level of effort to get to 85%  Integration into the workflow is efficient
  18. 18. Benchmarks  15 – 20% irrelevant returns / noise  Amount of work needed to achieve 85% level  How good is good enough?  Satisfice = satisfaction + suffice  How much error can you put up with?
  19. 19. Example ROI Calculation  Assume – 5,000 term thesaurus  1.5 synonyms per terms  7,500 terms total  Assume 85% accuracy  Use assisted for indexing  Use automatically for filtering  Assume $75 per hour for staff  Assume 10,000 records for test batch
  20. 20. Indexing costs with Data Harmony  80% of rules built automatically  7,500 x .8 = 6,000  20% require complex rules  Average rule takes 5 minutes  (Actually MUCH faster using M.A.I. GUI)  5 x 1,500 = 7,500 minutes  125 hours x $75 = $9,375
  21. 21. Indexing Costs  Base cost of MAIstro EE - $60,000  Cost of getting system ready  Programming support and integration  Estimated at 2 weeks programming $125 / hour = $10,000  Rule building  Estimated at 125 hours $75 / hour = $9,375  Possible need to re-run training set several times  Ongoing maintenance  Estimated at 15% of purchase price for license = $9,000  Rule building for new terms 50 terms per quarter  200 terms x .8 = 160 automatic  40 at 5 minutes per term = 200 minutes /60 = 3.33 hours x $75 = $250  Targeted initial accuracy at 85%
  22. 22. Indexing costs  Year one  $60,000 + $10,000 + $9,375 = $79,375  Years thereafter  9000 + 250 = $9250  85% accuracy
  23. 23. ROI  Taxonomy costs = $67,500  Indexing costs = $79,375  Pain of search – difference = $5,678,400  If off by factor of 4, then a positive ROI of 241% Copyright © 2007 Access Innovations, Inc.
  24. 24. ROI & Impact: Quantitative & Qualitative Measures for Taxonomies Wednesday, 11 February 2009 12:00 – 12:30 PM MST Presented by Jay Ven Eman, Ph.D., CEO Access Innovations, Inc. / Data Harmony 505.998.0800 / www.accessinn.com / www.dataharmony.com j_ven_eman@accessinn.com Thank you!

×