Using MAI™ to Filter News Data


Published on

An overview of NewsIndexer, a data filtering and tagging solution using Access Innovations, Inc.'s Data Harmony software suite.

Published in: Education, Technology
  • Be the first to comment

Using MAI™ to Filter News Data

  1. 1. Using MAI toFilter News Data
  2. 2. NewsIndexer –a case study in filtering Filters / categorizes / tags news content Manages massive information flow Based on Thesaurus Master and M.A.I.  Specialized thesaurus  Specialized rulebase
  3. 3. NewsIndexer’s vocabulary Broad and general subject matter Reflects coverage of typical news publications Over 5200 terms, nine levels deep  Six top level categories  Geographic terms Starter vocabulary Easily adapted and customized
  4. 4. NewsIndexer’s brain M.A.I. rulebase customized for news topics Words in text trigger M.A.I. rules Conditions in rules determine precise taxonomy term(s) to apply  Rules capture human knowledge and analysis  Rules use context to distinguish between homographs Chicago Bears Bear market Bears in the woods
  5. 5. Why filter? Reduce noise to enhance retrieval precision Disambiguate homographs to increase accuracy Limit unnecessary detail to reduce data flow Direct data to targeted recipients
  6. 6. Filter to cut noise M.A.I. suggests terms as directed by rules Index with most specific appropriate terms Result: precision and accuracy in retrieval
  7. 7. Filter to disambiguate Common words used with very different meanings in different contexts  Utilities – electricity / water / sewer? utility software?  Architecture – of buildings? of computer systems? M.A.I. rule conditions differentiate concepts  Information Architect doesn’t want to retrieve building blueprints
  8. 8. I want it ALL! Rulebase filters data, yields ALL terms that meet conditions of M.A.I. rules Editor can select, reject and add terms Most specific appropriate term – as chosen by editor – is saved with the document  Subject metadata  XML format
  9. 9. Red Sox Crime Baseball Elections Pharmaceuticals Gun Health sciences control Medicine Law Antibiotics Major LeaguePenicillin Baseball Campaign finance Politics
  10. 10. Taxonomy 2nd level 3rd level 4th level 5th levelTop Term Health conditions Health Medicine Pharma- Anti- sciences ceuticals biotics Penicillin Medical facilities
  11. 11. Filter to limit detail Want all terms or a select few? Roll up terms to the first, second, or third level in your taxonomy Up-posting Good for automatic indexing Programmers can set filter to reduce detail
  12. 12. Pharmaceuticals Health sciences Medicine AntibioticsPenicillin
  13. 13. Pharmaceuticals AND Antibiotics AND Penicillin Health sciences Medicine AntibioticsPenicillin
  14. 14. Taxonomy 2nd level 3rd, 4th, andTop Term 5th levels Health conditions Up-post Penicillin to Health Antibiotics third sciences level Medicine Pharma- ceuticals Narrower terms go in Medical Medicine facilities bucket
  15. 15. No details –just the big picture Index comprehensively and retain details BUT Display only general terms for end userDisplayhigher Health scienceslevel term Medicine Pharmaceuticals Antibiotics Index with Penicillin most specific
  16. 16. Health sciences AND Medicine Pharmaceuticals AND Pharmaceuticals AND Antibiotics AND Penicillin Medicine AntibioticsPenicillin
  17. 17. Penicillin Up-postAntibiotics toPharma- topceuticals levelMedicine -- Narrower terms Health go in sciences Health sciences bucket
  18. 18. Filter to direct data User expresses interest in general topics  e.g., Technology, Environment, Law Materials indexed with those topics or any or their Narrower Terms are forwarded Applications: User profiles Interest groups Specific departments
  19. 19. Specialized filtering –NewsIndexer and IPTC International Press Telecommunications Council (IPTC) proposal for NewsCodes Part of News Industry Text Format (NITF) ~1300 terms describe topics of news articles Broad coverage (heavy on sports)NewsIndexer rulebase can apply detailedNewsIndexer terms and/or IPTC NewsCodes Comply with growing news standards Achieve greater detail for news indexing
  20. 20. Thesaurus Master manages RESULT: RESULT: custom vocab ALL Higher level terms categories,News that reduced feed meet data stream M.A.I. -- for portal, M.A.I. adds rule targeted metadata conditions users, (vocab in TM) and other purposes Cut noise, Up-post to disambiguate limit returns
  21. 21. Filtering advantages For the End User  Simpler, more manageable presentation of concepts  Consistent with typical user’s search strategy  Differentiated concepts associated with homographs  Targeted information according to user profile For the Internal User  Documents retain subject metadata reflecting granular indexing  Precision search gets precision results
  22. 22. For more information and a live demo, visit