Your SlideShare is downloading. ×
0
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Using MAI™ to Filter News Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using MAI™ to Filter News Data

1,195

Published on

An overview of NewsIndexer, a data filtering and tagging solution using Access Innovations, Inc.'s Data Harmony software suite.

An overview of NewsIndexer, a data filtering and tagging solution using Access Innovations, Inc.'s Data Harmony software suite.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,195
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Using MAI toFilter News Data
  • 2. NewsIndexer –a case study in filtering Filters / categorizes / tags news content Manages massive information flow Based on Thesaurus Master and M.A.I.  Specialized thesaurus  Specialized rulebase
  • 3. NewsIndexer’s vocabulary Broad and general subject matter Reflects coverage of typical news publications Over 5200 terms, nine levels deep  Six top level categories  Geographic terms Starter vocabulary Easily adapted and customized
  • 4. NewsIndexer’s brain M.A.I. rulebase customized for news topics Words in text trigger M.A.I. rules Conditions in rules determine precise taxonomy term(s) to apply  Rules capture human knowledge and analysis  Rules use context to distinguish between homographs Chicago Bears Bear market Bears in the woods
  • 5. Why filter? Reduce noise to enhance retrieval precision Disambiguate homographs to increase accuracy Limit unnecessary detail to reduce data flow Direct data to targeted recipients
  • 6. Filter to cut noise M.A.I. suggests terms as directed by rules Index with most specific appropriate terms Result: precision and accuracy in retrieval
  • 7. Filter to disambiguate Common words used with very different meanings in different contexts  Utilities – electricity / water / sewer? utility software?  Architecture – of buildings? of computer systems? M.A.I. rule conditions differentiate concepts  Information Architect doesn’t want to retrieve building blueprints
  • 8. I want it ALL! Rulebase filters data, yields ALL terms that meet conditions of M.A.I. rules Editor can select, reject and add terms Most specific appropriate term – as chosen by editor – is saved with the document  Subject metadata  XML format
  • 9. Red Sox Crime Baseball Elections Pharmaceuticals Gun Health sciences control Medicine Law Antibiotics Major LeaguePenicillin Baseball Campaign finance Politics
  • 10. Taxonomy 2nd level 3rd level 4th level 5th levelTop Term Health conditions Health Medicine Pharma- Anti- sciences ceuticals biotics Penicillin Medical facilities
  • 11. Filter to limit detail Want all terms or a select few? Roll up terms to the first, second, or third level in your taxonomy Up-posting Good for automatic indexing Programmers can set filter to reduce detail
  • 12. Pharmaceuticals Health sciences Medicine AntibioticsPenicillin
  • 13. Pharmaceuticals AND Antibiotics AND Penicillin Health sciences Medicine AntibioticsPenicillin
  • 14. Taxonomy 2nd level 3rd, 4th, andTop Term 5th levels Health conditions Up-post Penicillin to Health Antibiotics third sciences level Medicine Pharma- ceuticals Narrower terms go in Medical Medicine facilities bucket
  • 15. No details –just the big picture Index comprehensively and retain details BUT Display only general terms for end userDisplayhigher Health scienceslevel term Medicine Pharmaceuticals Antibiotics Index with Penicillin most specific
  • 16. Health sciences AND Medicine Pharmaceuticals AND Pharmaceuticals AND Antibiotics AND Penicillin Medicine AntibioticsPenicillin
  • 17. Penicillin Up-postAntibiotics toPharma- topceuticals levelMedicine -- Narrower terms Health go in sciences Health sciences bucket
  • 18. Filter to direct data User expresses interest in general topics  e.g., Technology, Environment, Law Materials indexed with those topics or any or their Narrower Terms are forwarded Applications: User profiles Interest groups Specific departments
  • 19. Specialized filtering –NewsIndexer and IPTC International Press Telecommunications Council (IPTC) proposal for NewsCodes Part of News Industry Text Format (NITF) ~1300 terms describe topics of news articles Broad coverage (heavy on sports)NewsIndexer rulebase can apply detailedNewsIndexer terms and/or IPTC NewsCodes Comply with growing news standards Achieve greater detail for news indexing
  • 20. Thesaurus Master manages RESULT: RESULT: custom vocab ALL Higher level terms categories,News that reduced feed meet data stream M.A.I. -- for portal, M.A.I. adds rule targeted metadata conditions users, (vocab in TM) and other purposes Cut noise, Up-post to disambiguate limit returns
  • 21. Filtering advantages For the End User  Simpler, more manageable presentation of concepts  Consistent with typical user’s search strategy  Differentiated concepts associated with homographs  Targeted information according to user profile For the Internal User  Documents retain subject metadata reflecting granular indexing  Precision search gets precision results
  • 22. For more information and a live demo, visit www.newsindexer.com

×