Lexalytics Text Analytics Workshop: Perfect Text Analytics

5,858 views

Published on

Presentation by Seth Redmore, VP Product Management at the Text Analytics Summit 2010

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,858
On SlideShare
0
From Embeds
0
Number of Embeds
385
Actions
Shares
0
Downloads
105
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Lexalytics Text Analytics Workshop: Perfect Text Analytics

  1. 1. Perfect Text Analytics <br />Seth Redmore<br />VP, Product Management<br />
  2. 2. Perfect<br />per·fect<br />    [adj., n. pur-fikt; v. per-fekt]<br />1. conforming absolutely to the description or definition of an ideal type: a perfect sphere; a perfect gentleman.<br />2. excellent or complete beyond practical or theoretical improvement: There is no perfect legal code. The proportions of this temple are almost perfect.<br />2<br />All right reserved © 2010 Lexalytics Inc.<br />
  3. 3. Text Analytics<br />The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia)<br />In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion.<br />3<br />All right reserved © 2010 Lexalytics Inc.<br />
  4. 4. Perfect is Fast<br />Average Human Reading Speed: 250wpm<br />Conservative computer reading speed: 6000 wpm/core (our speed on a moderate single core)<br />Each core is equivalent to the reading bandwidth of 12 people.<br />Modern machines have 8 cores. <br />That’s just about 100 people in a box. <br />Nice.<br />4<br />All right reserved © 2010 Lexalytics Inc.<br />
  5. 5. Perfect is Useable<br />“I don’t like the results” is not the same as “the results are incorrect”<br />Understanding the behavior key to usefulness<br />Can you make better decisions?<br />Can you make more money or save money?<br />What is the most controversial area of text analytics?<br />Thompson Reuters trading w/Sentiment Analysis increased Alpha (profit over market) by 80 basis points<br />5<br />All right reserved © 2010 Lexalytics Inc.<br />
  6. 6. Useable: How much can you differ?<br />“In my shop, that up until now has relied exclusively on human coding, we consider anything below 90% to be unacceptably inaccurate…. There is no doubt that automated sentiment is getting much much better, but to suggest that people should be okay with 20% of their data being wrong is just absurd.” Katie Delahaye Payne<br />Why is 10% “wrong” so much less absurd than 20% “wrong”?<br />20% Error<br />10% Error<br />6<br />All right reserved © 2010 Lexalytics Inc.<br />
  7. 7. Perfect is Consistent<br />Same results for same content, every time<br />University of Pittsburgh “Multi-Perspective Question Answering” Corpus: 535 documents, 11k+ sentences. <br />40 hours of training for each rater<br />~80% inter-rater agreement<br />7<br />All right reserved © 2010 Lexalytics Inc.<br />
  8. 8. Perfect is (new) Knowledge<br />Discover the stuff you don’t know<br />Text Analytics is really, really great at telling you the who, the what, and the where. Sometimes the “how”<br />You have to supply the “why” – but that question is way easier to answer when you know the other “w’s and the h”<br />8<br />All right reserved © 2010 Lexalytics Inc.<br />
  9. 9. Perfect Includes Everything<br />Running our top of the line software flat out across one year will cost you about $.002/document analyzed (news article sized content) (assuming 3 docs/core-second, 8 core machine)<br />The more data the better and the greater worth your ta has<br />9<br />All right reserved © 2010 Lexalytics Inc.<br />
  10. 10. Perfect is Trainable<br />Can you solve YOUR business problem with it?<br />Can you optimize to suit different kinds of content and roll those results up into a single reporting system?<br />10<br />All right reserved © 2010 Lexalytics Inc.<br />
  11. 11. Perfect Text Analytics<br />11<br />All right reserved © 2010 Lexalytics Inc.<br />Fast<br />Useable<br />Consistent<br />Knowledge<br />(that is)<br />Inclusive<br />Trainable<br />
  12. 12. Customer Snapshots<br />(or, “rubber, meet road”)<br />
  13. 13. Reputation Management<br />13<br />All right reserved © 2010 Lexalytics Inc.<br />
  14. 14. Politics<br />14<br />All right reserved © 2010 Lexalytics Inc.<br />
  15. 15. Market Intelligence<br />Client Employee<br />User <br />Authentication<br />Single <br />Sign-on<br />External Content Providers<br />SinglePoint<br />Client Company<br />User <br />Authentication<br />Web 2.0<br />Collaboration<br />Search Results<br />Secondary<br />Research<br />Suppliers<br />User <br />Authentication<br />MI Analyst <br />Text Analytics<br />Integrated<br /> Index<br />News<br />& Journals <br />NL Search Engine<br />FIREWALL<br />Internal<br />Document <br />Repository<br />Optional<br />Document <br />Repository<br />Financial <br />analyst <br />reports<br />Internal <br />research<br />Content <br />Processing<br />Custom Web <br />Crawls & Gov.<br />Databases<br />Trash<br />can<br />crawl, <br />FTP<br />or CD<br />15<br />All right reserved © 2010 Lexalytics Inc.<br />
  16. 16. Hospitality<br />16<br />All right reserved © 2010 Lexalytics Inc.<br />
  17. 17. Financial Services<br />Turns News into numbers for automatic trading systems<br /><ul><li>Company stocks + Commodities
  18. 18. Resilient server product</li></ul>All right reserved © 2010 Lexalytics Inc.<br />17<br />Algorithmic<br />Trading<br />(QED firm)<br />Financial data<br />Indicators<br />Buy/Sell<br />RNSE<br />Server<br />Indicators<br /><ul><li>Ultimate customers are financial institutions
  19. 19. QED (Quantitative and Event-Driven Trading) Banks, hedge funds.
  20. 20. JPMorgan, SocGen, Alpha Equities…and others</li></li></ul><li>ROI – Retrieving Organized Information<br />RTI CONSULTING SERVICES<br />REPEATABLE<br />EVOLVING<br />DESIGNS<br />BALANCED METHODOLOGY<br />Business Assessment<br />User Interviews<br />Taxonomy Design and Recommendation<br />Content Governance / Analysis<br />DEPLOYMENT / SUPPORT<br />Solution Alternatives<br />Integration & Deployment<br />Testing, Tuning, and Evaluation<br />THOUGHT LEADERSHIP<br />Strategy Consultation<br />Roadmaps – Evolution and Growth <br />PROF. TED SULLIVAN<br />
  21. 21. Pharma<br />19<br />All right reserved © 2010 Lexalytics Inc.<br />
  22. 22. The Next Year…<br />
  23. 23. Opinion Mining<br />Who said what about whom?<br />All right reserved © 2010 Lexalytics Inc.<br />21<br />
  24. 24. Sarcasm, Twitter<br />Model trained to detect sarcasm<br />Once detected, you can decide what to do with it – because actually determining the sentiment is going to be unreliable<br />New model trained on Twitter content<br />Moving towards a concept of text analytics driven by business logic<br />All right reserved © 2010 Lexalytics Inc.<br />22<br />
  25. 25. Thesaurus-based Theme Rollup<br />Machine generated conceptual taxonomy<br />Gas/Electric Hybrid and EV might roll up to EV<br />Fewer themes, but very useful to detect patterns across content<br />All right reserved © 2010 Lexalytics Inc.<br />23<br />
  26. 26. Foreign Language Support<br />French is first, followed by other Romance languages<br />New stemmer<br />New summarization algorithm<br />New part-of-speech tagger<br />Automatic language detection<br />New sentiment/entity extraction algorithms<br />Also applicable to vertical specific content<br />Confidence scoring by algorithm<br />Use business logic to meld the results<br />All right reserved © 2010 Lexalytics Inc.<br />24<br />
  27. 27. Trainable Entity Sentiment<br />New technique for entity sentiment<br />Initial results from testing in English extremely promising<br />Average human scoring overlap of >> 90% for scored sentences<br />Initially used only for French<br />25<br />All right reserved © 2010 Lexalytics Inc.<br />
  28. 28. Tool Enhancements<br />Eventually use on English content:<br />Twitter<br />Customer Satisfaction<br />Others…<br />Entity Management Toolkit <br />Part of Speech Tagset training<br />Using to train Salience on French<br />Sentiment Toolkit<br />Build your own entity sentiment models:<br />French (first)<br />New Sentiment Toolkit + Maximum Entropy model builder allows new Entity and Sentiment modules<br />New EMT helps us build a new French PoS tagger<br />Entity Extraction<br />& Sentiment Models<br />Fully <br />Tagged<br />Document<br />Doc<br />POS Tagger<br />26<br />All right reserved © 2010 Lexalytics Inc.<br />Themes<br />&<br />Summaries<br />
  29. 29. Business Logic + TA Algorithms<br />Content<br />Source<br />Search<br />Business Logic<br />Other TA System<br />Sarcasm<br />Route On<br />Sports<br />Finance<br />Unknown<br />$<br />?<br />A<br />B<br />C<br />D<br />Entity: <br />Cisco<br />27<br />All right reserved © 2010 Lexalytics Inc.<br />ProbabilityScores<br />Cisco : Positive<br />
  30. 30. Summary<br />Lots of people making money with text analytics<br />In lots of different verticals<br />Next 12 months brings online a whole host of features to make our software even more flexible<br />Check out tas.lexalytics.com<br />Check out www.lexalytics.com/lexascope<br />All right reserved © 2010 Lexalytics Inc.<br />28<br />

×