Your SlideShare is downloading. ×
0
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lexalytics Text Analytics Workshop: Perfect Text Analytics

4,181

Published on

Presentation by Seth Redmore, VP Product Management at the Text Analytics Summit 2010

Presentation by Seth Redmore, VP Product Management at the Text Analytics Summit 2010

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,181
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
98
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Perfect Text Analytics <br />Seth Redmore<br />VP, Product Management<br />
  • 2. Perfect<br />per·fect<br />    [adj., n. pur-fikt; v. per-fekt]<br />1. conforming absolutely to the description or definition of an ideal type: a perfect sphere; a perfect gentleman.<br />2. excellent or complete beyond practical or theoretical improvement: There is no perfect legal code. The proportions of this temple are almost perfect.<br />2<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 3. Text Analytics<br />The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia)<br />In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion.<br />3<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 4. Perfect is Fast<br />Average Human Reading Speed: 250wpm<br />Conservative computer reading speed: 6000 wpm/core (our speed on a moderate single core)<br />Each core is equivalent to the reading bandwidth of 12 people.<br />Modern machines have 8 cores. <br />That’s just about 100 people in a box. <br />Nice.<br />4<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 5. Perfect is Useable<br />“I don’t like the results” is not the same as “the results are incorrect”<br />Understanding the behavior key to usefulness<br />Can you make better decisions?<br />Can you make more money or save money?<br />What is the most controversial area of text analytics?<br />Thompson Reuters trading w/Sentiment Analysis increased Alpha (profit over market) by 80 basis points<br />5<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 6. Useable: How much can you differ?<br />“In my shop, that up until now has relied exclusively on human coding, we consider anything below 90% to be unacceptably inaccurate…. There is no doubt that automated sentiment is getting much much better, but to suggest that people should be okay with 20% of their data being wrong is just absurd.” Katie Delahaye Payne<br />Why is 10% “wrong” so much less absurd than 20% “wrong”?<br />20% Error<br />10% Error<br />6<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 7. Perfect is Consistent<br />Same results for same content, every time<br />University of Pittsburgh “Multi-Perspective Question Answering” Corpus: 535 documents, 11k+ sentences. <br />40 hours of training for each rater<br />~80% inter-rater agreement<br />7<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 8. Perfect is (new) Knowledge<br />Discover the stuff you don’t know<br />Text Analytics is really, really great at telling you the who, the what, and the where. Sometimes the “how”<br />You have to supply the “why” – but that question is way easier to answer when you know the other “w’s and the h”<br />8<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 9. Perfect Includes Everything<br />Running our top of the line software flat out across one year will cost you about $.002/document analyzed (news article sized content) (assuming 3 docs/core-second, 8 core machine)<br />The more data the better and the greater worth your ta has<br />9<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 10. Perfect is Trainable<br />Can you solve YOUR business problem with it?<br />Can you optimize to suit different kinds of content and roll those results up into a single reporting system?<br />10<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 11. Perfect Text Analytics<br />11<br />All right reserved © 2010 Lexalytics Inc.<br />Fast<br />Useable<br />Consistent<br />Knowledge<br />(that is)<br />Inclusive<br />Trainable<br />
  • 12. Customer Snapshots<br />(or, “rubber, meet road”)<br />
  • 13. Reputation Management<br />13<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 14. Politics<br />14<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 15. Market Intelligence<br />Client Employee<br />User <br />Authentication<br />Single <br />Sign-on<br />External Content Providers<br />SinglePoint<br />Client Company<br />User <br />Authentication<br />Web 2.0<br />Collaboration<br />Search Results<br />Secondary<br />Research<br />Suppliers<br />User <br />Authentication<br />MI Analyst <br />Text Analytics<br />Integrated<br /> Index<br />News<br />& Journals <br />NL Search Engine<br />FIREWALL<br />Internal<br />Document <br />Repository<br />Optional<br />Document <br />Repository<br />Financial <br />analyst <br />reports<br />Internal <br />research<br />Content <br />Processing<br />Custom Web <br />Crawls & Gov.<br />Databases<br />Trash<br />can<br />crawl, <br />FTP<br />or CD<br />15<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 16. Hospitality<br />16<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 17. Financial Services<br />Turns News into numbers for automatic trading systems<br /><ul><li>Company stocks + Commodities
  • 18. Resilient server product</li></ul>All right reserved © 2010 Lexalytics Inc.<br />17<br />Algorithmic<br />Trading<br />(QED firm)<br />Financial data<br />Indicators<br />Buy/Sell<br />RNSE<br />Server<br />Indicators<br /><ul><li>Ultimate customers are financial institutions
  • 19. QED (Quantitative and Event-Driven Trading) Banks, hedge funds.
  • 20. JPMorgan, SocGen, Alpha Equities…and others</li></li></ul><li>ROI – Retrieving Organized Information<br />RTI CONSULTING SERVICES<br />REPEATABLE<br />EVOLVING<br />DESIGNS<br />BALANCED METHODOLOGY<br />Business Assessment<br />User Interviews<br />Taxonomy Design and Recommendation<br />Content Governance / Analysis<br />DEPLOYMENT / SUPPORT<br />Solution Alternatives<br />Integration & Deployment<br />Testing, Tuning, and Evaluation<br />THOUGHT LEADERSHIP<br />Strategy Consultation<br />Roadmaps – Evolution and Growth <br />PROF. TED SULLIVAN<br />
  • 21. Pharma<br />19<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 22. The Next Year…<br />
  • 23. Opinion Mining<br />Who said what about whom?<br />All right reserved © 2010 Lexalytics Inc.<br />21<br />
  • 24. Sarcasm, Twitter<br />Model trained to detect sarcasm<br />Once detected, you can decide what to do with it – because actually determining the sentiment is going to be unreliable<br />New model trained on Twitter content<br />Moving towards a concept of text analytics driven by business logic<br />All right reserved © 2010 Lexalytics Inc.<br />22<br />
  • 25. Thesaurus-based Theme Rollup<br />Machine generated conceptual taxonomy<br />Gas/Electric Hybrid and EV might roll up to EV<br />Fewer themes, but very useful to detect patterns across content<br />All right reserved © 2010 Lexalytics Inc.<br />23<br />
  • 26. Foreign Language Support<br />French is first, followed by other Romance languages<br />New stemmer<br />New summarization algorithm<br />New part-of-speech tagger<br />Automatic language detection<br />New sentiment/entity extraction algorithms<br />Also applicable to vertical specific content<br />Confidence scoring by algorithm<br />Use business logic to meld the results<br />All right reserved © 2010 Lexalytics Inc.<br />24<br />
  • 27. Trainable Entity Sentiment<br />New technique for entity sentiment<br />Initial results from testing in English extremely promising<br />Average human scoring overlap of >> 90% for scored sentences<br />Initially used only for French<br />25<br />All right reserved © 2010 Lexalytics Inc.<br />
  • 28. Tool Enhancements<br />Eventually use on English content:<br />Twitter<br />Customer Satisfaction<br />Others…<br />Entity Management Toolkit <br />Part of Speech Tagset training<br />Using to train Salience on French<br />Sentiment Toolkit<br />Build your own entity sentiment models:<br />French (first)<br />New Sentiment Toolkit + Maximum Entropy model builder allows new Entity and Sentiment modules<br />New EMT helps us build a new French PoS tagger<br />Entity Extraction<br />& Sentiment Models<br />Fully <br />Tagged<br />Document<br />Doc<br />POS Tagger<br />26<br />All right reserved © 2010 Lexalytics Inc.<br />Themes<br />&<br />Summaries<br />
  • 29. Business Logic + TA Algorithms<br />Content<br />Source<br />Search<br />Business Logic<br />Other TA System<br />Sarcasm<br />Route On<br />Sports<br />Finance<br />Unknown<br />$<br />?<br />A<br />B<br />C<br />D<br />Entity: <br />Cisco<br />27<br />All right reserved © 2010 Lexalytics Inc.<br />ProbabilityScores<br />Cisco : Positive<br />
  • 30. Summary<br />Lots of people making money with text analytics<br />In lots of different verticals<br />Next 12 months brings online a whole host of features to make our software even more flexible<br />Check out tas.lexalytics.com<br />Check out www.lexalytics.com/lexascope<br />All right reserved © 2010 Lexalytics Inc.<br />28<br />

×