Text, Content, and Social Analytics: BI for the New World

3,136 views

Published on

Presentation by Seth Grimes to the TDWI Washington DC chapter, July 15, 2011

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • I like the idea of semantics uniting all sorts of data. I would think that might be an important consideration that all content is analyzed and filtered using the same methodology. So, whether it's a tweet or a set of customer service chats the same modeling approach is used.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,136
On SlideShare
0
From Embeds
0
Number of Embeds
51
Actions
Shares
0
Downloads
90
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Text, Content, and Social Analytics: BI for the New World

  1. 1. Text, Content, and Social Analytics: BI for the New World<br />Seth Grimes<br />Alta Plana Corporation<br />@sethgrimes<br />TDWI – Washington DC<br />July 15, 2011<br />
  2. 2. Table of Content:<br />Principles.<br />Perspectives.<br />Semantics.<br />Text/content analytics.<br />Social.<br />BI for the New World.<br />
  3. 3. Imperatives for the 2010s:<br />Do more with more.<br />“It’s Not Information Overload. It’s Filter Failure”: Clay Shirky, 2008.<br /><ul><li>More sources & types of data.
  4. 4. Greater data volumes.
  5. 5. New hardware and methods.</li></ul>Automate more, more intelligently.<br /><ul><li>Analytics.
  6. 6. Semantics.</li></ul>Engage. Socialize.<br />
  7. 7. I see three categories of data:<br />Quantities, whether measured, observed, or computed.<br />Content, which I’ll characterize as non-quantitative information.<br />Metadata (semantic & structural) describing quantities and content.<br /><ul><li>Our concern is content, analytics & fusion.
  8. 8. Structured/unstructured is a false dichotomy.
  9. 9. Where do relationships fit?</li></li></ul><li>DW & BI relate numbers...<br />...but by-the-numbers BI lacks doesn’t explain.<br />
  10. 10. Questions for business (& government):<br />What are people saying? What’s hot/trending?<br />What are they saying about {topic|person|product} X?<br />... about X versus {topic|person|product} Y?<br />How has opinion about X and Y evolved?<br />How has opinion correlated with {our|competitors’|general} {news|marketing|sales|events}?<br />What’s behind opinion, the root causes?<br />Who are opinion leaders?<br />How does sentiment propagate across multiple channels?<br />
  11. 11. The answers are here...<br />But how do you get at them? <br />
  12. 12. “In this example, you can quickly see that the Drooling Dog Bar B Q has gotten lots of positive reviews, and if you want to see what other people have said about the restaurant, clicking this result is a good choice.”<br />-- http://googleblog.blogspot.com/2009/05/more-search-options-and-other-updates.html<br />“In the recap of [Searchology] from Google’s Matt Cutts, he tells us that: ‘If you sort by reviews, Google will perform sentiment analysis and highlight interesting comments.’<br />-- Bill Slawski, “Google's New Review Search Option and Sentiment Analysis,” http://www.seobythesea.com/?p=1488<br />
  13. 13. Text Analytics!<br />More generally...<br />
  14. 14. Analytics is a collection of tools and techniques that extract insights from data.<br />Apply or embed analytics within business contexts – collect data and information about customers, markets, suppliers, and business processes – use results to inform, drive, and optimize business decision making – and you harness analytics as a core BI asset.<br />
  15. 15. Analytics seeks structure in “unstructured” sources.<br />x(t) = t <br />y(t) = ½ a (et/a + e-t/a)<br /> = acosh(t/a)<br />http://www.tropicalisland.de/NYC_New_York_Brooklyn_Bridge_from_World_Trade_Center_b.jpg<br />http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg<br />
  16. 16. Text analytics models text.<br />“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences.”<br />-- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.<br />http://wordle.net<br />
  17. 17. Document input and processing<br />Knowledge handling is key<br />Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.<br />Hans Peter Luhn<br />“A Business Intelligence System”<br />IBM Journal, October 1958<br />
  18. 18. “This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.” <br />-- H.P. Luhn<br />
  19. 19. My 2009 text-analytics market survey asked, [What information] do you need (or expect to need) to extract or analyze:<br />Text Analytics 2009: User Perspectives on Solutions and Providers<br />
  20. 20.
  21. 21. From document to DB; an IBM example: <br />“The standard features are stored in the STANDARD_KW table, keywords with their occurrences in the KEYWORD_KW_OCC table, and the text list features in the TEXTLIST_TEXT table. Every feature table contains the DOC_ID as a reference to the DOCUMENT table.”<br />
  22. 22. Welcome to the New World.<br />The Far Side <br />by Gary Larson<br />Ken Jennings, IBM Watson, and Brad Rutter play Jeopardy!<br />https://secure.wikimedia.org/wikipedia/en/wiki/File:Watson_Jeopardy.jpg<br />
  23. 23. In a sense, text analytics, by generating semantics, bridges search and BI to turn Information Retrieval into Information Access.<br />Information Access<br />Search<br />BI<br />Text Analytics<br />Integrated analytics<br />Semantic search<br />
  24. 24. Have we arrived?<br />2001: A Space Odyssey, Stanley Kubrick<br />
  25. 25. En route.<br />http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm<br />
  26. 26. Intelligent computing involves:<br />Big (and little) Data.<br /><ul><li>Quantities.
  27. 27. Content.
  28. 28. Metadata.</li></ul>Analytics.<br />Semantics.<br />Integration.<br />Inference<br />
  29. 29. Semantics enables better content production, management & use. <br />Semantics captures –<br />Meaning<br />Relationships<br />Context <br />Understanding<br />–the sense of “unstructured” online, social, and enterprise information, for content consumers and publishers.<br />Semantics unites data of all types.<br />
  30. 30. Content, composites, connections.<br />
  31. 31. Content, composites, connections, 2.<br />
  32. 32. Content, composites, connections, 3.<br />
  33. 33. From connections to influence: What’s wrong with these pictures? (Radian6, Sysomos, Klout) <br />
  34. 34. Social analytics:<br />Use social data in analyses (alongside enterprise & online information).<br /><ul><li>Content.
  35. 35. Connections.</li></ul>Bring BI to social analyses.<br />3rd & 4th senses of social analytics:<br />Adopt agile, collaborative methods.<br />Share your data.<br />A challenge: Enterprise-social-online data integration.<br />
  36. 36. Text, Content, and Social Analytics: BI for the New World<br />Seth Grimes<br />Alta Plana Corporation<br />@sethgrimes<br />TDWI – Washington DC<br />July 15, 2011<br />

×