Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What Business Innovators Need to Know about Content Analytics


Published on

Presentation by Jeff Fried, CTO of BA-Insight, at Smart Content: The Content Analytics Conference, October 19, 2010,

Published in: Business, Technology, Education
  • Be the first to comment

  • Be the first to like this

What Business Innovators Need to Know about Content Analytics

  1. 1. Smart Content What Business Innovators Need to Knowabout Content Analytics Jeff Fried CTO,BA-Insight
  2. 2. Examples from: Opinions from:
  3. 3. Three Views of Content Analytics Business Strategist End User Research Scientist It’s about money, business models, advertising, and money. It’s about finding things, having fun, and getting stuff done. It’s about fast algorithms, massive scales, and machine learning.
  4. 4. Content is Exploding “If you think the information doesn’t exist you’re not looking hard enough”
  5. 5. 6
  6. 6. Traffic, Ads and Information Mash-Ups becoming a part of emerging ecosystems cloud platforms content platforms ad platforms services
  7. 7. Rethinking the Data Warehouse
  8. 8. Use of Unstructured Data in Information Analysis Applications Analyst: Mark Beyer
  9. 9. 10
  10. 10. Smart Content is Streaming
  11. 11. Different multimedia applications Education Entertainment Archives
  12. 12. 13 Social Video Sharing System Users create, produce, upload, manage and share video within one system
  13. 13. Music Image Face Query by example:
  14. 14. Smart Content is Mobile Location and Form factors
  15. 15. Bing Twitter Maps
  16. 16. Smart Content is Social Many layers of social media
  17. 17. //twitterviz
  18. 18. Publication Platforms PublicCommunityPrivate Publication PrivateCommunityPublic Access Facebook Email Answers Web Twitter
  19. 19. Social Graph Naturally connected community Spam marketing campaign Spammy communities are highly visible – don’t be part of one!
  20. 20. Permission
  21. 21. Social Search Needs • Relevance – Filtering the document web • Social Media Content – Filtering the social web • Trends / Group Insight – Tapping Community Knowledge • Answers – Trusted Advisor Recommendation • “Java” (coffee, island, or language?) • “compliance” • “What should I do in New York?” • Where are my friends now? • Why did power go out in Palo Alto? • How does adoption work? • ( on FB update) anybody give their babies baby Benedryl for travel/jet lag? Want to hear from parents whether they have or not and how it went
  22. 22. Enables 1:1 relevance based on user profile Complexity Value 3. Social Recommendations (users to users) 1. Content or “Related item” Recommendations (items to item) 2. Personalized Recommendations (items to user) Enables connections between like users Drives service stickiness Enables users to ‘browse sideways’ from any item Recommendations “Personalized” to “Social”
  23. 23. Virtuous Cycles Create new patterns with positive reinforcement
  25. 25. The Long-Tail of Online Business 70 %30 % QUERY TRAFFIC +70% Y/Y
  26. 26. Virtuous Cycles in Findability Tuned experience Social behavior affects relevance Socially driven feedback loop People and expertise location are the key ‘lens’ Structure drives exploration Aligned with taxonomy and tags Refinement Social Relevance
  27. 27. Text Analytics Isn’t Perfect Realistic Expectations for Powerful Technology
  28. 28. Analytics! Semantics! Machine Learning!
  29. 29. 36
  30. 30. Grab-Bag of Related Technologies • Problem – linguistic variations in concept expression – Technology: natural language processing (NLP) • Problem – huge numbers of documents that are the same or versions of the same – Technologies : text mining, text analytics, normalizing & de-duping • Problem – amount of content exceeds amount of human expertise to analyze & categorize – Technologies : entity extraction, contextual analysis, auto- categorization • Problem – understanding trends and relative values expressed in content – Technology : sentiment analysis • Problem – retrieving & federating contextually related and relevant content – Technologies – All of the above
  31. 31. 38 10 Entire contents © 2006 Forrester Research, Inc. All rights reserved. BPM, Service Orchestration, Workflow Content, Search, Integration, & Composition technologies Presentation tier Middle tier Repositories Unstructured Information access Structured Data AccessDynamic Information Applications Visualization Portals, AJAX, Mash-ups, RSS, widgets, gadgets Enterprise Content Management ERP, CRM , PIM, PLM, SCM , HCM ProductivityApps(mail,IM,officetools) CollaborationTools EAI, EII, ESB Business Intelligence Databases ETL, Data Cleansing, Data Quality Identity MDM, Data Warehouses File systems File filters Connectors Taxonomy Text Mining Desktop Search Federated Search Video/audio Enterprise Search
  32. 32. 39 Linguistics, Statistics, & Gymnastics Lexicon Base Language-specific Common Words Inflection Dictionaries Part-of-speech Dictionaries Synonymy Dictionaries Subject-specific ontologies Spellcheck dictionaries Geographical and people’s names Special terminology lexica Basic Linguistic Algorithms Pattern extraction Stemming / Lemmatization Part-of-speech Tagging Language normalization Vectorization Applications Data Cleansing Categori- zation Entity Extraction Suggest Synonyms Find similar Stop word elimination Spell checking Machine Translation Relationship Extraction
  33. 33. From Entity Extraction Acronym Person Location End of sentence End of paragraph Date Base = 2002-03-XX
  34. 34. To Fact Extraction.... Substance Base=„Gold“ Class=„Element“ Number=79 Symbol=Au Location Base=„Qilian“ Country=„China“ Region=„Asia“ Subregion=„East“ „The Red Valley property lies within the Qilian fold belt which is host to gold deposits.“ Qilian is location of gold Extracted Fact: Substances x Locations Substance Base=„Gold“ Class=„Element“ Number=79 Symbol=Au Location=„Qilian“ Location Base=„Qilian“ Country=„China“ Region=„Asia“ Subregion=„East“ Substance=„Gold“ Indicates a gold location
  35. 35. Intelligent Answers from Text Internal/external text sources
  36. 36. LookingGlass
  37. 37. Semantics means what? Beware of overhype; seek pragmatic use of semantic tech
  38. 38. Solving the Knife problem Man Allegedly Attacked Wife With Knife A Tyler man is awaiting arraignment this afternoon after allegedly attacking his wife with a knife, said Tyler police. The 41-year-old man will face aggravated assault and aggravated robbery charges, said Don Martin, the department's spokesman. Officers took the man in custody near Garden Valley and Loop 323. He ran from his residence after "assaulting his wife with a knife and taking her purse at knifepoint," said information released by Martin. The woman refused medical treatment and did not appear to be seriously injured, the statement said. Excellent Knives!!! Mere frequency counting of key words can lead to undesired results... ...understanding relationships between words can reveal the true topic of the document. Objective: Automatically insert an advertisement that matches the content best.
  39. 39. Actor Director Movi e TV Show Adventure Comedy Face Image Actor 0 0.6 1 1 1 1 0.9 Director 0 1 1 1 1 0.3 Movie 0 0.6 1 1 -1 TVShow 0 1 1 -1 Adventure 0 0.14 -1 Comedy 0 -1 FaceImage 0
  40. 40. 48 Cyc Knowledge Base Thing Intangible Thing Individual Temporal Thing Spatial Thing Partially Tangible Thing Paths Sets Relations Logic Math Human Artifacts Social Relations, Culture Human Anatomy & Physiology Emotion Perception Belief Human Behavior & Actions Products Devices Conceptual Works Vehicles Buildings Weapons Mechanical & Electrical Devices Software Literature Works of Art Language Agent Organizations Organizational Actions Organizational Plans Types of Organizations Human Organizations Nations Governments Geo-Politics Business, Military Organizations Law Business & Commerce Politics Warfare Professions Occupations Purchasing Shopping Travel Communication Transportation & Logistics Social Activities Everyday Living Sports Recreation Entertainment Artifacts Movement State Change Dynamics Materials Parts Statics Physical Agents Borders Geometry Events Scripts Spatial Paths Actors Actions Plans Goals Time Agents Space Physical Objects Human Beings Organ- ization Human Activities Living Things Social Behavior Life Forms Animals Plants Ecology Natural Geography Earth & Solar System Political Geography Weather General Knowledge about Various Domains Cyc contains: 17,000 Predicates 400,000 Concepts 5,000,000 Assertions Represented in: • First Order Logic • Higher Order Logic • Modal Logic • Context Logic • Micro-theories Specific data, facts, and observations
  41. 41. Machine Learning Techniques Create Examples Model Trainer „Let the occurrence of the term ‚is host to‘ between a location and a substance increase the probability that this is a location x substance relation by 10%, because we have seen it more often in positive than in negative examples.“ Good enough ? Deploy yesno
  42. 42. Example: The Semantic Associative Search Method (MMM: The Mathematical Model of Meaning) A B C A B C || A || = || B || = || C || A B C || A || > || C || > || B || impression words (as a context): light, bright impression words (as a context): dark, black A,B,C: image data vectors semantic space: 2,000 dimensional space (presently) (retrieval candidate image data) 2 2 2 w w w || C || > || B || > || A ||w w w A: a sunny image B: a silent image C: a shady image semantic subspace semantic projection semantic projection USP: 6,138,116Yasushi Kiyoki, 2009
  43. 43. Context Matters Information Overload Relevancy Overload What’s important to me right now
  44. 44. Audience-specific search experiences User context Inform- ation context Application context Social context Renee Lo Engineering Contoso Consulting ”What should I know about implementing ERP?” Alan Brewer Sales Manager Contoso Consulting ”What should I know about selling ERP consulting?” Username&Group Memberships Location Languages BusinessUnit Department Team TimeofDay PreferredSites SharePointAudiences Interests&CurrentProjects ContextofCurrentTask
  45. 45. 53
  46. 46. Time is Money
  47. 47. Data = Metadata Content = Connections
  48. 48. 57 Image by Richard Cyganiak and Anja Jentzsch
  49. 49. Smart Content needs Gardeners
  50. 50. From Documents to Knowledge Value Document Search Finds documents containing terms Relationship Extraction Finds relationships within documents Assertion Clustering Finds assertions and the evidence for them Profiling Summarizes different kinds of information Join Creates indirect correlations and connections
  51. 51. Knowledge Management Framework SocialIndividual History Event (transaction) Mapping Content management Standardization Findability Common ground (practices, values, belief) Typologies Sense-makingCategory busting Discovery Coordination Based on Organizing Knowledge: Taxonomies, Knowledge and Organizational Effectiveness, Patrick Lambe (not exact reproduction) Culture Collaboration Expertise and learning Information Communities of Practice
  52. 52. Summary Content Analytics involves • Gardeners • Context • Virtuous Cycles • Lots of cool, imperfect technology Smart Content is • Social • Mobile • Streaming • Exploding
  53. 53. 64 Q&A