Your SlideShare is downloading. ×
Social media mining   hicss 46 part 2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Social media mining hicss 46 part 2

567
views

Published on

HICSS 46 Tutorial on Social Media Mining and Analysis - Part 2 -- Delivered on 1/7/13

HICSS 46 Tutorial on Social Media Mining and Analysis - Part 2 -- Delivered on 1/7/13

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
567
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
42
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mining and Analyzing Social Media: Part 2 Dave King January 7, 2013
  • 2. Agenda: Part 2 • Sentiment Analysis & Opinion Mining • Defined • Business Interest & Software Packages • Levels of Analysis • Automated Classification • Social Network Analysis • Defined • History • Basic techniques and measures • Ego and Social-Centric Analysis 2 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 3. Sentiment Analysis and Opinion Mining:Interchangeable TermsComputational study of opinions, sentiments,subjectivity, evaluations, attitudes, appraisals,affects, views, emotions, etc., expressed intext. (Lui, 2012) 3 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 4. Sentiment Analysis and Opinion Mining:Business Interests Service Products Marketing Response Issues and Focus Message Company 4 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 5. Opinion Mining and Sentiment Analysis: Some Sample Questions of Interest• Is the sentiment towards my X primarily positive, neutral, or negative? How does it compare to my key competitors? Has it changed overtime?• What factors are positively and negatively influencing my X’s image?• Are there opportunities and needs my customers are identifying for me through their conversations? 5 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 6. Opinion Mining and Sentiment Analysis:An Offshoot of Social CRM Social CRM • Social Media services, techniques and technology for engaging customers • Sometimes synonymous with Social Media Monitoring • Gartner’s Magic Quadrant has a min of $10M in rev. 6 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 7. Opinion Mining and Sentiment Analysis:An Offshoot of Social CRM 7 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 8. Opinion Mining and Sentiment Analysis:An Offshoot of Social CRM 8 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 9. Sentiment Analysis and Opinion Mining:Commercial Products – General Operation 9 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 10. Sentiment Analysis and Opinion Mining:Some Examples – What do you see? 10 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 11. Sentiment Analysis and Opinion Mining: What do you see?• Opinion holders: persons who hold the opinions• Opinion targets: entities and their features/aspects• Sentiments: positive and negative• Time: when opinions are expressed 11 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 12. Sentiment Analysis and Opinion Mining:Opinion DefinedSimply a positive or negative sentiment, view, attitude,emotion, or appraisal about an entity or an aspect of the entityfrom an opinion holder at a particular point in time. • Opinion is a quintuple (ej, ajk, soijkl, hi, tl) • Sentiment orientation (“so”): +, -, or possibly neutral 12 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 13. Sentiment Analysis and Opinion Mining:Level of Analysis• Document Level: +, -, or 0*• Sentence Level: +, -, or 0*• Entity and Feature/Aspect Level: +, -, or 0* 0* ~ possibly neutral 13 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 14. Sentiment Analysis and Opinion Mining:Example – Document Sentiment Classification • Basically a Text Classification Problem • Assumptions – Each document written by single person – About single entity – Goal: discover (_,_,so,_,_) 14 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 15. Sentiment Analysis and Opinion Mining: Example – Document Sentiment Classification Collection Doc-Term Automatedof Text Docs Matrix* Classification ??? + Small Set of Predetermined Sentiment Unsupervised - Categories Supervised 15 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 16. Sentiment Analysis and Opinion Mining:Example – Document Sentiment Classification Real-World Reviews with known sentiment Training Process Reviews with known Classification Document Consolidation Train Test Validate Establish the Corpus Classification Corpus Refinement Algorithm?? (Token, Stem, Stop…) Feature Selection & Weighting - + Doc-Term Matrix 16 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 17. Sentiment Analysis and Opinion Mining:Example – Document Sentiment Classification Supervised Classification Algorithms • Naïve Bayes • Support Vector Machine • Decision Trees • Nearest Neighbor (k-NN) • Neural Nets (e.g. SOM) • … 17 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 18. Sentiment Analysis:Doing Simple Sentiment Analysis P(H/D) = P(D/H) * P(H)/P(D) H is the hypothesis and D is the data P(H) is the prior probability of H: the probability that H is correct before the data D are seen P(D/H) is the conditional probability of seeing the data D given that the hypothesis H is true. This conditional probability is called the likelihood. Thomas Bayes P(D) is the marginal probability of D. P(H/D) is the posterior probability: the probability that the hypothesis is true, given the data and the previous state of belief about the hypothesis. Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 19. Sentiment Analysis: Doing Simple Sentiment Analysis Training SetP(Positive | Tweet)compared toP(Negative | Tweet) P(Pos | Word) = P(Pos) * P(W1/Pos) / P(M) P(Pos| fail) = P(Pos) * P(great/Pos) P(Pos | fail) = (2/5) * (1/2) = .2 P(Neg | Word) = P(N) * P(W1/N) / P(M) P(Neg | fail) = P(Neg) * P(great/Neg) P(Neg| fail) = (3/5)*(2/3) = .4 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 20. Sentiment Analysis: Doing Simple Sentiment Analysis Training SetP(Positive | Tweet)compared toP(Negative | Tweet) P(Pos | Words) = P(Pos) * P(W1/Pos) * P(W2/Pos) * ... P(Pos | poor & fail) = P(Pos) * P(poor/Pos) * P(fail/Pos) P(Pos | poor & fail) = .4 * 0 * .5 = 0 P(Neg | Words) = P(Neg) * P(W1/Neg) * P(W2/Neg) * ... P(Neg | poor & fail) = P(Neg) * P(poor/Neg) * P(fail/Neg) P(Neg | poor & fail) = .6 * .67 * .67 = .27 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 21. Sentiment Analysis:Doing Simple Sentiment Analysis Confusion MatrixHow do youknow if yourmodel works? Accuracy = (TP + TN)/N Recall = TP / (TP + FN)Depends on Precision = TP / (TP + FP)your Goal? Error = (FP + FN)/N F1 = 2*Recall*Precision/(Recall + Precision) Where N = TP+FP+FN+TN Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 22. Sentiment Analysis:Summary• From one type to the next (classification, features, comparisons), it becomes more complex to extract the information.• Once extracted, standard text mining techniques can be used to classify and compare the opinions• Simple techniques (like naïve Bayesian) often produce strong results (e.g. 80+% accuracy) 22 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 23. Sentiment Analysis:Comparing Techniques Baselines and Bigrams: Simple, Good Sentiment and Topic Classification Wang and Manning 23 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 24. What do they have in common? 24 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 25. Here’s a hint 25 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 26. …and the answer istheir Erdos-Bacon Number equals 3+3 4.65 5+1 = + 4+2 2.957 26 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 27. Suppose I started with this.What would you have guessed? 27 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 28. Six Degrees of Separation Frigyes Karinthy Stanley Milgram 1929 1967 6 1990 1998 John Guare Duncan Watts 28 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 29. Six Degrees of Separation A fascinating game grew out of this discussion. One of us suggested performing the following experiment to prove that the population of the Earth is closer together now than they have ever been before. We should select any person from the 1.5 billion inhabitants of the Earth—anyone, anywhere at all. He bet us that, using no more than five individuals, one of whom is a personal acquaintance, he could contact the selected individual using nothing except the network of personal acquaintances. Frigyes Karninthy , Chains, 1929 A 1 2 3 4 5 Degrees of separation ~ average path length ~ distance 29 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 30. Social Network AnalysisDefinitionsNetwork – Collection of things and theirrelationships to one another.Social Network – Collection of humans, roles,groups, and/or institutions and their relationshipswith one another.Social Network Analysis (SNA) – Application ofGraph Theory or Network Science to the study ofsocial relationships and connections. 30 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 31. Social Network AnalysisMain PurposeDetecting and interpreting patterns of socialties among actors. A pattern is meaningful ifit expresses: • Choices by social actors • Impact of the social system on actors’ behaviors and attitudes 31 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 32. Social Network Analysis:Brief Highlights 32 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 33. Social Network Analysis:Early Efforts 33 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 34. Social Network Analysis:Visualization/Analysis Libraries © 1973 34 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 35. Social Network Analysis:Growing Interest ―Ten years ago, the field of Social Network Analysis was a scientific backwater. We were the misfits, rejected from both mainstream sociology and mainstream computer science… The advent of the Social Internet changed everything.‖ 35 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 36. Social Network Analysis:Growing Interest 36 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 37. Social Network Analysis:Growing Interest … the availability of massive amounts of data in an online setting has given a new impetus towards a scientifically and statistically robust study of the field of social networks 37 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 38. Social Network Analysis:Growing Interest Dining Table Partners World Trade US Political Blogs N=26 N=80 N~1400 Russian LiveJournal Egyptian Revolution Mobile Phones N~3.5K N~90K N=20M Facebook Friends N = 721M Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL 38
  • 39. Social Network Analysis:Types of Structural Analysis • Social Influence Analysis • Expert Discovery • Node Classification • Link Prediction • Community, Subgroup & Clique Detection in Social Networks • Evolution in dynamic Social Networks • Statistical Analysis and Comparison – Small Worlds, Weak Ties, and Random Models • Visualization 39 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 40. Social Network Analysis:Introduction 40 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 41. Social Network Analysis: Key ElementsGraph or Network GraphThe set of [ V,E, f ]vertices/nodes, Aedges/links and therelationship/functionconnecting them. BVertices or Nodes Edge C (Link)The “things” D VertexEdges or Links (Node)The “relationships” 41 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 42. Social Network Analysis:Alternative Representations Graph Edge Adjacency List Matrix 42 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 43. Social Network Analysis:Types of Edges or Links Undirected, Directed, Unweighted UnweightedA B A Twitter B Facebook Friends Followers C C Undirected, Directed, Weighted Weighted 100A Facebook B A 60 B 5 Email 70 Friends Network 20 10 C C 43 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 44. Social Network Analysis:Alternative Representations Graph Edge Adjacency List Matrix 44 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 45. Social Network Analysis: Types of Networks & Approaches Ego-Centered Approach Socio-centered Approach ―Ego-Network‖ ―Whole‖ Network P4 P5 P6 Ego P1 Alters P4 P5 P2 P3 P1Vertex (Ego), Neighbors (Alters) & all lines among the Neighbors P2 P3 45 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 46. Social Network Analysis: Centrality – Who is key?Measure Definition Interpretation ReasoningDegree Number of edges or links. In How connected is a node? How Higher probability of receiving and transmitting degree- links in, Out-degree - links many people can this person reach information flows in the network. Nodes considered to out directly? have influence over larger number of nodes and or are capable of communicating quickly with the nodes in their neighborhood.Betweenness Number of times node or vertex How important is a node in terms Degree to which node controls flow of information in lies on shortest path between 2 of connecting other nodes? How the network. Those with high betweenness function as nodes divided by number of all the likely is this person to be the most brokers. Useful where a network is vulnerable. shortest paths direct route between two people in the network?Closeness 1 over the average distance How easily can a node reach other Measure of reach. Importance based on how close a between a node and every other nodes? How fast can this person node is located with respect to every other node in the node in the network reach everyone in the network? network. Nodes able to reach most or be reached by most all other nodes in the network through geodesic paths.Eigenvector Proporational to the sum of the How important, central, or Evaluates a players popularity. Identifies centers of eigenvector centralities of all the influential are a node’s neighbors? large cliques. Node with more connections to higher nodes directly connected to it. How well is this person connected scoring nodes is more important. to other well-connected people? 46 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 47. Social Network Analysis:Centrality – Who is most important? B E Eigen DA G FC Betw H CloseR I N P Deg J O K M L Q S 47 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 48. Social Network Analysis:Cohesion – Overall structure 48 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 49. Social Network Analysis:Cohesion – How well connected? B E DA G FC HR I N P J O K M L Q S 49 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 50. Social Network Analysis:Ego Centered – Simple Example http://apps.facebook.com/touchgraph/ 50 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 51. Social Network Analysis:Ego Centered – Another Example 51 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 52. Social Network Analysis:Ego-Centered – Simple Example 52 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 53. Social Network Analysis:Ego Analysis – Simple Example Netvizz7.0 .gdf (GUESS) file format 53 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 54. Social Network Analysis:Some Analytical Alternatives GUESS Gephi NodeXL NetViz Pajek SocVNet UCINet/NetDraw Visone Visual/Analytical Packages 54 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 55. Social Network Analysis:Some Analytical Alternatives• igraph (R, Python, C): Creating and manipulating graphs• libSNA (Python): Open-source library for social network analysis (2008 last update)• NetworkX (Python): Package for complex networks• SNA (R): Social Network Analysis tools Visual/Analytical Libraries/Modules 55 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 56. Social Network Analysis:Some Analytical Alternatives 56 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 57. Social Network Analysis:Ego Analysis – Simple Example 57 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 58. Social Network Analysis:Ego Analysis – Simple Example 58 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 59. Social Network Analysis: Ego Analysis – Simple ExampleN = 67 L = 235 59 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 60. Social Network Analysis:Ego Analysis – Simple Example Skewed Furthest Distance Power Law? Males=76% Strong Weak Within .11 Clusters Overall PL = N(N-1)/2 = 2211 Ego Density = L/PL = 235/2211 =.11 60 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 61. Social Network Analysis:Large Scale Networks The emergence of online social networking services over the past decade has revolutionized how social scientists study the structure of human … previously invisible social structures are being captured at tremendous scale and with unprecedented detail. Accessed within 28 days of May ’11 At least one friend Over 13 years of age 61 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 62. Social Network Analysis:Large Scale Networks 14% for 100 Assortativity P(F|F) = .52 P(F|M) = .51 62 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 63. Social Network Analysis:Is it a Small World after all? Average Average 4.7 4.3 World 92% 99.6% US 96% 99.7% 63 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 64. Social Network Analysis:Second Example • Single day snapshot of a Snowball Sample of Political Blogs (N=1490) • Manually assigned as Liberal or Conservative • Focus on Blogrolls and front page citations • Primary question: Cyber- balkanization? 64 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 65. Social Network Analysis:Balkanization (regular kind) Process of fragmentation or division of a region or state into smaller regions or states that are often hostile or non-cooperative with each other. 65 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 66. Social Network Analysis:Cyber-balkanization? Proliferation of specialized online news sources allows people with different political leanings to be exposed only to information in agreement with their previously held views. 66 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 67. Social Network Analysis:Cyber-balkanization? N=1490 Edges = 16715 N=758 N=732 Edges = 7301 Edges = 7839 Liberals Conservatives 67 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 68. Social Network Analysis:Political Blogs – Cyberbalkanization? 68 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 69. Social Networks:Political Blogs - Metrics 69 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 70. Social Network Analysis:Statistical Network Models A Statistical Network Model - Assumes that part of the structure of an observed network is random - Mathematical description of a collection of possible networks and a probability distribution on this set - Informs us about which network characteristics to expect if lines assigned to pairs of vertices at random 70 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 71. Social Network Analysis:Statistical Network Models - Classic Bernouilli - Conditional Uniform - Small-World - Preferential Attachment 71 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 72. Social Network Analysis:Political Blogs - Comparisons 72 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL