SlideShare a Scribd company logo
1 of 86
Mining and Analyzing
Social Media: Part 1
Dave King
HICSS 47 - January 2014
Agenda: Part 1
• Introduction
–
–
–
–

Bio
Some definitions
Growing Interest
Resources

• Social Homophily as an Example
– Meaning and Import
– Political Cyberbalkanization
• Social Network Analysis
• Text Mining

2
Agenda: Part 2

• Introduction to Social Network
Analysis Metrics
– Degrees of Separation
– Hooray for Bollywood

• Standard Measures
– Centrality
– Cohesion

• Levels of Social Network Analysis
– Facebook & Egocentric Analysis
– Searching for Cohesive Subgroups

3
Dave King
Biography

University
Professor

„70
App Stats
Math Modeling

Dir R&D
Execucom

„80

SVP & CTO
Comshare

„90

DSS
Optimization
App Stats
Nat Lang for
DB Query
DSS Interface
HICSS (83)

EVP R&D
JDA Software

„00
DSS
Multi-Dim DB
EIS & BI

„10

Merch Planning
SCP Planning
Optimization

ECom
HICSS
Track Intern
& Dig Econ

Sentiment
Analysis

HICSS
Tutorial
Web Mining &
Analysis

4
Social Media Defined

5
Social Media
View from the Pinwheel
Social Media
View from the Word Clouds

7
Defined

is the media we use to
be social. That’s it.
8
Mining and Analyzing Social Media
Two Approaches

Content

Connections
9
Social Media
Content
Emails

Tweets

SMS

Tags

IM Chat
Photos

Bookmarks
FAQs

Video

Discussion

Music

Presentations

Conversations

Ratings

Wiki Entries

Reviews

Lifestreams

Articles
Recommendations
Profiles

News Stories

RSS
Feeds

Location
Blogs
Social Media
Social Connections
Family

Friends

Groups

Followers

Senders

Contacts

Recipients

Authors

Participants

Readers

Answerers
Collaborators

Subscribers

Collaborators
11
Social Media
Connections

People

Objects

Places
Types of Mining and Analysis

13
Mining and Analyzing Social Media
The Content

Data Mining
Discovering meaningful patterns
from large data sets using
pattern recognition technologies

Web Mining
Data Mining focused on the
analysis of Web Usage, Structure
& Content

Text Mining
Using natural language processing
& data mining to discover patterns
in a collection of ―documents
Mining and Analyzing Social Media
The Connections

Social Network Analysis
[SNA]
The mapping and measuring
of relationships and flows
between people, groups,
organizations, computers,
URLs, and other connected
information/knowledge
entities.

Network science
Study of the theoretical
foundations of network
structure and behavior and
the application of networks
to many subfields including
SNA, collaboration networks,
emergent systems, and
physical and life science
systems

15
Interest
You are not alone or maybe you are?

16
Interest
Search - Google Trends
Web Mining
Text

Social Network Analysis

Web Mining

Network Science

Data Mining

Social Media Mining

Big Data

Sentiment Analysis

Data Analytics

Mathematical Statistics

17
Interest
Book Titles – Google Ngram Viewer
Web Mining

Social Network Analysis

Text Mining

Network Science

Data Mining

Social Media**

Big Data

Sentiment Analysis

Data Analytics

Mathematical Statistics

18
Resources
Customers who spent a lot of money
on these items also…
19
Web & Text Mining Resources
General Coverage and Programming

20
Text Mining & Social Network Analysis
Computational Models and Programming

21
Social Network Analysis
General Coverage of SNA
Social Network Analysis
Mathematical Models and SNA Tools

UCI

Pajek

UCI

UCI

ORA

UCI,Pajek
& ORA

NodeXL
Social Network Analysis
Online Tutorials and Classes
An Example
Social Homophily from Two Perspectives

25
Social Homophily
aka Assortative Mixing

Sound bites
Love of the same
Birds of a feather flock together
Likes associate with likes
Likes copy likes
Likes think alike

26
Social Homophily
Some things that are homogenous

27
Social Network Analysis
Key Elements
Vertices or Nodes
The “things”

Graph
[ V,E, f ]
A

Edges or Links
The “relationships”
Graph or Network
The set of
vertices/nodes,
edges/links and the
relationship/function
connecting them.

B
Edge
(Link)

C
Vertex
(Node)

D
Social Network Analysis
Types of Vertices
• Social Entities
– People or social structures such as
workgroups, teams, organizations,
institutions, states, or even countries.

• Content

A

– Web pages, keyword tags, or videos.

What does a
Vertex
represent?

• Locations

B

– Physical or virtual locations or events.

• Primary building blocks of social
media
– Friends in social networking sites,
posts or authors in blogs, or pages in
wikis.

C
D

29
Social Network Analysis
Types of Relations
• Similarities
– Location (same spatial and temporal space)
– Participation (same club, same event, …)
– Attributes (Age, gender, same attitudes, …)

A

• Relational Roles

What does
an Edge
represent?

– Kinship (mother of, sibling of, …)
– Other Roles (friend of, boss of, …)

B

• Relational Cognition
– Affective (Likes, Hates…)
– Perceptual (Knows, Knows of, …)

• Relational Events

C
D

– Interactions (Sold to, talked to, helped, …)
– Flows (Information, beliefs, money, …)
30
Social Network Analysis
Types of Edges or Links
Undirected,
Unweighted

A

Facebook
Friends

Directed,
Unweighted

B

A

B

Twitter
Followers

C
Undirected,
Weighted

A
Width of
lines

C
Directed, Weighted

Collaboration
Network

100

B

A
5
10

C

60
Email 70
Network

B
20

C
31
Social Network Analysis
Alternative Representations
Edge List

Graph

Adjacency Matrix

A
B

C
D

Adjacency List

XML

32
Social Network Analysis
Multimodal networks
Bipartite or Bimodal Network
Linking individuals to events
(participation, membership, topic, tag,
post, …)
Examples:
Authors‐to‐papers (they authored)
Actors‐to‐Movies (they appeared in)
Users‐to‐Movies (they rated)
―Folded‖ networks:
Author collaboration networks
Movie co‐rating networks

E1

E2

E3

E4

I1

I2

I3

I4

I1

2
2

I2

I3

I4
M * MT

E1
1

1

1

1

E2
1

2
E3

2

E4

MT * M
33
Social Network Analysis
One of the founding fathers

34
Social Homophily
Any guesses what this is and what it shows?

35
Social Homophily
Any guesses what this is and what it shows?

36
Social Homophily
Gender homophily in adolescent school group

Density or Distinction? The Roles of Data Structure and Group Detection Methods in Describing Adolescent Peer Groups
Gest, S. et al. Journal of Social Structure: Vol. 6, http://www.cmu.edu/joss/content/articles/volume8/GestMoody/
37
Social Networks
Role of Attributes
• Add insights to visualizations
and analysis
• May include:
– Demographic characteristics
(age, gender, race)
– Other characteristics (income,
education, location)
– Use of the system (number of
logins, blogs posted, edits
made)
– Network Metrics (centralization)

A
B

C

D

• Often designated by color,
shape, size and/or opacity of
the vertices

38
Social Homophily
Any guesses what this is and what it shows?

39
Social Homophily
The beauty of social networks

ted.com/talks/nicholas_christakis_the_hidden_influence_of_social_networks.html

40
Social Homophily
Regardless of cause the resulting pattern is

Clusters of connections
emerge throughout the social
space such that there is
dense intra-connectivity
within clusters
and sparse inter-connectivity
between clusters
41
Social Homophily
What produces a pattern of homophily?

Cause: Similarity breeds connection
Effect:
Connection breeds similarity
Spurious: Some other factor is @ work

42
Social Homophily
In the world of U.S. politics?

Discourse
Viewing
Voting
Purchasing
…

Antagonistic Polarization?
43
Social Homophily
Balkanization

Process of fragmentation or division of a region or state
into smaller regions or states that are often hostile or
non-cooperative with each other
44
Social Homophily
Balkanization of the US Congress (Data Sets)
senate.gov/
reference/
Index/Votes.htm

clerk.house.gov/
legislative/
legvotes.aspx

Before
Threshold
After
Threshold

govtrack.us/
developers
"Senate Voting Patterns," Morris, F.
analyticalvisions.blogspot.com/2006/04/senate-votingpatterns-part-2.html
45
Social Homophily
Balkanization of US Senate (1989-2013) - Votes

U.S. Senate More Divided
Than Ever Data Shows,
Swallow,E. ,Forbes, Oct. 17,
2013, forbes.com/sites/
ericaswallow/2013/11/17/sen
ate-voting-relationships-data/
Partisan voting patterns –
1989 to present
Interactive data visualization
static.davidchouinard.com/c
ongress/
46
Social Homophily
Balkanization of US Senate (1975-2012) - Votes

Portrait of Political Party Polarization, Moody J. and P. Mucha, April 2013, Network Science,
journals.cambridge.org/action/displayAbstract?fromPage=online&aid=8888768
47
Social Homophily
Are political books of “a feather” purchased together?
The Social Life of Books,Krebs V.,
orgnet.com/divided.html

Aug 2008

Oct 2008
Social Homophily
Cyberbalkanization of the Political Blogsphere
• Single day snapshot of a
Snowball Sample of
Political Blogs (N=1490)
• Manually assigned as
Liberal or Conservative
• Focus on blogrolls and
front page citations
• Are we witnessing an era
of ―Cyberbalkinization?‖

49
Social Homophily
An aside – what’s a (We)blog?
• Information or discussion site
– Focused on a particular subject area
– Discrete entries posted in reverse
chronological order
– Primarily text
– Trend toward multiple author entries

• Some stats?
– Wordpress ~ 60M blogs with 100M
pageviews/day
– Tumblr ~ 150M blogs with 90M posts/day
– Blogger ~ 46M unique visitors/month

50
Social Homophily
Snowball Sample
Political Blog Directories

eTalkingHead

BlogCatalog

Blogarama

CampaignLine

Individual
Political Blogrolls

51
Social Homophily
Social Network Packages

Pajek

UCI

UCI

UCI

UCI,Pajek
& ORA

ORA

NodeXL

52
Social Homophily
Social Network Analysis of Blogs with Pajek

53
Social Homophily
Social Network Analysis of Blogs with Pajek
.NET Network File
(Vertex descriptions followed
By Edge descriptions)

•
•
•

Colors and bold simply for illustration
Liberal Blogs (nos. 1-758)
Conservative Blogs (nos. 759 – 1490)

.CLU Partition File
(0-Lib; 1 – Cons
One entry per vertex)

*vertices 1490
1 100monkeystyping 0.0 0.0
2 12thharmonic 0.0 0.0
...
757 yoder 0.0 0.0
758 younglibs 0.0 0.0
...
759 84rules 0.0 0.0
760 a100wwe 0.0 0.0
...
1489 zeke01 0.0 0.0
1490 zeph1z 0.0 0.0
*edges
1 23 1.0
1 55 1.0
...
757 65 1.0
760 854 1.0
760 855 1.0
...
1489 1431 1.0
1490 802 1.0

*vertices 1490
0
0
0
…
0
1
1
1
…
1

54
Social Homophily
Social Network Analysis of Blogs with Pajek
F-R 2D Layout w/o labels
partitioned by pol. position

Circular Layout w. labels
Draw
Network

Fructerman-Reingold
2D Layout with labels

Draw
Partition

F-R 2D Layout w/o labels

55
Social Homophily
Cyberbalkanization of the Political Blogsphere?

Citation Network – link
implies that a political
blog has a URL on it’s
front page referencing
another blog

Proliferation of specialized online political news sources allows
people with different political leanings to be exposed only to
information in agreement with their previously held views?
56
Social Homophily
Cyberbalkanization of the Political Blogsphere?
N=1490
Edges = 16715

N=758
Edges = 7301

Liberals

N=732
Edges = 7839

Conservatives

57
Social Homophily
Cyberbalkanization of the Political Blogsphere
Blogrolls

Page Citation
90%
N = 1151
(82%)

N=758
Avg. Links=13.2

L

L

N=Top 20
2 MnthPosts = 12470
Citations =1398

67%
74%
N=1490

Pol
Blog

8%

10%

N = 312
(13%)

N = 247
(18%)

82%
84%

N=732
Avg. Links = 15.1

C
92%

C

N=Top 20
2 MnthPosts = 10414
Citations =2422

N = 2110
(87%)

58
Social Homophily
Polarization on Twitter
• Twitter communications
six weeks prior to 2010
mid-term elections (9/1411/1)
• Examined 250,000
―politically relevant‖
tweets (specific hashtags)

• Focused on two networks
– retweets and mentions
using ―streaming API‖

59
Social Homophily
An Aside – What is Twitter?
• Each tweet <= 140 characters
(avg. 10-15 words/message)
• Heavy presence of non-alpha
symbols, abbrevs, misspellings
and slang
• Tweets often include
– Retweets (original tweet
repeated)
– Mentions (user mentions
another user in a tweet)
– Hashtags ―#‖ (metadata
annotation denoting topic or
audience)
http://www.forbes.com/sites/marketshare/2013/01/23/
the-number-one-mistake-retail-brands-make-when-it-comes-to-twitter/
60
Social Homophily
Polarization on Twitter – Social Networks (in study)
Network 1:
N=23,766
A

B
retweets

Network 2:
A
mentions

N=10,142

B

Both representing potential pathways
for information to flow from A to B

61
Social Homophily
Polarization on Twitter – Social Networks

Gephi Representation
GraphicXML
Retweet Network

Source:
http://carl.cs.indiana.edu/data/
icwsm/icwsm_polarization.zip

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<!-- Created by igraph -->
<key id="cluster" for="node" attr.name="cluster" attr.type="string"/>
<key id="tags" for="edge" attr.name="tags" attr.type="string"/>
<key id="type" for="edge" attr.name="type" attr.type="string"/>
<key id="urls" for="edge" attr.name="urls" attr.type="double"/>
<key id="time" for="edge" attr.name="time" attr.type="string"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="cluster">right</data>
</node>
<node id="n2">
<data key="cluster">left</data>
</node>
...
<edge source="n12464" target="n7349">
<data key="tags">[&apos;#tcot&apos;, &apos;#tlot&apos;]</data>
<data key="type">retweet</data>
<data key="urls">1</data>
<data key="time">1286901355</data>
</edge>
...
</graph>
</graphml>

62
Social Homophily
Polarization on Twitter – Gephi Analysis

Forced
Directed
Layout (Force
Atlas 2)

Nodes colored
by Cluster
(Political
Ideology)

63
Social Homophily
Contributions of Twitter Study
•

Cluster analysis of networks derived
from corpus shows that the networks of
retweets exhibits clear segregation
while the mentioned network is
dominated by single cluster.

•

Manual classification of Twitter user by
political alignment, demonstrating that
the retweet network clusters
corresponding to political left and right.
They also show the mention network is
political.

•

Interpretation of the observed
community structures based on
injection of partisan content into
ideologically opposed hashtag
information streams

Retweets

Mentions

64
Social Homophily
Polarization – The other side of the coin

Oh, buy the way…

• In addition to citations, Adamic
and Glance also considered the
similarity in the textual content of
the blogs

• Identified 498 ―bigrams‖ across
the Top 40 blogs
• Computed cosine similarity
measure between TD-IDF scores
for all pairs of blogs

• Average similarity between
opposites was .10, among
conservatives was .54 and
liberals was .57.

65
Social Homophily
Mining Text Content
• ―Scholars have long recognized that
much of politics is expressed in
words.‖
• ―To understand what politics is about
we need to know what political actors
are saying and writing.‖

• ―…The assumption is that ideology
dominates the language used in the
text.‖
• Ergo, another way to study the
polarization of political blogs is to
compare what they say.

66
Social Homophily
Mining Text Content - Political Blogs Example
Conservative or Liberal?
Republican nightmare begins: Obamacare is 'a godsend' for people getting
coverage
U.S. President Barack Obama delivers remarks alongside Human Services
Secretary Kathleen Sebelius (R) and other Americans the White House says
will benefit from the opening of health insurance marketplaces under the
Affordable Care Act, in the Rose Garden.
The health care exchanges—federal and state—are now functioning and not
sucking up all the oxygen around the implementation of Obamacare. Finally,
the "good news" news stories are finally being told, like this one in The New
York Times.
…
That's the kind of story that makes Republicans quake in their shoes. For more
of them, go below the fold.

67
Social Homophily
Mining Text Content – Political Blogs Example
Conservative or Liberal?
The Colorado #obamacare exchange is a disaster.
I’m sorry, but there’s no other way to describe these enrollment numbers. Which are, by the way, from
a state exchange – meaning that the problems of healthcare.gov should theoretically be irrelevant to
the conversation:
Enrollment have grown to 15,074, marketplace officials announced Monday. Marketplace officials set
a goal of 136,000 people covered on exchanged-based plans by the end of 2014, but so far the
exchange has failed to reach even worst-case enrollment projections.
The Democratic-controlled Coloradan state government is blaming this mess on people supposedly
not having to replace their insurance now and bad press about the national exchange. Which is easier
than blaming the Democrats now running the state exchange, particularly since the head of the
exchange wanted a merit raise for all her work on it. It’s certainly easier than simply admitting that
they, maybe there wasn’t all that much of a burning need for Obamacare in the first place.
But surely that’s crazy talk.
Yes, I am sorry. Bad numbers = people not being insured next year. What I am not is responsible for
any of this.

68
Social Homophily
Mining Text Content – Political Blogs Example

Are they separated into 2 distinct clusters
based on their content?

Sample of 50 blog entries about
―Obamacare‖ from popular Liberal &
Conservative Political Blogs

69
Social Homophily
Mining Text Content vs. Data Mining
Real-World
Data

Data Consolidation
Business
Understanding

Data
Understanding

Data
Preparation

Data Cleaning

Deployment
Modeling

Evaluation

Cross-Industry Standard Process for Data Mining

Data Transformation

Data Reduction

Well-Formed
Data
70
70
Social Homophily
Mining Text Content vs. Data Mining
Some Basic DM Tasks
• Anomaly Detection
• Association Rule
Learning
• Clustering
• Classification
• Regression
• Summarization

71
Social Homophily
Mining Text Content vs. Data Mining

DM Data is usually
• Structured
• Transformed
• Well-formed

72
Social Homophily
Mining Text Content – The Process
Unstructured Data
• No specified format

• Variable length
• Variable spelling
• Punctuation and
non-alphanumeric
characters
• Contents are not
predefined and no
predefined set of
values
73
Social Homophily
Mining Text Content – The Process

NLP

The overarching goal is, essentially, to turn
text into data for analysis, via application
of natural language processing (NLP) and
analytical methods.
74
Social Homophily
Mining Text Content – The Process

NLP

―The most consequential, and shocking, step we will take
is to discard the order in which words occur in documents.
We will assume documents are a bag of words, where
order does not inform our analyses… If this assumption is
unpalatable, we can retain some word order by including
bigrams (word pairs) or trigrams..‖
75
Social Homophily
Mining Text Content – The Process
Real-World
Text Data

Business
Understanding

Document
Understanding

Document
Preparation
Deployment

Document
Consolidation

Establish the
Corpus

Documents
Modeling

Evaluation

CRISP-Like Processes

Corpus Refinement
(Token, Stem, Stop…)

Feature Selection
& Weighting

Doc-Term
Matrix*
76
Social Homophily
Mining Text Content – The Process
Common representation of tokens within and between documents
Tokenization

Normalize

Eliminate
Stop Words

Stemming

•

Tokenization —Parse the text to generate terms. Sophisticated analyzers can also
extract phrases from the text.

•

Normalize — Convert them to lowercase.

•

Eliminate stop words — Eliminate terms that appear very often (e.g. the, and, …).

•

Stemming — Convert the terms into their stemmed form—remove plurals and
different word forms (e.g. achieve, achieves, achieved – achiev) [note: word about
synonyms – WordNet Synset]

77
Social Homophily
Mining Text Content – The Process
Documents

Four score and seven years
ago our fathers brought forth
on this continent, a new
nation, conceived in Liberty,
and dedicated to the
proposition that all mean are
created equal.
Now we are engaged in a
great civil war, testing
whether that nation, or…

Token Sets

Feature
Extraction

nation – 5
civil – 1
war – 2
men – 2
died – 4
people – 5
liberty – 1
god – 1
…

Feature Extraction & Weighting

“Bag of Words, Terms
or Tokens”

Doc/Token Matrix:
Vectors of Words, Terms or Tokens by Doc

“Bag of Words” (BOW) or
Vector Space Model (VSM):
Words or Tokens are
attributes and documents
are examples
78
Social Homophily
Mining Text Content – Political Blog Example
Analysis
from
Python
NLTK
Program

For each
sentence

Tokens: ['The', 'Colorado', '#', 'obamacare', 'exchange', 'is', 'a', 'disaster', '.']
Remove Stopwords: ['Colorado', '#', 'obamacare', 'exchange', 'disaster', '.']
Lower case: ['colorado', '#', 'obamacare', 'exchange', 'disaster', '.']
Alpha only: ['colorado', 'obamacare', 'exchange', 'disaster']
…
Stems: ['colorado', 'obamacar', 'exchang', 'disast']
Vector of stems for all sentences in blog posting
{'one-year': 1, 'offici': 2, 'merit': 1, 'particularli': 1, 'coloradan': 1, 'explain': 2, 'theoret': 1, 'certainli': 1, 'easier': 2, 'mayb': 1,
'far': 1, 'nation': 1, 'disast': 1, 'govern': 1, 'press': 1, 'democrat': 1, 'bad': 2, 'mid-rang': 1, 'mean': 1, 'set': 1, 'replac': 1,
'respons': 1, 'fail': 1, 'even': 1, 'decre': 2, 'state': 3, 'estim': 1, 'democratic-control': 1, 'run': 1, 'obamacar': 2, 'burn': 1,
'blame': 2, 'let': 1, 'enrol': 6, 'worst-cas': 2, 'could': 1, 'unenforc': 1, 'admit': 1, 'place': 1, 'presid': 1, 'opinion': 1, 'first': 1,
'simpli': 1, 'colorado': 2, 'one-tim': 1, 'exchang': 5, 'number': 2, 'cancel': 1, 'announc': 1, 'supposedli': 1, 'describ': 1, 'would':
1, 'pictur': 1, 'hey': 1, 'next': 1, 'much': 1, 'way': 2, 'head': 1, 'offer': 1, 'peopl': 4, 'compani': 1, 'exchanged-bas': 1, 'loom': 1,
'whether': 1, 'work': 2, 'project': 2, 'sinc': 1, 'problem': 1, 'aw': 1, 'site': 1, 'crazi': 1, 'want': 1, 'need': 1, 'grown': 1, 'irrelev': 1,
'goal': 1, 'renew': 1, 'marketplac': 2, 'sure': 1, 'insur': 3, 'monday': 1, 'mess': 1, 'reach': 1, 'rais': 1, 'plan': 1, 'date': 1, 'end': 1,
'mid-novemb': 1, 'scenario': 1, 'cover': 1, 'talk': 1}
79
Social Homophily
Mining Text Content – Political Blog Example
DocStem
Matrix
(from
Python
NLTK
Program)

Stem Distribution from Obamacare Postings
50 Most Frequently Used out of 792 Stems by Political Ideology

80
Social Homophily
Mining Text Content – Political Blog Example
Stem Distribution from Obamacare Postings
For Stems whose proportions are between .1 and .7 by Political Ideology

Differences in Stem Frequencies by Political Ideology

81
Social Homophily
Mining Text Content – The Process
Transforming Frequencies and Weights
•
•
•
•

Binary Frequencies: tf =1 for tf>0; otherwise 0
Term Frequencies: tf(i,j)/Sum of tf(i,j) in Doc K
Log Frequencies: 1 + log(tf) for tf>0; otherwise 0
Normalized Frequencies: Divide each frequency by
SQRT of Sum of Squares of the frequencies within the
vector (column)
• Term Frequency–Inverse Document Frequency
– TF: Freq of term for given doc/ sum of words for given doc
– Inverse Document Frequency: log(N/(1+D)) where N is total
number of docs and D is number with term
– TF * IDF

82
Social Homophily
Mining Text Content – The Process

83
Social Homophily
Mining Text Content – Political Blog Example
Average
Cosine Similarities
Cons = .25
Lib = .27
Cons-Lib = .26

Hierarchical
Clustering
Based on TD-IDF
Of Stems from
Obamacare Postings
By Political Position
Social Homophily
Mining Text Content – Retweets and Mentions
Content Homogenity
•

An interesting question, therefore,
is whether it has any significance in
terms of the actual content of the
discussions involved.

•

To address this issue we associate
each user with a profile vector
containing all the hashtags in her
tweets, weighted by their
frequencies.

•

We can then compute the cosine
similarities between each pair of
user profiles, separately for users
in the same cluster and users in
different clusters.

85
Social Homophily
Mining Text Content – Retweets and Mentions

Cosine
Similarities

Average Similarities

86

More Related Content

What's hot

Searching The Social Web
Searching The Social WebSearching The Social Web
Searching The Social WebOfer Egozi
 
You're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job ApplicantsYou're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job ApplicantsToronto Metropolitan University
 
Social Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive IntelligenceSocial Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive IntelligenceAugust Jackson
 
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media DataAltmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media DataToronto Metropolitan University
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network E...
Understanding Public Sentiment: Conducting a Related-Tags Content Network E...Understanding Public Sentiment: Conducting a Related-Tags Content Network E...
Understanding Public Sentiment: Conducting a Related-Tags Content Network E...learjk
 
Recommending Communities at the Regional and City Level
Recommending Communities at the Regional and City LevelRecommending Communities at the Regional and City Level
Recommending Communities at the Regional and City LevelJohn Verostek
 
Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)Katy Jordan
 
Data Privacy and Anonymization
Data Privacy and AnonymizationData Privacy and Anonymization
Data Privacy and AnonymizationJeffrey Wang
 
Social Networks and Social Capital
Social Networks and Social CapitalSocial Networks and Social Capital
Social Networks and Social CapitalGiorgos Cheliotis
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsPatti Anklam
 
An Introduction to NodeXL for Social Scientists
An Introduction to NodeXL for Social ScientistsAn Introduction to NodeXL for Social Scientists
An Introduction to NodeXL for Social ScientistsDr Wasim Ahmed
 
Social Network Analysis in Two Parts
Social Network Analysis in Two PartsSocial Network Analysis in Two Parts
Social Network Analysis in Two PartsPatti Anklam
 
Data Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data InfrastructuresData Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data InfrastructuresLiliana Bounegru
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaJanna Joceli Omena
 
Towards a More Holistic Approach on Online Abuse and Antisemitism
Towards a More Holistic Approach on Online Abuse and AntisemitismTowards a More Holistic Approach on Online Abuse and Antisemitism
Towards a More Holistic Approach on Online Abuse and AntisemitismIIIT Hyderabad
 
Social Network Theory & Analysis
Social Network Theory & Analysis Social Network Theory & Analysis
Social Network Theory & Analysis Susan Chesley Fant
 
Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networksNicola Barbieri
 
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsIntroduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsMike Kujawski
 

What's hot (20)

Searching The Social Web
Searching The Social WebSearching The Social Web
Searching The Social Web
 
You're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job ApplicantsYou're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job Applicants
 
Social Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive IntelligenceSocial Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive Intelligence
 
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media DataAltmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network E...
Understanding Public Sentiment: Conducting a Related-Tags Content Network E...Understanding Public Sentiment: Conducting a Related-Tags Content Network E...
Understanding Public Sentiment: Conducting a Related-Tags Content Network E...
 
Recommending Communities at the Regional and City Level
Recommending Communities at the Regional and City LevelRecommending Communities at the Regional and City Level
Recommending Communities at the Regional and City Level
 
Unpacking Digital Methods
Unpacking Digital MethodsUnpacking Digital Methods
Unpacking Digital Methods
 
Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)
 
Data Privacy and Anonymization
Data Privacy and AnonymizationData Privacy and Anonymization
Data Privacy and Anonymization
 
Social Networks and Social Capital
Social Networks and Social CapitalSocial Networks and Social Capital
Social Networks and Social Capital
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to Tools
 
An Introduction to NodeXL for Social Scientists
An Introduction to NodeXL for Social ScientistsAn Introduction to NodeXL for Social Scientists
An Introduction to NodeXL for Social Scientists
 
Social Network Analysis in Two Parts
Social Network Analysis in Two PartsSocial Network Analysis in Two Parts
Social Network Analysis in Two Parts
 
Data Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data InfrastructuresData Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data Infrastructures
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social Media
 
Towards a More Holistic Approach on Online Abuse and Antisemitism
Towards a More Holistic Approach on Online Abuse and AntisemitismTowards a More Holistic Approach on Online Abuse and Antisemitism
Towards a More Holistic Approach on Online Abuse and Antisemitism
 
Asymmetric polarization
Asymmetric polarizationAsymmetric polarization
Asymmetric polarization
 
Social Network Theory & Analysis
Social Network Theory & Analysis Social Network Theory & Analysis
Social Network Theory & Analysis
 
Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networks
 
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsIntroduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
 

Viewers also liked

Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An IntroductionAli Abbasi
 
Ghazal Badalti Tehzeeb
Ghazal Badalti TehzeebGhazal Badalti Tehzeeb
Ghazal Badalti TehzeebMool Chand
 
Intro To The Valuation Council Final
Intro To The Valuation Council FinalIntro To The Valuation Council Final
Intro To The Valuation Council FinalRICS Americas
 
стратегическая цель гос политики ( измен)
стратегическая цель гос политики ( измен)стратегическая цель гос политики ( измен)
стратегическая цель гос политики ( измен)farcrys
 
Summerhill
SummerhillSummerhill
Summerhillbertafv
 
20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_media20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_mediaDimitris Tsingos
 
Ccm Smim2l
Ccm Smim2lCcm Smim2l
Ccm Smim2lMIMCCMS
 
садовые цветы
садовые  цветысадовые  цветы
садовые цветыfarcrys
 
Quick Start: Rails
Quick Start: RailsQuick Start: Rails
Quick Start: RailsDavid Keener
 
Gp E Getting Started Din
Gp E Getting Started DinGp E Getting Started Din
Gp E Getting Started Dinguestbe3d76d
 
Building Community for Insight and WOM
Building Community for Insight and WOMBuilding Community for Insight and WOM
Building Community for Insight and WOMSheSpeaks Inc.
 
Summerhill
SummerhillSummerhill
Summerhillbertafv
 
Mondi virtuali, numeri e prospettive
Mondi virtuali, numeri e prospettiveMondi virtuali, numeri e prospettive
Mondi virtuali, numeri e prospettiveLuca Spoldi
 
What I learned from...
What I learned from...What I learned from...
What I learned from...laurenepfa
 

Viewers also liked (20)

Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
 
Fisokuhle Profile
Fisokuhle ProfileFisokuhle Profile
Fisokuhle Profile
 
Ghazal Badalti Tehzeeb
Ghazal Badalti TehzeebGhazal Badalti Tehzeeb
Ghazal Badalti Tehzeeb
 
Intro To The Valuation Council Final
Intro To The Valuation Council FinalIntro To The Valuation Council Final
Intro To The Valuation Council Final
 
POSOBIE
POSOBIEPOSOBIE
POSOBIE
 
стратегическая цель гос политики ( измен)
стратегическая цель гос политики ( измен)стратегическая цель гос политики ( измен)
стратегическая цель гос политики ( измен)
 
Summerhill
SummerhillSummerhill
Summerhill
 
20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_media20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_media
 
Stricklin Mini Portfolio
Stricklin Mini PortfolioStricklin Mini Portfolio
Stricklin Mini Portfolio
 
Ccm Smim2l
Ccm Smim2lCcm Smim2l
Ccm Smim2l
 
садовые цветы
садовые  цветысадовые  цветы
садовые цветы
 
E Safety
E SafetyE Safety
E Safety
 
Quick Start: Rails
Quick Start: RailsQuick Start: Rails
Quick Start: Rails
 
Gp E Getting Started Din
Gp E Getting Started DinGp E Getting Started Din
Gp E Getting Started Din
 
Building Community for Insight and WOM
Building Community for Insight and WOMBuilding Community for Insight and WOM
Building Community for Insight and WOM
 
5 60 Kumarasiri VP and seedling tea
5 60 Kumarasiri VP and seedling tea5 60 Kumarasiri VP and seedling tea
5 60 Kumarasiri VP and seedling tea
 
Summerhill
SummerhillSummerhill
Summerhill
 
Mondi virtuali, numeri e prospettive
Mondi virtuali, numeri e prospettiveMondi virtuali, numeri e prospettive
Mondi virtuali, numeri e prospettive
 
Prueba 1
Prueba 1Prueba 1
Prueba 1
 
What I learned from...
What I learned from...What I learned from...
What I learned from...
 

Similar to Mining and analyzing social media part 1 - hicss47 tutorial - dave king

2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network AnalysisMarc Smith
 
Seams2016 presentation calikli_et_al
Seams2016 presentation calikli_et_alSeams2016 presentation calikli_et_al
Seams2016 presentation calikli_et_alGul Calikli
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfBalasundaramSr
 
20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...Marc Smith
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6sritikumar
 
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...Marc Smith
 
Making More Sense Out of Social Data
Making More Sense Out of Social DataMaking More Sense Out of Social Data
Making More Sense Out of Social DataThe Open University
 
Tutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social NetworksTutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social Networkspjing2
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchMicah Altman
 
Social media presentation
Social media presentationSocial media presentation
Social media presentationmhabibian
 
Social media presentation (1)
Social media presentation (1)Social media presentation (1)
Social media presentation (1)mhabibian
 
Social media presentation
Social media presentationSocial media presentation
Social media presentationBobbyS95
 
Social media presentation - fixed
Social media presentation - fixedSocial media presentation - fixed
Social media presentation - fixedBobbyS95
 
Social media presentation
Social media presentation Social media presentation
Social media presentation mhabibian
 
Social media presentation
Social media presentationSocial media presentation
Social media presentationJelani9
 
Social media presentation
Social media presentationSocial media presentation
Social media presentationdrewfay1
 

Similar to Mining and analyzing social media part 1 - hicss47 tutorial - dave king (20)

2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
 
Seams2016 presentation calikli_et_al
Seams2016 presentation calikli_et_alSeams2016 presentation calikli_et_al
Seams2016 presentation calikli_et_al
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...20151001 charles university prague - marc smith - node xl-picturing political...
20151001 charles university prague - marc smith - node xl-picturing political...
 
SSRI_pt1.ppt
SSRI_pt1.pptSSRI_pt1.ppt
SSRI_pt1.ppt
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6
 
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
2015 #MMeasure-Marc Smith-NodeXL Mapping social media using social network ma...
 
Making More Sense Out of Social Data
Making More Sense Out of Social DataMaking More Sense Out of Social Data
Making More Sense Out of Social Data
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Tutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social NetworksTutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social Networks
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science Research
 
Social media presentation
Social media presentationSocial media presentation
Social media presentation
 
Social media presentation (1)
Social media presentation (1)Social media presentation (1)
Social media presentation (1)
 
Social media presentation
Social media presentationSocial media presentation
Social media presentation
 
Social media presentation - fixed
Social media presentation - fixedSocial media presentation - fixed
Social media presentation - fixed
 
Social media presentation
Social media presentation Social media presentation
Social media presentation
 
Social media presentation
Social media presentationSocial media presentation
Social media presentation
 
Social media presentation
Social media presentationSocial media presentation
Social media presentation
 
01 Network Data Collection (2017)
01 Network Data Collection (2017)01 Network Data Collection (2017)
01 Network Data Collection (2017)
 

More from Dave King

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave kingDave King
 
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...Dave King
 
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...Dave King
 
Mining and analyzing social media sample network w ora - hicss47 tutorial -...
Mining and analyzing social media   sample network w ora - hicss47 tutorial -...Mining and analyzing social media   sample network w ora - hicss47 tutorial -...
Mining and analyzing social media sample network w ora - hicss47 tutorial -...Dave King
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2Dave King
 
Social media mining hicss 46 part 1
Social media mining   hicss 46 part 1Social media mining   hicss 46 part 1
Social media mining hicss 46 part 1Dave King
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Dave King
 
Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1Dave King
 
Text mining and analytics v6 - p1
Text mining and analytics   v6 - p1Text mining and analytics   v6 - p1
Text mining and analytics v6 - p1Dave King
 
Text mining and analytics v6 - p2
Text mining and analytics   v6 - p2Text mining and analytics   v6 - p2
Text mining and analytics v6 - p2Dave King
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3Dave King
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3Dave King
 

More from Dave King (12)

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
 
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
 
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...
 
Mining and analyzing social media sample network w ora - hicss47 tutorial -...
Mining and analyzing social media   sample network w ora - hicss47 tutorial -...Mining and analyzing social media   sample network w ora - hicss47 tutorial -...
Mining and analyzing social media sample network w ora - hicss47 tutorial -...
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2
 
Social media mining hicss 46 part 1
Social media mining   hicss 46 part 1Social media mining   hicss 46 part 1
Social media mining hicss 46 part 1
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
 
Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1
 
Text mining and analytics v6 - p1
Text mining and analytics   v6 - p1Text mining and analytics   v6 - p1
Text mining and analytics v6 - p1
 
Text mining and analytics v6 - p2
Text mining and analytics   v6 - p2Text mining and analytics   v6 - p2
Text mining and analytics v6 - p2
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3
 

Recently uploaded

Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeCall^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeMs Riya
 
Website research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazineWebsite research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazinesamuelcoulson30
 
Night 7k Call Girls Noida Sector 120 Call Me: 8448380779
Night 7k Call Girls Noida Sector 120 Call Me: 8448380779Night 7k Call Girls Noida Sector 120 Call Me: 8448380779
Night 7k Call Girls Noida Sector 120 Call Me: 8448380779Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...anilsa9823
 
Call Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts ServiecCall Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts ServiecSapana Sha
 
Angela Killian | Operations Director | Dallas
Angela Killian | Operations Director | DallasAngela Killian | Operations Director | Dallas
Angela Killian | Operations Director | DallasAngela Killian
 
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...Mona Rathore
 
Add more information to your upload Tip: Better titles and descriptions lead ...
Add more information to your upload Tip: Better titles and descriptions lead ...Add more information to your upload Tip: Better titles and descriptions lead ...
Add more information to your upload Tip: Better titles and descriptions lead ...SejarahLokal
 
Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...AJHSSR Journal
 
Unlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.comUnlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.comSagar Sinha
 
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceCall Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceSapana Sha
 
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrCall Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrSapana Sha
 
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Paymentanilsa9823
 
Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779
Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779
Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779Delhi Call girls
 
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCRElite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCRDelhi Call girls
 
Call Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking MenCall Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking MenSapana Sha
 
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...AJHSSR Journal
 
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy GirlsPooja Nehwal
 

Recently uploaded (20)

Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeCall^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
 
Website research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazineWebsite research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazine
 
Night 7k Call Girls Noida Sector 120 Call Me: 8448380779
Night 7k Call Girls Noida Sector 120 Call Me: 8448380779Night 7k Call Girls Noida Sector 120 Call Me: 8448380779
Night 7k Call Girls Noida Sector 120 Call Me: 8448380779
 
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
 
Call Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts ServiecCall Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
 
Angela Killian | Operations Director | Dallas
Angela Killian | Operations Director | DallasAngela Killian | Operations Director | Dallas
Angela Killian | Operations Director | Dallas
 
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
 
Add more information to your upload Tip: Better titles and descriptions lead ...
Add more information to your upload Tip: Better titles and descriptions lead ...Add more information to your upload Tip: Better titles and descriptions lead ...
Add more information to your upload Tip: Better titles and descriptions lead ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
Delhi  99530 vip 56974  Genuine Escort Service Call Girls in MasudpurDelhi  99530 vip 56974  Genuine Escort Service Call Girls in Masudpur
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
 
Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...
 
Unlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.comUnlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.com
 
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceCall Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
 
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrCall Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
 
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
 
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779
Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779
Night 7k Call Girls Noida New Ashok Nagar Escorts Call Me: 8448380779
 
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCRElite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
 
Call Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking MenCall Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking Men
 
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
 
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
 

Mining and analyzing social media part 1 - hicss47 tutorial - dave king

  • 1. Mining and Analyzing Social Media: Part 1 Dave King HICSS 47 - January 2014
  • 2. Agenda: Part 1 • Introduction – – – – Bio Some definitions Growing Interest Resources • Social Homophily as an Example – Meaning and Import – Political Cyberbalkanization • Social Network Analysis • Text Mining 2
  • 3. Agenda: Part 2 • Introduction to Social Network Analysis Metrics – Degrees of Separation – Hooray for Bollywood • Standard Measures – Centrality – Cohesion • Levels of Social Network Analysis – Facebook & Egocentric Analysis – Searching for Cohesive Subgroups 3
  • 4. Dave King Biography University Professor „70 App Stats Math Modeling Dir R&D Execucom „80 SVP & CTO Comshare „90 DSS Optimization App Stats Nat Lang for DB Query DSS Interface HICSS (83) EVP R&D JDA Software „00 DSS Multi-Dim DB EIS & BI „10 Merch Planning SCP Planning Optimization ECom HICSS Track Intern & Dig Econ Sentiment Analysis HICSS Tutorial Web Mining & Analysis 4
  • 6. Social Media View from the Pinwheel
  • 7. Social Media View from the Word Clouds 7
  • 8. Defined is the media we use to be social. That’s it. 8
  • 9. Mining and Analyzing Social Media Two Approaches Content Connections 9
  • 10. Social Media Content Emails Tweets SMS Tags IM Chat Photos Bookmarks FAQs Video Discussion Music Presentations Conversations Ratings Wiki Entries Reviews Lifestreams Articles Recommendations Profiles News Stories RSS Feeds Location Blogs
  • 13. Types of Mining and Analysis 13
  • 14. Mining and Analyzing Social Media The Content Data Mining Discovering meaningful patterns from large data sets using pattern recognition technologies Web Mining Data Mining focused on the analysis of Web Usage, Structure & Content Text Mining Using natural language processing & data mining to discover patterns in a collection of ―documents
  • 15. Mining and Analyzing Social Media The Connections Social Network Analysis [SNA] The mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. Network science Study of the theoretical foundations of network structure and behavior and the application of networks to many subfields including SNA, collaboration networks, emergent systems, and physical and life science systems 15
  • 16. Interest You are not alone or maybe you are? 16
  • 17. Interest Search - Google Trends Web Mining Text Social Network Analysis Web Mining Network Science Data Mining Social Media Mining Big Data Sentiment Analysis Data Analytics Mathematical Statistics 17
  • 18. Interest Book Titles – Google Ngram Viewer Web Mining Social Network Analysis Text Mining Network Science Data Mining Social Media** Big Data Sentiment Analysis Data Analytics Mathematical Statistics 18
  • 19. Resources Customers who spent a lot of money on these items also… 19
  • 20. Web & Text Mining Resources General Coverage and Programming 20
  • 21. Text Mining & Social Network Analysis Computational Models and Programming 21
  • 23. Social Network Analysis Mathematical Models and SNA Tools UCI Pajek UCI UCI ORA UCI,Pajek & ORA NodeXL
  • 24. Social Network Analysis Online Tutorials and Classes
  • 25. An Example Social Homophily from Two Perspectives 25
  • 26. Social Homophily aka Assortative Mixing Sound bites Love of the same Birds of a feather flock together Likes associate with likes Likes copy likes Likes think alike 26
  • 27. Social Homophily Some things that are homogenous 27
  • 28. Social Network Analysis Key Elements Vertices or Nodes The “things” Graph [ V,E, f ] A Edges or Links The “relationships” Graph or Network The set of vertices/nodes, edges/links and the relationship/function connecting them. B Edge (Link) C Vertex (Node) D
  • 29. Social Network Analysis Types of Vertices • Social Entities – People or social structures such as workgroups, teams, organizations, institutions, states, or even countries. • Content A – Web pages, keyword tags, or videos. What does a Vertex represent? • Locations B – Physical or virtual locations or events. • Primary building blocks of social media – Friends in social networking sites, posts or authors in blogs, or pages in wikis. C D 29
  • 30. Social Network Analysis Types of Relations • Similarities – Location (same spatial and temporal space) – Participation (same club, same event, …) – Attributes (Age, gender, same attitudes, …) A • Relational Roles What does an Edge represent? – Kinship (mother of, sibling of, …) – Other Roles (friend of, boss of, …) B • Relational Cognition – Affective (Likes, Hates…) – Perceptual (Knows, Knows of, …) • Relational Events C D – Interactions (Sold to, talked to, helped, …) – Flows (Information, beliefs, money, …) 30
  • 31. Social Network Analysis Types of Edges or Links Undirected, Unweighted A Facebook Friends Directed, Unweighted B A B Twitter Followers C Undirected, Weighted A Width of lines C Directed, Weighted Collaboration Network 100 B A 5 10 C 60 Email 70 Network B 20 C 31
  • 32. Social Network Analysis Alternative Representations Edge List Graph Adjacency Matrix A B C D Adjacency List XML 32
  • 33. Social Network Analysis Multimodal networks Bipartite or Bimodal Network Linking individuals to events (participation, membership, topic, tag, post, …) Examples: Authors‐to‐papers (they authored) Actors‐to‐Movies (they appeared in) Users‐to‐Movies (they rated) ―Folded‖ networks: Author collaboration networks Movie co‐rating networks E1 E2 E3 E4 I1 I2 I3 I4 I1 2 2 I2 I3 I4 M * MT E1 1 1 1 1 E2 1 2 E3 2 E4 MT * M 33
  • 34. Social Network Analysis One of the founding fathers 34
  • 35. Social Homophily Any guesses what this is and what it shows? 35
  • 36. Social Homophily Any guesses what this is and what it shows? 36
  • 37. Social Homophily Gender homophily in adolescent school group Density or Distinction? The Roles of Data Structure and Group Detection Methods in Describing Adolescent Peer Groups Gest, S. et al. Journal of Social Structure: Vol. 6, http://www.cmu.edu/joss/content/articles/volume8/GestMoody/ 37
  • 38. Social Networks Role of Attributes • Add insights to visualizations and analysis • May include: – Demographic characteristics (age, gender, race) – Other characteristics (income, education, location) – Use of the system (number of logins, blogs posted, edits made) – Network Metrics (centralization) A B C D • Often designated by color, shape, size and/or opacity of the vertices 38
  • 39. Social Homophily Any guesses what this is and what it shows? 39
  • 40. Social Homophily The beauty of social networks ted.com/talks/nicholas_christakis_the_hidden_influence_of_social_networks.html 40
  • 41. Social Homophily Regardless of cause the resulting pattern is Clusters of connections emerge throughout the social space such that there is dense intra-connectivity within clusters and sparse inter-connectivity between clusters 41
  • 42. Social Homophily What produces a pattern of homophily? Cause: Similarity breeds connection Effect: Connection breeds similarity Spurious: Some other factor is @ work 42
  • 43. Social Homophily In the world of U.S. politics? Discourse Viewing Voting Purchasing … Antagonistic Polarization? 43
  • 44. Social Homophily Balkanization Process of fragmentation or division of a region or state into smaller regions or states that are often hostile or non-cooperative with each other 44
  • 45. Social Homophily Balkanization of the US Congress (Data Sets) senate.gov/ reference/ Index/Votes.htm clerk.house.gov/ legislative/ legvotes.aspx Before Threshold After Threshold govtrack.us/ developers "Senate Voting Patterns," Morris, F. analyticalvisions.blogspot.com/2006/04/senate-votingpatterns-part-2.html 45
  • 46. Social Homophily Balkanization of US Senate (1989-2013) - Votes U.S. Senate More Divided Than Ever Data Shows, Swallow,E. ,Forbes, Oct. 17, 2013, forbes.com/sites/ ericaswallow/2013/11/17/sen ate-voting-relationships-data/ Partisan voting patterns – 1989 to present Interactive data visualization static.davidchouinard.com/c ongress/ 46
  • 47. Social Homophily Balkanization of US Senate (1975-2012) - Votes Portrait of Political Party Polarization, Moody J. and P. Mucha, April 2013, Network Science, journals.cambridge.org/action/displayAbstract?fromPage=online&aid=8888768 47
  • 48. Social Homophily Are political books of “a feather” purchased together? The Social Life of Books,Krebs V., orgnet.com/divided.html Aug 2008 Oct 2008
  • 49. Social Homophily Cyberbalkanization of the Political Blogsphere • Single day snapshot of a Snowball Sample of Political Blogs (N=1490) • Manually assigned as Liberal or Conservative • Focus on blogrolls and front page citations • Are we witnessing an era of ―Cyberbalkinization?‖ 49
  • 50. Social Homophily An aside – what’s a (We)blog? • Information or discussion site – Focused on a particular subject area – Discrete entries posted in reverse chronological order – Primarily text – Trend toward multiple author entries • Some stats? – Wordpress ~ 60M blogs with 100M pageviews/day – Tumblr ~ 150M blogs with 90M posts/day – Blogger ~ 46M unique visitors/month 50
  • 51. Social Homophily Snowball Sample Political Blog Directories eTalkingHead BlogCatalog Blogarama CampaignLine Individual Political Blogrolls 51
  • 52. Social Homophily Social Network Packages Pajek UCI UCI UCI UCI,Pajek & ORA ORA NodeXL 52
  • 53. Social Homophily Social Network Analysis of Blogs with Pajek 53
  • 54. Social Homophily Social Network Analysis of Blogs with Pajek .NET Network File (Vertex descriptions followed By Edge descriptions) • • • Colors and bold simply for illustration Liberal Blogs (nos. 1-758) Conservative Blogs (nos. 759 – 1490) .CLU Partition File (0-Lib; 1 – Cons One entry per vertex) *vertices 1490 1 100monkeystyping 0.0 0.0 2 12thharmonic 0.0 0.0 ... 757 yoder 0.0 0.0 758 younglibs 0.0 0.0 ... 759 84rules 0.0 0.0 760 a100wwe 0.0 0.0 ... 1489 zeke01 0.0 0.0 1490 zeph1z 0.0 0.0 *edges 1 23 1.0 1 55 1.0 ... 757 65 1.0 760 854 1.0 760 855 1.0 ... 1489 1431 1.0 1490 802 1.0 *vertices 1490 0 0 0 … 0 1 1 1 … 1 54
  • 55. Social Homophily Social Network Analysis of Blogs with Pajek F-R 2D Layout w/o labels partitioned by pol. position Circular Layout w. labels Draw Network Fructerman-Reingold 2D Layout with labels Draw Partition F-R 2D Layout w/o labels 55
  • 56. Social Homophily Cyberbalkanization of the Political Blogsphere? Citation Network – link implies that a political blog has a URL on it’s front page referencing another blog Proliferation of specialized online political news sources allows people with different political leanings to be exposed only to information in agreement with their previously held views? 56
  • 57. Social Homophily Cyberbalkanization of the Political Blogsphere? N=1490 Edges = 16715 N=758 Edges = 7301 Liberals N=732 Edges = 7839 Conservatives 57
  • 58. Social Homophily Cyberbalkanization of the Political Blogsphere Blogrolls Page Citation 90% N = 1151 (82%) N=758 Avg. Links=13.2 L L N=Top 20 2 MnthPosts = 12470 Citations =1398 67% 74% N=1490 Pol Blog 8% 10% N = 312 (13%) N = 247 (18%) 82% 84% N=732 Avg. Links = 15.1 C 92% C N=Top 20 2 MnthPosts = 10414 Citations =2422 N = 2110 (87%) 58
  • 59. Social Homophily Polarization on Twitter • Twitter communications six weeks prior to 2010 mid-term elections (9/1411/1) • Examined 250,000 ―politically relevant‖ tweets (specific hashtags) • Focused on two networks – retweets and mentions using ―streaming API‖ 59
  • 60. Social Homophily An Aside – What is Twitter? • Each tweet <= 140 characters (avg. 10-15 words/message) • Heavy presence of non-alpha symbols, abbrevs, misspellings and slang • Tweets often include – Retweets (original tweet repeated) – Mentions (user mentions another user in a tweet) – Hashtags ―#‖ (metadata annotation denoting topic or audience) http://www.forbes.com/sites/marketshare/2013/01/23/ the-number-one-mistake-retail-brands-make-when-it-comes-to-twitter/ 60
  • 61. Social Homophily Polarization on Twitter – Social Networks (in study) Network 1: N=23,766 A B retweets Network 2: A mentions N=10,142 B Both representing potential pathways for information to flow from A to B 61
  • 62. Social Homophily Polarization on Twitter – Social Networks Gephi Representation GraphicXML Retweet Network Source: http://carl.cs.indiana.edu/data/ icwsm/icwsm_polarization.zip <?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <!-- Created by igraph --> <key id="cluster" for="node" attr.name="cluster" attr.type="string"/> <key id="tags" for="edge" attr.name="tags" attr.type="string"/> <key id="type" for="edge" attr.name="type" attr.type="string"/> <key id="urls" for="edge" attr.name="urls" attr.type="double"/> <key id="time" for="edge" attr.name="time" attr.type="string"/> <graph id="G" edgedefault="directed"> <node id="n0"> <data key="cluster">right</data> </node> <node id="n2"> <data key="cluster">left</data> </node> ... <edge source="n12464" target="n7349"> <data key="tags">[&apos;#tcot&apos;, &apos;#tlot&apos;]</data> <data key="type">retweet</data> <data key="urls">1</data> <data key="time">1286901355</data> </edge> ... </graph> </graphml> 62
  • 63. Social Homophily Polarization on Twitter – Gephi Analysis Forced Directed Layout (Force Atlas 2) Nodes colored by Cluster (Political Ideology) 63
  • 64. Social Homophily Contributions of Twitter Study • Cluster analysis of networks derived from corpus shows that the networks of retweets exhibits clear segregation while the mentioned network is dominated by single cluster. • Manual classification of Twitter user by political alignment, demonstrating that the retweet network clusters corresponding to political left and right. They also show the mention network is political. • Interpretation of the observed community structures based on injection of partisan content into ideologically opposed hashtag information streams Retweets Mentions 64
  • 65. Social Homophily Polarization – The other side of the coin Oh, buy the way… • In addition to citations, Adamic and Glance also considered the similarity in the textual content of the blogs • Identified 498 ―bigrams‖ across the Top 40 blogs • Computed cosine similarity measure between TD-IDF scores for all pairs of blogs • Average similarity between opposites was .10, among conservatives was .54 and liberals was .57. 65
  • 66. Social Homophily Mining Text Content • ―Scholars have long recognized that much of politics is expressed in words.‖ • ―To understand what politics is about we need to know what political actors are saying and writing.‖ • ―…The assumption is that ideology dominates the language used in the text.‖ • Ergo, another way to study the polarization of political blogs is to compare what they say. 66
  • 67. Social Homophily Mining Text Content - Political Blogs Example Conservative or Liberal? Republican nightmare begins: Obamacare is 'a godsend' for people getting coverage U.S. President Barack Obama delivers remarks alongside Human Services Secretary Kathleen Sebelius (R) and other Americans the White House says will benefit from the opening of health insurance marketplaces under the Affordable Care Act, in the Rose Garden. The health care exchanges—federal and state—are now functioning and not sucking up all the oxygen around the implementation of Obamacare. Finally, the "good news" news stories are finally being told, like this one in The New York Times. … That's the kind of story that makes Republicans quake in their shoes. For more of them, go below the fold. 67
  • 68. Social Homophily Mining Text Content – Political Blogs Example Conservative or Liberal? The Colorado #obamacare exchange is a disaster. I’m sorry, but there’s no other way to describe these enrollment numbers. Which are, by the way, from a state exchange – meaning that the problems of healthcare.gov should theoretically be irrelevant to the conversation: Enrollment have grown to 15,074, marketplace officials announced Monday. Marketplace officials set a goal of 136,000 people covered on exchanged-based plans by the end of 2014, but so far the exchange has failed to reach even worst-case enrollment projections. The Democratic-controlled Coloradan state government is blaming this mess on people supposedly not having to replace their insurance now and bad press about the national exchange. Which is easier than blaming the Democrats now running the state exchange, particularly since the head of the exchange wanted a merit raise for all her work on it. It’s certainly easier than simply admitting that they, maybe there wasn’t all that much of a burning need for Obamacare in the first place. But surely that’s crazy talk. Yes, I am sorry. Bad numbers = people not being insured next year. What I am not is responsible for any of this. 68
  • 69. Social Homophily Mining Text Content – Political Blogs Example Are they separated into 2 distinct clusters based on their content? Sample of 50 blog entries about ―Obamacare‖ from popular Liberal & Conservative Political Blogs 69
  • 70. Social Homophily Mining Text Content vs. Data Mining Real-World Data Data Consolidation Business Understanding Data Understanding Data Preparation Data Cleaning Deployment Modeling Evaluation Cross-Industry Standard Process for Data Mining Data Transformation Data Reduction Well-Formed Data 70 70
  • 71. Social Homophily Mining Text Content vs. Data Mining Some Basic DM Tasks • Anomaly Detection • Association Rule Learning • Clustering • Classification • Regression • Summarization 71
  • 72. Social Homophily Mining Text Content vs. Data Mining DM Data is usually • Structured • Transformed • Well-formed 72
  • 73. Social Homophily Mining Text Content – The Process Unstructured Data • No specified format • Variable length • Variable spelling • Punctuation and non-alphanumeric characters • Contents are not predefined and no predefined set of values 73
  • 74. Social Homophily Mining Text Content – The Process NLP The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods. 74
  • 75. Social Homophily Mining Text Content – The Process NLP ―The most consequential, and shocking, step we will take is to discard the order in which words occur in documents. We will assume documents are a bag of words, where order does not inform our analyses… If this assumption is unpalatable, we can retain some word order by including bigrams (word pairs) or trigrams..‖ 75
  • 76. Social Homophily Mining Text Content – The Process Real-World Text Data Business Understanding Document Understanding Document Preparation Deployment Document Consolidation Establish the Corpus Documents Modeling Evaluation CRISP-Like Processes Corpus Refinement (Token, Stem, Stop…) Feature Selection & Weighting Doc-Term Matrix* 76
  • 77. Social Homophily Mining Text Content – The Process Common representation of tokens within and between documents Tokenization Normalize Eliminate Stop Words Stemming • Tokenization —Parse the text to generate terms. Sophisticated analyzers can also extract phrases from the text. • Normalize — Convert them to lowercase. • Eliminate stop words — Eliminate terms that appear very often (e.g. the, and, …). • Stemming — Convert the terms into their stemmed form—remove plurals and different word forms (e.g. achieve, achieves, achieved – achiev) [note: word about synonyms – WordNet Synset] 77
  • 78. Social Homophily Mining Text Content – The Process Documents Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all mean are created equal. Now we are engaged in a great civil war, testing whether that nation, or… Token Sets Feature Extraction nation – 5 civil – 1 war – 2 men – 2 died – 4 people – 5 liberty – 1 god – 1 … Feature Extraction & Weighting “Bag of Words, Terms or Tokens” Doc/Token Matrix: Vectors of Words, Terms or Tokens by Doc “Bag of Words” (BOW) or Vector Space Model (VSM): Words or Tokens are attributes and documents are examples 78
  • 79. Social Homophily Mining Text Content – Political Blog Example Analysis from Python NLTK Program For each sentence Tokens: ['The', 'Colorado', '#', 'obamacare', 'exchange', 'is', 'a', 'disaster', '.'] Remove Stopwords: ['Colorado', '#', 'obamacare', 'exchange', 'disaster', '.'] Lower case: ['colorado', '#', 'obamacare', 'exchange', 'disaster', '.'] Alpha only: ['colorado', 'obamacare', 'exchange', 'disaster'] … Stems: ['colorado', 'obamacar', 'exchang', 'disast'] Vector of stems for all sentences in blog posting {'one-year': 1, 'offici': 2, 'merit': 1, 'particularli': 1, 'coloradan': 1, 'explain': 2, 'theoret': 1, 'certainli': 1, 'easier': 2, 'mayb': 1, 'far': 1, 'nation': 1, 'disast': 1, 'govern': 1, 'press': 1, 'democrat': 1, 'bad': 2, 'mid-rang': 1, 'mean': 1, 'set': 1, 'replac': 1, 'respons': 1, 'fail': 1, 'even': 1, 'decre': 2, 'state': 3, 'estim': 1, 'democratic-control': 1, 'run': 1, 'obamacar': 2, 'burn': 1, 'blame': 2, 'let': 1, 'enrol': 6, 'worst-cas': 2, 'could': 1, 'unenforc': 1, 'admit': 1, 'place': 1, 'presid': 1, 'opinion': 1, 'first': 1, 'simpli': 1, 'colorado': 2, 'one-tim': 1, 'exchang': 5, 'number': 2, 'cancel': 1, 'announc': 1, 'supposedli': 1, 'describ': 1, 'would': 1, 'pictur': 1, 'hey': 1, 'next': 1, 'much': 1, 'way': 2, 'head': 1, 'offer': 1, 'peopl': 4, 'compani': 1, 'exchanged-bas': 1, 'loom': 1, 'whether': 1, 'work': 2, 'project': 2, 'sinc': 1, 'problem': 1, 'aw': 1, 'site': 1, 'crazi': 1, 'want': 1, 'need': 1, 'grown': 1, 'irrelev': 1, 'goal': 1, 'renew': 1, 'marketplac': 2, 'sure': 1, 'insur': 3, 'monday': 1, 'mess': 1, 'reach': 1, 'rais': 1, 'plan': 1, 'date': 1, 'end': 1, 'mid-novemb': 1, 'scenario': 1, 'cover': 1, 'talk': 1} 79
  • 80. Social Homophily Mining Text Content – Political Blog Example DocStem Matrix (from Python NLTK Program) Stem Distribution from Obamacare Postings 50 Most Frequently Used out of 792 Stems by Political Ideology 80
  • 81. Social Homophily Mining Text Content – Political Blog Example Stem Distribution from Obamacare Postings For Stems whose proportions are between .1 and .7 by Political Ideology Differences in Stem Frequencies by Political Ideology 81
  • 82. Social Homophily Mining Text Content – The Process Transforming Frequencies and Weights • • • • Binary Frequencies: tf =1 for tf>0; otherwise 0 Term Frequencies: tf(i,j)/Sum of tf(i,j) in Doc K Log Frequencies: 1 + log(tf) for tf>0; otherwise 0 Normalized Frequencies: Divide each frequency by SQRT of Sum of Squares of the frequencies within the vector (column) • Term Frequency–Inverse Document Frequency – TF: Freq of term for given doc/ sum of words for given doc – Inverse Document Frequency: log(N/(1+D)) where N is total number of docs and D is number with term – TF * IDF 82
  • 83. Social Homophily Mining Text Content – The Process 83
  • 84. Social Homophily Mining Text Content – Political Blog Example Average Cosine Similarities Cons = .25 Lib = .27 Cons-Lib = .26 Hierarchical Clustering Based on TD-IDF Of Stems from Obamacare Postings By Political Position
  • 85. Social Homophily Mining Text Content – Retweets and Mentions Content Homogenity • An interesting question, therefore, is whether it has any significance in terms of the actual content of the discussions involved. • To address this issue we associate each user with a profile vector containing all the hashtags in her tweets, weighted by their frequencies. • We can then compute the cosine similarities between each pair of user profiles, separately for users in the same cluster and users in different clusters. 85
  • 86. Social Homophily Mining Text Content – Retweets and Mentions Cosine Similarities Average Similarities 86

Editor's Notes

  1. First step in textual data preparation is to systematically collect samples of text, i.e. the documents related to the context being studiedRange of possibilities: word documents, PDFs, emails, IM chat, Web pages, RSS Feeds, Blogs, Tweets, Open ended surveys, Transcripts of Helpline calls …Convert into organized set of texts – called a corpus – standardized and prepared for the purpose of knowledge discovery