SlideShare a Scribd company logo
1 of 49
Download to read offline
Introduction and Overview
APAM E4990
Modeling Social Data
Jake Hofman & Chris Wiggins
Columbia University
January 23, 2013
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 1 / 52
Housekeeping
• Who are we:
• Jake: PhD, now at Microsoft Research NYC
• Chris: APMA faculty, also NYT Chief Data Scientist
• E-Dean Fung: Cornell Undergrad, now APAM student
• Grading:
• 3 homeworks, each 20 percent of final grade
• 1 final project, 30 percent of final grade
• Class participation, 10 percent of final grade
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 2 / 52
Course overview
Modeling social data requires an understanding of:
1 How to obtain data produced by online human interactions,
2 What questions we typically ask about human-generated data,
3 How to reframe these questions as mathematical models, and
4 How to interpret the results of these models in ways that
address our questions.
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 3 / 52
Topics we will cover
Figure : modelingsocialdata.org
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 4 / 52
Mise-en-place
(Things you will need in your setup)
1 Bash shell
2 Git
3 Account on Github
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 5 / 52
Computational social science
An emerging discipline at the intersection of the social sciences,
statistics, and computer science
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
Computational social science
An emerging discipline at the intersection of the social sciences,
statistics, and computer science
(motivating questions)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
Computational social science
An emerging discipline at the intersection of the social sciences,
statistics, and computer science
(fitting large, potentially sparse models)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
Computational social science
An emerging discipline at the intersection of the social sciences,
statistics, and computer science
(parallel processing for filtering and aggregating data)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
Questions
Many long-standing questions in the social sciences are notoriously
di cult to answer, e.g.:
• “Who says what to whom in what channel with what e↵ect”?
(Laswell, 1948)
• How do ideas and technology spread through cultures?
(Rogers, 1962)
• How do new forms of communication a↵ect society?
(Singer, 1970)
• . . .
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 7 / 52
Large-scale data
Recently available electronic data provide an unprecedented
opportunity to address these questions at scale
Demographic Behavioral Network
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 8 / 52
The clean real story
“We have a habit in writing articles published in
scientific journals to make the work as finished as
possible, to cover all the tracks, to not worry about the
blind alleys or to describe how you had the wrong idea
first, and so on. So there isn’t any place to publish, in
a dignified manner, what you actually did in order to
get to do the work ...”
-Richard Feynman
Nobel Lecture1, 1965
1
http://bit.ly/feynmannobel
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 9 / 52
Examples!
Now some examples of social data we’ve worked with and
will be discussing
But first: questions?
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 10 / 52
Topics
Exploratory Data Analysis
Regression
Classification
Networks
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 11 / 52
Exploratory Data Analysis
(a.k.a. counting and plotting things)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 12 / 52
Demographic diversity on the Web
with Irmak Sirer and Sharad Goel (ICWSM 2012)
DailyPer−CapitaPageviews
0
10
20
30
40
50
60
70
●
●
●
●
●
Over $25k
Under $25k
Black
&
Hispanic
White
No College
Some College
Over 65
Under 65
Female
Male
Income Race Education Age Sex
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 13 / 52
Motivation
Previous work is largely survey-based and focuses and group-level
di↵erences in online access
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 14 / 52
Motivation
“As of January 1997, we estimate that 5.2 million
African Americans and 40.8 million whites have ever used
the Web, and that 1.4 million African Americans and
20.3 million whites used the Web in the past week.”
-Ho↵man & Novak (1998)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 14 / 52
Motivation
Focus on activity instead of access
How diverse is the Web?
To what extent do online experiences vary across demographic
groups?
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 15 / 52
Data
• Representative sample of 265,000 individuals in the US, paid
via the Nielsen MegaPanel2
• Log of anonymized, complete browsing activity from June
2009 through May 2010 (URLs viewed, timestamps, etc.)
• Detailed individual and household demographic information
(age, education, income, race, sex, etc.)
2
Special thanks to Mainak Mazumdar
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 16 / 52
Data
# ls -alh nielsen_megapanel.tar
-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
Data
# ls -alh nielsen_megapanel.tar
-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar
• Normalize pageviews to at most three domain levels, sans www
e.g. www.yahoo.com ! yahoo.com,
us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
Data
# ls -alh nielsen_megapanel.tar
-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar
• Normalize pageviews to at most three domain levels, sans www
e.g. www.yahoo.com ! yahoo.com,
us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com
• Restrict to top 100k (out of 9M+ total) most popular sites
(by unique visitors)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
Data
# ls -alh nielsen_megapanel.tar
-rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar
• Normalize pageviews to at most three domain levels, sans www
e.g. www.yahoo.com ! yahoo.com,
us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com
• Restrict to top 100k (out of 9M+ total) most popular sites
(by unique visitors)
• Aggregate activity at the site, group, and user levels
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
Aggregate usage patterns
How do users distribute their time across di↵erent categories?
Fractionoftotalpageviews
0.05
0.10
0.15
0.20
0.25
●
●
●
● ●
SocialM
edia
E−m
ail
G
am
es
Portals
Search
All groups spend the majority of their time in the top five most
popular categories
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 18 / 52
Aggregate usage patterns
How do users distribute their time across di↵erent categories?
User Rank by Daily Activity
FractionofPageviewsinCategory
0.05
0.10
0.15
0.20
0.25
0.30
●
● ● ● ●
●
●
●
●
●
10% 30% 50% 70% 90%
● Social Media
E−mail
Games
Portals
Search
Highly active users devote nearly twice as much of their time to
social media relative to typical individuals
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 18 / 52
Group-level activity
How does browsing activity vary at the group level?
DailyPer−CapitaPageviews
0
10
20
30
40
50
60
70
●
●
●
●
●
Over $25k
Under $25k
Black
&
Hispanic
White
No College
Some College
Over 65
Under 65
Female
Male
Income Race Education Age Sex
Large di↵erences exist even at the aggregate level
(e.g. women on average generate 40% more pageviews than men)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 19 / 52
Group-level activity
How does browsing activity vary at the group level?
DailyPer−CapitaPageviews
0
10
20
30
40
50
60
70
●
●
●
●
●
Over $25k
Under $25k
Black
&
Hispanic
White
No College
Some College
Over 65
Under 65
Female
Male
Income Race Education Age Sex
Younger and more educated individuals are both more likely to
access the Web and more active once they do
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 19 / 52
Group-level activity
All demographic groups spend the majority of their time in the
same categories
Age
Fractionoftotalpageviews
0.0
0.1
0.2
0.3
0.4
0.5
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
● Social Media
E−mail
Games
Portals
Search
Fractionoftotalpageviews
0.0
0.1
0.2
0.3
0.4
Education
● ●
●
●
●
●
●
G
ram
m
arSchool
Som
e
H
ig
h
School
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egree
PostG
raduate
D
egree
Sex
●
●
Fem
ale
M
ale
Income
●
● ●
●
●
●
$0−25k
$25−50k
$50−75k
$75−100k
$100−150k
$150k+
Race
● ●
● ●
●
O
ther
H
is
panic
Bla
ck
W
hite
Asia
n
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 20 / 52
Group-level activity
Older, more educated, male, wealthier, and Asian Internet users
spend a smaller fraction of their time on social media
Age
Fractionoftotalpageviews
0.0
0.1
0.2
0.3
0.4
0.5
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
● Social Media
E−mail
Games
Portals
Search
Fractionoftotalpageviews
0.0
0.1
0.2
0.3
0.4
Education
● ●
●
●
●
●
●
G
ram
m
arSchool
Som
e
H
ig
h
School
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egree
PostG
raduate
D
egree
Sex
●
●
Fem
ale
M
ale
Income
●
● ●
●
●
●
$0−25k
$25−50k
$50−75k
$75−100k
$100−150k
$150k+
Race
● ●
● ●
●
O
ther
H
is
panic
Bla
ck
W
hite
Asia
n
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 20 / 52
Group-level activity
Lower social media use by these groups is often accompanied by
higher e-mail volume
Age
Fractionoftotalpageviews
0.0
0.1
0.2
0.3
0.4
0.5
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
● Social Media
E−mail
Games
Portals
Search
Fractionoftotalpageviews
0.0
0.1
0.2
0.3
0.4
Education
● ●
●
●
●
●
●
G
ram
m
arSchool
Som
e
H
ig
h
School
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egree
PostG
raduate
D
egree
Sex
●
●
Fem
ale
M
ale
Income
●
● ●
●
●
●
$0−25k
$25−50k
$50−75k
$75−100k
$100−150k
$150k+
Race
● ●
● ●
●
O
ther
H
is
panic
Bla
ck
W
hite
Asia
n
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 20 / 52
Revisiting the digital divide
How does usage of news, health, and reference vary with
demographics?
Averagepageviewspermonth
0
2
4
6
8
10
12
Education
●
●
●
● ●
●
●
G
ram
m
arSchool
Som
e
H
ig
h
School
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egree
PostG
raduate
D
egree
Sex
●
●
Fem
ale
M
ale
Income
● ● ●
●
●
●
$0−25k
$25−50k
$50−75k
$75−100k
$100−150k
$150k+
Race
● ●
●
●
●
O
ther
H
is
panic
Bla
ck
W
hite
Asia
n
● News
Health
Reference
Post-graduates spend three times as much time on health sites
than adults with only some high school education
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 21 / 52
Revisiting the digital divide
How does usage of news, health, and reference vary with
demographics?
Averagepageviewspermonth
0
2
4
6
8
10
12
Education
●
●
●
● ●
●
●
G
ram
m
arSchool
Som
e
H
ig
h
School
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egree
PostG
raduate
D
egree
Sex
●
●
Fem
ale
M
ale
Income
● ● ●
●
●
●
$0−25k
$25−50k
$50−75k
$75−100k
$100−150k
$150k+
Race
● ●
●
●
●
O
ther
H
is
panic
Bla
ck
W
hite
Asia
n
● News
Health
Reference
Asians spend more than 50% more time browsing online news than
do other race groups
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 21 / 52
Revisiting the digital divide
How does usage of news, health, and reference vary with
demographics?
Averagepageviewspermonth
0
2
4
6
8
10
12
Education
●
●
●
● ●
●
●
G
ram
m
arSchool
Som
e
H
ig
h
School
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egree
PostG
raduate
D
egree
Sex
●
●
Fem
ale
M
ale
Income
● ● ●
●
●
●
$0−25k
$25−50k
$50−75k
$75−100k
$100−150k
$150k+
Race
● ●
●
●
●
O
ther
H
is
panic
Bla
ck
W
hite
Asia
n
● News
Health
Reference
Even when less educated and less wealthy groups gain access to
the Web, they utilize these resources relatively infrequently
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 21 / 52
Revisiting the digital divide
How does usage of news, health, and reference vary with
demographics?
Averagepageviewspermonth
0
2
4
6
8
10
12
News
●
● ●
●
●
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egreePostG
raduate
D
egree
Health
●
● ●
● ●
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egreePostG
raduate
D
egree
Reference
●
● ●
● ●
H
ig
h
SchoolG
raduate
Som
e
C
ollege
Associa
te
D
egree
Bachelo
r's
D
egreePostG
raduate
D
egree
Asian
Black
Hispanic
White
Controlling for other variables, e↵ects of race and gender largely
disappear, while education continues to have large e↵ect
pi =
X
j
↵j xij +
X
j
X
k
jkxij xik +
X
j
j x2
ij + ✏i
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 22 / 52
Revisiting the digital divide
How does usage of news, health, and reference vary with
demographics?
Averagepageviewspermonth
0
2
4
6
8
10
12
Health
●
● ●
● ●
H
igh
SchoolG
raduate
Som
e
C
ollegeAssociate
D
egreeBachelor's
D
egree
PostG
raduate
D
egree
Female
Male
However, women spend considerably more time on health sites
compared to men
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 23 / 52
Revisiting the digital divide
How does usage of news, health, and reference vary with
demographics?
Monthly pageviews on health sites
20 40 60 80 100
Female
Male
However, women spend considerably more time on health sites
compared to men, although means can be misleading
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 23 / 52
Summary
• Highly active users spend disproportionately more of their
time on social media and less on e-mail relative to the overall
population
• Access to research, news, and healthcare is strongly related to
education, not as closely to ethnicity
• User demographics can be inferred from browsing activity with
reasonable accuracy
• “Who Does What on the Web”, Goel, Hofman & Sirer,
ICWSM 2012
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 24 / 52
Classification
(a.k.a. modeling discrete things)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 25 / 52
example:
millions of views per hour
interpretable supervised learning
supercoolstuff
Regression
(a.k.a. modeling continuous things)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 26 / 52
Predicting consumer activity with Web search
with Sharad Goel, S´ebastien Lahaie, David Pennock, Duncan Watts
"Right Round"
Week
Rank
40
30
20
10
cccccccccccccccccccccccccccccccccccccccccc
Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09
Billboard
Search
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 27 / 52
Search predictions
Summary
• Relative performance and value of search varies across
domains
• Search provides a fast, convenient, and flexible signal across
domains
• “Predicting consumer activity with Web search”
Goel, Hofman, Lahaie, Pennock & Watts, PNAS 2010
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 37 / 52
Networks
(a.k.a. counting complicated things)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 38 / 52
The structural virality of online di↵usion
with Ashton Anderson, Sharad Goel, Duncan Watts (Management Science 2015)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 39 / 52
Information di↵usion
Summary
• Most cascades fail, resulting in fewer than two adoptions, on
average
• Of the hits that do succeed, we observe a wide range of
diverse di↵usion structures
• It’s di cult to say how something spread given only its
popularity
• “The structural virality of online di↵usion”, Anderson, Goel,
Hofman & Watts (Under review.)
Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 52 / 52

More Related Content

Viewers also liked

Viewers also liked (20)

Indemnification to Nominee Director- Allowed or not-CA 2013
Indemnification to Nominee Director- Allowed or not-CA 2013Indemnification to Nominee Director- Allowed or not-CA 2013
Indemnification to Nominee Director- Allowed or not-CA 2013
 
Aircraft landing views
Aircraft landing viewsAircraft landing views
Aircraft landing views
 
Chapter 10 be wto_agriculture
Chapter 10 be wto_agricultureChapter 10 be wto_agriculture
Chapter 10 be wto_agriculture
 
A would-be nanopreneur's Thinkerings on Knowledge
A would-be nanopreneur's Thinkerings on KnowledgeA would-be nanopreneur's Thinkerings on Knowledge
A would-be nanopreneur's Thinkerings on Knowledge
 
CFA presentation
CFA presentationCFA presentation
CFA presentation
 
Market research process
Market research processMarket research process
Market research process
 
81765 karaagopyan
81765 karaagopyan81765 karaagopyan
81765 karaagopyan
 
Lec16
Lec16Lec16
Lec16
 
移动产品界面适配设计
移动产品界面适配设计移动产品界面适配设计
移动产品界面适配设计
 
DF letter Mr. Venkaih Naidu - Feb 16, 2015
DF letter  Mr. Venkaih Naidu - Feb 16, 2015 DF letter  Mr. Venkaih Naidu - Feb 16, 2015
DF letter Mr. Venkaih Naidu - Feb 16, 2015
 
Lec3
Lec3Lec3
Lec3
 
Corporate Presentation
Corporate PresentationCorporate Presentation
Corporate Presentation
 
Learning from Web Activity
Learning from Web ActivityLearning from Web Activity
Learning from Web Activity
 
Imágenes BTI
Imágenes BTIImágenes BTI
Imágenes BTI
 
Aiguilledu midiengelseversie
Aiguilledu midiengelseversieAiguilledu midiengelseversie
Aiguilledu midiengelseversie
 
Atps teoria da_cantabilidade
Atps teoria da_cantabilidadeAtps teoria da_cantabilidade
Atps teoria da_cantabilidade
 
Do You Straight Talk
Do You Straight TalkDo You Straight Talk
Do You Straight Talk
 
Lec4
Lec4Lec4
Lec4
 
Baile
Baile Baile
Baile
 
Russian photographers (8 18)
Russian photographers (8 18)Russian photographers (8 18)
Russian photographers (8 18)
 

Similar to Modeling Social Data, Lecture 1: Case Studies

Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?University of South Africa (Unisa)
 
MSL Web Identity Presentation F11
MSL Web Identity Presentation F11MSL Web Identity Presentation F11
MSL Web Identity Presentation F11NC State University
 
LMU Workshop Hill Slides 20150601
LMU Workshop Hill Slides 20150601LMU Workshop Hill Slides 20150601
LMU Workshop Hill Slides 20150601Phil Hill
 
Women who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWomen who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWBDC of Florida
 
Digital Civic Engagement: Helping Students Find Their Voice
Digital Civic Engagement: Helping Students Find Their VoiceDigital Civic Engagement: Helping Students Find Their Voice
Digital Civic Engagement: Helping Students Find Their VoicePaul Brown
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Exploring Peer Prestige in Academic Hiring Networks
Exploring Peer Prestige in Academic Hiring NetworksExploring Peer Prestige in Academic Hiring Networks
Exploring Peer Prestige in Academic Hiring NetworksAndrea Wiggins
 
Developing Digital Student Leaders : A mixed methods dissertation study of s...
Developing Digital Student Leaders: A mixed methods dissertation study of s...Developing Digital Student Leaders: A mixed methods dissertation study of s...
Developing Digital Student Leaders : A mixed methods dissertation study of s...Dr. Josie Ahlquist
 
Mooc user experience research paper
Mooc user experience research paperMooc user experience research paper
Mooc user experience research paperSonal Jena
 
Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...
Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...
Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...NC State University
 
Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?CesToronto
 
Vera Data 2.0 Slides and Script
Vera Data 2.0 Slides and ScriptVera Data 2.0 Slides and Script
Vera Data 2.0 Slides and ScriptSara Vera
 
Altmetrics: What is it and how is it relevant to librarians?
Altmetrics: What is it and how is it relevant to librarians?Altmetrics: What is it and how is it relevant to librarians?
Altmetrics: What is it and how is it relevant to librarians?ciakov
 
The coming college tsunami
The coming college tsunamiThe coming college tsunami
The coming college tsunamiWoody Robertson
 
Crowdwork and the Life Course
Crowdwork and the Life CourseCrowdwork and the Life Course
Crowdwork and the Life CourseAnoush Margaryan
 
Lak15workshop vulnerability
Lak15workshop vulnerabilityLak15workshop vulnerability
Lak15workshop vulnerabilitySharon Slade
 
Student vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an explorationStudent vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an explorationUniversity of South Africa (Unisa)
 
Georgetown "Green"
Georgetown "Green"Georgetown "Green"
Georgetown "Green"Jonathan Jacobs
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
 

Similar to Modeling Social Data, Lecture 1: Case Studies (20)

Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?Student data: the missing link in solving the student departure puzzle?
Student data: the missing link in solving the student departure puzzle?
 
MSL Web Identity Presentation F11
MSL Web Identity Presentation F11MSL Web Identity Presentation F11
MSL Web Identity Presentation F11
 
LMU Workshop Hill Slides 20150601
LMU Workshop Hill Slides 20150601LMU Workshop Hill Slides 20150601
LMU Workshop Hill Slides 20150601
 
Women who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWomen who choose Computer Science - what really matters
Women who choose Computer Science - what really matters
 
Digital Civic Engagement: Helping Students Find Their Voice
Digital Civic Engagement: Helping Students Find Their VoiceDigital Civic Engagement: Helping Students Find Their Voice
Digital Civic Engagement: Helping Students Find Their Voice
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Exploring Peer Prestige in Academic Hiring Networks
Exploring Peer Prestige in Academic Hiring NetworksExploring Peer Prestige in Academic Hiring Networks
Exploring Peer Prestige in Academic Hiring Networks
 
Developing Digital Student Leaders : A mixed methods dissertation study of s...
Developing Digital Student Leaders: A mixed methods dissertation study of s...Developing Digital Student Leaders: A mixed methods dissertation study of s...
Developing Digital Student Leaders : A mixed methods dissertation study of s...
 
Mooc user experience research paper
Mooc user experience research paperMooc user experience research paper
Mooc user experience research paper
 
Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...
Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...
Publishing & Sharing with Online Professional Identity in Mind: Crafting Your...
 
Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?
 
Vera Data 2.0 Slides and Script
Vera Data 2.0 Slides and ScriptVera Data 2.0 Slides and Script
Vera Data 2.0 Slides and Script
 
New hampshire2.pptx
New hampshire2.pptxNew hampshire2.pptx
New hampshire2.pptx
 
Altmetrics: What is it and how is it relevant to librarians?
Altmetrics: What is it and how is it relevant to librarians?Altmetrics: What is it and how is it relevant to librarians?
Altmetrics: What is it and how is it relevant to librarians?
 
The coming college tsunami
The coming college tsunamiThe coming college tsunami
The coming college tsunami
 
Crowdwork and the Life Course
Crowdwork and the Life CourseCrowdwork and the Life Course
Crowdwork and the Life Course
 
Lak15workshop vulnerability
Lak15workshop vulnerabilityLak15workshop vulnerability
Lak15workshop vulnerability
 
Student vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an explorationStudent vulnerability, agency and learning analytics: an exploration
Student vulnerability, agency and learning analytics: an exploration
 
Georgetown "Green"
Georgetown "Green"Georgetown "Green"
Georgetown "Green"
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 

More from jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIjakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part Ijakehofman
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIjakehofman
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part Ijakehofman
 
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIjakehofman
 

More from jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part II
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part I
 
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part II
 

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

Modeling Social Data, Lecture 1: Case Studies

  • 1. Introduction and Overview APAM E4990 Modeling Social Data Jake Hofman & Chris Wiggins Columbia University January 23, 2013 Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 1 / 52
  • 2. Housekeeping • Who are we: • Jake: PhD, now at Microsoft Research NYC • Chris: APMA faculty, also NYT Chief Data Scientist • E-Dean Fung: Cornell Undergrad, now APAM student • Grading: • 3 homeworks, each 20 percent of final grade • 1 final project, 30 percent of final grade • Class participation, 10 percent of final grade Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 2 / 52
  • 3. Course overview Modeling social data requires an understanding of: 1 How to obtain data produced by online human interactions, 2 What questions we typically ask about human-generated data, 3 How to reframe these questions as mathematical models, and 4 How to interpret the results of these models in ways that address our questions. Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 3 / 52
  • 4. Topics we will cover Figure : modelingsocialdata.org Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 4 / 52
  • 5. Mise-en-place (Things you will need in your setup) 1 Bash shell 2 Git 3 Account on Github Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 5 / 52
  • 6. Computational social science An emerging discipline at the intersection of the social sciences, statistics, and computer science Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
  • 7. Computational social science An emerging discipline at the intersection of the social sciences, statistics, and computer science (motivating questions) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
  • 8. Computational social science An emerging discipline at the intersection of the social sciences, statistics, and computer science (fitting large, potentially sparse models) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
  • 9. Computational social science An emerging discipline at the intersection of the social sciences, statistics, and computer science (parallel processing for filtering and aggregating data) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 6 / 52
  • 10. Questions Many long-standing questions in the social sciences are notoriously di cult to answer, e.g.: • “Who says what to whom in what channel with what e↵ect”? (Laswell, 1948) • How do ideas and technology spread through cultures? (Rogers, 1962) • How do new forms of communication a↵ect society? (Singer, 1970) • . . . Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 7 / 52
  • 11. Large-scale data Recently available electronic data provide an unprecedented opportunity to address these questions at scale Demographic Behavioral Network Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 8 / 52
  • 12. The clean real story “We have a habit in writing articles published in scientific journals to make the work as finished as possible, to cover all the tracks, to not worry about the blind alleys or to describe how you had the wrong idea first, and so on. So there isn’t any place to publish, in a dignified manner, what you actually did in order to get to do the work ...” -Richard Feynman Nobel Lecture1, 1965 1 http://bit.ly/feynmannobel Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 9 / 52
  • 13. Examples! Now some examples of social data we’ve worked with and will be discussing But first: questions? Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 10 / 52
  • 14. Topics Exploratory Data Analysis Regression Classification Networks Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 11 / 52
  • 15. Exploratory Data Analysis (a.k.a. counting and plotting things) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 12 / 52
  • 16. Demographic diversity on the Web with Irmak Sirer and Sharad Goel (ICWSM 2012) DailyPer−CapitaPageviews 0 10 20 30 40 50 60 70 ● ● ● ● ● Over $25k Under $25k Black & Hispanic White No College Some College Over 65 Under 65 Female Male Income Race Education Age Sex Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 13 / 52
  • 17. Motivation Previous work is largely survey-based and focuses and group-level di↵erences in online access Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 14 / 52
  • 18. Motivation “As of January 1997, we estimate that 5.2 million African Americans and 40.8 million whites have ever used the Web, and that 1.4 million African Americans and 20.3 million whites used the Web in the past week.” -Ho↵man & Novak (1998) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 14 / 52
  • 19. Motivation Focus on activity instead of access How diverse is the Web? To what extent do online experiences vary across demographic groups? Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 15 / 52
  • 20. Data • Representative sample of 265,000 individuals in the US, paid via the Nielsen MegaPanel2 • Log of anonymized, complete browsing activity from June 2009 through May 2010 (URLs viewed, timestamps, etc.) • Detailed individual and household demographic information (age, education, income, race, sex, etc.) 2 Special thanks to Mainak Mazumdar Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 16 / 52
  • 21. Data # ls -alh nielsen_megapanel.tar -rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
  • 22. Data # ls -alh nielsen_megapanel.tar -rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com ! yahoo.com, us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
  • 23. Data # ls -alh nielsen_megapanel.tar -rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com ! yahoo.com, us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com • Restrict to top 100k (out of 9M+ total) most popular sites (by unique visitors) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
  • 24. Data # ls -alh nielsen_megapanel.tar -rw-r--r-- 100G Jul 17 13:00 nielsen_megapanel.tar • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com ! yahoo.com, us.mg2.mail.yahoo.com/neo/launch ! mail.yahoo.com • Restrict to top 100k (out of 9M+ total) most popular sites (by unique visitors) • Aggregate activity at the site, group, and user levels Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 17 / 52
  • 25. Aggregate usage patterns How do users distribute their time across di↵erent categories? Fractionoftotalpageviews 0.05 0.10 0.15 0.20 0.25 ● ● ● ● ● SocialM edia E−m ail G am es Portals Search All groups spend the majority of their time in the top five most popular categories Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 18 / 52
  • 26. Aggregate usage patterns How do users distribute their time across di↵erent categories? User Rank by Daily Activity FractionofPageviewsinCategory 0.05 0.10 0.15 0.20 0.25 0.30 ● ● ● ● ● ● ● ● ● ● 10% 30% 50% 70% 90% ● Social Media E−mail Games Portals Search Highly active users devote nearly twice as much of their time to social media relative to typical individuals Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 18 / 52
  • 27. Group-level activity How does browsing activity vary at the group level? DailyPer−CapitaPageviews 0 10 20 30 40 50 60 70 ● ● ● ● ● Over $25k Under $25k Black & Hispanic White No College Some College Over 65 Under 65 Female Male Income Race Education Age Sex Large di↵erences exist even at the aggregate level (e.g. women on average generate 40% more pageviews than men) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 19 / 52
  • 28. Group-level activity How does browsing activity vary at the group level? DailyPer−CapitaPageviews 0 10 20 30 40 50 60 70 ● ● ● ● ● Over $25k Under $25k Black & Hispanic White No College Some College Over 65 Under 65 Female Male Income Race Education Age Sex Younger and more educated individuals are both more likely to access the Web and more active once they do Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 19 / 52
  • 29. Group-level activity All demographic groups spend the majority of their time in the same categories Age Fractionoftotalpageviews 0.0 0.1 0.2 0.3 0.4 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 ● Social Media E−mail Games Portals Search Fractionoftotalpageviews 0.0 0.1 0.2 0.3 0.4 Education ● ● ● ● ● ● ● G ram m arSchool Som e H ig h School H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egree PostG raduate D egree Sex ● ● Fem ale M ale Income ● ● ● ● ● ● $0−25k $25−50k $50−75k $75−100k $100−150k $150k+ Race ● ● ● ● ● O ther H is panic Bla ck W hite Asia n Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 20 / 52
  • 30. Group-level activity Older, more educated, male, wealthier, and Asian Internet users spend a smaller fraction of their time on social media Age Fractionoftotalpageviews 0.0 0.1 0.2 0.3 0.4 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 ● Social Media E−mail Games Portals Search Fractionoftotalpageviews 0.0 0.1 0.2 0.3 0.4 Education ● ● ● ● ● ● ● G ram m arSchool Som e H ig h School H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egree PostG raduate D egree Sex ● ● Fem ale M ale Income ● ● ● ● ● ● $0−25k $25−50k $50−75k $75−100k $100−150k $150k+ Race ● ● ● ● ● O ther H is panic Bla ck W hite Asia n Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 20 / 52
  • 31. Group-level activity Lower social media use by these groups is often accompanied by higher e-mail volume Age Fractionoftotalpageviews 0.0 0.1 0.2 0.3 0.4 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 ● Social Media E−mail Games Portals Search Fractionoftotalpageviews 0.0 0.1 0.2 0.3 0.4 Education ● ● ● ● ● ● ● G ram m arSchool Som e H ig h School H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egree PostG raduate D egree Sex ● ● Fem ale M ale Income ● ● ● ● ● ● $0−25k $25−50k $50−75k $75−100k $100−150k $150k+ Race ● ● ● ● ● O ther H is panic Bla ck W hite Asia n Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 20 / 52
  • 32. Revisiting the digital divide How does usage of news, health, and reference vary with demographics? Averagepageviewspermonth 0 2 4 6 8 10 12 Education ● ● ● ● ● ● ● G ram m arSchool Som e H ig h School H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egree PostG raduate D egree Sex ● ● Fem ale M ale Income ● ● ● ● ● ● $0−25k $25−50k $50−75k $75−100k $100−150k $150k+ Race ● ● ● ● ● O ther H is panic Bla ck W hite Asia n ● News Health Reference Post-graduates spend three times as much time on health sites than adults with only some high school education Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 21 / 52
  • 33. Revisiting the digital divide How does usage of news, health, and reference vary with demographics? Averagepageviewspermonth 0 2 4 6 8 10 12 Education ● ● ● ● ● ● ● G ram m arSchool Som e H ig h School H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egree PostG raduate D egree Sex ● ● Fem ale M ale Income ● ● ● ● ● ● $0−25k $25−50k $50−75k $75−100k $100−150k $150k+ Race ● ● ● ● ● O ther H is panic Bla ck W hite Asia n ● News Health Reference Asians spend more than 50% more time browsing online news than do other race groups Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 21 / 52
  • 34. Revisiting the digital divide How does usage of news, health, and reference vary with demographics? Averagepageviewspermonth 0 2 4 6 8 10 12 Education ● ● ● ● ● ● ● G ram m arSchool Som e H ig h School H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egree PostG raduate D egree Sex ● ● Fem ale M ale Income ● ● ● ● ● ● $0−25k $25−50k $50−75k $75−100k $100−150k $150k+ Race ● ● ● ● ● O ther H is panic Bla ck W hite Asia n ● News Health Reference Even when less educated and less wealthy groups gain access to the Web, they utilize these resources relatively infrequently Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 21 / 52
  • 35. Revisiting the digital divide How does usage of news, health, and reference vary with demographics? Averagepageviewspermonth 0 2 4 6 8 10 12 News ● ● ● ● ● H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egreePostG raduate D egree Health ● ● ● ● ● H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egreePostG raduate D egree Reference ● ● ● ● ● H ig h SchoolG raduate Som e C ollege Associa te D egree Bachelo r's D egreePostG raduate D egree Asian Black Hispanic White Controlling for other variables, e↵ects of race and gender largely disappear, while education continues to have large e↵ect pi = X j ↵j xij + X j X k jkxij xik + X j j x2 ij + ✏i Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 22 / 52
  • 36. Revisiting the digital divide How does usage of news, health, and reference vary with demographics? Averagepageviewspermonth 0 2 4 6 8 10 12 Health ● ● ● ● ● H igh SchoolG raduate Som e C ollegeAssociate D egreeBachelor's D egree PostG raduate D egree Female Male However, women spend considerably more time on health sites compared to men Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 23 / 52
  • 37. Revisiting the digital divide How does usage of news, health, and reference vary with demographics? Monthly pageviews on health sites 20 40 60 80 100 Female Male However, women spend considerably more time on health sites compared to men, although means can be misleading Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 23 / 52
  • 38. Summary • Highly active users spend disproportionately more of their time on social media and less on e-mail relative to the overall population • Access to research, news, and healthcare is strongly related to education, not as closely to ethnicity • User demographics can be inferred from browsing activity with reasonable accuracy • “Who Does What on the Web”, Goel, Hofman & Sirer, ICWSM 2012 Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 24 / 52
  • 39. Classification (a.k.a. modeling discrete things) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 25 / 52
  • 41.
  • 42.
  • 44. Regression (a.k.a. modeling continuous things) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 26 / 52
  • 45. Predicting consumer activity with Web search with Sharad Goel, S´ebastien Lahaie, David Pennock, Duncan Watts "Right Round" Week Rank 40 30 20 10 cccccccccccccccccccccccccccccccccccccccccc Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09 Billboard Search Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 27 / 52
  • 46. Search predictions Summary • Relative performance and value of search varies across domains • Search provides a fast, convenient, and flexible signal across domains • “Predicting consumer activity with Web search” Goel, Hofman, Lahaie, Pennock & Watts, PNAS 2010 Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 37 / 52
  • 47. Networks (a.k.a. counting complicated things) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 38 / 52
  • 48. The structural virality of online di↵usion with Ashton Anderson, Sharad Goel, Duncan Watts (Management Science 2015) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 39 / 52
  • 49. Information di↵usion Summary • Most cascades fail, resulting in fewer than two adoptions, on average • Of the hits that do succeed, we observe a wide range of diverse di↵usion structures • It’s di cult to say how something spread given only its popularity • “The structural virality of online di↵usion”, Anderson, Goel, Hofman & Watts (Under review.) Hofman & Wiggins (Columbia University) Introduction and Overview January 23, 2013 52 / 52