SlideShare a Scribd company logo
Learning from Web Activity
Jake Hofman
Yahoo! Research
November 18, 2010
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 1 / 33
Outline
1 Agenda: Just enough philosophy
2 Case study: Demographic diversity on the Web
3 Conclusion: Lessons learned
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 2 / 33
Agenda
Size (only kind of) matters
Big Data
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Size (only kind of) matters
Big Data
Lots of data means lots to learn (from)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Size (only kind of) matters
Big Data
But the “big” part isn’t intrinsically interesting
(although large sample sizes are always good)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Size (only kind of) matters
Big Data
Regardless of size, it’s really about “data jeopardy”
(To what question are these data the answer?)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
Agenda
Tools
Data tools:
• Shell scripting & Python
Munging, Glue
• R
Modeling, Visualization
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
Agenda
Tools
Big Data tools:
• Hadoop & Pig
Filtering, Aggregating
• Shell scripting & Python
Munging, Glue
• R
Modeling, Visualization
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
Agenda
The clean real story
“We have a habit in writing articles published in
scientific journals to make the work as finished as
possible, to cover all the tracks, to not worry about the
blind alleys or to describe how you had the wrong idea
first, and so on. So there isn’t any place to publish, in
a dignified manner, what you actually did in order to
get to do the work ...”
-Richard Feynman
Nobel Lecture1, 1965
1
http://bit.ly/feynmannobel
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 5 / 33
Outline
1 Agenda: Just enough philosophy
2 Case study: Demographic diversity on the Web
3 Conclusion: Lessons learned
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 6 / 33
Demographic diversity on the Web
The clean story
(covering our tracks)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 7 / 33
Demographic diversity on the Web
with Irmak Sirer and Sharad Goel
How diverse is the Web?
To what extent do online experiences vary across demographic
groups?
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 8 / 33
Diversity of the Web
Data
• Representative sample of 265,000 individuals in the US, paid
via the Nielsen MegaPanel2
• Log of anonymized, complete browsing activity from June
2009 through May 2010 (URLs viewed, timestamps, etc.)
• Detailed individual and household demographic information
(age, education, income, race, sex, etc.)
2
http://bit.ly/nielsenonline
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 9 / 33
Diversity of the Web
Data
• Transform all demographic attributes to binary variables
e.g., Age → Over/Under 25, Race → White/Non-White,
Sex → Female/Male
• Normalize pageviews to at most three domain levels, sans www
e.g. www.yahoo.com → yahoo.com,
us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com
• Restrict to top 100k most popular sites
• Aggregate activity at the site, group, and user levels
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 10 / 33
Diversity of the Web
Pig to the rescue
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 11 / 33
Diversity of the Web
Site-level skew
How diverse are site audiences?
• For each site and attribute,
calculate the skew in visitors
(e.g., 93% of pageviews on
foxnews.com are by White
users)
• For each attribute, plot the
distribution of visitor skew
across all sites
Proportion White Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 12 / 33
Diversity of the Web
Site-level skew
Proportion Female Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0
Proportion White VisitorsDensity
0.0 0.2 0.4 0.6 0.8 1.0
Proportion College Educated Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0
Proportion Adult Visitors
Density
0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Visitors With
Household Incomes Under $50,000
Density
0.0 0.2 0.4 0.6 0.8 1.0
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 13 / 33
Diversity of the Web
Site-level skew
Many sites have skew close the average, but there also popular,
highly-skewed sites
Greater Than 90% Less Than 10%
Female
youravon.com
collectionsetc.com
coveritlive.com
needlive.com
White
foxnews.com
wunderground.com
blackplanet.com
mediatakeout.com
College Educated
news.google.com
nytimes.com
slumz.boxden.com
sythe.com
Over 25 Years Old
mail.yahoo.com
apps.facebook.com
nanowrimo.org
cbox.ws
Household Income
Under $50,000
scarleteen.com
boards.adultswim.com
opentable.com
marketwatch.com
Table 1: A selection of popular sites that are homogeneous along various demographic dimensions.
ilyPer−CapitaPageviews
20
30
40
50
60
70
!
!
!
!Non−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
visually apparent from Figure 5, there are significant differ-
ences in how groups distribute their time on the web. These
differences—which, as mentioned above, hold for highly fre-
quented sites such as Facebook and YouTube—are in some
cases even more pronounced for lower traffic sites. For in-
stance, the gaming site pogo.com accounts for less than 1%
of pageviews among both low and high income users, but
low income users spend almost twice as much of their time
there.Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
Diversity of the Web
Site-level skew
Many sites have skew close the average, but there also popular,
highly-skewed sites
Greater Than 90% Less Than 10%
Female
youravon.com
collectionsetc.com
coveritlive.com
needlive.com
White
foxnews.com
wunderground.com
blackplanet.com
mediatakeout.com
College Educated
news.google.com
nytimes.com
slumz.boxden.com
sythe.com
Over 25 Years Old
mail.yahoo.com
apps.facebook.com
nanowrimo.org
cbox.ws
Household Income
Under $50,000
scarleteen.com
boards.adultswim.com
opentable.com
marketwatch.com
Table 1: A selection of popular sites that are homogeneous along various demographic dimensions.
ilyPer−CapitaPageviews
20
30
40
50
60
70
!
!
!
!Non−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
visually apparent from Figure 5, there are significant differ-
ences in how groups distribute their time on the web. These
differences—which, as mentioned above, hold for highly fre-
quented sites such as Facebook and YouTube—are in some
cases even more pronounced for lower traffic sites. For in-
stance, the gaming site pogo.com accounts for less than 1%
of pageviews among both low and high income users, but
low income users spend almost twice as much of their time
there.
This skew persists even when we restrict attention to the top 10k
or 1k sites
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
As expected, neighborhoods are more gender-balanced than sites
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
But sites typically have more racially diverse audiences than
neighborhoods have residents
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Sites vs. ZIPs
How do diversity of the online and offline worlds compare?
Proportion Female
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion White
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Proportion College Educated
Density
0.0 0.2 0.4 0.6 0.8 1.0
Sites
ZIPs
Skew by education is comparable, with online showing a bias
towards higher education
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
Diversity of the Web
Group-level activity
How does browsing activity vary at the group level?
DailyPer−CapitaPageviews
0
10
20
30
40
50
60
70
q
q
q
qNon−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
Race Education Sex Age
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
Diversity of the Web
Group-level activity
How does browsing activity vary at the group level?
DailyPer−CapitaPageviews
0
10
20
30
40
50
60
70
q
q
q
qNon−White
Male
Non−White
Male
No College
Under 25
No College
Under 25
White
Female
White
FemaleCollege
Over 25
College
Over 25
Race Education Sex Age
Large differences exist even at the aggregate level
(e.g. women on average generate 40% more pageviews than men)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
Diversity of the Web
Group-level activity
All groups spend more than a third of their time on a handful of
email, search, and social networking sites
PercentofTotalTimeSpentonSite
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google.com
apps.facebook.com
m
ail.google.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
view
m
orepics.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google.com
hom
e.m
yspace.com
m
ail.com
cast.net
bing.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essaging.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ikipedia.org
login.yahoo.com
facebook.m
afiawars.com
m
y.yahoo.com
gam
e3.pogo.com
friends.m
yspace.com
tagged.com
w
orldw
inner.com
m
eebo.com
login.live.com
m
ypoints.com
m
aps.google.com
aol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
inster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alotm
etrics.com
female
male
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
Diversity of the Web
Group-level activity
But different groups distribute their time differently, both on
universally popular and on more niche sites
PercentofTotalTimeSpentonSite
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google.com
apps.facebook.com
m
ail.google.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
view
m
orepics.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google.com
hom
e.m
yspace.com
m
ail.com
cast.net
bing.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essaging.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ikipedia.org
login.yahoo.com
facebook.m
afiawars.com
m
y.yahoo.com
gam
e3.pogo.com
friends.m
yspace.com
tagged.com
w
orldw
inner.com
m
eebo.com
login.live.com
m
ypoints.com
m
aps.google.com
aol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
inster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alotm
etrics.com
female
male
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
Diversity of the Web
Group-level activity
But different groups distribute their time differently, both on
universally popular and on more niche sites
PercentofTotalTimeSpentonSite
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google.com
apps.facebook.com
m
ail.google.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
view
m
orepics.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google.com
hom
e.m
yspace.com
m
ail.com
cast.net
bing.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essaging.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ikipedia.org
login.yahoo.com
facebook.m
afiawars.com
m
y.yahoo.com
gam
e3.pogo.com
friends.m
yspace.com
tagged.com
w
orldw
inner.com
m
eebo.com
login.live.com
m
ypoints.com
m
aps.google.com
aol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
inster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alotm
etrics.com
white
non.white
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
PercentofTotalTimeSpentonSite
0.1%
1%
10%
0.1%
1%
10%
0.1%
1%
10%
0.1%
1%
10%
0.1%
1%
10%
facebook.com
m
ail.yahoo.com
google
.com
apps.facebook.com
m
ail.google
.com
m
ail.live.com
youtube.com
w
ebm
ail.aol.com
m
w
fb.zynga.com
channel.facebook.com
vie
w
m
orepic
s.m
yspace.com
search.yahoo.com
m
yspace.com
m
sn.com
am
azon.com
shop.ebay.com
yahoo.com
im
ages.google
.com
hom
e.m
yspace.com
m
ail.com
cast.net
bin
g.com
w
w
w
.yahoo.com
cgi.ebay.com
espn.go.com
m
essagin
g.m
yspace.com
tw
itter.com
cim
.m
eebo.com
m
y.ebay.com
en.w
ik
ip
edia
.org
lo
gin
.yahoo.com
facebook.m
afia
wars.com
m
y.yahoo.com
gam
e3.pogo.com
frie
nds.m
yspace.com
tagged.com
w
orld
w
in
ner.com
m
eebo.com
lo
gin
.live.com
m
ypoin
ts.com
m
aps.google
.comaol.com
pogo.com
m
w
m
s.zynga.com
new
s.yahoo.com
w
in
ster.com
netflix.com
fantasysports.yahoo.com
search.aol.com
com
cast.net
alo
tm
etric
s.com
AgeSexRaceEducationIncome
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 18 / 33
Diversity of the Web
Individual-level prediction
How well can one predict an individual’s demographics from their
browsing activity?
• Represent each user by the set of sites visited
• Fit linear models to predict majority/minority for each
attribute on 80% of users
• Tune model parameters using a 10% validation set
• Evaluate final performance on held-out 10% test set
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 19 / 33
Diversity of the Web
GNU-fu
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 20 / 33
Diversity of the Web
Individual-level prediction
• Reasonable (∼70-85%)
accuracy and AUC across all
attributes
• Similar performance even
when restricted to top 1k
sites
• Can achieve substantially
better performance when
restricted to “stereotypical”
users (∼80-90%)
College/No College
Under/Over $50,000
Household Income
White/Non−White
Female/Male
Over/Under 25
Years Old
AUC
q
q
q
q
q
.5 .6 .7 .8 .9 1
Accuracy
q
q
q
q
q
.5 .6 .7 .8 .9 1
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 21 / 33
Diversity of the Web
Individual-level prediction
Highly-weighted sites under the fitted models
Large positive weight Large negative weight
Female
winster.com
lancome-usa.com
sports.yahoo.com
espn.go.com
White
marlboro.com
cmt.com
mediatakeout.com
bet.com
College Educated
news.yahoo.com
linkedin.com
youtube.com
myspace.com
Over 25 Years Old
evite.com
classmates.com
addictinggames.com
youtube.com
Household Income
Under $50,000
eharmony.com
tracfone.com
rownine.com
matrixdirect.com
Table 2: A selection of the most predictive (i.e., most highly weighted) sites for each classification task.
College/No College
Under/Over $50,000
Household Income
White/Non−White
Female/Male
Over/Under 25
Years Old
AUC
!
!
!
!
!
Accuracy
!
!
!
!
!
Figure 7, a measure that effectively re-normalizes the ma-
jority and minority classes to have equal size. Intuitively,
AUC is the probability that a model scores a randomly se-
lected positive example higher than a randomly selected neg-
ative one (e.g., the probability that the model correctly dis-
tinguishes between a randomly selected female and male).
Though an uninformative rule would correctly discriminateJake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 22 / 33
Diversity of the Web
Individual-level prediction
Proof of concept browser demo
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
Diversity of the Web
Individual-level prediction
Proof of concept browser demo
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
Diversity of the Web
The real story
(what we actually did)
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 24 / 33
Diversity on the Web
The real story
• Got several hundred GBs of MegaPanel data from Nielsen3
3
Special thanks to Mainak Mazumdar
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
Diversity on the Web
The real story
• Got several hundred GBs of MegaPanel data from Nielsen3
• Discussed possible projects
• Predict user demographics (e.g. real-valued age) from a few
minutes of browsing activity for ad-targeting?
• Infer the number of individuals using the same browser or
behind the same ip?
• Determine number of actual uniques advertisers are receiving?
• . . .
3
Special thanks to Mainak Mazumdar
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
Diversity on the Web
The real story (cont’d)
• Started with predicting real-valued age
• Worked on this for an embarassingly long time
(various methods, feature selection, etc.)
• Turns out to be difficult to do better than within 10 years of
true age, on average
• Settled for classification on binary outcomes (e.g.,
adult/non-adult) over entire history
• Classification worked reasonably well for age and other
attributes
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 26 / 33
Diversity on the Web
The real story (cont’d)
• Became curious about why classification worked well
compared to regression
• Generated descriptive statistics across all attributes at the site
and group levels
• Compared site statistics to ZIP code data from the US Census
• Compared time distribution across groups
• Realized that we now had the largest comprehensive study of
demographic diversity on the web
Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 27 / 33
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity

More Related Content

What's hot

Extreme Democracy: Strategy
Extreme Democracy: StrategyExtreme Democracy: Strategy
Extreme Democracy: Strategy
Paul Schumann
 
Career of mark zuckerberg
Career of mark zuckerbergCareer of mark zuckerberg
Career of mark zuckerberg
nayanbanik
 
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on TeenagersPeriod 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagersmrsalcido
 
Activism
ActivismActivism
Activismsunnyuf
 
How the Net aids dictatorships
How the Net aids dictatorshipsHow the Net aids dictatorships
How the Net aids dictatorships
evgeny.morozov
 
FACEBOOK
FACEBOOKFACEBOOK
FACEBOOK
Lynn University
 
A guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-disseminationA guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-dissemination
FHI 360
 
Authoritarian Governments in Cyberspace
Authoritarian Governments in CyberspaceAuthoritarian Governments in Cyberspace
Authoritarian Governments in Cyberspace
evgeny.morozov
 
Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.
Kevin Anderson
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conference
mikep007
 
Not Evenly Distributed
Not Evenly DistributedNot Evenly Distributed
Not Evenly Distributed
Jason Griffey
 
Outreach for law librarians
Outreach for law librariansOutreach for law librarians
Outreach for law librarians
Meg Kribble
 
MARK ZUCKERBERG presentation
MARK ZUCKERBERG presentationMARK ZUCKERBERG presentation
MARK ZUCKERBERG presentationNaitik Patel
 
Facebook
FacebookFacebook
Facebook
morgansimpson01
 
TLA 2009
TLA 2009TLA 2009
TLA 2009
Jason Griffey
 

What's hot (19)

Freakanomics
FreakanomicsFreakanomics
Freakanomics
 
Extreme Democracy: Strategy
Extreme Democracy: StrategyExtreme Democracy: Strategy
Extreme Democracy: Strategy
 
Career of mark zuckerberg
Career of mark zuckerbergCareer of mark zuckerberg
Career of mark zuckerberg
 
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on TeenagersPeriod 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
Period 3 Michael Murtaugh The Negative Effects of Technology on Teenagers
 
Activism
ActivismActivism
Activism
 
How the Net aids dictatorships
How the Net aids dictatorshipsHow the Net aids dictatorships
How the Net aids dictatorships
 
Website Usability Study
Website Usability StudyWebsite Usability Study
Website Usability Study
 
FACEBOOK
FACEBOOKFACEBOOK
FACEBOOK
 
Digital media
Digital mediaDigital media
Digital media
 
A guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-disseminationA guide-to-using-facebook-in-dissemination
A guide-to-using-facebook-in-dissemination
 
Authoritarian Governments in Cyberspace
Authoritarian Governments in CyberspaceAuthoritarian Governments in Cyberspace
Authoritarian Governments in Cyberspace
 
Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.Independent Journalism: Doing good and doing well.
Independent Journalism: Doing good and doing well.
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conference
 
Not Evenly Distributed
Not Evenly DistributedNot Evenly Distributed
Not Evenly Distributed
 
Outreach for law librarians
Outreach for law librariansOutreach for law librarians
Outreach for law librarians
 
MARK ZUCKERBERG presentation
MARK ZUCKERBERG presentationMARK ZUCKERBERG presentation
MARK ZUCKERBERG presentation
 
Facebook
FacebookFacebook
Facebook
 
TLA 2009
TLA 2009TLA 2009
TLA 2009
 
Mark Zuckerberg
Mark ZuckerbergMark Zuckerberg
Mark Zuckerberg
 

Viewers also liked

Blogging at SinauOnline - Open Social Learning
Blogging at SinauOnline - Open Social LearningBlogging at SinauOnline - Open Social Learning
Blogging at SinauOnline - Open Social Learning
Sinauonline - The Passion of Learning
 
Charter of demands -From the Resident's of Dwarka,Delhi Sub city
Charter of demands -From the Resident's of  Dwarka,Delhi Sub cityCharter of demands -From the Resident's of  Dwarka,Delhi Sub city
Charter of demands -From the Resident's of Dwarka,Delhi Sub city
Madhukar Varshney
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216
Pat Maher
 
Brite zeynep 2012
Brite zeynep 2012Brite zeynep 2012
Brite zeynep 2012
Zeynep Tufekci
 
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Madhukar Varshney
 
Presentatie carrieredagen 2011 linkedin
Presentatie carrieredagen 2011 linkedinPresentatie carrieredagen 2011 linkedin
Presentatie carrieredagen 2011 linkedin
Rijksdienst voor Ondernemend Nederland
 
Xay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat luaXay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat lua
Ho Cao Viet
 
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling StarsC:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
LAKSHMI A R
 
Observations made by the group (Edited)
Observations made by the group (Edited)Observations made by the group (Edited)
Observations made by the group (Edited)
d3ath2u
 
Intro computer
Intro computerIntro computer
Intro computerprajug2503
 
Los Gusanos De Seda
Los Gusanos De SedaLos Gusanos De Seda
Los Gusanos De Sedabeasolo
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcing
asherad
 
Introduction to Steens Furniture
Introduction to Steens FurnitureIntroduction to Steens Furniture
Introduction to Steens Furniture
Steens Furniture
 
Kamila Ppt
Kamila PptKamila Ppt
Kamila Ppt
nagorego
 
Graduacion 8vo
Graduacion 8voGraduacion 8vo
Graduacion 8vo
Colegio Zaenid
 
Domain Name System
Domain Name SystemDomain Name System
Domain Name System
JOYJOYJOYJOY
 
Hieu qua san xuat bap lai tren dat lua dbscl
Hieu qua san  xuat bap lai tren dat lua dbsclHieu qua san  xuat bap lai tren dat lua dbscl
Hieu qua san xuat bap lai tren dat lua dbscl
Ho Cao Viet
 

Viewers also liked (20)

D.psicologia
D.psicologiaD.psicologia
D.psicologia
 
Blogging at SinauOnline - Open Social Learning
Blogging at SinauOnline - Open Social LearningBlogging at SinauOnline - Open Social Learning
Blogging at SinauOnline - Open Social Learning
 
Charter of demands -From the Resident's of Dwarka,Delhi Sub city
Charter of demands -From the Resident's of  Dwarka,Delhi Sub cityCharter of demands -From the Resident's of  Dwarka,Delhi Sub city
Charter of demands -From the Resident's of Dwarka,Delhi Sub city
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216
 
Brite zeynep 2012
Brite zeynep 2012Brite zeynep 2012
Brite zeynep 2012
 
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
Dwarka Forum Organizes Interaction with Candidates from Matiala Constituency
 
Presentatie carrieredagen 2011 linkedin
Presentatie carrieredagen 2011 linkedinPresentatie carrieredagen 2011 linkedin
Presentatie carrieredagen 2011 linkedin
 
Xay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat luaXay dung co so du lieu chi phi san xuat lua
Xay dung co so du lieu chi phi san xuat lua
 
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling StarsC:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
C:\Documents And Settings\Lakshmi Menon\My Documents\Twinkling Stars
 
Observations made by the group (Edited)
Observations made by the group (Edited)Observations made by the group (Edited)
Observations made by the group (Edited)
 
Intro computer
Intro computerIntro computer
Intro computer
 
Los Gusanos De Seda
Los Gusanos De SedaLos Gusanos De Seda
Los Gusanos De Seda
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcing
 
Introduction to Steens Furniture
Introduction to Steens FurnitureIntroduction to Steens Furniture
Introduction to Steens Furniture
 
Kamila Ppt
Kamila PptKamila Ppt
Kamila Ppt
 
Graduacion 8vo
Graduacion 8voGraduacion 8vo
Graduacion 8vo
 
Lec4
Lec4Lec4
Lec4
 
Domain Name System
Domain Name SystemDomain Name System
Domain Name System
 
Hieu qua san xuat bap lai tren dat lua dbscl
Hieu qua san  xuat bap lai tren dat lua dbsclHieu qua san  xuat bap lai tren dat lua dbscl
Hieu qua san xuat bap lai tren dat lua dbscl
 
Bx CRM
Bx CRMBx CRM
Bx CRM
 

Similar to Learning from Web Activity

Better Health through Social Networking
Better Health through Social NetworkingBetter Health through Social Networking
Better Health through Social Networking
Charlene Chausis
 
Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?
Ben Ropp
 
J201 Firstdaylecture
J201 FirstdaylectureJ201 Firstdaylecture
J201 Firstdaylecturelinville
 
History & influencers of Digital & Social Media
History & influencers of Digital & Social MediaHistory & influencers of Digital & Social Media
History & influencers of Digital & Social Media
Susan Chesley Fant
 
Hei Leeds 2008 B
Hei Leeds 2008 BHei Leeds 2008 B
Hei Leeds 2008 B
Ray Poynter
 
Fake news and fact finding
Fake news and fact findingFake news and fact finding
Fake news and fact finding
Yumonomics
 
Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)
Anne Arendt
 
The Digital Museum
The Digital MuseumThe Digital Museum
The Digital Museum
Mitch Maxson
 
4 hstaff final
4 hstaff final4 hstaff final
What We Learned From Social Media
What We Learned From Social MediaWhat We Learned From Social Media
What We Learned From Social Media
Adrian Monck
 
Waterford (Dave Pattern)
Waterford (Dave Pattern)Waterford (Dave Pattern)
Waterford (Dave Pattern)
daveyp
 
2007 open everything at gnomedex 4.4
2007 open everything at gnomedex 4.42007 open everything at gnomedex 4.4
2007 open everything at gnomedex 4.4
Robert David Steele Vivas
 
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Georgiana Cohen
 
Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.
Julie Oden
 
Ltr2 - Gaming and Libraries
Ltr2 - Gaming and LibrariesLtr2 - Gaming and Libraries
Ltr2 - Gaming and Libraries
ryanoceros
 
Ltr Gaming And Libraries
Ltr Gaming And LibrariesLtr Gaming And Libraries
Ltr Gaming And Libraries
StargazerNJ
 
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
jkcomerford
 
Facebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVilleFacebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVille
Selena Garrison
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
Hendrik Speck
 

Similar to Learning from Web Activity (20)

Better Health through Social Networking
Better Health through Social NetworkingBetter Health through Social Networking
Better Health through Social Networking
 
Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?Library 2.0: the Future of Libraries?
Library 2.0: the Future of Libraries?
 
J201 Firstdaylecture
J201 FirstdaylectureJ201 Firstdaylecture
J201 Firstdaylecture
 
History & influencers of Digital & Social Media
History & influencers of Digital & Social MediaHistory & influencers of Digital & Social Media
History & influencers of Digital & Social Media
 
Hei Leeds 2008 B
Hei Leeds 2008 BHei Leeds 2008 B
Hei Leeds 2008 B
 
Fake news and fact finding
Fake news and fact findingFake news and fact finding
Fake news and fact finding
 
Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)Social Media and You (for tweeners/teens)
Social Media and You (for tweeners/teens)
 
The Digital Museum
The Digital MuseumThe Digital Museum
The Digital Museum
 
4 hstaff final
4 hstaff final4 hstaff final
4 hstaff final
 
What We Learned From Social Media
What We Learned From Social MediaWhat We Learned From Social Media
What We Learned From Social Media
 
Waterford (Dave Pattern)
Waterford (Dave Pattern)Waterford (Dave Pattern)
Waterford (Dave Pattern)
 
2007 open everything at gnomedex 4.4
2007 open everything at gnomedex 4.42007 open everything at gnomedex 4.4
2007 open everything at gnomedex 4.4
 
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
Bridging the Real and Virtual Worlds: The Next Evolution of Social and Mobile...
 
Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.Ithaca College Essay Questions. Online assignment writing service.
Ithaca College Essay Questions. Online assignment writing service.
 
Ltr2 - Gaming and Libraries
Ltr2 - Gaming and LibrariesLtr2 - Gaming and Libraries
Ltr2 - Gaming and Libraries
 
Ltr Gaming And Libraries
Ltr Gaming And LibrariesLtr Gaming And Libraries
Ltr Gaming And Libraries
 
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
Amarillo College Creery TLC Library Instruction PowerPoint Fall 2015
 
AprilMay
AprilMayAprilMay
AprilMay
 
Facebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVilleFacebook- Beyond Friends, Family and FarmVille
Facebook- Beyond Friends, Family and FarmVille
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 

More from jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
jakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
jakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
jakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
jakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
jakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
jakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
jakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
jakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
jakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
jakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
jakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
jakehofman
 

More from jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Learning from Web Activity

  • 1. Learning from Web Activity Jake Hofman Yahoo! Research November 18, 2010 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 1 / 33
  • 2. Outline 1 Agenda: Just enough philosophy 2 Case study: Demographic diversity on the Web 3 Conclusion: Lessons learned Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 2 / 33
  • 3. Agenda Size (only kind of) matters Big Data Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 4. Agenda Size (only kind of) matters Big Data Lots of data means lots to learn (from) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 5. Agenda Size (only kind of) matters Big Data But the “big” part isn’t intrinsically interesting (although large sample sizes are always good) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 6. Agenda Size (only kind of) matters Big Data Regardless of size, it’s really about “data jeopardy” (To what question are these data the answer?) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 3 / 33
  • 7. Agenda Tools Data tools: • Shell scripting & Python Munging, Glue • R Modeling, Visualization Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
  • 8. Agenda Tools Big Data tools: • Hadoop & Pig Filtering, Aggregating • Shell scripting & Python Munging, Glue • R Modeling, Visualization Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 4 / 33
  • 9. Agenda The clean real story “We have a habit in writing articles published in scientific journals to make the work as finished as possible, to cover all the tracks, to not worry about the blind alleys or to describe how you had the wrong idea first, and so on. So there isn’t any place to publish, in a dignified manner, what you actually did in order to get to do the work ...” -Richard Feynman Nobel Lecture1, 1965 1 http://bit.ly/feynmannobel Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 5 / 33
  • 10. Outline 1 Agenda: Just enough philosophy 2 Case study: Demographic diversity on the Web 3 Conclusion: Lessons learned Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 6 / 33
  • 11. Demographic diversity on the Web The clean story (covering our tracks) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 7 / 33
  • 12. Demographic diversity on the Web with Irmak Sirer and Sharad Goel How diverse is the Web? To what extent do online experiences vary across demographic groups? Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 8 / 33
  • 13. Diversity of the Web Data • Representative sample of 265,000 individuals in the US, paid via the Nielsen MegaPanel2 • Log of anonymized, complete browsing activity from June 2009 through May 2010 (URLs viewed, timestamps, etc.) • Detailed individual and household demographic information (age, education, income, race, sex, etc.) 2 http://bit.ly/nielsenonline Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 9 / 33
  • 14. Diversity of the Web Data • Transform all demographic attributes to binary variables e.g., Age → Over/Under 25, Race → White/Non-White, Sex → Female/Male • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com → yahoo.com, us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com • Restrict to top 100k most popular sites • Aggregate activity at the site, group, and user levels Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 10 / 33
  • 15. Diversity of the Web Pig to the rescue Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 11 / 33
  • 16. Diversity of the Web Site-level skew How diverse are site audiences? • For each site and attribute, calculate the skew in visitors (e.g., 93% of pageviews on foxnews.com are by White users) • For each attribute, plot the distribution of visitor skew across all sites Proportion White Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 12 / 33
  • 17. Diversity of the Web Site-level skew Proportion Female Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Proportion White VisitorsDensity 0.0 0.2 0.4 0.6 0.8 1.0 Proportion College Educated Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Adult Visitors Density 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Visitors With Household Incomes Under $50,000 Density 0.0 0.2 0.4 0.6 0.8 1.0 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 13 / 33
  • 18. Diversity of the Web Site-level skew Many sites have skew close the average, but there also popular, highly-skewed sites Greater Than 90% Less Than 10% Female youravon.com collectionsetc.com coveritlive.com needlive.com White foxnews.com wunderground.com blackplanet.com mediatakeout.com College Educated news.google.com nytimes.com slumz.boxden.com sythe.com Over 25 Years Old mail.yahoo.com apps.facebook.com nanowrimo.org cbox.ws Household Income Under $50,000 scarleteen.com boards.adultswim.com opentable.com marketwatch.com Table 1: A selection of popular sites that are homogeneous along various demographic dimensions. ilyPer−CapitaPageviews 20 30 40 50 60 70 ! ! ! !Non−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 visually apparent from Figure 5, there are significant differ- ences in how groups distribute their time on the web. These differences—which, as mentioned above, hold for highly fre- quented sites such as Facebook and YouTube—are in some cases even more pronounced for lower traffic sites. For in- stance, the gaming site pogo.com accounts for less than 1% of pageviews among both low and high income users, but low income users spend almost twice as much of their time there.Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
  • 19. Diversity of the Web Site-level skew Many sites have skew close the average, but there also popular, highly-skewed sites Greater Than 90% Less Than 10% Female youravon.com collectionsetc.com coveritlive.com needlive.com White foxnews.com wunderground.com blackplanet.com mediatakeout.com College Educated news.google.com nytimes.com slumz.boxden.com sythe.com Over 25 Years Old mail.yahoo.com apps.facebook.com nanowrimo.org cbox.ws Household Income Under $50,000 scarleteen.com boards.adultswim.com opentable.com marketwatch.com Table 1: A selection of popular sites that are homogeneous along various demographic dimensions. ilyPer−CapitaPageviews 20 30 40 50 60 70 ! ! ! !Non−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 visually apparent from Figure 5, there are significant differ- ences in how groups distribute their time on the web. These differences—which, as mentioned above, hold for highly fre- quented sites such as Facebook and YouTube—are in some cases even more pronounced for lower traffic sites. For in- stance, the gaming site pogo.com accounts for less than 1% of pageviews among both low and high income users, but low income users spend almost twice as much of their time there. This skew persists even when we restrict attention to the top 10k or 1k sites Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 14 / 33
  • 20. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 21. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs As expected, neighborhoods are more gender-balanced than sites Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 22. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs But sites typically have more racially diverse audiences than neighborhoods have residents Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 23. Diversity of the Web Sites vs. ZIPs How do diversity of the online and offline worlds compare? Proportion Female Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion White Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Proportion College Educated Density 0.0 0.2 0.4 0.6 0.8 1.0 Sites ZIPs Skew by education is comparable, with online showing a bias towards higher education Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 15 / 33
  • 24. Diversity of the Web Group-level activity How does browsing activity vary at the group level? DailyPer−CapitaPageviews 0 10 20 30 40 50 60 70 q q q qNon−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 Race Education Sex Age Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
  • 25. Diversity of the Web Group-level activity How does browsing activity vary at the group level? DailyPer−CapitaPageviews 0 10 20 30 40 50 60 70 q q q qNon−White Male Non−White Male No College Under 25 No College Under 25 White Female White FemaleCollege Over 25 College Over 25 Race Education Sex Age Large differences exist even at the aggregate level (e.g. women on average generate 40% more pageviews than men) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 16 / 33
  • 26. Diversity of the Web Group-level activity All groups spend more than a third of their time on a handful of email, search, and social networking sites PercentofTotalTimeSpentonSite 0.1% 1% 10% facebook.com m ail.yahoo.com google.com apps.facebook.com m ail.google.com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com view m orepics.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google.com hom e.m yspace.com m ail.com cast.net bing.com w w w .yahoo.com cgi.ebay.com espn.go.com m essaging.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ikipedia.org login.yahoo.com facebook.m afiawars.com m y.yahoo.com gam e3.pogo.com friends.m yspace.com tagged.com w orldw inner.com m eebo.com login.live.com m ypoints.com m aps.google.com aol.com pogo.com m w m s.zynga.com new s.yahoo.com w inster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alotm etrics.com female male Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
  • 27. Diversity of the Web Group-level activity But different groups distribute their time differently, both on universally popular and on more niche sites PercentofTotalTimeSpentonSite 0.1% 1% 10% facebook.com m ail.yahoo.com google.com apps.facebook.com m ail.google.com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com view m orepics.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google.com hom e.m yspace.com m ail.com cast.net bing.com w w w .yahoo.com cgi.ebay.com espn.go.com m essaging.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ikipedia.org login.yahoo.com facebook.m afiawars.com m y.yahoo.com gam e3.pogo.com friends.m yspace.com tagged.com w orldw inner.com m eebo.com login.live.com m ypoints.com m aps.google.com aol.com pogo.com m w m s.zynga.com new s.yahoo.com w inster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alotm etrics.com female male Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
  • 28. Diversity of the Web Group-level activity But different groups distribute their time differently, both on universally popular and on more niche sites PercentofTotalTimeSpentonSite 0.1% 1% 10% facebook.com m ail.yahoo.com google.com apps.facebook.com m ail.google.com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com view m orepics.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google.com hom e.m yspace.com m ail.com cast.net bing.com w w w .yahoo.com cgi.ebay.com espn.go.com m essaging.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ikipedia.org login.yahoo.com facebook.m afiawars.com m y.yahoo.com gam e3.pogo.com friends.m yspace.com tagged.com w orldw inner.com m eebo.com login.live.com m ypoints.com m aps.google.com aol.com pogo.com m w m s.zynga.com new s.yahoo.com w inster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alotm etrics.com white non.white Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 17 / 33
  • 29. PercentofTotalTimeSpentonSite 0.1% 1% 10% 0.1% 1% 10% 0.1% 1% 10% 0.1% 1% 10% 0.1% 1% 10% facebook.com m ail.yahoo.com google .com apps.facebook.com m ail.google .com m ail.live.com youtube.com w ebm ail.aol.com m w fb.zynga.com channel.facebook.com vie w m orepic s.m yspace.com search.yahoo.com m yspace.com m sn.com am azon.com shop.ebay.com yahoo.com im ages.google .com hom e.m yspace.com m ail.com cast.net bin g.com w w w .yahoo.com cgi.ebay.com espn.go.com m essagin g.m yspace.com tw itter.com cim .m eebo.com m y.ebay.com en.w ik ip edia .org lo gin .yahoo.com facebook.m afia wars.com m y.yahoo.com gam e3.pogo.com frie nds.m yspace.com tagged.com w orld w in ner.com m eebo.com lo gin .live.com m ypoin ts.com m aps.google .comaol.com pogo.com m w m s.zynga.com new s.yahoo.com w in ster.com netflix.com fantasysports.yahoo.com search.aol.com com cast.net alo tm etric s.com AgeSexRaceEducationIncome Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 18 / 33
  • 30. Diversity of the Web Individual-level prediction How well can one predict an individual’s demographics from their browsing activity? • Represent each user by the set of sites visited • Fit linear models to predict majority/minority for each attribute on 80% of users • Tune model parameters using a 10% validation set • Evaluate final performance on held-out 10% test set Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 19 / 33
  • 31. Diversity of the Web GNU-fu Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 20 / 33
  • 32. Diversity of the Web Individual-level prediction • Reasonable (∼70-85%) accuracy and AUC across all attributes • Similar performance even when restricted to top 1k sites • Can achieve substantially better performance when restricted to “stereotypical” users (∼80-90%) College/No College Under/Over $50,000 Household Income White/Non−White Female/Male Over/Under 25 Years Old AUC q q q q q .5 .6 .7 .8 .9 1 Accuracy q q q q q .5 .6 .7 .8 .9 1 Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 21 / 33
  • 33. Diversity of the Web Individual-level prediction Highly-weighted sites under the fitted models Large positive weight Large negative weight Female winster.com lancome-usa.com sports.yahoo.com espn.go.com White marlboro.com cmt.com mediatakeout.com bet.com College Educated news.yahoo.com linkedin.com youtube.com myspace.com Over 25 Years Old evite.com classmates.com addictinggames.com youtube.com Household Income Under $50,000 eharmony.com tracfone.com rownine.com matrixdirect.com Table 2: A selection of the most predictive (i.e., most highly weighted) sites for each classification task. College/No College Under/Over $50,000 Household Income White/Non−White Female/Male Over/Under 25 Years Old AUC ! ! ! ! ! Accuracy ! ! ! ! ! Figure 7, a measure that effectively re-normalizes the ma- jority and minority classes to have equal size. Intuitively, AUC is the probability that a model scores a randomly se- lected positive example higher than a randomly selected neg- ative one (e.g., the probability that the model correctly dis- tinguishes between a randomly selected female and male). Though an uninformative rule would correctly discriminateJake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 22 / 33
  • 34. Diversity of the Web Individual-level prediction Proof of concept browser demo Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
  • 35. Diversity of the Web Individual-level prediction Proof of concept browser demo Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 23 / 33
  • 36. Diversity of the Web The real story (what we actually did) Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 24 / 33
  • 37. Diversity on the Web The real story • Got several hundred GBs of MegaPanel data from Nielsen3 3 Special thanks to Mainak Mazumdar Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
  • 38. Diversity on the Web The real story • Got several hundred GBs of MegaPanel data from Nielsen3 • Discussed possible projects • Predict user demographics (e.g. real-valued age) from a few minutes of browsing activity for ad-targeting? • Infer the number of individuals using the same browser or behind the same ip? • Determine number of actual uniques advertisers are receiving? • . . . 3 Special thanks to Mainak Mazumdar Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 25 / 33
  • 39. Diversity on the Web The real story (cont’d) • Started with predicting real-valued age • Worked on this for an embarassingly long time (various methods, feature selection, etc.) • Turns out to be difficult to do better than within 10 years of true age, on average • Settled for classification on binary outcomes (e.g., adult/non-adult) over entire history • Classification worked reasonably well for age and other attributes Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 26 / 33
  • 40. Diversity on the Web The real story (cont’d) • Became curious about why classification worked well compared to regression • Generated descriptive statistics across all attributes at the site and group levels • Compared site statistics to ZIP code data from the US Census • Compared time distribution across groups • Realized that we now had the largest comprehensive study of demographic diversity on the web Jake Hofman (@jakehofman) Learning from Web Activity TimesOpen, 2010.11.18 27 / 33