Ed H. Chi, Palo Alto Research Center
Large-Scale Social Analytics in Wikipedia, Delicious, and Twitter
Abstract
We will illustrate an analytical research approach in social computing. Our research in Augmented Social Cognition is aimed at enhancing the ability of a group of people to remember, think, and reason. The drive to build models and theories for social computing research should further our understanding of how network science, behavioral economics, and evolutionary theories could explain how social systems work. Here we will summarize the published research we conducted on large-scale social analytics in Wikipedia, Delicious, and Twitter, and point out how social analytics can help us understand the intricacies of large social systems.
About the Speaker
Ed H. Chi is area manager and principal scientist at Palo Alto Research Center's Augmented Social Cognition Group. He leads the group in understanding how Web2.0 and Social Computing systems help groups of people to remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota, and has been doing research on user interface software systems since 1993. He has been featured and quoted in the press, such as the Economist, Time Magazine, LA Times, and the Associated Press. With 20 patents and over 70 research articles, he has won awards for both teaching and research. In his spare time, Ed is an avid Taekwondo martial artist, photographer, and snowboarder.
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented at IBM NPUC 2010)
1. Image from: http://www.flickr.com/photos/ourcommon/480538715/
Ed
H.
Chi,
Principal
Scientist
and
Area
Manager
Peter
Pirolli,
Lichan
Hong
Bongwon
Suh,
Les
Nelson
Gregorio
Convertino,
Sharoda
Paul
Interns:
Sanjay
Kairam,
Jilin
Chen,
Brent
HectMichael
Bernstein
Alumni:
Raluca
Budiu,
Bryan
Pendleton,
Niki
Kittur,
Todd
Mytkowicz,
Terrell
Russell,
Brynn
Evans,
Bryan
Chan,
KMRC
students
Augmented
Social
Cognition
Area
Palo
Alto
Research
Center
2. 2010-10-22 IBM NPUC 2010 2
To:
chi@acm.org
From:
Brad
Barrish
<brad@…removed.for.privacy….com>
Subject:
Pancreatic
cancer
Date:
Thu,
1
Feb
2007
21:37:55
PST
Hey
Ed.
I'm
a
fellow
del.icio.us
user
and
noticed
you
bookmark
a
lot
of
pancreatic
cancer
stuff.
I'm
at
home
with
my
dad
who
was
diagnosed
a
little
over
a
year
ago
and
is
now
at
the
tale
end
of
things.
I've
learned
a
lot
through
his
treatments
and
about
what's
out
there.
I
dunno
if
it's
something
you
or
a
family
member
has,
but
just
wanted
to
drop
you
an
email.
Be
well.
Brad
3. Cognition:
the
ability
to
remember,
think,
and
reason;
the
faculty
of
knowing.
Social
Cognition:
the
ability
of
a
group
to
remember,
think,
and
reason;
the
construction
of
knowledge
structures
by
a
group.
– (not
quite
the
same
as
in
the
branch
of
psychology
that
studies
the
cognitive
processes
involved
in
social
interaction,
though
included)
Augmented
Social
Cognition:
Supported
by
systems,
the
enhancement
of
the
ability
of
a
group
to
remember,
think,
and
reason;
the
system-‐supported
construction
of
knowledge
structures
by
a
group.
Citation:
Chi,
IEEE
Computer,
Sept
2008
32010-10-22 IBM NPUC 2010
9. Spreading
Activation
in
a
bi-‐graph
Computation
over
a
very
large
data
set
– 150
Million+
bookmarks
Tags URLs
P(URL|Tag)
P(Tag|URL)
2010-10-22 9IBM NPUC 2010
12. What
drives
contributions
to
Wikipedia?
Conflicts
drives
most
of
the
contributions
to
Wikipedia.
– How
do
we
measure
conflicts?
Conflicts
cause
coordination
costs
to
go
up.
– Measuring
coordination
costs
2010-10-22 IBM NPUC 2010 12
21. Preferential
Attachment:
Edits
beget
edits
– more
number
of
previous
edits,
more
number
of
new
edits
Growth rate
of population
Current
population
Growth rate depends on:
N = current population
r = growth rate of the population
2010-10-22 21IBM NPUC 2010
!
dN
dt
= r " N
!
N(t) = N0 " ert
22. Ecological
population
growth
model
– Also
depend
on
environmental
conditions
– K,
carrying
capacity
(due
to
resource
limitation)
€
dN
dt
= rN(1−
N
K
)
2010-10-22 22IBM NPUC 2010
23. Follows
a
logistic
growth
curve
New Article
2010-10-22 23IBM NPUC 2010
24. Biological
system
– Competition
increases
as
population
hit
the
limits
of
the
ecology
– Advantage
go
to
members
of
the
population
that
have
competitive
dominance
over
others
Analogy
– Limited
opportunities
to
make
novel
contributions
– Increased
patterns
of
conflict
and
dominance
2010-10-22 24IBM NPUC 2010
33. wherever justin
wants me to be
User ID
71097545
User ID
77503970
Justin Biebers
heart!
User ID
134222427
Jonasbieberland3
Bieber IslandUser ID
91705969
34. n = 10,000 users with 5 or more tweets
All Twitter Users
35. n = 2,965 users with 5 or more tweets
Users w/ Informative Location
in the United States
43. Which
tweet
features
are
associated
with
retweet?
Retweet
Model
– #
Retweet
~
function(f1,
f2,
….,
fn),
where
fi
are
simple
features
extracted
from
a
tweet
74M
tweets
from
Twitter
Stream
API
– Characterization
– 2~3
%
sample
– Hadoop
/
Hbase
/
MapReduce
2010-10-22 43IBM NPUC 2010
44. #
Followees:
395
#
Followers:
1,400
#
Favorite:
1,657
#
Day:
(since
June
17,
2008)
#
Past
tweets:
21,000
Contextual
Features
URL
Hashtag
Mention
Content
Features
2010-10-22 44IBM NPUC 2010
Two
Types
of
Features
47. My Friends’
URLs
Popular URLs
Recommendation Algorithm:
Combining Sources and
Models
Recommendations
My Friends’ Network
and Tweeting Pattern
Social Ranking
Model
My Tweets
My Friends’
Tweets
Topic Relevance
Model
2010-10-22 47IBM NPUC 2010
48. Hadoop
Compute
Cluster
– 50
nodes,
depending
on
project
requirement
– ~40TB
storage
capacity
– Experience
with
Hbase,
Pig,
Interaction
with
Lucene,
MySQL
Large-‐scale
crawling
and
analytics
experience
with
– Wikipedia
(all
edits
up
to
2009)
– Delicious
data
set
(200M
bookmarks)
– Twitter
(70M+
Tweets)
Experience
with
Large
Scale
Social
Analytics
– Example
1:
Visual
analytics
in
Wikipedia
(wikidashboard.com)
– Example
2:
Search
engines
for
social
bookmarks
(mrtaggy.com)
– Example
3:
Recommenders
for
Twitter
news
(zerozero88.com)
2010-10-22 IBM NPUC 2010 48
50. Image from: http://www.flickr.com/photos/ourcommon/480538715/
Research
Vision:
Understand
how
social
computing
systems
can
enhance
the
ability
of
a
group
of
people
to
remember,
think,
and
reason.
Understand and support Collective Intelligence by
modeling social group behaviors and testing
prototype tools in Living Labs
http://asc-‐parc.blogspot.com
http://www.edchi.net
echi@parc.com