Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Relative Trends in Scientific Terms on Twitter

848 views

Published on

  • Be the first to comment

Relative Trends in Scientific Terms on Twitter

  1. 1. Rela%ve
Trends
in
Scien%fic
 Terms
on
Twi4er

 Victoria
Uren,
Aba‐Sah
Dadzie
 
The
OAK
Group,
Dept.
of
Computer
Science,
The
University
of
Sheffield

  2. 2. Introduc%on
 •  scien%fic
research
tradi%onally
disseminated
via
journals,
books,
 scien%fic
conferences
 •  new
form
of
discourse
–
online
social
media

 –  suitable
forum
for
dissemina%ng
scien%fic
research?
 –  do
scien%sts
engage
with
online
social
media?
 –  are
there
sufficient
amounts
of
informa%on
on
scien%fic
topics?
 •  are
there
suitable
metrics
for
measuring
scien%fic
impact
online?
 –  between
scien%sts?
 –  for
public
engagement?
 •  are
these
new
measures
comparable
to
formal
metrics?
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  3. 3. Outline
 •  Aims/Introduc%on
 •  Related
Work
 •  Experiment
 –  Data
 –  Analysis
&
Results
 •  Conclusions

 •  Next
Steps
 •  Acknowledgements
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  4. 4. Outline
 •  Aims/Introduc%on
 •  Related
Work
 •  Experiment
 –  Data
 –  Analysis
&
Results
 •  Conclusions

 •  Next
Steps
 •  Acknowledgements
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  5. 5. Related
Work
 •  Garfield,
E.
(from
1950s)
 –  father
of
scientometrics

 •  Priem
et
al.
(2010)

 –  Scientometrics
2.0
as
a
new
metric
for
measuring
scholarly
impact
on
social
web
 
 •  Lane
(2010)

 –  need
to
improve
metrics
used
to
measure
scien%fic
impact

 •  Michel
et
al.
(2011)
 –  Google
nGrams
to
analyse
culture
 –  a.o.,
recognised
fame
for
scien%sts
low…
 •  Cheong
et
al.
(2009)

 –  H1N1
spike
(trend)
detected
on
Twi4er
during
flu
pandemic
(May
2009)
 •  Rowe
et
al.
(2011)
 –  influence
of
content
and
author
features
on
predic%on
of
ac%ve,
long
term
 discussions
on
social
web
 •  Kinsella
et
al.
(2011)
 –  using
hyperlinked
metadata
to
aid
categorisa%on
of
topics
discussed
in
online
 social
media
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  6. 6. Outline
 •  Aims/Introduc%on
 •  Related
Work
 •  Experiment
 –  Data
 –  Analysis
&
Results
 •  Conclusions

 •  Next
Steps
 •  Acknowledgements
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  7. 7. Experiment 
 •  exploratory
experiment

 –  to
determine
frequency
of
occurrence
of
scien%fic
term
usage
in
 online
social
media
 •  data
set
 –  three
sets
of
(scien%fic)
terms
selected
from
UNESCO
thesaurus
 –  Google
Books
NGrams
corpus
used
as
a
baseline
 –  300
tweets
collected
in
each
sample,
using
Twi4er
API,
for
selected
 terms
 •  frequency/usage
analysis
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  8. 8. Outline
 •  Aims/Introduc%on
 •  Related
Work
 •  Experiment
 – Data
 –  Analysis
&
Results
 •  Conclusions

 •  Next
Steps
 •  Acknowledgements
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  9. 9. UNESCO
Thesaurus
1Gram
Terms
 
 Topic
 Terms

 Physical
Sciences

 Ioniza%on,
Electromagne%sm,
Crystallography

 Chemical
Sciences

 Phosphorus,
Alkalinity,
Microchemistry

 Earth
Sciences

 Permafrost,
Lithosphere,
Glaciology
 •  selec%on
criteria
 –  minimisa%on
of
noise
due
to
polysemy
 –  avoidance
of
scien%fic
terms
with
other
common/colloquial
usage
 –  terms
unique
to
a
par%cular
topic
 –  words
with
a
single
stem
 –  1Grams
only
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  10. 10. Baseline
Dataset
–
Google
1Grams 
 •  obtained
from
Google
Books
NGrams
corpus1

 •  total
NGrams
by
year
for
three
sets
of
terms
 –  2006
–
116,029

 –  2007
–
126,206

 –  2008
–
111,417
 •  annual
varia%on
by
topic
(of
total
NGrams
baseline
dataset)
 –  Chemical
Sciences

50‐60%

 –  Physical
Sciences



30‐40%

 –  Earth
Sciences








~
10%
 •  [1]
h4p://ngrams.googlelabs.com/datasets 

altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  11. 11. Baseline
Dataset
–
Google
1Grams 
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  12. 12. Twi4er
Dataset 
 Sample
ID
 CollecAon
Period

 Elapsed
Time
(h)
 T‐300‐1

 Tue


Mar
01
20:56:43
GMT
2011
–

 41
 Thu


Mar
03
14:22:18
GMT
2011

 T‐300‐2
 Fri




Mar
04
02:35:55
GMT
2011
–
 64
 
 Sun


Mar
06
18:38:05
GMT
2011

 T‐300‐3
 Mon
Mar
07
20:31:11
GMT
2011
–
 44
 
 Wed
Mar
09
16:21:36
GMT
2011
 •  three
samples
collected,
containing
300
consecu%ve
tweets
each
 •  ~
0.003%
of
total
tweets
over
collec%on
period

altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  13. 13. Outline
 •  Aims/Introduc%on
 •  Related
Work
 •  Experiment
 –  Data
 – Analysis
&
Results
 •  Conclusions

 •  Next
Steps
 •  Acknowledgements
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  14. 14. Twi4er
c.f.
Google
NGrams 
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  15. 15. Twi4er
c.f.
Google
NGrams 
 •  higher
varia%on
in
distribu%on
for
Twi4er
sample
 –  however
largely
in
line
with
Google
NGrams
 •  can
Google
NGrams
serve
as
a
suitable
baseline?
 –  need
to
more
closely
examine
varia%on…
 •  notable
peaks
in
Twi4er
sample
for
three
terms
 –  Permafrost
(Earth
Sciences)
 –  Alkalinity
(Chemical
Sciences)

 –  Phosphorus
(Chemical
Sciences)

 •  are
these
poten%al
trends?
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  16. 16. Twi4er
c.f.
Google
NGrams 
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  17. 17. Twi4er
c.f.
Google
NGrams 
 •  Permafrost

 –  17%
and
15%
in
Twi4er
samples
(T‐300‐1
&
2)
–
c.f.
5%
in
G‐2006‐2008
 
 –  41
out
of
113
tweets
(36%)
used
in
scien%fic
context

 –  large
number
of
tweets
referred
to
 •  online
game
server1

 •  designer
case
for
iPhone
 •  Alkalinity
 –  none
found
to
have
scien%fic
content

 –  mostly
used
in
pseudo‐scien%fic
health
advice
 –  peak
in
T‐300‐2
(31
out
of
60
tweets
–
~50%)

 •  dominated
by

pH
measures
in
swimming
pools
&
fish
tanks

 •  influence
probably
due
to
collec%on
period
–
weekend
–
engagement
in
 leisure
ac%vi%es

 •  [1]
h4p://www.everquest2.com/Permafrost
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  18. 18. Example
Tweets
–
Permafrost

 •  advert/chat
 –  @HDNinjacp
go
to
Permafrost
Its
never
full
:
Fri
Mar
04
05:21:02
GMT
2011

 –  @Riffy8888
hey
Could
you
Come
to
my
Party
birthday
Party
on
CP
March
13
Server
 Permafrost
Dock
6:00PST
:
Sun
Mar
06
04:37:13
GMT
2011

 –  Party
Server
Permafrost
Dock
Please
Go
Its
An
Early
Birthday
Party
For
me
:
Thu
Mar
03
 01:38:28
GMT
2011

 •  cold
 –  36
inches
of
permafrost
s%ll,
I
want
to
stake
my
bird
condo
b4
the
squirrals
knock
it
 down
again..bas%ds..all
ofm
:
Sat
Mar
05
01:51:48
GMT
2011

 •  science
 –  Fire
and
Ice:
Permafrost
Melt
Spews
Combus%ble
Methane
h4p://%ny.ly/be8q
:
Fri
Mar
 04
16:43:10
GMT
2011

 –  (retweeted)
‐
Experts
Monitor
Methane
Release
from
Permafrost:
Over
the
past
few
 years,
methane
levels
around
the
world
have
b...
h4p://bit.ly/hvVEJX
:
Wed
Mar
02
 12:27:25
GMT
2011

 –  RT
@NetNewsBuzz:
Permafrost
Melt
Soon
Irreversible
Without
Major
Fossil
Fuel
Cuts
 h4p://%nyurl.com/5w8w2oh
#oil
#climate
#CO2
#fossilfuels
:
Thu
Mar
03
02:57:48
GMT
 2011

altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  19. 19. Example
Tweets
–
Alkalinity
T‐300‐2
 •  Chemistry
Help
Needed!
pH,
concentra%on
of
carbonate
species
and
 alkalinity...
just
got
published:
h4p://bit.ly/hUCpz7

 –  URL
points
to
the
ques%on
on
“My
Chemistry
Tutor”
–
homework?
 •  retweeted
 –  The
proper
total
alkalinity
for
your
pool
is
100
ppm.
h4p://su.pr/8hrxCE
:
Fri
Mar
 04
19:02:20
GMT
2011

 –  If
the
Total
Alkalinity
in
your
swimming
pool
is
low,
your
pH
will
be
low.
h4p:// su.pr/8hrxCE
:
Fri
Mar
04
20:34:11
GMT
2011

 •  spam/adverts
(including
retweets)
 –  @Poet_Carl_Wa4s:
some
foods
create
acidity
or
alkalinity
ayer
they‚Äôre
 metabolized...h4p://ping.fm/GQTvA
#KnowledgeIsPower!
:
Sat
Mar
05
02:38:55
 GMT
2011

 –  
RT
@CourtneyPool:
Green
juice,
oh
Liquid
Emerald
Elixir
of
Life
and
Alkalinity!
 Course
through
my
BODY!
#juicing
:
Sun
Mar
06
18:34:29
GMT
2011

altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  20. 20. Twi4er
c.f.
Google
NGrams:
Phosphorus 
Sample
 Total
 LegislaAon
 NutriAon
 
 
 
 Other
 Industry
 
 White
 ID
 Sciences
 
 Phosphorus 
T‐300‐1
 
 129
 
 46
 
 16
 
 29
 
 4
 5
T‐300‐2
 119
 
 4
 26
 
 35
 
 9
 5
T‐300‐3
 171
 
 12
 
 23
 
 37
 
 42
 
 19
 
 •  Twi4er
trends
for
Phosphorus
in
sample
T‐300‐3

 –  Industry

 •  takeover
of
a
Brazilian
company
by
the
Indian
firm
United
Phosphorus

 –  White
Phosphorus
 •  17
retweets
of
an
emo%ve
message
(rela%on
to
Middle
East
wars)
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  21. 21. Twi4er
c.f.
Google
NGrams:
Phosphorus 
 •  usage
largely
with
scien%fic
content
 –  with
rela%onships,
a.o.,
to
legal,
nutri%onal
&
economic
context
 –  five
main
categories
iden%fied
 •  Legisla%on

 –  limits
to
use
in
fer%liser,
soap

 •  Nutri%on
 –  phosphorus
content

 •  Other
Science

 –  peak
phosphorus,
pollu%on
 –  discovery
of
arsenic
replacing
phosphorus
in
a
microbe

 –  tweets
about
new
paper
on
Redfield
ra%o
in
organisms
 •  Industry

 –  mergers,
prices
of
Phosphorus‐containing
goods
 •  White
Phosphorus

 –  use
in
Middle
East
wars

altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  22. 22. Example
Tweets
–
Phosphorus 
 •  Legisla%on


 –  RT
@YarnPlayCafe:
The
fact
that
he
wants
to
repeal
the
phosphorus
ban
and
kill
the
Madison
 lakes
is,
by
itself,
enough
to
#killthisbill

...
:
Tue
Mar
08
02:04:49
GMT
2011

 •  Nutri%on
 –  Big,
wet
snowflakes
driy
over
the
farm.
To
warm
up,
I
try
some
Horlicks,
a
wheat/barley/whey
 drink
with
lots
of
calcium
&
phosphorus.
Mmmm.
:
Tue
Mar
01
20:56:43
GMT
2011

 –  Vitamin
D
acts
as
an
hormone
and
plays
a
controlling
role
in
the
metabolism
of
calcium
and
 phosphorus
:
Sun
Mar
06
12:12:36
GMT
2011
 •  
Other
Science

 –  
[java]
129
:
Greater
Phosphorus
Efficiency
h4p://bit.ly/iehsmK
#agriculture
:
Wed
Mar
02
 14:36:21
GMT
2011

 •  Industry

 –  #stocks
#bse
#nse
Buy
United
Phosphorus
‐
posi%ve
move
to
tap
largest
La%n
American
market;
 Edelweiss
h4p://dlvr.it/JdSpV
:
Tue
Mar
08
17:22:55
GMT
2011

 –  Enshi
:
Wugang
develops
technique
to
handle
high‐phosphorus
iron
ore
‐
Steel
Business
Briefing
 (subscri
h4p://uxp.in/30538045
:
Tue
Mar
08
09:33:05
GMT
2011

 •  
White
Phosphorus

 –  Dear
America,
your
white
phosphorus
and
depleted
uranium
can
not
stop
the
growth
of
Iraqs
 future.
Iraq
Will
Rise.
:
Wed
Mar
02
07:49:21
GMT
2011

 –  @Remroum
so
first
they
steal
our
land,
now
they
want
our
"tac%cs"
i.e.
poetry?
i
guess
the
white
 phosphorus
just
isnt
cu•ng
it
anymore.
:
Sat
Mar
05
03:42:44
GMT
2011

 •  ???
 –  @p_kojo
‐
Phosphorus
Potassium
‐
Pinocchio
,
Im
so
glad
we
found
each
other
nw
we
can
hav
 lots
of
fun
:)
:
Sun
Mar
06
13:43:10
GMT
2011

altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  23. 23. Outline
 •  Aims/Introduc%on
 •  Related
Work
 •  Experiment
 –  Data
 –  Analysis
&
Results
 •  Conclusions

 •  Next
Steps
 •  Acknowledgements
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  24. 24. Conclusions
–
Experiment 
 •  recognised
challenges
 –  baseline
corpus
for
online
social
media
difficult
to
obtain
 •  very
small
(rela%vely)
samples
found
in
Twi4er
stream
 •  difficult
to
obtain
representa%ve
samples

   more
effec%ve
methods
required
to
extract
lower
frequency
terms
 –  difficulty
reproducing
experiments
 –  reliability,
ethical
&
privacy
issues
–
due
to
user‐created
content
 •  what
is
a
suitable,
publicly
available
baseline
corpus?
 –  Google
NGrams?
 •  different
informa%on
collec%on
methods
from
online
social
media
 –  coverage
of
topics
may
see
large
varia%on
between
corpora
 –  any
others?
 •  Wikipedia/DBpedia?
TREC?
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  25. 25. Engagement
with
the
Web?
•  why
do
scien%sts
not
tweet?
(or
engage
much
in
other
social
media)?
 –  is
the
web
not
seen
to
enforce
sufficient
scien%fic
rigour?
 –  do
scien%sts
not
view
the
web
as
a
poten%al
audience?
•  is
the
web
audience
a
suitable
peer
reviewer?
•  why
do
scien%sts
hesitate
to
disseminate
informa%on
online?
 –  poten%al
for
ideas
to
be
stolen?
 –  trust
–
how
to
differen%ate
between
valid
science
and
pseudo‐science,
 spam
and
adverts?
•  social
media
largely
driven
by
personal
interest,
sen%ment,
opinion
 –  may
explain
low
scien%fic
content
 –  more
colloquial
use
of
what
is
tradi%onally
scien%fic
terminology
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  26. 26. Implica%ons
for
Altmetrics 
 •  however
‐
some
level
of
scien%fic
discourse
on
Twi4er
 –  e.g.,
Phosphorus
iden%fied
as
a
poten%al
Twi4er
trend
 •  online
social
media
may
s%ll
have
poten%al
to
serve
as
an
altmetric
 for
measuring
impact
of
science
 •  star%ng
from
scientometrics
‐
which
looks
at
author
features,
e.g.,

 –  co‐cita%on
 –  affilia%on
–
rela%onship
to
reputa%on
 •  corresponding
features
in
online
social
media
 –  followers

 –  retweets
–
rela%onship
to
trust?
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  27. 27. Outline
 •  Aims/Introduc%on
 •  Related
Work
 •  Experiment
 –  Data
 –  Analysis
&
Results
 •  Conclusions

 •  Next
Steps
 •  Acknowledgements
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  28. 28. Next
Steps 
•  replicate
experiments
with
larger
samples
over
longer
period
 –  more
detailed
analysis
 •  e.g.,
hashtag
analysis;
urls
within
tweets
 •  focus
on
terms
with
more
trending
poten%al,
e.g.,
nanostructures,
nanosilver

•  consider
specific
tweets
 –  from
scien%fic
media
and
journals
 –  posted
during
scien%fic
conferences,
congresses
•  comparison
with
other
independent
baseline
data
sets
•  compare
Twi4er
use
within
different
disciplines
 –  influence
of
interdisciplinary
collabora%on
on
use
of
online
social
media?
•  create
new
benchmarks
data
&
experiments
  define
alt‐metric
for
scien%fic
term
usage
in
online
social
media
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  29. 29. Acknowledgements 
 •  Elizabeth
Cano
for
discussions
on
collec%on
and
use
of
data
from
 Twi4er
streams
 •  V.S.
Uren
&
A.‐S.
Dadzie
funded
by:
 –  European
Commission
7th
Framework
Programme
project
 SmartProducts
(grant
no.
231204)
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web

  30. 30. References 
 •  Garfield
bib
‐
h4p://garfield.library.upenn.edu/pub.html
 •  Ma4hew
Rowe,
Sofia
Angeletou
and
Harith
Alani.
(2011)
Predic%ng
Discussions
on
 the
Social
Seman%c
Web,
Proc.,
ESWC
(2)
2011:
405‐420
 •  Sheila
Kinsella,
Mengjiao
Wang,
John
Breslin
and
Conor
Hayes.
(2011)
Improving
 Categorisa%on
in
Social
Media
using
Hyperlinks
to
Structured
Data
Sources,
Proc.,
 ESWC
(2)
2011:

390–404
 •  others
in
paper
references
–
see
h4p://altmetrics.org/altmetrics11/uren‐v0
altmetrics11:
Tracking
scholarly
impact
on
the
social
Web


×