11. “The distinguishing thing about
the paranoid style is not that its
exponents see conspiracies or
plots here and there in history,
but that they regard a vast' or
gigantic' conspiracy as the
"motive force" in historical
events...The paranoid spokesman
sees the fate of this conspiracy in
apocalyptic terms--he traffics in
the birth and death of whole
worlds, whole political orders,
whole systems of human values.
He is always manning the
barricades of civilization.”
19. Emergent properties found in a very well read texts,
such as the character type “extremist agent of the law”
35. The NRC emotion lexicon is a list of
words and their associations with eight
emotions (anger, fear, anticipation,
trust, surprise, sadness, joy, and
disgust) and two sentiments (negative
and positive).
40. - MIT Professor Eric von Hippel
“This is really the biggest paradigm shift in innovation since the Industrial Revolution”
Crowdsourcing brings widely distributed
wisdom to process of text analysis
61. Version 1 Gnip PowerTrack Rule
i (hate OR fear OR loathe OR despise OR
dislike OR abhor OR aversion OR afraid OR
scared OR dismay OR dread OR horror OR
alarm OR frightened OR frightful OR
horrified OR terrified) "american politics"
lang:en -is:retweet
First Person Tweet Collection
62. Version 2 Gnip PowerTrack Rule
(hate OR fear OR loathe OR despise OR
dislike OR abhor OR aversion OR afraid OR
scared OR dismay OR dread OR horror OR
alarm OR frightened OR frightful OR
horrified OR terrified) "american politics"
lang:en -is:retweet
First Person Tweet Collection
63. First Person Tweet Collection
Version 3 Gnip PowerTrack Rule
("i fear" OR "i am afraid" OR "i'm scared"
OR "i am scared" OR "i am worried" OR
"i'm worried" OR "i dread" OR "i am
horrified" OR "i worry" OR "i feel afraid"
OR "i feel scared" OR "i am terrified" OR "i
feel terrified" OR "i feel worried" OR
"worries me" OR "scares me" OR
"frightens me" OR "horrifies me" OR
"terrifies me") (trump) lang:en -is:retweet
64. Version 4 Gnip PowerTrack Rule
("i fear" OR "i am afraid" OR "i'm scared" OR
"i am scared" OR "i am worried" OR "i'm
worried" OR "i dread" OR "i am horrified"
OR "i worry" OR "i feel afraid" OR "i feel
scared" OR "i am terrified" OR "i feel
terrified" OR "i feel worried" OR "worries
me" OR "scares me" OR "frightens me" OR
"horrifies me" OR "terrifies me") (libtard OR
democrat OR liberal) lang:en -is:retweet
First Person Tweet Collection
65. Version 5 Gnip PowerTrack Rule
("i fear" OR "i am afraid" OR "i'm scared" OR
"i am scared" OR "i am worried" OR "i'm
worried" OR "i dread" OR "i am horrified" OR
"i worry" OR "i feel afraid" OR "i feel scared"
OR "i am terrified" OR "i feel terrified" OR "i
feel worried" OR "worries me" OR "scares
me" OR "frightens me" OR "horrifies me" OR
"terrifies me") (libtards OR democrats OR
liberals) lang:en -is:retweet
First Person Tweet Collection
76. Checking Inter-Rater Reliability
•We conducted four reliability checks
•Datasets were 200, 200, 100, and 200 items
•We used between 6 & 12 coders
•Fleiss’ kappa = .76, .91, .80, and .85
77. Checking Validity
•We conducted regular validity checks
•Thousands of observations were validated
•Very few invalid observations overall
•Invalid observations not used for training
•Better quality training data
•The “gold standard”
•Better understanding in the 50+ page codebook
78. • We rank coders all the time.
• CoderRank is the notion that for any
annotation task, simple to complex,
there is a range of human aptitude.
• A small number of coders are fantastic.
• Surprisingly small at times.
• A larger number is awful.
• Especially for hard tasks!
• Most are average.
• ~65-85% valid.
CoderRankSM
88. Dr. Stuart W. Shulman
Founder & CEO, Texifter, LLC
Editor Emeritus, Journal of Information Technology & Politics
stu@texifter.com
@stuartwshulman
Thank-you for having me!