SMAC LAB, LSU

OCT 26, 2018
SMAC Talks
Telling Stories from Social Media Text 2
Instructor: Dr. Ke (Jenny) Jiang
Telling Stories from Social Media Text 1
Collect Text Data
Clean Text
Text Analysis
Visualization
Python, TCAT, Crimson Hexagon…
Remove Stop Words, Stemming
Replacing “/”, “@” and “|” with space
Convert the text to lower case
Remove punctuations
Frequency Analysis, Sentiment Analysis
Entity Detection, Topic Modeling
Word Clouds, Semantic Network Analysis
R, Gephi
Tell a Story
Step 1: Collect Text Data Using Crimson Hexagon
Export 10,000 posts
Step 2: Create/Load a text file
a. Set Working Directory — Directory for trump.csv
b. Get a Sample
Step 3: Create/Load a Text File
Output
Step 4: Install and load the Required Packages
Create a text Corpus
Step 5: Load the Data as a Corpus
Output
Replacing “/”, “@” and “|” with space
Step 6: Clean Text
Output
Convert the text to lower case
Step 6: Clean Text
Output
Remove numbers
Step 6: Clean Text
Output
Remove english common stop-words
Step 6: Clean Text
Output
Remove your own stop word
Step 6: Clean Text
Output
Remove punctuations
Step 6: Clean Text
Output
Eliminate Extra White Spaces
Step 6: Clean Text
Output
Parse CSV Files
ConText Tutorials
Stemming
ConText Tutorials
Frequency
ConText Tutorials
Bigram
ConText Tutorials
Topic Modeling
ConText Tutorials
Sentiment Analysis
ConText Tutorials
Word Clouds of Topics and Sentiment
ConText Tutorials
Word Clouds of Topics and Sentiment
ConText Tutorials
Entity Detection
ConText Tutorials
Results
ConText Tutorials
Semantic Network Analysis
Words are hierarchically clustered in memory
(Collins & Quillian, 1972), and thus spatial
models that illustrate the relations among
words are representative of meaning (See
Barnett & Woelfel, 1988).
Theoretical Foundation
Theoretical Foundation
Complex Associations
Salient Concepts
Jiang, K., Benefield, G., Yang, J. F., & Barnett, G. A. (2017). Mapping Articles on China in Wikipedia: An Inter-Language Semantic
Network Analysis. In the Proceeding of Hawaii International Conference on System Science (HICSS-50).
The extraction of the semantic network should
be restricted to the manifest content that
consists of fixed vocabulary
Latent meanings of the manifest content can be
inferred through the interpretation of patterns of
concept associations in the research context.
Characteristic
The most frequent words
Entities
Entities + Sentiment
Choice of Vocabulary
The principle of producing concept links of
semantic network is based on the measurement
of concept co-occurrence.
Co-occurrence Matrices
e.g. Semantic Networks of the Co-mention of Countries in News of Trump in 2017
Author: Ke Jiang
* Label Size: Frequency of Country Name Appeared in Titles of Trump News in NYT in 2017
* Link Weight: Number of Times Two Countries were Co-mentioned in Titles of Trump News in NYT in 2017
“Trump Calls for Closer Relationship Between U.S. and Russia”
Often, a concept pair can be given a connection
weight within certain unit of analysis equally
regardless of distance.
The unit of analysis can be a sentence, a
paragraph, an article (syntactical unit typical of
content analysis), 5 or 7 word sliding window.
Co-occurrence Matrices
Co-occurrence Matrices
social media giant silence million people
social 0 1 1 1 1 1
media 1 0 1 1 1 1
giant 1 1 0 1 1 1
silence 1 1 1 0 1 1
million 1 1 1 1 0 1
people 1 1 1 1 1 0
“Social Media Giants are silencing millions of people.”
social media giant silence million people
Co-occurrence Matrices
social media giant silence million people
social 0 1 1 1 1 0
media 1 0 1 1 1 1
giant 1 1 0 1 1 1
silence 1 1 1 0 1 1
million 1 1 1 1 0 1
people 0 1 1 1 1 0
“Social Media Giants are silencing millions of people.”
social media giant silence million people
The salience of the concept can be measured
through the analysis of concept centrality that
reflects the location and the importance of a
concept in relation with other concepts in the
network (Freeman, 1979; Wasserman & Faust,
1994).
Concept Centralities
Degree: the total number of direct links.
Betweenness: the extent to which a node lies on the shortest path
connecting others in the network (Freeman, 1979).
Closeness: the average distance that a node location is from all
others in the network (Freeman, 1979).
Eigenvector centrality: an indicator of a node’s overall centrality in
a network (Bonacich, 1972).
Concept Centralities
Jiang, K., Anderton, B. N., Ronald, P. C., & Barnett, G. A. (2018). Semantic Network Analysis Reveals Opposing Online Representations of the Search Term “GMO”.
Global Challenges, 2.
InDegree: the number of ties that a semantic object received
OutDegree: the number of ties that a semantic subject
initiated
Concept Centralities
The semantic subject can be the actor or object primarily
doing or causing something, while the semantic object is the
actor or object that it is done to (Dixon, 1991).
Directional Semantic Network Example
Jiang, K., Barnett, G. A., Taylor, L. D., & Feng, B. (2018). Dynamic Co-evolutions of Peace Frames in the United States, Mainland China, and Hong
Kong: A Semantic Network Analysis. In B. Cook (Eds.), Handbook of Research on Examining Global Peacemaking in the Digital Age. (pp. 145 -168).
Hershey, Pennsylvania: IGI Global.
In semantic networks, the association between
two concepts, A and B, can be defined as the
chance of reading about A given that one reads
about B in a random unit.
Concept Association
SMA focuses on examining the concept
associations by looking at the frequency with
which concepts co-occur or appear in close
proximity.
Concept Association
Jiang, K., Anderton, B. N., Ronald, P. C., & Barnett, G. A. (2018). Semantic Network Analysis Reveals Opposing Online Representations of the Search Term “GMO”. Global Challenges, 2.
Based on the results of semantic network
analysis, the dynamic evolution and co-evolution
of the semantic content can be tracked through
the analysis of semantic networks at different
points in time.
Co-evolution of Texts
Scholars can explore the correlation between
semantic networks at different times by
conducting QAP correlation analysis.
Co-evolution of Texts
0.1
0.19
0.28
0.37
0.46
0.55
0.64
0.73
0.82
0.91
1
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
u&h c&h u&c
While US & China US&HK, China & HK
The convergence of US and China’s News HK News’ flexibility and independence
0.820.460.47
0.74
0.536
0.513
0.842
0.439
0.4
Iraq War
Annapolis
Conference
China become the
World’s Second
Largest
Economies
0.885
0.1
Jiang, K., Barnett, G. A., Taylor, L. D., & Feng, B. (2018). Dynamic Co-evolutions of Peace Frames in the United States, Mainland China, and Hong Kong: A
Semantic Network Analysis. In B. Cook (Eds.), Handbook of Research on Examining Global Peacemaking in the Digital Age. (pp. 145 -168). Hershey,
Pennsylvania: IGI Global.
Word Co-Occurence Network
ConText Tutorials
Word Co-Occurence Network
ConText Tutorials
Word Co-Occurence Network
ConText Tutorials
The most frequent words
Entities
Entities + Sentiment
Codebook
The most frequent words
Codebook1
Location
Codebook2
Entities + Sentiment
Codebook3
Entities + Sentiment
codebook3_nodel_attributes
Word Co-Occurrence Network
ConText Tutorials
Word Co-Occurrence Network
ConText Tutorials
Word Co-Occurrence Network
ConText Tutorials
Word Co-Occurrence Network
ConText Tutorials
Word Co-Occurrence Network
ConText Tutorials
Top 50+Topics
Visualization
Location
Visualization
Entities + Sentiments
Visualization
Entities + Topic Model
Visualization

1026 telling story from text 2

  • 1.
    SMAC LAB, LSU OCT26, 2018 SMAC Talks Telling Stories from Social Media Text 2 Instructor: Dr. Ke (Jenny) Jiang
  • 2.
    Telling Stories fromSocial Media Text 1 Collect Text Data Clean Text Text Analysis Visualization Python, TCAT, Crimson Hexagon… Remove Stop Words, Stemming Replacing “/”, “@” and “|” with space Convert the text to lower case Remove punctuations Frequency Analysis, Sentiment Analysis Entity Detection, Topic Modeling Word Clouds, Semantic Network Analysis R, Gephi Tell a Story
  • 3.
    Step 1: CollectText Data Using Crimson Hexagon Export 10,000 posts
  • 4.
    Step 2: Create/Loada text file a. Set Working Directory — Directory for trump.csv
  • 5.
    b. Get aSample Step 3: Create/Load a Text File Output
  • 6.
    Step 4: Installand load the Required Packages
  • 7.
    Create a textCorpus Step 5: Load the Data as a Corpus Output
  • 8.
    Replacing “/”, “@”and “|” with space Step 6: Clean Text Output
  • 9.
    Convert the textto lower case Step 6: Clean Text Output
  • 10.
    Remove numbers Step 6:Clean Text Output
  • 11.
    Remove english commonstop-words Step 6: Clean Text Output
  • 12.
    Remove your ownstop word Step 6: Clean Text Output
  • 13.
    Remove punctuations Step 6:Clean Text Output
  • 14.
    Eliminate Extra WhiteSpaces Step 6: Clean Text Output
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Word Clouds ofTopics and Sentiment ConText Tutorials
  • 22.
    Word Clouds ofTopics and Sentiment ConText Tutorials
  • 23.
  • 24.
  • 25.
  • 26.
    Words are hierarchicallyclustered in memory (Collins & Quillian, 1972), and thus spatial models that illustrate the relations among words are representative of meaning (See Barnett & Woelfel, 1988). Theoretical Foundation
  • 27.
    Theoretical Foundation Complex Associations SalientConcepts Jiang, K., Benefield, G., Yang, J. F., & Barnett, G. A. (2017). Mapping Articles on China in Wikipedia: An Inter-Language Semantic Network Analysis. In the Proceeding of Hawaii International Conference on System Science (HICSS-50).
  • 28.
    The extraction ofthe semantic network should be restricted to the manifest content that consists of fixed vocabulary Latent meanings of the manifest content can be inferred through the interpretation of patterns of concept associations in the research context. Characteristic
  • 29.
    The most frequentwords Entities Entities + Sentiment Choice of Vocabulary
  • 30.
    The principle ofproducing concept links of semantic network is based on the measurement of concept co-occurrence. Co-occurrence Matrices
  • 31.
    e.g. Semantic Networksof the Co-mention of Countries in News of Trump in 2017 Author: Ke Jiang * Label Size: Frequency of Country Name Appeared in Titles of Trump News in NYT in 2017 * Link Weight: Number of Times Two Countries were Co-mentioned in Titles of Trump News in NYT in 2017 “Trump Calls for Closer Relationship Between U.S. and Russia”
  • 32.
    Often, a conceptpair can be given a connection weight within certain unit of analysis equally regardless of distance. The unit of analysis can be a sentence, a paragraph, an article (syntactical unit typical of content analysis), 5 or 7 word sliding window. Co-occurrence Matrices
  • 33.
    Co-occurrence Matrices social mediagiant silence million people social 0 1 1 1 1 1 media 1 0 1 1 1 1 giant 1 1 0 1 1 1 silence 1 1 1 0 1 1 million 1 1 1 1 0 1 people 1 1 1 1 1 0 “Social Media Giants are silencing millions of people.” social media giant silence million people
  • 34.
    Co-occurrence Matrices social mediagiant silence million people social 0 1 1 1 1 0 media 1 0 1 1 1 1 giant 1 1 0 1 1 1 silence 1 1 1 0 1 1 million 1 1 1 1 0 1 people 0 1 1 1 1 0 “Social Media Giants are silencing millions of people.” social media giant silence million people
  • 35.
    The salience ofthe concept can be measured through the analysis of concept centrality that reflects the location and the importance of a concept in relation with other concepts in the network (Freeman, 1979; Wasserman & Faust, 1994). Concept Centralities
  • 36.
    Degree: the totalnumber of direct links. Betweenness: the extent to which a node lies on the shortest path connecting others in the network (Freeman, 1979). Closeness: the average distance that a node location is from all others in the network (Freeman, 1979). Eigenvector centrality: an indicator of a node’s overall centrality in a network (Bonacich, 1972). Concept Centralities
  • 37.
    Jiang, K., Anderton,B. N., Ronald, P. C., & Barnett, G. A. (2018). Semantic Network Analysis Reveals Opposing Online Representations of the Search Term “GMO”. Global Challenges, 2.
  • 38.
    InDegree: the numberof ties that a semantic object received OutDegree: the number of ties that a semantic subject initiated Concept Centralities The semantic subject can be the actor or object primarily doing or causing something, while the semantic object is the actor or object that it is done to (Dixon, 1991).
  • 39.
    Directional Semantic NetworkExample Jiang, K., Barnett, G. A., Taylor, L. D., & Feng, B. (2018). Dynamic Co-evolutions of Peace Frames in the United States, Mainland China, and Hong Kong: A Semantic Network Analysis. In B. Cook (Eds.), Handbook of Research on Examining Global Peacemaking in the Digital Age. (pp. 145 -168). Hershey, Pennsylvania: IGI Global.
  • 40.
    In semantic networks,the association between two concepts, A and B, can be defined as the chance of reading about A given that one reads about B in a random unit. Concept Association
  • 41.
    SMA focuses onexamining the concept associations by looking at the frequency with which concepts co-occur or appear in close proximity. Concept Association
  • 42.
    Jiang, K., Anderton,B. N., Ronald, P. C., & Barnett, G. A. (2018). Semantic Network Analysis Reveals Opposing Online Representations of the Search Term “GMO”. Global Challenges, 2.
  • 43.
    Based on theresults of semantic network analysis, the dynamic evolution and co-evolution of the semantic content can be tracked through the analysis of semantic networks at different points in time. Co-evolution of Texts
  • 44.
    Scholars can explorethe correlation between semantic networks at different times by conducting QAP correlation analysis. Co-evolution of Texts
  • 45.
    0.1 0.19 0.28 0.37 0.46 0.55 0.64 0.73 0.82 0.91 1 1995 1996 19971998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 u&h c&h u&c While US & China US&HK, China & HK The convergence of US and China’s News HK News’ flexibility and independence 0.820.460.47 0.74 0.536 0.513 0.842 0.439 0.4 Iraq War Annapolis Conference China become the World’s Second Largest Economies 0.885 0.1 Jiang, K., Barnett, G. A., Taylor, L. D., & Feng, B. (2018). Dynamic Co-evolutions of Peace Frames in the United States, Mainland China, and Hong Kong: A Semantic Network Analysis. In B. Cook (Eds.), Handbook of Research on Examining Global Peacemaking in the Digital Age. (pp. 145 -168). Hershey, Pennsylvania: IGI Global.
  • 46.
  • 47.
  • 48.
  • 49.
    The most frequentwords Entities Entities + Sentiment Codebook
  • 50.
    The most frequentwords Codebook1
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
    Entities + TopicModel Visualization