SlideShare a Scribd company logo
Social Cohesion and Emotion Analysis of Social Media
During 2020 Wildfires: A Case Study
1
INFORMS2021
Alexander Gilgur
Jose Emmanuel Ramirez-Marquez
The research performed by Jose E. Ramirez Marquez leading to these results has received funding from the National Science Foundation, CRISP Type 2 /
Collaborative Research: Resilience Analytics: A Data-Driven Approach for Enhanced Interdependent Network Resilience, Award number 1541165.
Scenario Background
2
Wildfires in California have been a fact of life for many
years, including the 2018, 2019, and 2020 wildfires. This
comparison provides a way to analyze the baseline and to
tease out the interaction of wildfires with the other events.
Usually SF Bay Area is not affected by wildfires, which tend
to ravage the Santa Rosa / Napa / Sonoma areas, as well
as South California.
In 2020, SF Bay Area got hit by a rare combination of
wildfires, triggered by a series of dry lightning storms,
which set afire the hills surrounding the Bay (Santa Cruz
Mountains, Coastal Mountains, the range of populated hills
stretching from East San Jose to Pleasanton), in addition
to the “usual” danger zones.
Scenario
2020
SF Bay Area:
● COVID
● Protests
● Wildfires
3
● Emotions
● Cohesion
Can we predict Cohesion from
Sentiment & Emotions?
Data Sources
4
https://www.fire.ca.gov/stats-events/
Meltwater
The Timeline
Reference: https://en.wikipedia.org/wiki/August_2020_California_lightning_wildfires
August 11,
2020
August 12,
2020
August 15,
2020
August 16,
2020
August 17,
2020
August 18,
2020
August 19,
2020
August 20,
2020
1 fire
started
1 fire
started
2 fires
started
7 fires started 5 fires
started
3 fires
started
2 fires
started
1 fire
started
September 22, 2020 January 5, 2021
Most major fires contained All August wildfires contained
Measuring Social Cohesion
6
Statistical Analysis
● Z-score
● Inverse CV
Cohesion
Emotion &
Sentiment
Analysis
Social
Network
Analysis
Echo-Chamber Effect
● Amplification
Social Network Metrics
● Tie Strength
● Centrality
syuzhet
sentmentr
nltk.vader
Social Network Analysis: Degree Distribution
7
Tweets mentioning SF Bay
Area cities & counties and
wildfires - before, during,
and after major wildfires
Background:
Degree Centrality (degree) of
a node (user) is the number
of connections (edges) it has.
(source)
Before Bay Area Wildfires
degrees
nodes
100
900K
111 low-degree(<= 100K) nodes.
6 high-degree (>100K) nodes.
Max Degree Centrality = 900K.
During Bay Area Wildfires
degrees
nodes
10K
45M
22.6K low-degree (<=100K) nodes.
333 high-degree (>100K) nodes.
Max Degree Centrality = 47M.
After Containment of
Major Bay Area Wildfires
degrees
nodes
1K
4M
1080 low-degree nodes.
16 high-degree (>100K) nodes.
Max Degree Centrality = 4.2M.
Structural Cohesion
8
Structural Cohesion is defined as the minimal
number of actors in a social network that need to be
removed to disconnect the group
(source)
Logically, removal of higher-degree nodes in a social
network (degree outliers) would more likely result in
network disconnection.
=> count of degree outliers in a social network can
be used as a measurable proxy for structural
cohesion
Before Wildfires: SC = 5 During Wildfires: SC = 196 After Wildfires: SC = 25
Sentiment Cohesion Metric
9
Sentiment
Analysis
Tools
Sentiment Cohesion
Sentiment Cohesion: Absolute Inverse CV: Benchmark
10
Benchmark:
Non-Specific Bay Area Sentiment
Absolute Inverse CV is the signal to
noise ratio that can be used as a
measure of cohesiveness in positive
or negative sentiment.
Coefficient of Variation:
Abs-Inverse-CV spikes:
Positive Sentiment:
● 2020-04-20: “4/20”
● 2020-10-26: final
presidential debate
Negative Sentiment:
● 2020-06-08: protests
● 2020-08-17: wildfires
● 2020-10-26: final
presidential debate
● 2020-11-23: lockdown
Absolute Inverse CV: SF Bay Area Wildfires
11
Wildfires:
Bay Area Sentiment
Inverse-CV spikes:
Positive Sentiment:
● 2020-08-24: no new fires
Negative Sentiment:
● 2020-08-10: wildfires
● 2020-08-24: wildfires; air
quality dangerous
● 2020-09-21: largest local
fires contained
Absolute Inverse CV is the signal to
noise ratio that can be used as a
measure of cohesiveness in positive
or negative sentiment.
Coefficient of Variation:
Emotion Analysis
12
Emotion
Analysis
Tools
Emotion Timeline During CA Wildfires
13
fear
Polarity = -68
fear
Polarity = -31 Polarity = -784
fear
Polarity = -6400
fear
trust
anger
surprise
anger
sadness
trust
anger
sadness sadness
fear
trust
joy
anger
Polarity = -774
surprise
anticipation
sadness
fear
trust anticipation
Polarity = 342
Polarity = 3451
fear
trust anticipation
sadness
joy
fear
trust anticipation
sadness
Polarity = 114
fear
trust
sadness
Polarity = 98 Polarity = -580
fear
sadness
anger
anticipation
Weights & values of the 6 emotions:
Fear was the dominant emotion.
Anger effect on polarity was negative.
Surprise was rare. Its effect was uncertain.
Trust and Anticipation effects were positive.
Joy was rare. Its effect on polarity was positive.
2020-08-01 2020-08-17
2020-08-24
2020-09-14
2020-09-21
2020-09-28
Emotion and Cohesion Correlation Analysis
14
Linear Correlations
15
Many Features (Sentiment & Emotions)
are cross-correlated => need PCA
Structural Cohesion is:
Most strongly positively correlated with:
● Fear
● Sadness
Weakly positively correlated with:
● Anger
● Disgust
● Trust
Weakly negatively correlated with:
● Sentiment Cohesion
Most strongly negatively correlated with:
● Negative-Sentiment Cohesion
Cohesion Metrics are all intercorrelated
=> need PCA
Nonlinear Monotonic Correlations
16
Accepting nonlinearity makes things very
structured: we can group strongly correlated
emotions:
X1
= (
anticipation,
disgust,
joy,
sadness,
trust
)
X2
= (
anger,
fear,
surprise
)
We can also roll them all into one metric.
Then we can model
C = f (X1
, X2
)
Principal Component Analysis (PCA)
17
PCA finds the linear combinations (principal components, or PCs) of original variables that maximize
the variances of the principal components
This results in covariance being 0 => principal components are independent.
PCA for Cohesion
18
For the 4 Cohesion-related metrics, PCA has returned 4 Principal Components (PCs)
The PCs explain:
● pc_0: 51.1 % of the variance
● pc_1: 28.1 % of the variance
● pc_2: 17.6 % of the variance
● pc_3: 3.2 % of the variance
Total:
● 100 % of the variance is explained
Discounting pc_3 will only add 3.2% to the noise
PCA-Derived Cohesion Metric
19
The end result is a PCA-derived metric based on the 4 proxies we
defined for social cohesion:
● Structural
● Sentiment:
○ Negative
○ Positive
○ Overall (Compound)
The stepwise changes are due to weekly aggregations used in
deriving the proxies.
The new metric is computed as the length of the
vector built on the Principal Components (PCs).
The PCs are orthogonal; the vector length is the
square root of the sum of squares of the PCs
Linear Correlations for Cpca
20
Many Features (Sentiment & Emotions)
are cross-correlated => need PCA or RFR
The new cohesion metric Cpca
is negatively
correlated with emotions and compound
sentiment: the stronger emotions and
sentiment the less cohesive the community.
Disgust, Sadness, and Trust are the
strongest linear correlates for Cpca
, followed
by Anticipation and Fear.
Overall Sentiment, Anger, and Surprise are
weaker correlated with Cpca
than the other 5
Nonlinear Monotonic Correlations for Cpca
21
Many Features (Sentiment & Emotions)
are cross-correlated => need PCA
The new cohesion metric Cpca
is negatively
correlated with emotions and compound
sentiment: the stronger emotions and
sentiment the less cohesive the community.
Anticipation, Disgust, Joy, Sadness, and
Trust are the strongest nonlinear negative
correlates of Cpca
.
Overall Sentiment, Anger, and Surprise are
weaker correlated with Cpca
than the other 5
PCA for Sentiment and Emotions
22
The PCs explain:
● 61.1 % of the variance
● 23.9 % of the variance
● 10.8 % of the variance
● 4.1 % of the variance
● 0.0 % of the variance
● 0.0 % of the variance
● 0.0 % of the variance
● 0.0 % of the variance
● 0.0 % of the variance
Total:
● 100 % of the variance is explained
We should be fine with only 4 PCs
PCA for Sentiment and Emotions
23
Like Cpca
, this new metric (Epca
) is computed as the
length of the vector built on the Principal
Components (PCs). The PCs are orthogonal; the
vector length is the square root of the sum of
squares of the PCs
This PCA-derived metric is based on the 8 dimensions of
Emotions:
● 'Anger',
● 'Anticipation',
● 'Disgust',
● 'Fear',
● 'Joy',
● 'Sadness',
● 'Surprise',
● 'Trust'
and 1 dimension for Sentiment - a combination of Positive and
Negative Sentiment derived in the vader package
Modeling
24
nonlinearities
The nonlinear effects at low values of Epca
and Cpca
speak strongly for a nonlinear model
Linear Model
25
Cpca
= a0
* pc0
+ a1
* pc1
+ a2
* pc2
+ a3
* pc3
R2
= 0.388
Steering away from combining the PCs into one metric (Epca
) and using linear regression on PCs did not result in a
good model. A nonlinear model is more appropriate.
Random Forest Regression (RFR)
26
● We do not know if the model can be written as a closed-form equation =>
● Random Forest Regression works well in this situation.
● RFR does not need features to be orthogonal => interpretable results.
● RFR computes feature importance as their relative contribution to the variance of the dependent variable.
● RFR does not tell us whether a an increment of a feature will result in an increase or decrease of the dependent variable.
=> Sensitivity Analysis, LIME, or SHAP follow-up is needed.
R2
= 0.959
Feature Importances (Contributions To Variance, or CTV):
● trust: 0.673
● anger: 0.147
● disgust: 0.109
● surprise: 0.030
● joy: 0.019
● anticipation: 0.016
● sadness: 0.004
● fear: 0.003
● sentiment: 0.000
Identified important features (CTV cutoff = 0.02)
● trust: 0.673
● anger: 0.147
● disgust: 0.109
● surprise: 0.030
Trust, Anger, Disgust, and Surprise are sufficient to predict PCA-transformed Community Cohesion Metric (Cpca
) with a good
fit (R2
= 0.959). Adding Joy, Anticipation, Sadness, Fear, and Sentiment will make the fit slightly better.
Conclusions
27
Using off-the-shelf sentiment and emotion analysis tools and relying on statistical analysis of their outputs, we:
● Derived measurable proxy metrics of social cohesion in two dimensions - structural and sentiment-based - using the data
and metadata available from social-media interactions (tweets) within a loosely-defined community.
● Used Principal Component Analysis (PCA) to build a statistically sound metric of social cohesion.
● Used PCA to reduce 1 compound 'Sentiment' metric and the 8 basic measurable emotions into 1 statistically sound linear
combination of these metrics.
● Demonstrated that the relationship between PCA-transformed Cohesion Metric and the PCA-transformed Sentiment and
Emotions is linear and strong (Pearson correlation = 0.985).
● Feature Importance Analysis of Random Forest Regression (RFR) showed that Anger, Trust, Disgust, and Surprise, in
a nonlinear combination, are the emotions important for social cohesion.
● Applied Random Forest Regression (RFR) to predict PCA-transformed Cohesion metric Cpca
as a function of Sentiment
and the 8 basic emotions. Resulting R2
= 0.959 = 95.9% of the variance in Cpca
is explained by the RFR model.
○ It can be used to accurately predict social cohesion during and after disturbances.
○ Combining this with forecasts of trends of prevailing emotions can help in determining time to loss of cohesion.
Further Work
28
● Apply the unified metric to other communities, topics & events (e.g., COVID-19, protests, Presidential
elections, etc.)
● Perform Sensitivity Analysis of the RFR model.
● Model Community Resilience process with Cohesion as the metric of interest.
Cohesion = F(t, S, E)
S = Sentiment
E = Emotion
Thank you!
29
References
30
1. https://flowingdata.com/2020/09/10/a-timeline-of-california-wildfires/
2. https://psycnet.apa.org/record/2000-12222-004
3. https://aisel.aisnet.org/icis2009/112/
4. https://www.sciencedirect.com/science/article/abs/pii/0378873394002478
5. https://www.jstor.org/stable/3088904
6. https://doi.org/10.1016/0378-8733(94)00247-8
7. https://www.sciencedirect.com/topics/computer-science/degree-centrality
8. Our INFORMS 2020 presentation
31
Appendix
32
Social Network Cohesiveness
33
Tweets mentioning SF Bay Area cities & counties and
wildfires - before, during, and after major wildfires.
Degree Centrality (degree) of a node (user) is the number of
connections (edges) it has. (source)
Before the wildfires, the network of Twitter users concerned about wildfires in SF Bay Area only had 117 users. Only 6 of them
had more than 100K connections (followers + followed users). No users with more than 882.5K connections were identified.
During the wildfires, the network of Twitter users concerned about wildfires in SF Bay Area grew to 10.7 K users. 333 of them
had more than 100K connections (followers + followed users). On 3 occasions, ‘@nytimes’ had more than 47M connections.
As the major Bay Area wildfires were contained, the network of Twitter users concerned about wildfires in SF Bay Area shrank
to 1.1 K users. 16 of them had more than 100K connections (followers + followed users). On 1 occasion, ‘@USATODAY’ had
more than 4.2M connections.
Making Variables Independent: Principal Component Analysis
PCA finds the linear combinations (principal components, or PCs) of original variables that maximize
the variances of the principal components
This results in covariance being 0 => principal components are independent.
34
Principal Component Analysis and Dimensionality Reduction
Problem - thresholds are arbitrary
EV Threshold = 0.01
25 PCs 13 PCs
EV Threshold = 0.05
5 PCs
35

More Related Content

Recently uploaded

Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 

Recently uploaded (20)

Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

INFORMS 2021 Social cohesion and emotion analysis of media during 2020 wildfires a case study

  • 1. Social Cohesion and Emotion Analysis of Social Media During 2020 Wildfires: A Case Study 1 INFORMS2021 Alexander Gilgur Jose Emmanuel Ramirez-Marquez The research performed by Jose E. Ramirez Marquez leading to these results has received funding from the National Science Foundation, CRISP Type 2 / Collaborative Research: Resilience Analytics: A Data-Driven Approach for Enhanced Interdependent Network Resilience, Award number 1541165.
  • 2. Scenario Background 2 Wildfires in California have been a fact of life for many years, including the 2018, 2019, and 2020 wildfires. This comparison provides a way to analyze the baseline and to tease out the interaction of wildfires with the other events. Usually SF Bay Area is not affected by wildfires, which tend to ravage the Santa Rosa / Napa / Sonoma areas, as well as South California. In 2020, SF Bay Area got hit by a rare combination of wildfires, triggered by a series of dry lightning storms, which set afire the hills surrounding the Bay (Santa Cruz Mountains, Coastal Mountains, the range of populated hills stretching from East San Jose to Pleasanton), in addition to the “usual” danger zones.
  • 3. Scenario 2020 SF Bay Area: ● COVID ● Protests ● Wildfires 3 ● Emotions ● Cohesion Can we predict Cohesion from Sentiment & Emotions?
  • 5. The Timeline Reference: https://en.wikipedia.org/wiki/August_2020_California_lightning_wildfires August 11, 2020 August 12, 2020 August 15, 2020 August 16, 2020 August 17, 2020 August 18, 2020 August 19, 2020 August 20, 2020 1 fire started 1 fire started 2 fires started 7 fires started 5 fires started 3 fires started 2 fires started 1 fire started September 22, 2020 January 5, 2021 Most major fires contained All August wildfires contained
  • 6. Measuring Social Cohesion 6 Statistical Analysis ● Z-score ● Inverse CV Cohesion Emotion & Sentiment Analysis Social Network Analysis Echo-Chamber Effect ● Amplification Social Network Metrics ● Tie Strength ● Centrality syuzhet sentmentr nltk.vader
  • 7. Social Network Analysis: Degree Distribution 7 Tweets mentioning SF Bay Area cities & counties and wildfires - before, during, and after major wildfires Background: Degree Centrality (degree) of a node (user) is the number of connections (edges) it has. (source) Before Bay Area Wildfires degrees nodes 100 900K 111 low-degree(<= 100K) nodes. 6 high-degree (>100K) nodes. Max Degree Centrality = 900K. During Bay Area Wildfires degrees nodes 10K 45M 22.6K low-degree (<=100K) nodes. 333 high-degree (>100K) nodes. Max Degree Centrality = 47M. After Containment of Major Bay Area Wildfires degrees nodes 1K 4M 1080 low-degree nodes. 16 high-degree (>100K) nodes. Max Degree Centrality = 4.2M.
  • 8. Structural Cohesion 8 Structural Cohesion is defined as the minimal number of actors in a social network that need to be removed to disconnect the group (source) Logically, removal of higher-degree nodes in a social network (degree outliers) would more likely result in network disconnection. => count of degree outliers in a social network can be used as a measurable proxy for structural cohesion Before Wildfires: SC = 5 During Wildfires: SC = 196 After Wildfires: SC = 25
  • 10. Sentiment Cohesion: Absolute Inverse CV: Benchmark 10 Benchmark: Non-Specific Bay Area Sentiment Absolute Inverse CV is the signal to noise ratio that can be used as a measure of cohesiveness in positive or negative sentiment. Coefficient of Variation: Abs-Inverse-CV spikes: Positive Sentiment: ● 2020-04-20: “4/20” ● 2020-10-26: final presidential debate Negative Sentiment: ● 2020-06-08: protests ● 2020-08-17: wildfires ● 2020-10-26: final presidential debate ● 2020-11-23: lockdown
  • 11. Absolute Inverse CV: SF Bay Area Wildfires 11 Wildfires: Bay Area Sentiment Inverse-CV spikes: Positive Sentiment: ● 2020-08-24: no new fires Negative Sentiment: ● 2020-08-10: wildfires ● 2020-08-24: wildfires; air quality dangerous ● 2020-09-21: largest local fires contained Absolute Inverse CV is the signal to noise ratio that can be used as a measure of cohesiveness in positive or negative sentiment. Coefficient of Variation:
  • 13. Emotion Timeline During CA Wildfires 13 fear Polarity = -68 fear Polarity = -31 Polarity = -784 fear Polarity = -6400 fear trust anger surprise anger sadness trust anger sadness sadness fear trust joy anger Polarity = -774 surprise anticipation sadness fear trust anticipation Polarity = 342 Polarity = 3451 fear trust anticipation sadness joy fear trust anticipation sadness Polarity = 114 fear trust sadness Polarity = 98 Polarity = -580 fear sadness anger anticipation Weights & values of the 6 emotions: Fear was the dominant emotion. Anger effect on polarity was negative. Surprise was rare. Its effect was uncertain. Trust and Anticipation effects were positive. Joy was rare. Its effect on polarity was positive. 2020-08-01 2020-08-17 2020-08-24 2020-09-14 2020-09-21 2020-09-28
  • 14. Emotion and Cohesion Correlation Analysis 14
  • 15. Linear Correlations 15 Many Features (Sentiment & Emotions) are cross-correlated => need PCA Structural Cohesion is: Most strongly positively correlated with: ● Fear ● Sadness Weakly positively correlated with: ● Anger ● Disgust ● Trust Weakly negatively correlated with: ● Sentiment Cohesion Most strongly negatively correlated with: ● Negative-Sentiment Cohesion Cohesion Metrics are all intercorrelated => need PCA
  • 16. Nonlinear Monotonic Correlations 16 Accepting nonlinearity makes things very structured: we can group strongly correlated emotions: X1 = ( anticipation, disgust, joy, sadness, trust ) X2 = ( anger, fear, surprise ) We can also roll them all into one metric. Then we can model C = f (X1 , X2 )
  • 17. Principal Component Analysis (PCA) 17 PCA finds the linear combinations (principal components, or PCs) of original variables that maximize the variances of the principal components This results in covariance being 0 => principal components are independent.
  • 18. PCA for Cohesion 18 For the 4 Cohesion-related metrics, PCA has returned 4 Principal Components (PCs) The PCs explain: ● pc_0: 51.1 % of the variance ● pc_1: 28.1 % of the variance ● pc_2: 17.6 % of the variance ● pc_3: 3.2 % of the variance Total: ● 100 % of the variance is explained Discounting pc_3 will only add 3.2% to the noise
  • 19. PCA-Derived Cohesion Metric 19 The end result is a PCA-derived metric based on the 4 proxies we defined for social cohesion: ● Structural ● Sentiment: ○ Negative ○ Positive ○ Overall (Compound) The stepwise changes are due to weekly aggregations used in deriving the proxies. The new metric is computed as the length of the vector built on the Principal Components (PCs). The PCs are orthogonal; the vector length is the square root of the sum of squares of the PCs
  • 20. Linear Correlations for Cpca 20 Many Features (Sentiment & Emotions) are cross-correlated => need PCA or RFR The new cohesion metric Cpca is negatively correlated with emotions and compound sentiment: the stronger emotions and sentiment the less cohesive the community. Disgust, Sadness, and Trust are the strongest linear correlates for Cpca , followed by Anticipation and Fear. Overall Sentiment, Anger, and Surprise are weaker correlated with Cpca than the other 5
  • 21. Nonlinear Monotonic Correlations for Cpca 21 Many Features (Sentiment & Emotions) are cross-correlated => need PCA The new cohesion metric Cpca is negatively correlated with emotions and compound sentiment: the stronger emotions and sentiment the less cohesive the community. Anticipation, Disgust, Joy, Sadness, and Trust are the strongest nonlinear negative correlates of Cpca . Overall Sentiment, Anger, and Surprise are weaker correlated with Cpca than the other 5
  • 22. PCA for Sentiment and Emotions 22 The PCs explain: ● 61.1 % of the variance ● 23.9 % of the variance ● 10.8 % of the variance ● 4.1 % of the variance ● 0.0 % of the variance ● 0.0 % of the variance ● 0.0 % of the variance ● 0.0 % of the variance ● 0.0 % of the variance Total: ● 100 % of the variance is explained We should be fine with only 4 PCs
  • 23. PCA for Sentiment and Emotions 23 Like Cpca , this new metric (Epca ) is computed as the length of the vector built on the Principal Components (PCs). The PCs are orthogonal; the vector length is the square root of the sum of squares of the PCs This PCA-derived metric is based on the 8 dimensions of Emotions: ● 'Anger', ● 'Anticipation', ● 'Disgust', ● 'Fear', ● 'Joy', ● 'Sadness', ● 'Surprise', ● 'Trust' and 1 dimension for Sentiment - a combination of Positive and Negative Sentiment derived in the vader package
  • 24. Modeling 24 nonlinearities The nonlinear effects at low values of Epca and Cpca speak strongly for a nonlinear model
  • 25. Linear Model 25 Cpca = a0 * pc0 + a1 * pc1 + a2 * pc2 + a3 * pc3 R2 = 0.388 Steering away from combining the PCs into one metric (Epca ) and using linear regression on PCs did not result in a good model. A nonlinear model is more appropriate.
  • 26. Random Forest Regression (RFR) 26 ● We do not know if the model can be written as a closed-form equation => ● Random Forest Regression works well in this situation. ● RFR does not need features to be orthogonal => interpretable results. ● RFR computes feature importance as their relative contribution to the variance of the dependent variable. ● RFR does not tell us whether a an increment of a feature will result in an increase or decrease of the dependent variable. => Sensitivity Analysis, LIME, or SHAP follow-up is needed. R2 = 0.959 Feature Importances (Contributions To Variance, or CTV): ● trust: 0.673 ● anger: 0.147 ● disgust: 0.109 ● surprise: 0.030 ● joy: 0.019 ● anticipation: 0.016 ● sadness: 0.004 ● fear: 0.003 ● sentiment: 0.000 Identified important features (CTV cutoff = 0.02) ● trust: 0.673 ● anger: 0.147 ● disgust: 0.109 ● surprise: 0.030 Trust, Anger, Disgust, and Surprise are sufficient to predict PCA-transformed Community Cohesion Metric (Cpca ) with a good fit (R2 = 0.959). Adding Joy, Anticipation, Sadness, Fear, and Sentiment will make the fit slightly better.
  • 27. Conclusions 27 Using off-the-shelf sentiment and emotion analysis tools and relying on statistical analysis of their outputs, we: ● Derived measurable proxy metrics of social cohesion in two dimensions - structural and sentiment-based - using the data and metadata available from social-media interactions (tweets) within a loosely-defined community. ● Used Principal Component Analysis (PCA) to build a statistically sound metric of social cohesion. ● Used PCA to reduce 1 compound 'Sentiment' metric and the 8 basic measurable emotions into 1 statistically sound linear combination of these metrics. ● Demonstrated that the relationship between PCA-transformed Cohesion Metric and the PCA-transformed Sentiment and Emotions is linear and strong (Pearson correlation = 0.985). ● Feature Importance Analysis of Random Forest Regression (RFR) showed that Anger, Trust, Disgust, and Surprise, in a nonlinear combination, are the emotions important for social cohesion. ● Applied Random Forest Regression (RFR) to predict PCA-transformed Cohesion metric Cpca as a function of Sentiment and the 8 basic emotions. Resulting R2 = 0.959 = 95.9% of the variance in Cpca is explained by the RFR model. ○ It can be used to accurately predict social cohesion during and after disturbances. ○ Combining this with forecasts of trends of prevailing emotions can help in determining time to loss of cohesion.
  • 28. Further Work 28 ● Apply the unified metric to other communities, topics & events (e.g., COVID-19, protests, Presidential elections, etc.) ● Perform Sensitivity Analysis of the RFR model. ● Model Community Resilience process with Cohesion as the metric of interest. Cohesion = F(t, S, E) S = Sentiment E = Emotion
  • 31. 1. https://flowingdata.com/2020/09/10/a-timeline-of-california-wildfires/ 2. https://psycnet.apa.org/record/2000-12222-004 3. https://aisel.aisnet.org/icis2009/112/ 4. https://www.sciencedirect.com/science/article/abs/pii/0378873394002478 5. https://www.jstor.org/stable/3088904 6. https://doi.org/10.1016/0378-8733(94)00247-8 7. https://www.sciencedirect.com/topics/computer-science/degree-centrality 8. Our INFORMS 2020 presentation 31
  • 33. Social Network Cohesiveness 33 Tweets mentioning SF Bay Area cities & counties and wildfires - before, during, and after major wildfires. Degree Centrality (degree) of a node (user) is the number of connections (edges) it has. (source) Before the wildfires, the network of Twitter users concerned about wildfires in SF Bay Area only had 117 users. Only 6 of them had more than 100K connections (followers + followed users). No users with more than 882.5K connections were identified. During the wildfires, the network of Twitter users concerned about wildfires in SF Bay Area grew to 10.7 K users. 333 of them had more than 100K connections (followers + followed users). On 3 occasions, ‘@nytimes’ had more than 47M connections. As the major Bay Area wildfires were contained, the network of Twitter users concerned about wildfires in SF Bay Area shrank to 1.1 K users. 16 of them had more than 100K connections (followers + followed users). On 1 occasion, ‘@USATODAY’ had more than 4.2M connections.
  • 34. Making Variables Independent: Principal Component Analysis PCA finds the linear combinations (principal components, or PCs) of original variables that maximize the variances of the principal components This results in covariance being 0 => principal components are independent. 34
  • 35. Principal Component Analysis and Dimensionality Reduction Problem - thresholds are arbitrary EV Threshold = 0.01 25 PCs 13 PCs EV Threshold = 0.05 5 PCs 35