WHAT MAKES
CONTENT
GO VIRAL?
SPOILER ALERT
It’s not this!
(P.S. OUR INSPIRATION)
1SOCIAL CURRENCY
People talk about what
makes them look good.
2 TRIGGERS
People talk about content
that reminds them of something.
3 EMOTIONS
When we care,
we share.
4 PUBLIC
If others are talking about it,
people are more likely to share.
5PRACTICAL VALUE
People will share anything
that has practical value.
6 STORIES
People will share content
that has a great narrative.
OUR
FOCUS?
EMOTIONS!
OUR
OBJECTIVES?
WHAT ARE THE OBJECTIVES?
Determine the factors that lead to
an article going viral on social
media.
Predict the popularity of a given
article on social media.
Determine the causal effect of
emotions on an article’s popularity.
1.
2.
3.
LET’S DEFINE VIRAL FIRST
A BRIEF INTRODUCTION
WHY DO PEOPLE SHARE ANYWAY?
WHY?
AWE
LAUGHTER
AMUSEMENT
JOY
ANGER
EMPATHY
SURPRISE
SADNESS
“Must. Share. Now.” *clicks*
“ROFLMAOOOOOOO”
“Wow, this is quite something.”
“I’m feeling good!”
“NO WAAAAAAY!”
“Oh no! :(”
“Damn, that took me by surprise!”
“Just chill. We’re all gonna die anyways.” *sigh*
OUR
METHODOLOGY?
LET’S OUTLINE OUR METHODOLOGY
HOW DID WE SOLVE IT?
SCRAPING OF ARTICLES
FROM MEDIUM
DATA CLEANING
& PRE-PROCESSING
FEATURE
ENGINEERING & LIWC
1 2 3 4
MODEL
BUILDING
5
CAUSALITY
ANALYSIS (RCT)
OUR DATA SOURCE
WHAT DATA DID WE USE?
NUMBER OF ARTICLES: 5011
CATEGORIES: Social Media, Entrepreneurship, Culture, Technology, Self, Politics, Media
DATE RANGE: October 25th, 2013 - March 30th, 2018
DATA SCRAPED
WHAT DATA DID WE USE?
Title
Author
Followed By
Tags
Date Published
Paragraphs
Images
Bullets
Links
Claps (DV)
Author Followed By
Author Following
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
LINGUISTIC INQUIRY & WORD COUNT (LIWC)
WHAT DATA DID WE USE?
Explains how the words
we use in everyday
language reveal our
thoughts, feelings,
personality, and
motivations.
FEATURE ENGINEERING
WHAT DATA DID WE USE?
1. Images Exist
2. Number of Images
3. Bullets Exist
4. Number of Bullets
Article
Content
Time
5. Links Exists
6. Number of Links
7. Tags
8. Number of Tags
1. Date Published
2. Days Since Published
3. Day
4. Month
5. Year
1. Title Sentiment
2. Title Word Count
3. Title Tone
4. Title Analytic
5. Title Authentic
6. Title Tone
7. Title Six Letter Words
8. Tile I
9. Title We
10. Title You
11. Title She/He
12. Title They
13. Title Compare
A B Social Media
1. Facebook Shares
2. Facebook Comments
3. Facebook Reactions
4. LinkedIn Shares
5. Pinterest Shares
D
E
Author
1. Author
2. Author Followers
3. Author Following
C
14. Title Interrogation
15. Title Number
16. Title Positive Emotions
17. Title Negative Emotions
18. Title Anxiety
19. Title Anger
20. Title Sad
21. Title Cause
22. Title Sexual
23. Title Power
24. Title Risk
25. Title Focus Past
26. Title Focus Present
27. Title Focus Future
28. Title Religion
29. Title Swear
30. Title Net Speak
31. Title QMark
32. Content Word Count
33. Content Analytic
34. Content Authentic
35. Content Tone
36. Content Six Letter Words
37. Content I
38. Content We
39. Content You
40. Content She/He
41. Content They
42. Content Compare
43. Content Interrogation
44. Content Number
45. Content Positive Emotion
46. Content Negative Emotion
47. Content Anxiety
48. Content Anger
49. Content Sad
50. Content Cause
51. Content Sexual
52. Content Power
40. Content Risk
41. Content Focus Past
42. Content Focus Present
43. Content Focus Future
44. Content Religion
45. Content Swear
46. Content Net Speak
47. Content QMark
48. Content Words Per Sentence
49. Concept Count
50. Classification Label
51. Classification Confidence
HOW ARE THE CLAPS CATEGORISED?
WHAT DATA DID WE USE?
LOW: < 1,000 (3,893)
MEDIUM: >= 1,000 & < 3,000 (604)
HIGH: >= 3,000 & < 10,000 (273)
VIRAL: >= 10,000 (62)
OUR
MODELS?
THE MODELS!
WHICH MODELS DID WE USE?
RANDOM
FOREST
SUPPORT VECTOR
MACHINES (SVM)
1 2
FEATURE ENGINEERING
WHAT DATA DID WE USE?
1. Images Exist
2. Number of Images
3. Bullets Exist
4. Number of Bullets
Article
Content
Time
5. Links Exists
6. Number of Links
7. Tags
8. Number of Tags
1. Date Published
2. Days Since Published
3. Day
4. Month
5. Year
1. Title Sentiment
2. Title Word Count
3. Title Tone
4. Title Analytic
5. Title Authentic
6. Title Tone
7. Title Six Letter Words
8. Tile I
9. Title We
10. Title You
11. Title She/He
12. Title They
13. Title Compare
A B Social Media
1. Facebook Shares
2. Facebook Comments
3. Facebook Reactions
4. LinkedIn Shares
5. Pinterest Shares
D
E
Author
1. Author
2. Author Followers
3. Author Following
C
14. Title Interrogation
15. Title Number
16. Title Positive Emotions
17. Title Negative Emotions
18. Title Anxiety
19. Title Anger
20. Title Sad
21. Title Cause
22. Title Sexual
23. Title Power
24. Title Risk
25. Title Focus Past
26. Title Focus Present
27. Title Focus Future
28. Title Religion
29. Title Swear
30. Title Net Speak
31. Title QMark
32. Content Word Count
33. Content Analytic
34. Content Authentic
35. Content Tone
36. Content Six Letter Words
37. Content I
38. Content We
39. Content You
40. Content She/He
41. Content They
42. Content Compare
43. Content Interrogation
44. Content Number
45. Content Positive Emotion
46. Content Negative Emotion
47. Content Anxiety
48. Content Anger
49. Content Sad
50. Content Cause
51. Content Sexual
52. Content Power
40. Content Risk
41. Content Focus Past
42. Content Focus Present
43. Content Focus Future
44. Content Religion
45. Content Swear
46. Content Net Speak
47. Content QMark
48. Content Words Per Sentence
49. Concept Count
50. Classification Label
51. Classification Confidence
WHAT ARE THE RESULTS?
FOR THE STATISTICIANS
84.6% 84.2%
RANDOM FOREST SUPPORT VECTOR
MACHINES
VARIABLE IMPORTANCE PLOT
FOR THE STATISTICIANS
WHAT SHOULD YOU FOCUS ON?
FOR THE PUBLISHERS
AUTHOR LONG FORM MORE IMAGES
CONTENT
TYPE
CONTENT
POWER
NEGATIVE
EMOTIONS
OUR
TIPS?
1 AUTHOR
Authors with more social media
presence tend to produce viral content.
2 LONG FORM
Long form content engages users and
hence is more likely to be shared.
3 IMAGES
A picture speaks a 1000 words.
More images lead to more shares.
4CONTENT TYPE
The content topic is crucial and
readers need to associate with it.
5POWER CONTENT
Content with references to social
hierarchies and society do better.
6 EMOTIONS
Content that generates negative
emotions tend to be shared more.
ECONOMETRIC
MODELS?
WHAT DID WE TRY?
ECONOMETRICS! ECONOMETRICS!
1
2
3
4
TWO-STAGE LEAST SQUARES
THREE-STAGE LEAST SQUARES
SEEMINGLY UNRELATED REGRESSIONS (SUR)
WHAT DID WE GO WITH?
ECONOMETRICS! ECONOMETRICS!
1
2
3
4
TWO-STAGE LEAST SQUARES
THREE-STAGE LEAST SQUARES
SEEMINGLY UNRELATED REGRESSIONS (SUR)
WHAT DID WE GET?
ECONOMETRICS! ECONOMETRICS!
EQUATION 1 EQUATION 2
WALD TEST
(INSTRUMENT RELEVANCE)
CAUSALITY ANALYSIS OF EMOTIONS
RANDOMISED CONTROL TRIALS
ORGINAL ARTICLE EDITED ARTICLE
THE RESULTS?
RANDOMISED CONTROL TRIALS
1
2
3
4
ARTICLE CHOICE OF WORDS AFFECTS EMOTIONS. USING
NEGATIVE WORDS MAKES READERS MORE AGITATED.
EDITED ARTICLE GETS MORE LIKES - ANGER POTENTIALLY
TRIGGERS MORE THAN INSPIRATIONAL.
EDITED ARTICLE GETS MORE COMMENTS THAN ORIGINAL.
ANGER PROVOKES REACTION.
EDITED ARTICLE GOT LESS SHARES. :(
2
!

What Makes Content Go Viral?

  • 1.
  • 2.
  • 3.
  • 4.
    1SOCIAL CURRENCY People talkabout what makes them look good.
  • 5.
    2 TRIGGERS People talkabout content that reminds them of something.
  • 6.
    3 EMOTIONS When wecare, we share.
  • 7.
    4 PUBLIC If othersare talking about it, people are more likely to share.
  • 8.
    5PRACTICAL VALUE People willshare anything that has practical value.
  • 9.
    6 STORIES People willshare content that has a great narrative.
  • 10.
  • 11.
  • 12.
  • 13.
    WHAT ARE THEOBJECTIVES? Determine the factors that lead to an article going viral on social media. Predict the popularity of a given article on social media. Determine the causal effect of emotions on an article’s popularity. 1. 2. 3.
  • 14.
    LET’S DEFINE VIRALFIRST A BRIEF INTRODUCTION
  • 15.
    WHY DO PEOPLESHARE ANYWAY? WHY? AWE LAUGHTER AMUSEMENT JOY ANGER EMPATHY SURPRISE SADNESS “Must. Share. Now.” *clicks* “ROFLMAOOOOOOO” “Wow, this is quite something.” “I’m feeling good!” “NO WAAAAAAY!” “Oh no! :(” “Damn, that took me by surprise!” “Just chill. We’re all gonna die anyways.” *sigh*
  • 16.
  • 17.
    LET’S OUTLINE OURMETHODOLOGY HOW DID WE SOLVE IT? SCRAPING OF ARTICLES FROM MEDIUM DATA CLEANING & PRE-PROCESSING FEATURE ENGINEERING & LIWC 1 2 3 4 MODEL BUILDING 5 CAUSALITY ANALYSIS (RCT)
  • 18.
    OUR DATA SOURCE WHATDATA DID WE USE? NUMBER OF ARTICLES: 5011 CATEGORIES: Social Media, Entrepreneurship, Culture, Technology, Self, Politics, Media DATE RANGE: October 25th, 2013 - March 30th, 2018
  • 19.
    DATA SCRAPED WHAT DATADID WE USE? Title Author Followed By Tags Date Published Paragraphs Images Bullets Links Claps (DV) Author Followed By Author Following 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
  • 20.
    LINGUISTIC INQUIRY &WORD COUNT (LIWC) WHAT DATA DID WE USE? Explains how the words we use in everyday language reveal our thoughts, feelings, personality, and motivations.
  • 21.
    FEATURE ENGINEERING WHAT DATADID WE USE? 1. Images Exist 2. Number of Images 3. Bullets Exist 4. Number of Bullets Article Content Time 5. Links Exists 6. Number of Links 7. Tags 8. Number of Tags 1. Date Published 2. Days Since Published 3. Day 4. Month 5. Year 1. Title Sentiment 2. Title Word Count 3. Title Tone 4. Title Analytic 5. Title Authentic 6. Title Tone 7. Title Six Letter Words 8. Tile I 9. Title We 10. Title You 11. Title She/He 12. Title They 13. Title Compare A B Social Media 1. Facebook Shares 2. Facebook Comments 3. Facebook Reactions 4. LinkedIn Shares 5. Pinterest Shares D E Author 1. Author 2. Author Followers 3. Author Following C 14. Title Interrogation 15. Title Number 16. Title Positive Emotions 17. Title Negative Emotions 18. Title Anxiety 19. Title Anger 20. Title Sad 21. Title Cause 22. Title Sexual 23. Title Power 24. Title Risk 25. Title Focus Past 26. Title Focus Present 27. Title Focus Future 28. Title Religion 29. Title Swear 30. Title Net Speak 31. Title QMark 32. Content Word Count 33. Content Analytic 34. Content Authentic 35. Content Tone 36. Content Six Letter Words 37. Content I 38. Content We 39. Content You 40. Content She/He 41. Content They 42. Content Compare 43. Content Interrogation 44. Content Number 45. Content Positive Emotion 46. Content Negative Emotion 47. Content Anxiety 48. Content Anger 49. Content Sad 50. Content Cause 51. Content Sexual 52. Content Power 40. Content Risk 41. Content Focus Past 42. Content Focus Present 43. Content Focus Future 44. Content Religion 45. Content Swear 46. Content Net Speak 47. Content QMark 48. Content Words Per Sentence 49. Concept Count 50. Classification Label 51. Classification Confidence
  • 22.
    HOW ARE THECLAPS CATEGORISED? WHAT DATA DID WE USE? LOW: < 1,000 (3,893) MEDIUM: >= 1,000 & < 3,000 (604) HIGH: >= 3,000 & < 10,000 (273) VIRAL: >= 10,000 (62)
  • 23.
  • 24.
    THE MODELS! WHICH MODELSDID WE USE? RANDOM FOREST SUPPORT VECTOR MACHINES (SVM) 1 2
  • 25.
    FEATURE ENGINEERING WHAT DATADID WE USE? 1. Images Exist 2. Number of Images 3. Bullets Exist 4. Number of Bullets Article Content Time 5. Links Exists 6. Number of Links 7. Tags 8. Number of Tags 1. Date Published 2. Days Since Published 3. Day 4. Month 5. Year 1. Title Sentiment 2. Title Word Count 3. Title Tone 4. Title Analytic 5. Title Authentic 6. Title Tone 7. Title Six Letter Words 8. Tile I 9. Title We 10. Title You 11. Title She/He 12. Title They 13. Title Compare A B Social Media 1. Facebook Shares 2. Facebook Comments 3. Facebook Reactions 4. LinkedIn Shares 5. Pinterest Shares D E Author 1. Author 2. Author Followers 3. Author Following C 14. Title Interrogation 15. Title Number 16. Title Positive Emotions 17. Title Negative Emotions 18. Title Anxiety 19. Title Anger 20. Title Sad 21. Title Cause 22. Title Sexual 23. Title Power 24. Title Risk 25. Title Focus Past 26. Title Focus Present 27. Title Focus Future 28. Title Religion 29. Title Swear 30. Title Net Speak 31. Title QMark 32. Content Word Count 33. Content Analytic 34. Content Authentic 35. Content Tone 36. Content Six Letter Words 37. Content I 38. Content We 39. Content You 40. Content She/He 41. Content They 42. Content Compare 43. Content Interrogation 44. Content Number 45. Content Positive Emotion 46. Content Negative Emotion 47. Content Anxiety 48. Content Anger 49. Content Sad 50. Content Cause 51. Content Sexual 52. Content Power 40. Content Risk 41. Content Focus Past 42. Content Focus Present 43. Content Focus Future 44. Content Religion 45. Content Swear 46. Content Net Speak 47. Content QMark 48. Content Words Per Sentence 49. Concept Count 50. Classification Label 51. Classification Confidence
  • 26.
    WHAT ARE THERESULTS? FOR THE STATISTICIANS 84.6% 84.2% RANDOM FOREST SUPPORT VECTOR MACHINES
  • 27.
  • 28.
    WHAT SHOULD YOUFOCUS ON? FOR THE PUBLISHERS AUTHOR LONG FORM MORE IMAGES CONTENT TYPE CONTENT POWER NEGATIVE EMOTIONS
  • 29.
  • 30.
    1 AUTHOR Authors withmore social media presence tend to produce viral content.
  • 31.
    2 LONG FORM Longform content engages users and hence is more likely to be shared.
  • 32.
    3 IMAGES A picturespeaks a 1000 words. More images lead to more shares.
  • 33.
    4CONTENT TYPE The contenttopic is crucial and readers need to associate with it.
  • 34.
    5POWER CONTENT Content withreferences to social hierarchies and society do better.
  • 35.
    6 EMOTIONS Content thatgenerates negative emotions tend to be shared more.
  • 36.
  • 37.
    WHAT DID WETRY? ECONOMETRICS! ECONOMETRICS! 1 2 3 4 TWO-STAGE LEAST SQUARES THREE-STAGE LEAST SQUARES SEEMINGLY UNRELATED REGRESSIONS (SUR)
  • 38.
    WHAT DID WEGO WITH? ECONOMETRICS! ECONOMETRICS! 1 2 3 4 TWO-STAGE LEAST SQUARES THREE-STAGE LEAST SQUARES SEEMINGLY UNRELATED REGRESSIONS (SUR)
  • 39.
    WHAT DID WEGET? ECONOMETRICS! ECONOMETRICS! EQUATION 1 EQUATION 2 WALD TEST (INSTRUMENT RELEVANCE)
  • 40.
    CAUSALITY ANALYSIS OFEMOTIONS RANDOMISED CONTROL TRIALS ORGINAL ARTICLE EDITED ARTICLE
  • 41.
    THE RESULTS? RANDOMISED CONTROLTRIALS 1 2 3 4 ARTICLE CHOICE OF WORDS AFFECTS EMOTIONS. USING NEGATIVE WORDS MAKES READERS MORE AGITATED. EDITED ARTICLE GETS MORE LIKES - ANGER POTENTIALLY TRIGGERS MORE THAN INSPIRATIONAL. EDITED ARTICLE GETS MORE COMMENTS THAN ORIGINAL. ANGER PROVOKES REACTION. EDITED ARTICLE GOT LESS SHARES. :( 2
  • 42.