SlideShare a Scribd company logo
1 of 3
Download to read offline
Predicting Business Confidence using News and Social Media
Why Predict Business Confidence?
Financial
Crisis
John Cai, University of Cambridge 1
Why use News Media and Social Media data?
Data Collection
8000+
CNN Articles containing
“US Economy” scraped
92000+
Tweets containing “US
Economy” obtained
32000+
Tweets obtained from
@realDonaldTrump
6000+
NYT Articles from the
Economy Section scraped
News Data Twitter Data
Fig 1: Business Confidence vs GDP Growth
1. Financial Crisis: The drop in business confidence preceded US GDP growth data by 1 month.
2. News: Scraped using Selenium, Beautiful Soup and Newspaper in Python.
3. Tweets: Obtained using Python’s Twitterscraper .
(2) (3)
Trump
Presidency
Lead Indicator
OECD Business Confidence
Index (BCI) is a key lead
indicator of GDP growth, as
shown in Fig 1. BCI measures
expectations using surveys.
Infrequency
OECD Business Confidence
Index is published monthly.
We do not currently have
daily or weekly estimates of
business confidence.
High Frequency
Sentiments in news media
and social media change in
real-time. We are able to
construct estimates of daily
business confidence.
Broad-based
Traditional estimates of daily
sentiments are from financial
markets. News and Social
Media capture sentiments in
the broader real economy.
- VIX index (uses option prices to measure volatility)
- S&P500 daily returns and change in daily returns
Financial Data
(1)
Note: Y-axis is the standardized value of the variables
Building a Prediction Model using NLP and ML
John Cai, University of Cambridge 2
1. NLTK: developed by Stanford and trained on movie reviews. VADER: Developed by Georgia Tech and trained on tweets.
2. Cross-validation using h-step ahead: Calculates MSPE by taking the actual value – forecasted value from the model.
3. Plot of Squared Prediction Error over time, to see how the relative performance of the two models varies over the test period.
Use Textblob to tokenize the
textual data at the words level
and the sentence level. Remove
stop-words and creating n-grams.
NLP
Perform Features Scaling with
scikit-learn by standardizing all
features. Set aside 20% of the
data for Testing (2016-2018) and
another 20% for Validation.
1. Data Pre-processing
Employ VADER’s lexical and rule-
based classifier and NLTK’s
Naïve Bayes Classifier to obtain
polarity and/or subjectivity.
2. Sentiment Analysis
Compute the average and cross-
sectional variance of sentiments
over every month. Omit features
with unit roots.
3. Feature Engineering
4. Data Preparation
Employ Cross-Validation LASSO
with a rolling forecast origin and
fixed window (adapted for time-
series). LASSO selects features
with high prediction value.
5. Training the Model
0
0.1
0.2
0.3
0.4
Oct-16 Apr-17 Oct-17 Apr-18 Oct-18
Less Parsimonious More Parsimonious
ML
Fig 2: Test-Set Squared Error
𝜆"#$ , which corresponds to a
more parsimonious model, is
preferred as the model has a
smaller MSPE for out-of-sample
predictions (shown in Fig 2).
6. Model Selection
Training subset Validation
Get Mean Squared Prediction
Error (MSPE) over the Test Set
Cross Validation Loop
Tunes hyper-parameters by
minimizing the MSPE from h-
step ahead forecasting over
the Validation SetGives penalty terms 𝜆 𝑜𝑝𝑡 and 𝜆"#$,	
which allows us to select features.
Training Set Test
(1)
(2)
(3)
John Cai, University of Cambridge 3
1. LASSO selected the VADER Score rather than the NLTK Score, likely because VADER is trained on Social Media data.
2. Uncertainty is reflected in sentiment variance. In recessions, uncertainty and sentiment variance increase (Bloom, 2018).
3. Markov Switching Models would account for the structural breaks expected during recessions (Hamilton, 2010).
Fig 3: Best Prediction Model ( 𝜆"#$) Fig 4: Model Evaluation
Neutral BullishBearish
High
Error
Very Low
Error
Low
Error
Limitations and ExtensionsResults and Implications
1. Twitter sentiments are
most informative of BCI
Twitter has the highest value in
prediction compared to NYT,
CNN and Trump’s Tweets.
2. Results are consistent
with economic theory
Variance of sentiments predicts
business confidence because it
is counter-cyclical (-- coefficent).
Features Selected (sign of coeff)
1. Twitter Polarity Mean (+)
2. Twitter Polarity Variance (--)
3. VIX Index (--)
4. Returns (+)
5. Lagged Returns (+)
6. Lagged BCI (+)
3. Model works best during
neutral and bullish periods
As shown in Fig 4, the squared
prediction error is much higher
for bearish periods.
4. Possible extension:
Markov Switching Models
Performance in bearish periods
could improve if Markov chains
are used for structural breaks.
Analyzing and Evaluating Results from the Prediction Model
(1) (2) (3)

More Related Content

Similar to Predicting Business Confidence using News and Social Media

Bitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical PriceBitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical PriceIRJET Journal
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media DataIRJET Journal
 
ML in banking
ML in bankingML in banking
ML in bankingvrtanes
 
w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018Open Security Summit
 
Pillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionPillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionBenjamin Huston
 
Text book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docxText book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docxmehek4
 
IRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score IndexingIRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score IndexingIRJET Journal
 
Size matters a lot rick collins - technomics
Size matters a lot   rick collins - technomicsSize matters a lot   rick collins - technomics
Size matters a lot rick collins - technomicsNesma
 
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.IRJET Journal
 
Report 190804110930
Report 190804110930Report 190804110930
Report 190804110930udara12345
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxAASTHA76
 
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked EnsembleCOVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked EnsembleIRJET Journal
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSManishReddy706923
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsDavidkerrkelly
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental AnalysisIRJET Journal
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldJeomoan Kurian
 

Similar to Predicting Business Confidence using News and Social Media (20)

Bitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical PriceBitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical Price
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media Data
 
ML in banking
ML in bankingML in banking
ML in banking
 
w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018
 
Pillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionPillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted version
 
Text book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docxText book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docx
 
IRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score IndexingIRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score Indexing
 
Size matters a lot rick collins - technomics
Size matters a lot   rick collins - technomicsSize matters a lot   rick collins - technomics
Size matters a lot rick collins - technomics
 
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
 
Report 190804110930
Report 190804110930Report 190804110930
Report 190804110930
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
 
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked EnsembleCOVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
Introduction.pptx
 Introduction.pptx Introduction.pptx
Introduction.pptx
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data World
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Predicting Business Confidence using News and Social Media

  • 1. Predicting Business Confidence using News and Social Media Why Predict Business Confidence? Financial Crisis John Cai, University of Cambridge 1 Why use News Media and Social Media data? Data Collection 8000+ CNN Articles containing “US Economy” scraped 92000+ Tweets containing “US Economy” obtained 32000+ Tweets obtained from @realDonaldTrump 6000+ NYT Articles from the Economy Section scraped News Data Twitter Data Fig 1: Business Confidence vs GDP Growth 1. Financial Crisis: The drop in business confidence preceded US GDP growth data by 1 month. 2. News: Scraped using Selenium, Beautiful Soup and Newspaper in Python. 3. Tweets: Obtained using Python’s Twitterscraper . (2) (3) Trump Presidency Lead Indicator OECD Business Confidence Index (BCI) is a key lead indicator of GDP growth, as shown in Fig 1. BCI measures expectations using surveys. Infrequency OECD Business Confidence Index is published monthly. We do not currently have daily or weekly estimates of business confidence. High Frequency Sentiments in news media and social media change in real-time. We are able to construct estimates of daily business confidence. Broad-based Traditional estimates of daily sentiments are from financial markets. News and Social Media capture sentiments in the broader real economy. - VIX index (uses option prices to measure volatility) - S&P500 daily returns and change in daily returns Financial Data (1) Note: Y-axis is the standardized value of the variables
  • 2. Building a Prediction Model using NLP and ML John Cai, University of Cambridge 2 1. NLTK: developed by Stanford and trained on movie reviews. VADER: Developed by Georgia Tech and trained on tweets. 2. Cross-validation using h-step ahead: Calculates MSPE by taking the actual value – forecasted value from the model. 3. Plot of Squared Prediction Error over time, to see how the relative performance of the two models varies over the test period. Use Textblob to tokenize the textual data at the words level and the sentence level. Remove stop-words and creating n-grams. NLP Perform Features Scaling with scikit-learn by standardizing all features. Set aside 20% of the data for Testing (2016-2018) and another 20% for Validation. 1. Data Pre-processing Employ VADER’s lexical and rule- based classifier and NLTK’s Naïve Bayes Classifier to obtain polarity and/or subjectivity. 2. Sentiment Analysis Compute the average and cross- sectional variance of sentiments over every month. Omit features with unit roots. 3. Feature Engineering 4. Data Preparation Employ Cross-Validation LASSO with a rolling forecast origin and fixed window (adapted for time- series). LASSO selects features with high prediction value. 5. Training the Model 0 0.1 0.2 0.3 0.4 Oct-16 Apr-17 Oct-17 Apr-18 Oct-18 Less Parsimonious More Parsimonious ML Fig 2: Test-Set Squared Error 𝜆"#$ , which corresponds to a more parsimonious model, is preferred as the model has a smaller MSPE for out-of-sample predictions (shown in Fig 2). 6. Model Selection Training subset Validation Get Mean Squared Prediction Error (MSPE) over the Test Set Cross Validation Loop Tunes hyper-parameters by minimizing the MSPE from h- step ahead forecasting over the Validation SetGives penalty terms 𝜆 𝑜𝑝𝑡 and 𝜆"#$, which allows us to select features. Training Set Test (1) (2) (3)
  • 3. John Cai, University of Cambridge 3 1. LASSO selected the VADER Score rather than the NLTK Score, likely because VADER is trained on Social Media data. 2. Uncertainty is reflected in sentiment variance. In recessions, uncertainty and sentiment variance increase (Bloom, 2018). 3. Markov Switching Models would account for the structural breaks expected during recessions (Hamilton, 2010). Fig 3: Best Prediction Model ( 𝜆"#$) Fig 4: Model Evaluation Neutral BullishBearish High Error Very Low Error Low Error Limitations and ExtensionsResults and Implications 1. Twitter sentiments are most informative of BCI Twitter has the highest value in prediction compared to NYT, CNN and Trump’s Tweets. 2. Results are consistent with economic theory Variance of sentiments predicts business confidence because it is counter-cyclical (-- coefficent). Features Selected (sign of coeff) 1. Twitter Polarity Mean (+) 2. Twitter Polarity Variance (--) 3. VIX Index (--) 4. Returns (+) 5. Lagged Returns (+) 6. Lagged BCI (+) 3. Model works best during neutral and bullish periods As shown in Fig 4, the squared prediction error is much higher for bearish periods. 4. Possible extension: Markov Switching Models Performance in bearish periods could improve if Markov chains are used for structural breaks. Analyzing and Evaluating Results from the Prediction Model (1) (2) (3)