SlideShare a Scribd company logo
1 of 3
Download to read offline
Predicting Business Confidence using News and Social Media
Why Predict Business Confidence?
Financial
Crisis
John Cai, University of Cambridge 1
Why use News Media and Social Media data?
Data Collection
8000+
CNN Articles containing
“US Economy” scraped
92000+
Tweets containing “US
Economy” obtained
32000+
Tweets obtained from
@realDonaldTrump
6000+
NYT Articles from the
Economy Section scraped
News Data Twitter Data
Fig 1: Business Confidence vs GDP Growth
1. Financial Crisis: The drop in business confidence preceded US GDP growth data by 1 month.
2. News: Scraped using Selenium, Beautiful Soup and Newspaper in Python.
3. Tweets: Obtained using Python’s Twitterscraper .
(2) (3)
Trump
Presidency
Lead Indicator
OECD Business Confidence
Index (BCI) is a key lead
indicator of GDP growth, as
shown in Fig 1. BCI measures
expectations using surveys.
Infrequency
OECD Business Confidence
Index is published monthly.
We do not currently have
daily or weekly estimates of
business confidence.
High Frequency
Sentiments in news media
and social media change in
real-time. We are able to
construct estimates of daily
business confidence.
Broad-based
Traditional estimates of daily
sentiments are from financial
markets. News and Social
Media capture sentiments in
the broader real economy.
- VIX index (uses option prices to measure volatility)
- S&P500 daily returns and change in daily returns
Financial Data
(1)
Note: Y-axis is the standardized value of the variables
Building a Prediction Model using NLP and ML
John Cai, University of Cambridge 2
1. NLTK: developed by Stanford and trained on movie reviews. VADER: Developed by Georgia Tech and trained on tweets.
2. Cross-validation using h-step ahead: Calculates MSPE by taking the actual value – forecasted value from the model.
3. Plot of Squared Prediction Error over time, to see how the relative performance of the two models varies over the test period.
Use Textblob to tokenize the
textual data at the words level
and the sentence level. Remove
stop-words and creating n-grams.
NLP
Perform Features Scaling with
scikit-learn by standardizing all
features. Set aside 20% of the
data for Testing (2016-2018) and
another 20% for Validation.
1. Data Pre-processing
Employ VADER’s lexical and rule-
based classifier and NLTK’s
Naïve Bayes Classifier to obtain
polarity and/or subjectivity.
2. Sentiment Analysis
Compute the average and cross-
sectional variance of sentiments
over every month. Omit features
with unit roots.
3. Feature Engineering
4. Data Preparation
Employ Cross-Validation LASSO
with a rolling forecast origin and
fixed window (adapted for time-
series). LASSO selects features
with high prediction value.
5. Training the Model
0
0.1
0.2
0.3
0.4
Oct-16 Apr-17 Oct-17 Apr-18 Oct-18
Less Parsimonious More Parsimonious
ML
Fig 2: Test-Set Squared Error
!"#$ , which corresponds to a
more parsimonious model, is
preferred as the model has a
smaller MSPE for out-of-sample
predictions (shown in Fig 2).
6. Model Selection
Training subset Validation
Get Mean Squared Prediction
Error (MSPE) over the Test Set
Cross Validation Loop
Tunes hyper-parameters by
minimizing the MSPE from h-
step ahead forecasting over
the Validation SetGives penalty terms !%&' and !"#$,
which allows us to select
features.
Training Set Test
(1)
(2)
(3)
John Cai, University of Cambridge 3
1. LASSO selected the VADER Score rather than the NLTK Score, likely because VADER is trained on Social Media data.
2. Uncertainty is reflected in sentiment variance. In recessions, uncertainty and sentiment variance increase (Bloom, 2018).
3. Markov Switching Models would account for the structural breaks expected during recessions (Hamilton, 2010).
Fig 3: Best Prediction Model (!"#$) Fig 4: Model Evaluation
Neutral BullishBearish
High
Error
Very Low
Error
Low
Error
Limitations and ExtensionsResults and Implications
1. Twitter sentiments are
most informative of BCI
Twitter has the highest value in
prediction compared to NYT,
CNN and Trump’s Tweets.
2. Results are consistent
with economic theory
Variance of sentiments predicts
business confidence because it
is counter-cyclical (-- coefficent).
Features Selected (sign of
coeff)
1. Twitter Polarity Mean (+)
2. Twitter Polarity Variance (--)
3. VIX Index (--)
4. Returns (+)
5. Lagged Returns (+)
6. Lagged BCI (+)
3. Model works best during
neutral and bullish periods
As shown in Fig 4, the squared
prediction error is much higher
for bearish periods.
4. Possible extension:
Markov Switching Models
Performance in bearish periods
could improve if Markov chains
are used for structural breaks.
Analyzing and Evaluating Results from the Prediction Model
(1) (2) (3)

More Related Content

Similar to Predicting Business Confidence using NLP

Machine Learning in Banking
Machine Learning in Banking Machine Learning in Banking
Machine Learning in Banking vrtanes
 
Bitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical PriceBitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical PriceIRJET Journal
 
w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018Open Security Summit
 
ML in banking
ML in bankingML in banking
ML in bankingvrtanes
 
Size matters a lot rick collins - technomics
Size matters a lot   rick collins - technomicsSize matters a lot   rick collins - technomics
Size matters a lot rick collins - technomicsNesma
 
Pillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionPillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionBenjamin Huston
 
Text book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docxText book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docxmehek4
 
Global CISO Forum 2017: How To Measure Anything In Cybersecurity Risk
Global CISO Forum 2017: How To Measure Anything In Cybersecurity RiskGlobal CISO Forum 2017: How To Measure Anything In Cybersecurity Risk
Global CISO Forum 2017: How To Measure Anything In Cybersecurity RiskEC-Council
 
IRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score IndexingIRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score IndexingIRJET Journal
 
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.IRJET Journal
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSManishReddy706923
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldJeomoan Kurian
 
Report 190804110930
Report 190804110930Report 190804110930
Report 190804110930udara12345
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked EnsembleCOVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked EnsembleIRJET Journal
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental AnalysisIRJET Journal
 
Nick Jesteadt: Predictive Attrition Using Survival Analysis
Nick Jesteadt: Predictive Attrition Using Survival AnalysisNick Jesteadt: Predictive Attrition Using Survival Analysis
Nick Jesteadt: Predictive Attrition Using Survival AnalysisEdunomica
 
Stock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxStock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxChen Qian
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsDavidkerrkelly
 

Similar to Predicting Business Confidence using NLP (20)

Machine Learning in Banking
Machine Learning in Banking Machine Learning in Banking
Machine Learning in Banking
 
Bitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical PriceBitcoin Price Prediction using Sentiment and Historical Price
Bitcoin Price Prediction using Sentiment and Historical Price
 
w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018w-cyber-risk-modeling Owasp cyber risk quantification 2018
w-cyber-risk-modeling Owasp cyber risk quantification 2018
 
ML in banking
ML in bankingML in banking
ML in banking
 
Size matters a lot rick collins - technomics
Size matters a lot   rick collins - technomicsSize matters a lot   rick collins - technomics
Size matters a lot rick collins - technomics
 
Pillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted versionPillar III presentation 2 27-15 - redacted version
Pillar III presentation 2 27-15 - redacted version
 
Text book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docxText book title and AuthorMoeller, Robert R. IT audit, cont.docx
Text book title and AuthorMoeller, Robert R. IT audit, cont.docx
 
Global CISO Forum 2017: How To Measure Anything In Cybersecurity Risk
Global CISO Forum 2017: How To Measure Anything In Cybersecurity RiskGlobal CISO Forum 2017: How To Measure Anything In Cybersecurity Risk
Global CISO Forum 2017: How To Measure Anything In Cybersecurity Risk
 
IRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score IndexingIRJET - Bankruptcy Score Indexing
IRJET - Bankruptcy Score Indexing
 
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data World
 
Report 190804110930
Report 190804110930Report 190804110930
Report 190804110930
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked EnsembleCOVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
COVID Sentiment Analysis of Social Media Data Using Enhanced Stacked Ensemble
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
Nick Jesteadt: Predictive Attrition Using Survival Analysis
Nick Jesteadt: Predictive Attrition Using Survival AnalysisNick Jesteadt: Predictive Attrition Using Survival Analysis
Nick Jesteadt: Predictive Attrition Using Survival Analysis
 
Stock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptxStock Market Trends Prediction after Earning Release.pptx
Stock Market Trends Prediction after Earning Release.pptx
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Predicting Business Confidence using NLP

  • 1. Predicting Business Confidence using News and Social Media Why Predict Business Confidence? Financial Crisis John Cai, University of Cambridge 1 Why use News Media and Social Media data? Data Collection 8000+ CNN Articles containing “US Economy” scraped 92000+ Tweets containing “US Economy” obtained 32000+ Tweets obtained from @realDonaldTrump 6000+ NYT Articles from the Economy Section scraped News Data Twitter Data Fig 1: Business Confidence vs GDP Growth 1. Financial Crisis: The drop in business confidence preceded US GDP growth data by 1 month. 2. News: Scraped using Selenium, Beautiful Soup and Newspaper in Python. 3. Tweets: Obtained using Python’s Twitterscraper . (2) (3) Trump Presidency Lead Indicator OECD Business Confidence Index (BCI) is a key lead indicator of GDP growth, as shown in Fig 1. BCI measures expectations using surveys. Infrequency OECD Business Confidence Index is published monthly. We do not currently have daily or weekly estimates of business confidence. High Frequency Sentiments in news media and social media change in real-time. We are able to construct estimates of daily business confidence. Broad-based Traditional estimates of daily sentiments are from financial markets. News and Social Media capture sentiments in the broader real economy. - VIX index (uses option prices to measure volatility) - S&P500 daily returns and change in daily returns Financial Data (1) Note: Y-axis is the standardized value of the variables
  • 2. Building a Prediction Model using NLP and ML John Cai, University of Cambridge 2 1. NLTK: developed by Stanford and trained on movie reviews. VADER: Developed by Georgia Tech and trained on tweets. 2. Cross-validation using h-step ahead: Calculates MSPE by taking the actual value – forecasted value from the model. 3. Plot of Squared Prediction Error over time, to see how the relative performance of the two models varies over the test period. Use Textblob to tokenize the textual data at the words level and the sentence level. Remove stop-words and creating n-grams. NLP Perform Features Scaling with scikit-learn by standardizing all features. Set aside 20% of the data for Testing (2016-2018) and another 20% for Validation. 1. Data Pre-processing Employ VADER’s lexical and rule- based classifier and NLTK’s Naïve Bayes Classifier to obtain polarity and/or subjectivity. 2. Sentiment Analysis Compute the average and cross- sectional variance of sentiments over every month. Omit features with unit roots. 3. Feature Engineering 4. Data Preparation Employ Cross-Validation LASSO with a rolling forecast origin and fixed window (adapted for time- series). LASSO selects features with high prediction value. 5. Training the Model 0 0.1 0.2 0.3 0.4 Oct-16 Apr-17 Oct-17 Apr-18 Oct-18 Less Parsimonious More Parsimonious ML Fig 2: Test-Set Squared Error !"#$ , which corresponds to a more parsimonious model, is preferred as the model has a smaller MSPE for out-of-sample predictions (shown in Fig 2). 6. Model Selection Training subset Validation Get Mean Squared Prediction Error (MSPE) over the Test Set Cross Validation Loop Tunes hyper-parameters by minimizing the MSPE from h- step ahead forecasting over the Validation SetGives penalty terms !%&' and !"#$, which allows us to select features. Training Set Test (1) (2) (3)
  • 3. John Cai, University of Cambridge 3 1. LASSO selected the VADER Score rather than the NLTK Score, likely because VADER is trained on Social Media data. 2. Uncertainty is reflected in sentiment variance. In recessions, uncertainty and sentiment variance increase (Bloom, 2018). 3. Markov Switching Models would account for the structural breaks expected during recessions (Hamilton, 2010). Fig 3: Best Prediction Model (!"#$) Fig 4: Model Evaluation Neutral BullishBearish High Error Very Low Error Low Error Limitations and ExtensionsResults and Implications 1. Twitter sentiments are most informative of BCI Twitter has the highest value in prediction compared to NYT, CNN and Trump’s Tweets. 2. Results are consistent with economic theory Variance of sentiments predicts business confidence because it is counter-cyclical (-- coefficent). Features Selected (sign of coeff) 1. Twitter Polarity Mean (+) 2. Twitter Polarity Variance (--) 3. VIX Index (--) 4. Returns (+) 5. Lagged Returns (+) 6. Lagged BCI (+) 3. Model works best during neutral and bullish periods As shown in Fig 4, the squared prediction error is much higher for bearish periods. 4. Possible extension: Markov Switching Models Performance in bearish periods could improve if Markov chains are used for structural breaks. Analyzing and Evaluating Results from the Prediction Model (1) (2) (3)