SlideShare a Scribd company logo
1 of 20
Predicting Stock
Prices using
News data
Group L - July 31, 2020
Motivation
5
To make YOU rich.
DJIA Index
Problem & High Level Conclusions
3
Do news articles reveal more insight
about future DJIA indices than the
historical DJIA indices?
02
is it nowadays possible to anticipate
the market behavior based on its
evolution during the past years ?
01
01 NO
NO
General Problem:
How do we predict the stock market using historical stock prices and news articles?
Overview of Approach
4
Lorem 1
Convert to to lowercase
Lemmatize
Remove rare words
Remove non
Alphanumeric Characters
Clean up
Lorem 2
Text Vectorizations
● Word count
● Binary
● TF-IDF
● Frequency
Word Embeddings
● Glove vectors
● Co-occurrence
based word
embeddings
Lorem 3
Time Series Analysis
SVM Classifiers
Neural Networks
LSTM
Lorem 4
Baselines
B1: The predictor which
always predict up
B2: The predictor which
always predict the
previous day trend
Confidence interval of
the true error w. r. t.
baselines.
Pre process
Text Vectorizations
& Word Embeddings
Design Experiments
and Train Models
Evaluate
SVM - Approach
5
SVM parameters
● C value
● Gamma value
● Kernels:
○ Radial basis
function (rbf)
○ Polynomial
○ Sigmoid
○ Linear
Text vectorization techniques
● Binary occurrence
vectorization (binary)
● Word count
vectorization (count)
● Term frequency–
inverse document
frequency vectorization
(tfidf)
● Frequency
vectorization (freq)
Reducing Dimensionality
Techniques
● Truncated singular
value decomposition
(SVD)
● Principal component
analysis (PCA)
SVM - Results & Discussion
6
Neural Networks - Approach
7
Text vectorization techniques
● Binary occurrence
vectorization (binary)
● Word count
vectorization (count)
● Term frequency–
inverse document
frequency vectorization
(tfidf)
● Frequency
vectorization (freq)
Fine tune Neural Network
Parameters
● Hidden Layers were
varied from 1-3 layers
● Change the number of
neurons in the hidden
layers
● Loss function: Binary
Cross-entropy
● Optimizer: adam
● Activation: Relu
● Output layer Activation:
Sigmoid
Neural Network - Results & Discussion
8
LSTM - Overview
9
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM - Results
10
LSTM - Discussion
Hyper-parameter Parameter Value #1 Parameter Value #2
Word Embedding Traditional co-occurrence
based word embeddings
Glove Vectors
Word embedding
Dimension
50 75
Direction Bi-directional Uni-directional
Optimization Algorithm SGD Adam
11
Time Series - Approach
12
Time Series Approach
AutoRegressive Integrated
Moving-Average (ARIMA)
SVM
All Binary
Regression
Out-sample forecasting
In-sample forecasting Classification
Time Series - Results
13
ARIMA:
(a) (b)
SVM:
Time Series - Discussion
The parameter tuning is not straight-forward...
14
Time Series - Discussion
15
Conclusion - Research Questions
16
Do news articles reveal more
information on the short term (day-
ahead) evolution of the DJIA index
than the past evolution of this
market index itself ?
02
is it nowadays possible to anticipate
the market behavior based on its
evolution during the past years ?
01
01 NO
NO
Unfortunately, none of our
models can make you rich.
Appendix 1: Randomness
18
Appendix 2: the DJIA index
19
Text Vectorizations
1. Word Count Vectorization: A text encoding technique that converts a collection of
texts or text documents into a matrix of token counts.
2. Frequency Vectorization: A text vectorization technique using the frequency of
occurrence of a particular word in a text.
3. Term Frequency–Inverse Document Frequency (TF–IDF) Vectorization: A text
encoding technique that converts a collection of text or text documents to a matrix of
TF-IDF features.TF-IDF determines how much a particular word is relevant to a
particular text or document. This method is more convenient than the Frequency
vectorization
4. Binary occurrence vectorization: This is a variant of the Word count vectorization
where the occurrence of a word is considered instead of the word count. There
might be cases where the binary occurrence vectorization might offer better
features.
20

More Related Content

Similar to Predicting Stock Prices Using News Data Analysis

The Next Static Code Analysis Tool - Today and Tomorrow
The Next Static Code Analysis Tool - Today and TomorrowThe Next Static Code Analysis Tool - Today and Tomorrow
The Next Static Code Analysis Tool - Today and TomorrowM Firdaus Harun
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxKiranKumar918931
 
Directional movement index based machine learning strategy for predicting st...
Directional movement index based machine learning strategy  for predicting st...Directional movement index based machine learning strategy  for predicting st...
Directional movement index based machine learning strategy for predicting st...IJECEIAES
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLPYunyao Li
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveJune Andrews
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesIRJET Journal
 
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification Standard
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification Standard제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification Standard
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification StandardBOAZ Bigdata
 
A Sneak Peek into Artificial Intelligence Based HFT Trading Strategies
A Sneak Peek into Artificial Intelligence Based HFT Trading StrategiesA Sneak Peek into Artificial Intelligence Based HFT Trading Strategies
A Sneak Peek into Artificial Intelligence Based HFT Trading StrategiesQuantInsti
 
Quant Developer Career Entry Guide | Matrice.co.uk
Quant Developer Career Entry Guide | Matrice.co.ukQuant Developer Career Entry Guide | Matrice.co.uk
Quant Developer Career Entry Guide | Matrice.co.ukMatrice
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...AgileNetwork
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...NETWAYS
 
How to do code review and use analysis tool in software development
How to do code review and use analysis tool in software developmentHow to do code review and use analysis tool in software development
How to do code review and use analysis tool in software developmentMitosis Technology
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 
Panel Discussion SAP DevOps - ReleaseOwl
Panel Discussion SAP DevOps - ReleaseOwlPanel Discussion SAP DevOps - ReleaseOwl
Panel Discussion SAP DevOps - ReleaseOwl☁ Niranjan Gattupalli
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeAndre Langevin
 

Similar to Predicting Stock Prices Using News Data Analysis (20)

The Next Static Code Analysis Tool - Today and Tomorrow
The Next Static Code Analysis Tool - Today and TomorrowThe Next Static Code Analysis Tool - Today and Tomorrow
The Next Static Code Analysis Tool - Today and Tomorrow
 
CCDE Experience
CCDE ExperienceCCDE Experience
CCDE Experience
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Directional movement index based machine learning strategy for predicting st...
Directional movement index based machine learning strategy  for predicting st...Directional movement index based machine learning strategy  for predicting st...
Directional movement index based machine learning strategy for predicting st...
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLP
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
 
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification Standard
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification Standard제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification Standard
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [BICS팀] : Boaz Industry Classification Standard
 
A Sneak Peek into Artificial Intelligence Based HFT Trading Strategies
A Sneak Peek into Artificial Intelligence Based HFT Trading StrategiesA Sneak Peek into Artificial Intelligence Based HFT Trading Strategies
A Sneak Peek into Artificial Intelligence Based HFT Trading Strategies
 
Quant Developer Career Entry Guide | Matrice.co.uk
Quant Developer Career Entry Guide | Matrice.co.ukQuant Developer Career Entry Guide | Matrice.co.uk
Quant Developer Career Entry Guide | Matrice.co.uk
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
 
RESUME_VBA2
RESUME_VBA2RESUME_VBA2
RESUME_VBA2
 
OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...OSMC 2023 | Journey to observability: tracking every function execution in pr...
OSMC 2023 | Journey to observability: tracking every function execution in pr...
 
How to do code review and use analysis tool in software development
How to do code review and use analysis tool in software developmentHow to do code review and use analysis tool in software development
How to do code review and use analysis tool in software development
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
2019_7816154.pdf
2019_7816154.pdf2019_7816154.pdf
2019_7816154.pdf
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
Panel Discussion SAP DevOps - ReleaseOwl
Panel Discussion SAP DevOps - ReleaseOwlPanel Discussion SAP DevOps - ReleaseOwl
Panel Discussion SAP DevOps - ReleaseOwl
 
Wall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache GeodeWall Street Derivative Risk Solutions Using Apache Geode
Wall Street Derivative Risk Solutions Using Apache Geode
 

More from Nimmi Weeraddana

Deep reinforcement learning for de novo drug design
Deep reinforcement learning for de novo drug designDeep reinforcement learning for de novo drug design
Deep reinforcement learning for de novo drug designNimmi Weeraddana
 
Wilderness Touch Screen Display
Wilderness Touch Screen DisplayWilderness Touch Screen Display
Wilderness Touch Screen DisplayNimmi Weeraddana
 
Application of tree based structures in machine learning to a real word scenario
Application of tree based structures in machine learning to a real word scenarioApplication of tree based structures in machine learning to a real word scenario
Application of tree based structures in machine learning to a real word scenarioNimmi Weeraddana
 
Essentials of law short note (version 3)
Essentials of law short note (version 3)Essentials of law short note (version 3)
Essentials of law short note (version 3)Nimmi Weeraddana
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)Nimmi Weeraddana
 
Computer networks short note (version 8)
Computer networks short note (version 8)Computer networks short note (version 8)
Computer networks short note (version 8)Nimmi Weeraddana
 
Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdNimmi Weeraddana
 

More from Nimmi Weeraddana (7)

Deep reinforcement learning for de novo drug design
Deep reinforcement learning for de novo drug designDeep reinforcement learning for de novo drug design
Deep reinforcement learning for de novo drug design
 
Wilderness Touch Screen Display
Wilderness Touch Screen DisplayWilderness Touch Screen Display
Wilderness Touch Screen Display
 
Application of tree based structures in machine learning to a real word scenario
Application of tree based structures in machine learning to a real word scenarioApplication of tree based structures in machine learning to a real word scenario
Application of tree based structures in machine learning to a real word scenario
 
Essentials of law short note (version 3)
Essentials of law short note (version 3)Essentials of law short note (version 3)
Essentials of law short note (version 3)
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
Computer networks short note (version 8)
Computer networks short note (version 8)Computer networks short note (version 8)
Computer networks short note (version 8)
 
Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pd
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Predicting Stock Prices Using News Data Analysis

  • 1. Predicting Stock Prices using News data Group L - July 31, 2020
  • 2. Motivation 5 To make YOU rich. DJIA Index
  • 3. Problem & High Level Conclusions 3 Do news articles reveal more insight about future DJIA indices than the historical DJIA indices? 02 is it nowadays possible to anticipate the market behavior based on its evolution during the past years ? 01 01 NO NO General Problem: How do we predict the stock market using historical stock prices and news articles?
  • 4. Overview of Approach 4 Lorem 1 Convert to to lowercase Lemmatize Remove rare words Remove non Alphanumeric Characters Clean up Lorem 2 Text Vectorizations ● Word count ● Binary ● TF-IDF ● Frequency Word Embeddings ● Glove vectors ● Co-occurrence based word embeddings Lorem 3 Time Series Analysis SVM Classifiers Neural Networks LSTM Lorem 4 Baselines B1: The predictor which always predict up B2: The predictor which always predict the previous day trend Confidence interval of the true error w. r. t. baselines. Pre process Text Vectorizations & Word Embeddings Design Experiments and Train Models Evaluate
  • 5. SVM - Approach 5 SVM parameters ● C value ● Gamma value ● Kernels: ○ Radial basis function (rbf) ○ Polynomial ○ Sigmoid ○ Linear Text vectorization techniques ● Binary occurrence vectorization (binary) ● Word count vectorization (count) ● Term frequency– inverse document frequency vectorization (tfidf) ● Frequency vectorization (freq) Reducing Dimensionality Techniques ● Truncated singular value decomposition (SVD) ● Principal component analysis (PCA)
  • 6. SVM - Results & Discussion 6
  • 7. Neural Networks - Approach 7 Text vectorization techniques ● Binary occurrence vectorization (binary) ● Word count vectorization (count) ● Term frequency– inverse document frequency vectorization (tfidf) ● Frequency vectorization (freq) Fine tune Neural Network Parameters ● Hidden Layers were varied from 1-3 layers ● Change the number of neurons in the hidden layers ● Loss function: Binary Cross-entropy ● Optimizer: adam ● Activation: Relu ● Output layer Activation: Sigmoid
  • 8. Neural Network - Results & Discussion 8
  • 11. LSTM - Discussion Hyper-parameter Parameter Value #1 Parameter Value #2 Word Embedding Traditional co-occurrence based word embeddings Glove Vectors Word embedding Dimension 50 75 Direction Bi-directional Uni-directional Optimization Algorithm SGD Adam 11
  • 12. Time Series - Approach 12 Time Series Approach AutoRegressive Integrated Moving-Average (ARIMA) SVM All Binary Regression Out-sample forecasting In-sample forecasting Classification
  • 13. Time Series - Results 13 ARIMA: (a) (b) SVM:
  • 14. Time Series - Discussion The parameter tuning is not straight-forward... 14
  • 15. Time Series - Discussion 15
  • 16. Conclusion - Research Questions 16 Do news articles reveal more information on the short term (day- ahead) evolution of the DJIA index than the past evolution of this market index itself ? 02 is it nowadays possible to anticipate the market behavior based on its evolution during the past years ? 01 01 NO NO
  • 17. Unfortunately, none of our models can make you rich.
  • 19. Appendix 2: the DJIA index 19
  • 20. Text Vectorizations 1. Word Count Vectorization: A text encoding technique that converts a collection of texts or text documents into a matrix of token counts. 2. Frequency Vectorization: A text vectorization technique using the frequency of occurrence of a particular word in a text. 3. Term Frequency–Inverse Document Frequency (TF–IDF) Vectorization: A text encoding technique that converts a collection of text or text documents to a matrix of TF-IDF features.TF-IDF determines how much a particular word is relevant to a particular text or document. This method is more convenient than the Frequency vectorization 4. Binary occurrence vectorization: This is a variant of the Word count vectorization where the occurrence of a word is considered instead of the word count. There might be cases where the binary occurrence vectorization might offer better features. 20