SlideShare a Scribd company logo
1 of 31
Yi-Shan Shir
Instructor: Dr. Nam P. Nguyen
Department of Computer and Information
Science
Towson University
EXPLORING CORRELATION
BETWEEN SENTIMENT OF
ENVIRONMENTAL TWEETS
AND THE STOCK MARKET
Overview
• Motivation
• Research Approaches
• Tools
• Data Collection
• Sentiment Analysis
• Analysis of Correlation between Sentiments and Stock Price
• Conclusion
Motivation
• Hiroko Okajima and Barin Nag, Department of e-Business and
Technology Management, Towson University
• Previous studies:
Sentiment on social media can predict stock market fluctuations
• Question:
What about specific terms?
-- Environmental tweets over 5 years.
Tools
• Environment: Ubuntu 16.04
• Language: Python, SQL
• Database: MySQL
• Approaches:
1. Natural Language Processing
-- Sentiment Analysis
2. Machine Learning
Locating Target Enterprises
• PHOTOGRAPH BY KAREN DUCEY, GETTY IMAGES
Locating Target Enterprises
• Target set 1:
Top 100 from the Fortune 500 list
• Target set 2:
Enterprises with significant(notorious) reputation on environmental
issues
-- accounts: tweets > 30K or top 50%
Category Company/Brand
IT (renewable energy) Amazon, Samsung, Google
Oil Shell, BP, Exxon
Palm Oil (Deforestation) Nestle, JNJ, Unilever
Wastes Starbucks, CocaCola, PepsiCo
Fast Food (Deforestation) McDonalds, BurgerKing, KFC, TacoBell
Data Collection(1): Twitter API
• Twitter API
• Python implementation: Tweetpy
• Cons: only allow data collection for the most recent week
Data Collection(2): Advanced Search
Data Collection(2): Advanced Search
• Scraping tweets from search result of Twitter advanced search
• Source code: Jefferson Henrique
https://github.com/Jefferson-Henrique/GetOldTweets-python
• Cons: adjustments has to be made once Twitter change something.
Data Storage
• Raw data:
-- tweets: 5,818,254 tweets
-- account: 158
• Database schema:
1. Raw data 2. Filtered data 3. Stock tickers
Data Storage
Data Filtering
• 1. Filter with Python:
-- Filtering through a list of keywords
-- Pros: fast, keep as much data as possible
-- Cons: lower accuracy
-- e.g. “He has a lot of energy.”
• 2. Filter with SQL
-- Filtering inside the DB
-- Pros: higher accuracy
-- Cons: slow, may leave out tweets
-- e.g. “energy efficiency” vs “energy with efficiency”
• Examining the data after filtering: not practical for large dataset.
-- Google “we recycle gmail accounts.”
Data Filtering
• Key words:
# emission
# renewable
# climate
# recycle
# waste
# resource
# pollution
# deforestation
# environmental
Data After Filtering
• Tweets: 68,655
• Accounts: 154
• Distribution:
Stock Price Collection: Quandl
• Financial Database
• Quandl API
• Python implementation: quandl
• Source: WIKI Prices DB from Quandl
Sentiment Analysis
• 2 approaches:
• 1. Vader: a sentiment analysis package in Python NLTK library
-- does all the NLP works for you!
-- claim to achieve 96% accuracy on tweets
• 2. Scikit-Learn: a machine learning library
-- input data has to be preprocessed
-- various choices of models
Sentiment Analysis: Vader
• Pro: easy to use, fast to run
• Cleaning text,
Weighting by booster words,
Assigning sentiment score according to a lexicon.
• Output:
1. normalized compound polarity score: -1 ~ 1
2. positive, neutral, negative
Sentiment Analysis: Data prepressing(1)
• Text Cleaning:
-- converting cases
-- removing additional white space, repeated characters
-- replacing URL, @, # with stopwards
•
Sentiment Analysis: Data prepressing (2)
• Feature Extraction:
-- removing stopwords
-- map contraction to original forms
-- appending cleaned words to the feature vector
• Feature Vector
Sentiment Analysis: Scikit-Learn(1)
• Training Data:
-- tweets: 1,615,343
-- source:
1. Sentiment 140
2. Crowdflower's Data for Everyone library
• Feature Extraction methods:
1. Bag of Words
2. TF-IDF (Term Frequency - Inverse Document Frequency)
Sentiment Analysis: Scikit-Learn(2)
• Models
1. Multinomial Naïve Bayes
2. Logistic Regression
3. SVM
• Accuracy:
• (2 feature extraction) x (3 models) = 6 results
• For each feature set, take mode of 3 results
Bag of Words TF-IDF
Multinomial Naïve Bayes 0.767 0.761
Logistic Regression 0.777 0.779
SVM 0.769 0.772
Sentiment Analysis
positive negative neutral
Bag of words 45,915 21,865 874
TF-IDF 46,442 21,933 279
Vader 48,851 11,386 8,417
Data Integration
• Sentiment data, stock price data, Twitter username vs. Stock ticker
data
• 1. Merge sentiment data with stock ticker data upon username
• 2. For date delta from 1 to 7, merge (1) with stock price data upon
date and ticker.
Linear Regression Analysis
• Sentiment data is highly sparse
--> time series is not applicable
• Dealing with sparseness:
-- goal: joining all sentiment data into one dataset
-- method: normalizing all stock price before integration
For t = delta of date, Y = closing price, D = normalized closing
-- output:
Linear Regression Analysis
• Variance score = 0
Plotting the Result (1)
• Google
Plotting the Result (2)
• Johnson & Johnson
Plotting the Result (3)
• McDonald’s
Plotting the Result (4)
• PepsiCo
Plotting the Result (5)
• Exxon
Conclusion
• Correlation between sentiment of environmental tweets and the
drops of stock price might exist in some cases.
• Issues:
1. Most tweets tweeted by official accounts are positive.
2. Different types of enterprises might focus on different aspect of
their corporate images.
• Future Work:
1. Improving filtering strategy
2. Exploring of other analysis models/ plotting strategy
3. Adding tweets mentioning these companies by other users
4. Analyzing tweets by environmental NGOs
5. Incorporating other Social Network Analysis approaches

More Related Content

What's hot

Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Implementing and analyzing online experiments
Implementing and analyzing online experimentsImplementing and analyzing online experiments
Implementing and analyzing online experimentsSean Taylor
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Manuel Martín
 
Uncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and explorationUncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and explorationSubhashis Hazarika
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchSujit Pal
 
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...iammyr
 
News Session-Based Recommendations Using Deep Neural Networks
News Session-Based Recommendations Using Deep Neural NetworksNews Session-Based Recommendations Using Deep Neural Networks
News Session-Based Recommendations Using Deep Neural NetworksFelipe Ferreira
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคลBAINIDA
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Database performance improvement, a six sigma project (mesure) by nirav shah
Database performance improvement, a six sigma project (mesure) by nirav shah Database performance improvement, a six sigma project (mesure) by nirav shah
Database performance improvement, a six sigma project (mesure) by nirav shah Nirav Shah
 
Query Plan Generation using Particle Swarm Optimization
Query Plan Generation using Particle Swarm OptimizationQuery Plan Generation using Particle Swarm Optimization
Query Plan Generation using Particle Swarm OptimizationAkshay Jain
 

What's hot (17)

Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Implementing and analyzing online experiments
Implementing and analyzing online experimentsImplementing and analyzing online experiments
Implementing and analyzing online experiments
 
Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014Quick presentation for the OpenML workshop in Eindhoven 2014
Quick presentation for the OpenML workshop in Eindhoven 2014
 
Mcs 021
Mcs 021Mcs 021
Mcs 021
 
Uncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and explorationUncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and exploration
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
Mcs 021 solve assignment
Mcs 021 solve assignmentMcs 021 solve assignment
Mcs 021 solve assignment
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Pre...
 
News Session-Based Recommendations Using Deep Neural Networks
News Session-Based Recommendations Using Deep Neural NetworksNews Session-Based Recommendations Using Deep Neural Networks
News Session-Based Recommendations Using Deep Neural Networks
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Database performance improvement, a six sigma project (mesure) by nirav shah
Database performance improvement, a six sigma project (mesure) by nirav shah Database performance improvement, a six sigma project (mesure) by nirav shah
Database performance improvement, a six sigma project (mesure) by nirav shah
 
STDCS
STDCSSTDCS
STDCS
 
Query Plan Generation using Particle Swarm Optimization
Query Plan Generation using Particle Swarm OptimizationQuery Plan Generation using Particle Swarm Optimization
Query Plan Generation using Particle Swarm Optimization
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 

Similar to Exploring Correlation Between Sentiment of Environmental Tweets and the Stock Market

1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptxAnusuya123
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical dataPaul Skeie
 
Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Intel® Software
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data ExtractionDasha Herrmannova
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningThi K. Tran-Nguyen, PhD
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Institute of Contemporary Sciences
 
Chapter-1 - Notes.pptx
Chapter-1 - Notes.pptxChapter-1 - Notes.pptx
Chapter-1 - Notes.pptxDATASCIENCE41
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Alex Pinto
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...Databricks
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda ProvenanceVlad Vega
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 

Similar to Exploring Correlation Between Sentiment of Environmental Tweets and the Stock Market (20)

1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 
Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*Get Your Hands Dirty with Intel® Distribution for Python*
Get Your Hands Dirty with Intel® Distribution for Python*
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 
Chapter-1 - Notes.pptx
Chapter-1 - Notes.pptxChapter-1 - Notes.pptx
Chapter-1 - Notes.pptx
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Cri big data
Cri big dataCri big data
Cri big data
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda Provenance
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 

More from Data Works MD

Data Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Works MD
 
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksJolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksData Works MD
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWaveData Works MD
 
Malware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningMalware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningData Works MD
 
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleData Works MD
 
A Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistA Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistData Works MD
 
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsRobotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsData Works MD
 
Connect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiConnect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiData Works MD
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningData Works MD
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData Works MD
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements LabelingData Works MD
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsData Works MD
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...Data Works MD
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataData Works MD
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelData Works MD
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthData Works MD
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis WorkshopData Works MD
 

More from Data Works MD (18)

Data Journalism at The Baltimore Banner
Data Journalism at The Baltimore BannerData Journalism at The Baltimore Banner
Data Journalism at The Baltimore Banner
 
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit StreaksJolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
Jolt’s Picks - Machine Learning and Major League Baseball Hit Streaks
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
 
Malware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine LearningMalware Detection, Enabled by Machine Learning
Malware Detection, Enabled by Machine Learning
 
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
 
A Day in the Life of a Data Journalist
A Day in the Life of a Data JournalistA Day in the Life of a Data Journalist
A Day in the Life of a Data Journalist
 
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson KitsRobotics and Machine Learning: Working with NVIDIA Jetson Kits
Robotics and Machine Learning: Working with NVIDIA Jetson Kits
 
Connect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFiConnect Data and Devices with Apache NiFi
Connect Data and Devices with Apache NiFi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in Baltimore
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG Data
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph Kernel
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood Health
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Exploring Correlation Between Sentiment of Environmental Tweets and the Stock Market

  • 1. Yi-Shan Shir Instructor: Dr. Nam P. Nguyen Department of Computer and Information Science Towson University EXPLORING CORRELATION BETWEEN SENTIMENT OF ENVIRONMENTAL TWEETS AND THE STOCK MARKET
  • 2. Overview • Motivation • Research Approaches • Tools • Data Collection • Sentiment Analysis • Analysis of Correlation between Sentiments and Stock Price • Conclusion
  • 3. Motivation • Hiroko Okajima and Barin Nag, Department of e-Business and Technology Management, Towson University • Previous studies: Sentiment on social media can predict stock market fluctuations • Question: What about specific terms? -- Environmental tweets over 5 years.
  • 4. Tools • Environment: Ubuntu 16.04 • Language: Python, SQL • Database: MySQL • Approaches: 1. Natural Language Processing -- Sentiment Analysis 2. Machine Learning
  • 5. Locating Target Enterprises • PHOTOGRAPH BY KAREN DUCEY, GETTY IMAGES
  • 6. Locating Target Enterprises • Target set 1: Top 100 from the Fortune 500 list • Target set 2: Enterprises with significant(notorious) reputation on environmental issues -- accounts: tweets > 30K or top 50% Category Company/Brand IT (renewable energy) Amazon, Samsung, Google Oil Shell, BP, Exxon Palm Oil (Deforestation) Nestle, JNJ, Unilever Wastes Starbucks, CocaCola, PepsiCo Fast Food (Deforestation) McDonalds, BurgerKing, KFC, TacoBell
  • 7. Data Collection(1): Twitter API • Twitter API • Python implementation: Tweetpy • Cons: only allow data collection for the most recent week
  • 9. Data Collection(2): Advanced Search • Scraping tweets from search result of Twitter advanced search • Source code: Jefferson Henrique https://github.com/Jefferson-Henrique/GetOldTweets-python • Cons: adjustments has to be made once Twitter change something.
  • 10. Data Storage • Raw data: -- tweets: 5,818,254 tweets -- account: 158 • Database schema: 1. Raw data 2. Filtered data 3. Stock tickers
  • 12. Data Filtering • 1. Filter with Python: -- Filtering through a list of keywords -- Pros: fast, keep as much data as possible -- Cons: lower accuracy -- e.g. “He has a lot of energy.” • 2. Filter with SQL -- Filtering inside the DB -- Pros: higher accuracy -- Cons: slow, may leave out tweets -- e.g. “energy efficiency” vs “energy with efficiency” • Examining the data after filtering: not practical for large dataset. -- Google “we recycle gmail accounts.”
  • 13. Data Filtering • Key words: # emission # renewable # climate # recycle # waste # resource # pollution # deforestation # environmental
  • 14. Data After Filtering • Tweets: 68,655 • Accounts: 154 • Distribution:
  • 15. Stock Price Collection: Quandl • Financial Database • Quandl API • Python implementation: quandl • Source: WIKI Prices DB from Quandl
  • 16. Sentiment Analysis • 2 approaches: • 1. Vader: a sentiment analysis package in Python NLTK library -- does all the NLP works for you! -- claim to achieve 96% accuracy on tweets • 2. Scikit-Learn: a machine learning library -- input data has to be preprocessed -- various choices of models
  • 17. Sentiment Analysis: Vader • Pro: easy to use, fast to run • Cleaning text, Weighting by booster words, Assigning sentiment score according to a lexicon. • Output: 1. normalized compound polarity score: -1 ~ 1 2. positive, neutral, negative
  • 18. Sentiment Analysis: Data prepressing(1) • Text Cleaning: -- converting cases -- removing additional white space, repeated characters -- replacing URL, @, # with stopwards •
  • 19. Sentiment Analysis: Data prepressing (2) • Feature Extraction: -- removing stopwords -- map contraction to original forms -- appending cleaned words to the feature vector • Feature Vector
  • 20. Sentiment Analysis: Scikit-Learn(1) • Training Data: -- tweets: 1,615,343 -- source: 1. Sentiment 140 2. Crowdflower's Data for Everyone library • Feature Extraction methods: 1. Bag of Words 2. TF-IDF (Term Frequency - Inverse Document Frequency)
  • 21. Sentiment Analysis: Scikit-Learn(2) • Models 1. Multinomial Naïve Bayes 2. Logistic Regression 3. SVM • Accuracy: • (2 feature extraction) x (3 models) = 6 results • For each feature set, take mode of 3 results Bag of Words TF-IDF Multinomial Naïve Bayes 0.767 0.761 Logistic Regression 0.777 0.779 SVM 0.769 0.772
  • 22. Sentiment Analysis positive negative neutral Bag of words 45,915 21,865 874 TF-IDF 46,442 21,933 279 Vader 48,851 11,386 8,417
  • 23. Data Integration • Sentiment data, stock price data, Twitter username vs. Stock ticker data • 1. Merge sentiment data with stock ticker data upon username • 2. For date delta from 1 to 7, merge (1) with stock price data upon date and ticker.
  • 24. Linear Regression Analysis • Sentiment data is highly sparse --> time series is not applicable • Dealing with sparseness: -- goal: joining all sentiment data into one dataset -- method: normalizing all stock price before integration For t = delta of date, Y = closing price, D = normalized closing -- output:
  • 25. Linear Regression Analysis • Variance score = 0
  • 26. Plotting the Result (1) • Google
  • 27. Plotting the Result (2) • Johnson & Johnson
  • 28. Plotting the Result (3) • McDonald’s
  • 29. Plotting the Result (4) • PepsiCo
  • 30. Plotting the Result (5) • Exxon
  • 31. Conclusion • Correlation between sentiment of environmental tweets and the drops of stock price might exist in some cases. • Issues: 1. Most tweets tweeted by official accounts are positive. 2. Different types of enterprises might focus on different aspect of their corporate images. • Future Work: 1. Improving filtering strategy 2. Exploring of other analysis models/ plotting strategy 3. Adding tweets mentioning these companies by other users 4. Analyzing tweets by environmental NGOs 5. Incorporating other Social Network Analysis approaches