SlideShare a Scribd company logo
1 of 13
Outline
Introduction
• Background
• Purpose
• Audience
• Hypothesis
Statement
• Dataset
• Architecture
Product
Assessment
• What works
• What doesn’t
• Steps to improve it
• Conclusions
Methodology
• Data scraping & wrangling
• Feature selection
• NLP of headlines
• Modeling and prediction
• Testing and validation
• FP3 Product
Demonstration
1 2 3
Financial Performance Prediction Project (FP3):
Analyzing IPO Stories to Predict Stock Price Trajectories
Outline
Introduction
• Background
• Purpose
• Audience
• Hypothesis
Statement
• Dataset
• Architecture
Product Assessment
• What works
• Challenges
• Back-burner tasks
Methodology &
Data Product
• Data scraping &
wrangling
• Headline sentiment
analysis, NLP(s)
• Feature selection
• Modeling and
prediction
• DEMONSTRATION
1 2 3
Introduction
Introduction
• Intro to IPOs
• Hypothesis
Statement
• Dataset
• Architecture
Project Hypothesis: Sentiment
analysis of media headlines about
an IPO can be used to predict the
trajectory of the stock price over
the first three months.
Introduction
• Intro to IPOs
• Hypothesis
Statement
• Dataset
• Architecture
5395 headlines
16 IPOs
4 industries
Company Ticker Industry IPO Date Instances
Alibaba BABA Technology 2014-09-19 544
Blue Apron APRN Food/Drink 2017-06-29 125
Etsy ETSY Retail 2015-04-16 104
Facebook FB Technology 2012-05-18 1977
Ferrari RACE Automotive 2015-10-21 46
Fitbit FIT Technology 2015-06-18 114
General Motors GM Automotive 2010-11-18 368
GoPro GPRO Technology 2014-06-26 91
Groupon GRPN Retail 2011-11-04 426
LinkedIn LNKD Technology 2011-05-19 193
Shake Shack SHAK Food/Drink 2015-01-30 103
Snapchat SNAP Technology 2017-03-02 185
Stitch Fix SFIX Retail 2017-11-17 33
Tesla TSLA Automotive 2010-06-29 94
Twitter TWTR Technology 2013-11-07 902
Workday WDAY Technology 2012-10-12 90
Methodology and Data Product
Methodology &
Data Product
• Data scraping & wrangling
• Headline sentiment, NLP
• Feature selection
• Modeling and prediction
3 Data Sources:
• Dow Jones Factiva
• www.iposcoop.com
• Morningstar API
• Inner join merged into one dataframe
• Normalized prior to subsequent processing
Methodology &
Data Product
• Wrangling
• Headline sentiment, NLP
• Feature selection
• Modeling and prediction
Sentiment Analysis:
Empath vs. OpinionFinder
Built-in Lexicon vs. Built-in Lexicon
● Included both in model
Feature Selection:
● Early results were poor (32 – 52 intuitive features)
● CountVectorization to headline text -> 4K features
● Principal Component Analysis (k=0.95 ) → 2K
features
Methodology &
Data Product
• Data scraping & wrangling
• Headline sentiment, NLP
• Feature selection
• Modeling and prediction
Models Tested:
• LinearSVC (Support Vector Machine)
• NuSVC
• SVC
• Kneighbors
• SGDClassifier (Stochastic Gradient Descent)
Model Selected: LinearSVC
• Best at predicting price trajectory (both positive
and negative) over 90-day period
• LogisticRegression
• LogisticRegressionCV
• BaggingClassifier
• ExtraTreesClassifier
• RandomForestClassifier
• MultinomialNB (Naive Bayes)
Demonstration
Product Assessment
Product Assessment
● What works – scraping, wrangling, sentiment analysis
(eventually), predictor
● Challenges – knowing the sentiment analysis algorithm &
feature selection for best modeling results
● Back-burner improvements – end-to-end automation, GUI
Github Repo: https://github.com/georgetown-analytics/MFP3

More Related Content

Similar to Headline sentiment analysis for ipos

351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 

Similar to Headline sentiment analysis for ipos (20)

351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
 
Building Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | HautelookBuilding Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | Hautelook
 
Data-Driven Organisation
Data-Driven OrganisationData-Driven Organisation
Data-Driven Organisation
 
Un-siloing data science teams
Un-siloing data science teamsUn-siloing data science teams
Un-siloing data science teams
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
 
Pathways Overview For Open House 19 Sep2010
Pathways Overview For Open House   19 Sep2010Pathways Overview For Open House   19 Sep2010
Pathways Overview For Open House 19 Sep2010
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Solved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdfSolved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdf
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
MLIntro_ADA.pptx
MLIntro_ADA.pptxMLIntro_ADA.pptx
MLIntro_ADA.pptx
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
 
Having a strategy for new service development – does it really matter?
Having a strategy for new service development – does it really matter?Having a strategy for new service development – does it really matter?
Having a strategy for new service development – does it really matter?
 
Machine learning
Machine learningMachine learning
Machine learning
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdf
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 

Recently uploaded

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Recently uploaded (20)

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Headline sentiment analysis for ipos

  • 1. Outline Introduction • Background • Purpose • Audience • Hypothesis Statement • Dataset • Architecture Product Assessment • What works • What doesn’t • Steps to improve it • Conclusions Methodology • Data scraping & wrangling • Feature selection • NLP of headlines • Modeling and prediction • Testing and validation • FP3 Product Demonstration 1 2 3 Financial Performance Prediction Project (FP3): Analyzing IPO Stories to Predict Stock Price Trajectories
  • 2. Outline Introduction • Background • Purpose • Audience • Hypothesis Statement • Dataset • Architecture Product Assessment • What works • Challenges • Back-burner tasks Methodology & Data Product • Data scraping & wrangling • Headline sentiment analysis, NLP(s) • Feature selection • Modeling and prediction • DEMONSTRATION 1 2 3
  • 4. Introduction • Intro to IPOs • Hypothesis Statement • Dataset • Architecture Project Hypothesis: Sentiment analysis of media headlines about an IPO can be used to predict the trajectory of the stock price over the first three months.
  • 5. Introduction • Intro to IPOs • Hypothesis Statement • Dataset • Architecture 5395 headlines 16 IPOs 4 industries Company Ticker Industry IPO Date Instances Alibaba BABA Technology 2014-09-19 544 Blue Apron APRN Food/Drink 2017-06-29 125 Etsy ETSY Retail 2015-04-16 104 Facebook FB Technology 2012-05-18 1977 Ferrari RACE Automotive 2015-10-21 46 Fitbit FIT Technology 2015-06-18 114 General Motors GM Automotive 2010-11-18 368 GoPro GPRO Technology 2014-06-26 91 Groupon GRPN Retail 2011-11-04 426 LinkedIn LNKD Technology 2011-05-19 193 Shake Shack SHAK Food/Drink 2015-01-30 103 Snapchat SNAP Technology 2017-03-02 185 Stitch Fix SFIX Retail 2017-11-17 33 Tesla TSLA Automotive 2010-06-29 94 Twitter TWTR Technology 2013-11-07 902 Workday WDAY Technology 2012-10-12 90
  • 7. Methodology & Data Product • Data scraping & wrangling • Headline sentiment, NLP • Feature selection • Modeling and prediction 3 Data Sources: • Dow Jones Factiva • www.iposcoop.com • Morningstar API • Inner join merged into one dataframe • Normalized prior to subsequent processing
  • 8. Methodology & Data Product • Wrangling • Headline sentiment, NLP • Feature selection • Modeling and prediction Sentiment Analysis: Empath vs. OpinionFinder Built-in Lexicon vs. Built-in Lexicon ● Included both in model Feature Selection: ● Early results were poor (32 – 52 intuitive features) ● CountVectorization to headline text -> 4K features ● Principal Component Analysis (k=0.95 ) → 2K features
  • 9. Methodology & Data Product • Data scraping & wrangling • Headline sentiment, NLP • Feature selection • Modeling and prediction Models Tested: • LinearSVC (Support Vector Machine) • NuSVC • SVC • Kneighbors • SGDClassifier (Stochastic Gradient Descent) Model Selected: LinearSVC • Best at predicting price trajectory (both positive and negative) over 90-day period • LogisticRegression • LogisticRegressionCV • BaggingClassifier • ExtraTreesClassifier • RandomForestClassifier • MultinomialNB (Naive Bayes)
  • 12. Product Assessment ● What works – scraping, wrangling, sentiment analysis (eventually), predictor ● Challenges – knowing the sentiment analysis algorithm & feature selection for best modeling results ● Back-burner improvements – end-to-end automation, GUI