SlideShare a Scribd company logo
1 of 10
Download to read offline
When searching for investment opportunities, finding a company
to research is the first critical step. Current online tools often
neglect this very important aspect of investment research..
CompanySearch helps users find publicly traded companies
based on financial filters and user-defined keywords
Information Need
● With thousands of companies to choose from, it can be difficult for an investor to
find publicly traded companies which fit their individual investment needs.
● Casual investors need a way to transform broad ideas (i.e. “I am interested in the
machine learning revolution”) into tangible investment prospects (i.e. “I should
invest in Baidu”)
● CompanySearch is the culmination of this information need. In the above
example, an investor might enter the category “information technology” along
with the keywords “machine”, “learning”, “artificial”, and “intelligence”.
CompanySearch will return relevant companies like Baidu!
Input/Output Good Example
● Categories: Large Cap, Value
● Keywords: Airlines, Travel, Overseas
● Top Result: United Airlines
● Analysis: This result matched our query because it was a large cap, value stock and
matched several keywords. Additionally, the company description was found to be
topically similar to the keywords by LDA.
Input/Output Bad Example
● Categories: Information Technology
● Keywords: Social, Mobile, Video, Snapchat
● Top Result: Tegna Inc.
● Analysis: Tegna was selected because by its own company description, it plays a
heavy roll in social and mobile platforms. Even though it has nothing to do with
snapchat, our system does not penalize for that. A more accurate result might
have been SNAP, which more closely matches the query but whose company
description is very vague and does not match many of the keywords. In general,
our system does not perform well on very specific queries with a specific company
in mind. Then again, that was never our intended use case.
Live Demo
companysearcherv3.
herokuapp.com
Changes Since First Prototype
● Many UI improvements based on user feedback emphasizing readability
● Integration of machine learning, specifically LDA and k-means clustering
● Twitter data is now cached, addressing limitations of the Twitter API mentioned
in peer review
● Users are now given feedback as to why results are shown, addressing the need for
transparency brought up in peer review
● Improved ranking metric, weighing keyword similarity above categorical
similarity
● Removed query expansion (it was too buggy and gave odd answers, i.e. the
“machine” in “machine learning” expanded to “car” and “auto”)
Qualitative Evaluation
● Categories: Large Cap, Value
● Keywords: Airlines, Travel, Overseas
● Prototype 2 Output: World Fuel Services
● Final Project First Output: United Airlines
● Analysis: Our final project gives a much better answer than our second prototype.
Using topic modeling, our system can better hone in on airlines and travelling
agencies. Additionally, removing thesaurus-based query expansion resulted in
much more predictable results!
Class Concepts
● Cosine similarity
● Jaccard similarity
● TF-IDF matrix
● Term-document matrix
● Pseudo-relevance feedback
● LDA
● Sentiment analysis
Trial and Error / Challenges
● There was a significant tradeoff between number of topics used in LDA and the
runtime of our application. In general, making the tradeoff between runtime and
performance was the largest challenge we dealt with
● We implemented pseudo-Rocchio query expansion and thesaurus-based query
expansion, and reverse stemming, but found all led to worse query results :(
● We tried to make the graph in our results interactive, but it proved to be very time
consuming and resource intensive
Improvements
● Better integration with currently available investment tools. Specifically,
integrating Bloomberg services into our application could make it a “one stop
shop” from start to finish for investors
● Use of enterprise Twitter API would allow for more current tweets and thus more
relevant user results
Known Issues
● Not every company in our data set has a description (although the majority do)
● Graphical results are pulled from CNN, which on occasion has issues preventing
the images from downloading

More Related Content

What's hot

AnkitaDatabaseadmistrator
AnkitaDatabaseadmistratorAnkitaDatabaseadmistrator
AnkitaDatabaseadmistrator
ankita sinha
 
E-commerce Product Rating
E-commerce Product RatingE-commerce Product Rating
E-commerce Product Rating
Ranky Disuja
 

What's hot (9)

AnkitaDatabaseadmistrator
AnkitaDatabaseadmistratorAnkitaDatabaseadmistrator
AnkitaDatabaseadmistrator
 
Your Journey to Cognitive
Your Journey to CognitiveYour Journey to Cognitive
Your Journey to Cognitive
 
IRJET - Discovery of Ranking Fraud for Mobile Apps
IRJET - Discovery of Ranking Fraud for Mobile AppsIRJET - Discovery of Ranking Fraud for Mobile Apps
IRJET - Discovery of Ranking Fraud for Mobile Apps
 
HR Lifecycle | Chatbot
HR Lifecycle | ChatbotHR Lifecycle | Chatbot
HR Lifecycle | Chatbot
 
E-commerce Product Rating
E-commerce Product RatingE-commerce Product Rating
E-commerce Product Rating
 
Competitive analysis for mobile apps.pptx
Competitive analysis for mobile apps.pptxCompetitive analysis for mobile apps.pptx
Competitive analysis for mobile apps.pptx
 
Ads and url portfolio
Ads and url  portfolioAds and url  portfolio
Ads and url portfolio
 
E245 personallibs week6
E245 personallibs week6E245 personallibs week6
E245 personallibs week6
 
App store optimization
App store optimizationApp store optimization
App store optimization
 

Similar to Company Search Project

XYZ Fast Prototyping MGMT 3405 1 Definition – Fa.docx
XYZ Fast Prototyping MGMT 3405  1  Definition – Fa.docxXYZ Fast Prototyping MGMT 3405  1  Definition – Fa.docx
XYZ Fast Prototyping MGMT 3405 1 Definition – Fa.docx
jeffevans62972
 

Similar to Company Search Project (20)

How to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO StrategyHow to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO Strategy
 
Plerdy's CRO/UX_Party February 2021 - Dan Taylor - SEO & UX
Plerdy's CRO/UX_Party February 2021 - Dan Taylor - SEO & UXPlerdy's CRO/UX_Party February 2021 - Dan Taylor - SEO & UX
Plerdy's CRO/UX_Party February 2021 - Dan Taylor - SEO & UX
 
XYZ Fast Prototyping MGMT 3405 1 Definition – Fa.docx
XYZ Fast Prototyping MGMT 3405  1  Definition – Fa.docxXYZ Fast Prototyping MGMT 3405  1  Definition – Fa.docx
XYZ Fast Prototyping MGMT 3405 1 Definition – Fa.docx
 
Top 9 Search-Driven Analytics Evaluation Criteria
Top 9 Search-Driven Analytics Evaluation CriteriaTop 9 Search-Driven Analytics Evaluation Criteria
Top 9 Search-Driven Analytics Evaluation Criteria
 
Learn How to Maximize Your ServiceNow Investment
Learn How to Maximize Your ServiceNow InvestmentLearn How to Maximize Your ServiceNow Investment
Learn How to Maximize Your ServiceNow Investment
 
Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016
 
How to Create Product Driven Growth
How to Create Product Driven GrowthHow to Create Product Driven Growth
How to Create Product Driven Growth
 
Resume
ResumeResume
Resume
 
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market ShareData Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
 
Frames Poster Template
Frames Poster TemplateFrames Poster Template
Frames Poster Template
 
Google Case Study
Google Case StudyGoogle Case Study
Google Case Study
 
IRJET- Virtual Business Analyst using a Progressive Web Application
IRJET- Virtual Business Analyst using a Progressive Web ApplicationIRJET- Virtual Business Analyst using a Progressive Web Application
IRJET- Virtual Business Analyst using a Progressive Web Application
 
Top .NET development companies to outsource
Top .NET development companies to outsourceTop .NET development companies to outsource
Top .NET development companies to outsource
 
IRJET- Search Engine Optimization (Seo)
IRJET-  	  Search Engine Optimization (Seo)IRJET-  	  Search Engine Optimization (Seo)
IRJET- Search Engine Optimization (Seo)
 
How QA Ensures that Enterprise AI Initiatives Succeed
How QA Ensures that Enterprise AI Initiatives SucceedHow QA Ensures that Enterprise AI Initiatives Succeed
How QA Ensures that Enterprise AI Initiatives Succeed
 
Analysis mvp factory
Analysis mvp factoryAnalysis mvp factory
Analysis mvp factory
 
IRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation SystemsIRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
 
IRJET- A New Approach to Product Recommendation Systems
IRJET-  	  A New Approach to Product Recommendation SystemsIRJET-  	  A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
 
The Executive Survey 2024 on the Strategic Integration of Generative AI in Or...
The Executive Survey 2024 on the Strategic Integration of Generative AI in Or...The Executive Survey 2024 on the Strategic Integration of Generative AI in Or...
The Executive Survey 2024 on the Strategic Integration of Generative AI in Or...
 
Recommender System- Analyzing products by mining Data Streams
Recommender System- Analyzing products by mining Data StreamsRecommender System- Analyzing products by mining Data Streams
Recommender System- Analyzing products by mining Data Streams
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
 

Recently uploaded (20)

WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 

Company Search Project

  • 1. When searching for investment opportunities, finding a company to research is the first critical step. Current online tools often neglect this very important aspect of investment research.. CompanySearch helps users find publicly traded companies based on financial filters and user-defined keywords
  • 2. Information Need ● With thousands of companies to choose from, it can be difficult for an investor to find publicly traded companies which fit their individual investment needs. ● Casual investors need a way to transform broad ideas (i.e. “I am interested in the machine learning revolution”) into tangible investment prospects (i.e. “I should invest in Baidu”) ● CompanySearch is the culmination of this information need. In the above example, an investor might enter the category “information technology” along with the keywords “machine”, “learning”, “artificial”, and “intelligence”. CompanySearch will return relevant companies like Baidu!
  • 3. Input/Output Good Example ● Categories: Large Cap, Value ● Keywords: Airlines, Travel, Overseas ● Top Result: United Airlines ● Analysis: This result matched our query because it was a large cap, value stock and matched several keywords. Additionally, the company description was found to be topically similar to the keywords by LDA.
  • 4. Input/Output Bad Example ● Categories: Information Technology ● Keywords: Social, Mobile, Video, Snapchat ● Top Result: Tegna Inc. ● Analysis: Tegna was selected because by its own company description, it plays a heavy roll in social and mobile platforms. Even though it has nothing to do with snapchat, our system does not penalize for that. A more accurate result might have been SNAP, which more closely matches the query but whose company description is very vague and does not match many of the keywords. In general, our system does not perform well on very specific queries with a specific company in mind. Then again, that was never our intended use case.
  • 6. Changes Since First Prototype ● Many UI improvements based on user feedback emphasizing readability ● Integration of machine learning, specifically LDA and k-means clustering ● Twitter data is now cached, addressing limitations of the Twitter API mentioned in peer review ● Users are now given feedback as to why results are shown, addressing the need for transparency brought up in peer review ● Improved ranking metric, weighing keyword similarity above categorical similarity ● Removed query expansion (it was too buggy and gave odd answers, i.e. the “machine” in “machine learning” expanded to “car” and “auto”)
  • 7. Qualitative Evaluation ● Categories: Large Cap, Value ● Keywords: Airlines, Travel, Overseas ● Prototype 2 Output: World Fuel Services ● Final Project First Output: United Airlines ● Analysis: Our final project gives a much better answer than our second prototype. Using topic modeling, our system can better hone in on airlines and travelling agencies. Additionally, removing thesaurus-based query expansion resulted in much more predictable results!
  • 8. Class Concepts ● Cosine similarity ● Jaccard similarity ● TF-IDF matrix ● Term-document matrix ● Pseudo-relevance feedback ● LDA ● Sentiment analysis
  • 9. Trial and Error / Challenges ● There was a significant tradeoff between number of topics used in LDA and the runtime of our application. In general, making the tradeoff between runtime and performance was the largest challenge we dealt with ● We implemented pseudo-Rocchio query expansion and thesaurus-based query expansion, and reverse stemming, but found all led to worse query results :( ● We tried to make the graph in our results interactive, but it proved to be very time consuming and resource intensive
  • 10. Improvements ● Better integration with currently available investment tools. Specifically, integrating Bloomberg services into our application could make it a “one stop shop” from start to finish for investors ● Use of enterprise Twitter API would allow for more current tweets and thus more relevant user results Known Issues ● Not every company in our data set has a description (although the majority do) ● Graphical results are pulled from CNN, which on occasion has issues preventing the images from downloading