SlideShare a Scribd company logo
Scraping the Web with Scrapinghub
For Finance
We turn web content into useful data
About Scrapinghub
Scrapinghub specializes in data extraction. Our platform is
used to scrape over 4 billion web pages a month.
We offer:
● Professional Services to handle the web scraping for you
● Off-the-shelf datasets so you can get data hassle free
● A cloud-based platform that makes scraping a breeze
Founded in 2010, largest 100% remote company based outside of the US
We’re 134 teammates in 48 countries
“Getting information off the
Internet is like taking a drink
from a fire hydrant.”
– Mitchell Kapor
Scrapy
Scrapy is a web scraping framework that
gets the dirty work related to web crawling
out of your way.
Benefits
● No platform lock-in: Open Source
● Very popular (13k+ ★)
● Battle tested
● Highly extensible
● Great documentation
Portia
Portia is a Visual Scraping tool that lets you
get data without needing to write code.
Benefits
● No platform lock-in: Open Source
● JavaScript dynamic content generation
● Ideal for non-developers
● Extensible
● It’s as easy as annotating a page
Portia
Large Scale Infrastructure
Meet Scrapy Cloud , our PaaS for web crawlers:
● Scalable: Crawlers run on EC2 instances or dedicated servers
● Crawlera add-on
● Control your spiders: Command line, API or web UI
● Machine learning integration: BigML, MonkeyLearn
● No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
Broad Crawls
Frontera allows us to build large scale web crawlers in Python:
● Scrapy support out of the box
● Distribute and scale custom web crawlers across servers
● Crawl Frontier Framework: large scale URL prioritization logic
● Aduana to prioritize URLs based on link analysis (PageRank, HITS)
Web Scraping Use Cases
Competitive Pricing
Companies use web scraping to monitor the
pricing and the ratings of competitors:
● Scrape online retailers
● Structure the data in a search engine or DB
● Create an interface to search for products
● Sentiment analysis for product rankings
We help a leading IT manufacturer monitor the activities of their
resellers:
● Tracking and watching out for stolen goods
● Pricing agreement violations
● Customer support responses on complaints
● Product line quality checks
Monitor Resellers
Lead Generation
Mine scraped data to identify who to target in a company for your
outbound sales campaigns:
● Locate possible leads in your target market
● Identify the right contacts within each one
● Augment the information you already have on them
Real Estate
Crawl property websites and use the data obtained in order to:
● Estimate house prices
● Rental values
● Housing stock movements
● Give insight into real estate agents and homeowners
Fraud Detection
Monitor for sellers that offer products violating the ToS of credit card
companies including:
● Drugs
● Weapons
● Gambling
Identify stolen cards and IDs on the Dark Web
● Forums where hackers share ID numbers / pins
Company Reputation
Sentiment analysis of a company or product through newsletters, social
networks and other natural language data sources.
● NLP to create an associated sentiment indicator.
● Track the relevant news supporting the indicator can lead to market
insights for long-term trends.
Consumer Behavior
Extract data from forums and websites like Reddit to evaluate consumer
reviews and commentary:
● Volume of comments across brands
● Topics of discussion
● Comparisons with other brands and products
● Evaluate product launches and marketing tactics
Tracking Legislation
Monitor bills and regulations that are being discussed in Congress. Access
court judgments and opinions in order to:
● Follow discussions
● Try to forecast legislative outcomes
● Track regulations that impact different economic sectors
Hiring
Crawl and extract data from job boards and other
sources in order to understand:
● Hiring trends in different sectors or regions
● Find candidates for jobs, or future leaders
● Spot and rescue employees that are shopping
for a new job
Monitoring Corruption
Journalists and analysts can create Open Data by extracting information
from difficult to access government websites:
● Track the activities of lobbyists
● Patterns in the behavior of government officials
● Disruptions in the economy due to corruption allegations
Thank you!
scrapinghub.com
Thank you!

More Related Content

What's hot

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
MongoDB
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
Tommy Tavenner
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
MongoDB
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
Patrick Baumgartner
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
Treasure Data, Inc.
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
MongoDB
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
Treasure Data, Inc.
 
Jinchao demo
Jinchao demoJinchao demo
Jinchao demo
Jinchao Lin
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
Tom Bennet
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
Brijesh Prajapati
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
Shyjal Raazi
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et HadoopMongoDB
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
MongoDB
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools
 
Mindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenesMindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenes
robin_sy
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
Yu-Chang Ho
 

What's hot (20)

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
SoftNews-lowres
SoftNews-lowresSoftNews-lowres
SoftNews-lowres
 
Jinchao demo
Jinchao demoJinchao demo
Jinchao demo
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Generic Crawler
Generic CrawlerGeneric Crawler
Generic Crawler
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web Assets
 
Mindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenesMindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenes
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
 

Similar to Using Web Data for Finance

SEMrush product training- Killer Features
SEMrush product training- Killer FeaturesSEMrush product training- Killer Features
SEMrush product training- Killer Features
Yulia Aslamova
 
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Connotate
 
SEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for MeetupSEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for Meetup
Bruce Jones
 
How to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital StrategiesHow to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital Strategies
Mel Tomas
 
Top Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail DataTop Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail Data
retailgators
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptx
ASHAVI2
 
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Google Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners GuideGoogle Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners Guide
Indulge Media
 
Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1) Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1)
Magenest
 
Tuning out-the-static
Tuning out-the-staticTuning out-the-static
Tuning out-the-static
Plaudit Design
 
Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022
ALPSMarketing
 
Sources of data collection for business applications
Sources of data collection for business applicationsSources of data collection for business applications
Sources of data collection for business applications
PromptCloud
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
TanyaRaina3
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Company
startupscratch
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
South Tyrol Free Software Conference
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
AqsaBatool21
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in Production
Aggregage
 
Google analytics overview
Google analytics overviewGoogle analytics overview
Google analytics overview
Toby Eborn
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
In Marketing We Trust
 
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian AgencyWeb Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Vorian Agency
 

Similar to Using Web Data for Finance (20)

SEMrush product training- Killer Features
SEMrush product training- Killer FeaturesSEMrush product training- Killer Features
SEMrush product training- Killer Features
 
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
 
SEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for MeetupSEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for Meetup
 
How to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital StrategiesHow to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital Strategies
 
Top Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail DataTop Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail Data
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptx
 
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
 
Google Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners GuideGoogle Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners Guide
 
Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1) Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1)
 
Tuning out-the-static
Tuning out-the-staticTuning out-the-static
Tuning out-the-static
 
Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022
 
Sources of data collection for business applications
Sources of data collection for business applicationsSources of data collection for business applications
Sources of data collection for business applications
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Company
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in Production
 
Google analytics overview
Google analytics overviewGoogle analytics overview
Google analytics overview
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
 
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian AgencyWeb Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
 

Recently uploaded

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

Using Web Data for Finance

  • 1. Scraping the Web with Scrapinghub For Finance
  • 2. We turn web content into useful data
  • 3. About Scrapinghub Scrapinghub specializes in data extraction. Our platform is used to scrape over 4 billion web pages a month. We offer: ● Professional Services to handle the web scraping for you ● Off-the-shelf datasets so you can get data hassle free ● A cloud-based platform that makes scraping a breeze
  • 4. Founded in 2010, largest 100% remote company based outside of the US We’re 134 teammates in 48 countries
  • 5. “Getting information off the Internet is like taking a drink from a fire hydrant.” – Mitchell Kapor
  • 6. Scrapy Scrapy is a web scraping framework that gets the dirty work related to web crawling out of your way. Benefits ● No platform lock-in: Open Source ● Very popular (13k+ ★) ● Battle tested ● Highly extensible ● Great documentation
  • 7. Portia Portia is a Visual Scraping tool that lets you get data without needing to write code. Benefits ● No platform lock-in: Open Source ● JavaScript dynamic content generation ● Ideal for non-developers ● Extensible ● It’s as easy as annotating a page
  • 9. Large Scale Infrastructure Meet Scrapy Cloud , our PaaS for web crawlers: ● Scalable: Crawlers run on EC2 instances or dedicated servers ● Crawlera add-on ● Control your spiders: Command line, API or web UI ● Machine learning integration: BigML, MonkeyLearn ● No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
  • 10. Broad Crawls Frontera allows us to build large scale web crawlers in Python: ● Scrapy support out of the box ● Distribute and scale custom web crawlers across servers ● Crawl Frontier Framework: large scale URL prioritization logic ● Aduana to prioritize URLs based on link analysis (PageRank, HITS)
  • 12. Competitive Pricing Companies use web scraping to monitor the pricing and the ratings of competitors: ● Scrape online retailers ● Structure the data in a search engine or DB ● Create an interface to search for products ● Sentiment analysis for product rankings
  • 13. We help a leading IT manufacturer monitor the activities of their resellers: ● Tracking and watching out for stolen goods ● Pricing agreement violations ● Customer support responses on complaints ● Product line quality checks Monitor Resellers
  • 14. Lead Generation Mine scraped data to identify who to target in a company for your outbound sales campaigns: ● Locate possible leads in your target market ● Identify the right contacts within each one ● Augment the information you already have on them
  • 15. Real Estate Crawl property websites and use the data obtained in order to: ● Estimate house prices ● Rental values ● Housing stock movements ● Give insight into real estate agents and homeowners
  • 16. Fraud Detection Monitor for sellers that offer products violating the ToS of credit card companies including: ● Drugs ● Weapons ● Gambling Identify stolen cards and IDs on the Dark Web ● Forums where hackers share ID numbers / pins
  • 17. Company Reputation Sentiment analysis of a company or product through newsletters, social networks and other natural language data sources. ● NLP to create an associated sentiment indicator. ● Track the relevant news supporting the indicator can lead to market insights for long-term trends.
  • 18. Consumer Behavior Extract data from forums and websites like Reddit to evaluate consumer reviews and commentary: ● Volume of comments across brands ● Topics of discussion ● Comparisons with other brands and products ● Evaluate product launches and marketing tactics
  • 19. Tracking Legislation Monitor bills and regulations that are being discussed in Congress. Access court judgments and opinions in order to: ● Follow discussions ● Try to forecast legislative outcomes ● Track regulations that impact different economic sectors
  • 20. Hiring Crawl and extract data from job boards and other sources in order to understand: ● Hiring trends in different sectors or regions ● Find candidates for jobs, or future leaders ● Spot and rescue employees that are shopping for a new job
  • 21. Monitoring Corruption Journalists and analysts can create Open Data by extracting information from difficult to access government websites: ● Track the activities of lobbyists ● Patterns in the behavior of government officials ● Disruptions in the economy due to corruption allegations