SlideShare a Scribd company logo
Scraping the Web with Scrapinghub
For Finance
We turn web content into useful data
About Scrapinghub
Scrapinghub specializes in data extraction. Our platform is
used to scrape over 4 billion web pages a month.
We offer:
● Professional Services to handle the web scraping for you
● Off-the-shelf datasets so you can get data hassle free
● A cloud-based platform that makes scraping a breeze
Founded in 2010, largest 100% remote company based outside of the US
We’re 134 teammates in 48 countries
“Getting information off the
Internet is like taking a drink
from a fire hydrant.”
– Mitchell Kapor
Scrapy
Scrapy is a web scraping framework that
gets the dirty work related to web crawling
out of your way.
Benefits
● No platform lock-in: Open Source
● Very popular (13k+ ★)
● Battle tested
● Highly extensible
● Great documentation
Portia
Portia is a Visual Scraping tool that lets you
get data without needing to write code.
Benefits
● No platform lock-in: Open Source
● JavaScript dynamic content generation
● Ideal for non-developers
● Extensible
● It’s as easy as annotating a page
Portia
Large Scale Infrastructure
Meet Scrapy Cloud , our PaaS for web crawlers:
● Scalable: Crawlers run on EC2 instances or dedicated servers
● Crawlera add-on
● Control your spiders: Command line, API or web UI
● Machine learning integration: BigML, MonkeyLearn
● No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
Broad Crawls
Frontera allows us to build large scale web crawlers in Python:
● Scrapy support out of the box
● Distribute and scale custom web crawlers across servers
● Crawl Frontier Framework: large scale URL prioritization logic
● Aduana to prioritize URLs based on link analysis (PageRank, HITS)
Web Scraping Use Cases
Competitive Pricing
Companies use web scraping to monitor the
pricing and the ratings of competitors:
● Scrape online retailers
● Structure the data in a search engine or DB
● Create an interface to search for products
● Sentiment analysis for product rankings
We help a leading IT manufacturer monitor the activities of their
resellers:
● Tracking and watching out for stolen goods
● Pricing agreement violations
● Customer support responses on complaints
● Product line quality checks
Monitor Resellers
Lead Generation
Mine scraped data to identify who to target in a company for your
outbound sales campaigns:
● Locate possible leads in your target market
● Identify the right contacts within each one
● Augment the information you already have on them
Real Estate
Crawl property websites and use the data obtained in order to:
● Estimate house prices
● Rental values
● Housing stock movements
● Give insight into real estate agents and homeowners
Fraud Detection
Monitor for sellers that offer products violating the ToS of credit card
companies including:
● Drugs
● Weapons
● Gambling
Identify stolen cards and IDs on the Dark Web
● Forums where hackers share ID numbers / pins
Company Reputation
Sentiment analysis of a company or product through newsletters, social
networks and other natural language data sources.
● NLP to create an associated sentiment indicator.
● Track the relevant news supporting the indicator can lead to market
insights for long-term trends.
Consumer Behavior
Extract data from forums and websites like Reddit to evaluate consumer
reviews and commentary:
● Volume of comments across brands
● Topics of discussion
● Comparisons with other brands and products
● Evaluate product launches and marketing tactics
Tracking Legislation
Monitor bills and regulations that are being discussed in Congress. Access
court judgments and opinions in order to:
● Follow discussions
● Try to forecast legislative outcomes
● Track regulations that impact different economic sectors
Hiring
Crawl and extract data from job boards and other
sources in order to understand:
● Hiring trends in different sectors or regions
● Find candidates for jobs, or future leaders
● Spot and rescue employees that are shopping
for a new job
Monitoring Corruption
Journalists and analysts can create Open Data by extracting information
from difficult to access government websites:
● Track the activities of lobbyists
● Patterns in the behavior of government officials
● Disruptions in the economy due to corruption allegations
Thank you!
scrapinghub.com
Thank you!

More Related Content

What's hot

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
MongoDB
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
Tommy Tavenner
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
MongoDB
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
Patrick Baumgartner
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
Treasure Data, Inc.
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
MongoDB
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
Treasure Data, Inc.
 
SoftNews-lowres
SoftNews-lowresSoftNews-lowres
SoftNews-lowres
Marco Masetti
 
Jinchao demo
Jinchao demoJinchao demo
Jinchao demo
Jinchao Lin
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
Tom Bennet
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
Brijesh Prajapati
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
Shyjal Raazi
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
MongoDB
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
MongoDB
 
Generic Crawler
Generic CrawlerGeneric Crawler
Generic Crawler
Frans Siswanto
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools
 
Mindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenesMindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenes
robin_sy
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
Yu-Chang Ho
 

What's hot (20)

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
SoftNews-lowres
SoftNews-lowresSoftNews-lowres
SoftNews-lowres
 
Jinchao demo
Jinchao demoJinchao demo
Jinchao demo
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Generic Crawler
Generic CrawlerGeneric Crawler
Generic Crawler
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web Assets
 
Mindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenesMindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenes
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
 

Similar to Using Web Data for Finance

SEMrush product training- Killer Features
SEMrush product training- Killer FeaturesSEMrush product training- Killer Features
SEMrush product training- Killer Features
Yulia Aslamova
 
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Connotate
 
SEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for MeetupSEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for Meetup
Bruce Jones
 
How to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital StrategiesHow to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital Strategies
Mel Tomas
 
Top Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail DataTop Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail Data
retailgators
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptx
ASHAVI2
 
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Google Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners GuideGoogle Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners Guide
Indulge Media
 
Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1) Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1)
Magenest
 
Tuning out-the-static
Tuning out-the-staticTuning out-the-static
Tuning out-the-static
Plaudit Design
 
Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022
ALPSMarketing
 
Sources of data collection for business applications
Sources of data collection for business applicationsSources of data collection for business applications
Sources of data collection for business applications
PromptCloud
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
TanyaRaina3
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Company
startupscratch
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
South Tyrol Free Software Conference
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
AqsaBatool21
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in Production
Aggregage
 
Google analytics overview
Google analytics overviewGoogle analytics overview
Google analytics overview
Toby Eborn
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
In Marketing We Trust
 
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian AgencyWeb Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Vorian Agency
 

Similar to Using Web Data for Finance (20)

SEMrush product training- Killer Features
SEMrush product training- Killer FeaturesSEMrush product training- Killer Features
SEMrush product training- Killer Features
 
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
 
SEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for MeetupSEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for Meetup
 
How to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital StrategiesHow to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital Strategies
 
Top Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail DataTop Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail Data
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptx
 
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
 
Google Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners GuideGoogle Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners Guide
 
Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1) Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1)
 
Tuning out-the-static
Tuning out-the-staticTuning out-the-static
Tuning out-the-static
 
Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022
 
Sources of data collection for business applications
Sources of data collection for business applicationsSources of data collection for business applications
Sources of data collection for business applications
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Company
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in Production
 
Google analytics overview
Google analytics overviewGoogle analytics overview
Google analytics overview
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
 
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian AgencyWeb Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 

Using Web Data for Finance

  • 1. Scraping the Web with Scrapinghub For Finance
  • 2. We turn web content into useful data
  • 3. About Scrapinghub Scrapinghub specializes in data extraction. Our platform is used to scrape over 4 billion web pages a month. We offer: ● Professional Services to handle the web scraping for you ● Off-the-shelf datasets so you can get data hassle free ● A cloud-based platform that makes scraping a breeze
  • 4. Founded in 2010, largest 100% remote company based outside of the US We’re 134 teammates in 48 countries
  • 5. “Getting information off the Internet is like taking a drink from a fire hydrant.” – Mitchell Kapor
  • 6. Scrapy Scrapy is a web scraping framework that gets the dirty work related to web crawling out of your way. Benefits ● No platform lock-in: Open Source ● Very popular (13k+ ★) ● Battle tested ● Highly extensible ● Great documentation
  • 7. Portia Portia is a Visual Scraping tool that lets you get data without needing to write code. Benefits ● No platform lock-in: Open Source ● JavaScript dynamic content generation ● Ideal for non-developers ● Extensible ● It’s as easy as annotating a page
  • 9. Large Scale Infrastructure Meet Scrapy Cloud , our PaaS for web crawlers: ● Scalable: Crawlers run on EC2 instances or dedicated servers ● Crawlera add-on ● Control your spiders: Command line, API or web UI ● Machine learning integration: BigML, MonkeyLearn ● No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
  • 10. Broad Crawls Frontera allows us to build large scale web crawlers in Python: ● Scrapy support out of the box ● Distribute and scale custom web crawlers across servers ● Crawl Frontier Framework: large scale URL prioritization logic ● Aduana to prioritize URLs based on link analysis (PageRank, HITS)
  • 12. Competitive Pricing Companies use web scraping to monitor the pricing and the ratings of competitors: ● Scrape online retailers ● Structure the data in a search engine or DB ● Create an interface to search for products ● Sentiment analysis for product rankings
  • 13. We help a leading IT manufacturer monitor the activities of their resellers: ● Tracking and watching out for stolen goods ● Pricing agreement violations ● Customer support responses on complaints ● Product line quality checks Monitor Resellers
  • 14. Lead Generation Mine scraped data to identify who to target in a company for your outbound sales campaigns: ● Locate possible leads in your target market ● Identify the right contacts within each one ● Augment the information you already have on them
  • 15. Real Estate Crawl property websites and use the data obtained in order to: ● Estimate house prices ● Rental values ● Housing stock movements ● Give insight into real estate agents and homeowners
  • 16. Fraud Detection Monitor for sellers that offer products violating the ToS of credit card companies including: ● Drugs ● Weapons ● Gambling Identify stolen cards and IDs on the Dark Web ● Forums where hackers share ID numbers / pins
  • 17. Company Reputation Sentiment analysis of a company or product through newsletters, social networks and other natural language data sources. ● NLP to create an associated sentiment indicator. ● Track the relevant news supporting the indicator can lead to market insights for long-term trends.
  • 18. Consumer Behavior Extract data from forums and websites like Reddit to evaluate consumer reviews and commentary: ● Volume of comments across brands ● Topics of discussion ● Comparisons with other brands and products ● Evaluate product launches and marketing tactics
  • 19. Tracking Legislation Monitor bills and regulations that are being discussed in Congress. Access court judgments and opinions in order to: ● Follow discussions ● Try to forecast legislative outcomes ● Track regulations that impact different economic sectors
  • 20. Hiring Crawl and extract data from job boards and other sources in order to understand: ● Hiring trends in different sectors or regions ● Find candidates for jobs, or future leaders ● Spot and rescue employees that are shopping for a new job
  • 21. Monitoring Corruption Journalists and analysts can create Open Data by extracting information from difficult to access government websites: ● Track the activities of lobbyists ● Patterns in the behavior of government officials ● Disruptions in the economy due to corruption allegations