Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Web Data for Finance


Published on

We help you get web data hassle free. This deck introduces the different use cases that are most beneficial to finance companies and those looking to scale revenue using web data.

Published in: Data & Analytics
  • Best survey site online! $1,500 a month thanks to you guys! Without a doubt the best paid surveys site online!I have made money from other survey sites but made double or triple with for the same time and effort. The variety and number of daily paid surveys I get from them is unmatched. A must for anyone looking for extra cash or a full time income. ★★★
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Using Web Data for Finance

  1. 1. Scraping the Web with Scrapinghub For Finance
  2. 2. We turn web content into useful data
  3. 3. About Scrapinghub Scrapinghub specializes in data extraction. Our platform is used to scrape over 4 billion web pages a month. We offer: ● Professional Services to handle the web scraping for you ● Off-the-shelf datasets so you can get data hassle free ● A cloud-based platform that makes scraping a breeze
  4. 4. Founded in 2010, largest 100% remote company based outside of the US We’re 134 teammates in 48 countries
  5. 5. “Getting information off the Internet is like taking a drink from a fire hydrant.” – Mitchell Kapor
  6. 6. Scrapy Scrapy is a web scraping framework that gets the dirty work related to web crawling out of your way. Benefits ● No platform lock-in: Open Source ● Very popular (13k+ ★) ● Battle tested ● Highly extensible ● Great documentation
  7. 7. Portia Portia is a Visual Scraping tool that lets you get data without needing to write code. Benefits ● No platform lock-in: Open Source ● JavaScript dynamic content generation ● Ideal for non-developers ● Extensible ● It’s as easy as annotating a page
  8. 8. Portia
  9. 9. Large Scale Infrastructure Meet Scrapy Cloud , our PaaS for web crawlers: ● Scalable: Crawlers run on EC2 instances or dedicated servers ● Crawlera add-on ● Control your spiders: Command line, API or web UI ● Machine learning integration: BigML, MonkeyLearn ● No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
  10. 10. Broad Crawls Frontera allows us to build large scale web crawlers in Python: ● Scrapy support out of the box ● Distribute and scale custom web crawlers across servers ● Crawl Frontier Framework: large scale URL prioritization logic ● Aduana to prioritize URLs based on link analysis (PageRank, HITS)
  11. 11. Web Scraping Use Cases
  12. 12. Competitive Pricing Companies use web scraping to monitor the pricing and the ratings of competitors: ● Scrape online retailers ● Structure the data in a search engine or DB ● Create an interface to search for products ● Sentiment analysis for product rankings
  13. 13. We help a leading IT manufacturer monitor the activities of their resellers: ● Tracking and watching out for stolen goods ● Pricing agreement violations ● Customer support responses on complaints ● Product line quality checks Monitor Resellers
  14. 14. Lead Generation Mine scraped data to identify who to target in a company for your outbound sales campaigns: ● Locate possible leads in your target market ● Identify the right contacts within each one ● Augment the information you already have on them
  15. 15. Real Estate Crawl property websites and use the data obtained in order to: ● Estimate house prices ● Rental values ● Housing stock movements ● Give insight into real estate agents and homeowners
  16. 16. Fraud Detection Monitor for sellers that offer products violating the ToS of credit card companies including: ● Drugs ● Weapons ● Gambling Identify stolen cards and IDs on the Dark Web ● Forums where hackers share ID numbers / pins
  17. 17. Company Reputation Sentiment analysis of a company or product through newsletters, social networks and other natural language data sources. ● NLP to create an associated sentiment indicator. ● Track the relevant news supporting the indicator can lead to market insights for long-term trends.
  18. 18. Consumer Behavior Extract data from forums and websites like Reddit to evaluate consumer reviews and commentary: ● Volume of comments across brands ● Topics of discussion ● Comparisons with other brands and products ● Evaluate product launches and marketing tactics
  19. 19. Tracking Legislation Monitor bills and regulations that are being discussed in Congress. Access court judgments and opinions in order to: ● Follow discussions ● Try to forecast legislative outcomes ● Track regulations that impact different economic sectors
  20. 20. Hiring Crawl and extract data from job boards and other sources in order to understand: ● Hiring trends in different sectors or regions ● Find candidates for jobs, or future leaders ● Spot and rescue employees that are shopping for a new job
  21. 21. Monitoring Corruption Journalists and analysts can create Open Data by extracting information from difficult to access government websites: ● Track the activities of lobbyists ● Patterns in the behavior of government officials ● Disruptions in the economy due to corruption allegations
  22. 22. Thank you! Thank you!