2. Innovative Big-Data Web Scraping Tech Company
HIGHLIGHTS
What is WebRobot?
The Problem
How We Can Solve It
Team
Track Record
Business Model
Trends & Opportunities
Main Competitors
Target
SWOT Analysis
Some Numbers (sales, profit, clients)
Investment Plan
2
3. 1. THE PROJECT
Description
WebRobot Ltd is a London-based company that operates in the web scraping and web
mining industry in which it aims to become the leader.
In WebRobot we are building a super scalable infrastructure for data acquisition that
customers can use as a web service. It exploits cloud computing and big-data technologies,
as well as data-extraction and information-extraction algorithms.
WebRobot will be a great ally to every company that needs to acquire this heterogeneous
network of information and wants to reduce its internal management costs. WebRobot’s
services will represent a strategic resource essential to its business success.
Innovative Big-Data Web Scraping Tech Company 3
4. 1. THE PROJECT
The problem
Every company wishing to achieve, keep and improve its business success needs information (data)
on both the market, customers, and competitors, but this is challenging.
It must get good, reliable, and well-organized data. In addition, it needs to manage them properly.
The World Wide Web is made up of a huge amount of semi-structured and unstructured data.
Furthermore, it constantly changes its structure.
The cost to collect all of these data is often very expensive.
For all these reasons, we need robust and scalable algorithms that can reduce this onerous
maintenance activity.
Innovative Big-Data Web Scraping Tech Company 4
5. 1. THE PROJECT
How We Can Solve the Problem
We can guarantee algorithmic and structural scalability with automatic extraction features.
We offer a powerful solution in the form of a web service.
We integrate cloud computing with big-data technologies applied in the more general web mining
context.
We use visual support tools and SDK to connect to our stack.
WebRobot’s goal is to become a complete ETL service involving data extraction,
web mining, machine learning, and big-data analytics.
Innovative Big-Data Web Scraping Tech Company 5
6. 2. THE TEAM
CEO, CTO
Roger Giuffrè
71% of Equity
Mediterraneo
Capital Ltd
25% of Equity
CCO, CMO
Denis Giuffrè
4% of Equity
CFO
Antonio
Censabella
Roger Giuffrè
Denis Giuffrè Antonio Censabella
MEDITERRANEO CAPITAL LTD
Innovative Big-Data Web Scraping Tech Company 6
7. 3. TRACK RECORD
We are finalizing the first version of the web service which will include the serverless version on the
Lambda technology and Amazon EMR.
We need to integrate the wrapper induction algorithms directly into the spark context. This will help us
refine them with the latest academic findings.
API implementation is fundamentally finished. We have to complete the usability studies of the current
interface.
We need to complete the dashboard that will be released under an open-source license.
We have to design visual tools to support the ETL that has to be generated.
We have a new grammar to set up for the query.
Innovative Big-Data Web Scraping Tech Company 7
8. 4. THE BUSINESS MODEL
The Strategy
We will release the service on the Amazon marketplace, available in three commercial packages:
Entry-Level, Professional, and Enterprise.
Our average selling price could be around €0.0008 per page scraped, but we will make a distinction
between static and dynamic pages that need complex algorithms.
We have verified that the execution costs on a serverless environment and on an EMR cluster can
guarantee us a margin of at least 50%. This margin represents a cost constraint in our pricing policy.
In the future, we will integrate a web agents marketplace and adopt a B2B2C paradigm to fill the gap
with the end users, as well as with the actual use cases.
Innovative Big-Data Web Scraping Tech Company 8
9. 5. THE MARKET AND COMPETITORS
Trends and opportunities
Markets: Web Scraping, Web Mining, Data Analytics.
Dimension: $2 billion of estimated value in 2020 alone (in just one single year).
Growth: based on the market researches, we expect further growth in the
next years induced by (1) an ever-greater centrality of data in the entire
business process, and (2) the predisposition of the companies to outsource,
more and more often, the above-mentioned activities.
Innovative Big-Data Web Scraping Tech Company 9
10. Main Competitors
Diffbot: an API for data extraction that uses machine learning heuristic and features to crawl the
pages. Unfortunately, the results are not 100% precise.
Scrapyhub: a cloud service focused on the Scrapy framework. It offers every single service
separately plus automatic extraction functions that are still in beta version. Anyway, the results are
not always compliant.
ImportIO: visual tools that customers can use to configure the extractors. However, it is particularly
expensive.
5. THE MARKET AND COMPETITORS
Innovative Big-Data Web Scraping Tech Company 10
11. 6. TARGET
E-commerce companies that require algorithmic pricing and competition monitoring.
Big companies that produce press reviews, carry out social media analysis, opinion mining, and
sentiment analysis activities.
Hedge funds and financial institutions for which information such as financial data and sentiment
indicators are extremely important.
Marketing agencies that need web scraping for SEO and web marketing automation purposes.
Established and startup companies that run or are developing any kind of vertical search engine.
Startups and small businesses that can benefit from building dedicated applications on our stack.
Innovative Big-Data Web Scraping Tech Company 11
12. 7. SWOT ANALYSIS
STRENGTHS WEAKNESSES
Scalability.
Self-service fast big-data extraction
solution.
We need PhD resources to reinforce the
algorithmic extraction.
Very specialized high-tech service that
requires an effort to make it user-friendly
(for non-technical users).
OPPORTUNITIES RISKS
Global market with big expansion
opportunities.
Profitable niche with low competition.
Restrictive regulations on the use of
personal data (in Europe), on data collection
(in Asia), on data referring to minors
(worldwide).
Innovative Big-Data Web Scraping Tech Company 12
13. 8. THE NUMBERS
We are considering a medium / large customer that requires at least 1 million pages per day
at a price of €800.00 (there is a global potential request of 100 billion pages per day).
EUR (in thousands) Year 2021 Year 2022 Year 2023 Year 2024
Sales 2,880 7,200 13,248 20,160
Gross margin 1,440 3,600 6,624 10,080
Net margin 1,440 3,600 6,624 10,080
Num. Customers 10 25 46 70
Innovative Big-Data Web Scraping Tech Company 13
14. 9. INVESTMENT PLAN
The investment strategy
First round: 9% in equity for €300k with a pre-money evaluation of €3 million.
Second round: 9% in equity for €2 million.
Third round: 9% in equity for €10 million.
We plan to eventually go public on the stock exchange.
Innovative Big-Data Web Scraping Tech Company 14