eZmLThe Old The New
An ecosystem for machine learning. EZML
will allow anyone of any background to
upload data and receive a solution for their
machine learning problem.
Size of Opportunity:
SAM: 10 Billion (startups 10~100 ees)
Target Market: 8 Billion (high tech)
Helping startups increase user engagement and
retention through crowdsourced
recommendations and analytics. Engineers
without any data science background can
upload data and seamlessly integrate a
machine learning solution into their product.
Size of Opportunity:
SAM: 10 Billion (startups wz 10~100 ees)
Target Market: 4 Billion (consumer internet)
# of interviews: 103
THE TEAM
Jim Cai
MSCS
Technical Lead LinkedIn
Engineering
William Song
MSCS
Self driving Car
Google[x]; Facebook
Engineering
Billy Jun
MSCS
Robotics at MDA
Engineering
Sam Yu
MSx GSB
SAIF Partners, GE
Strategy + Bizdev
Jordan Segall
MS in MS&E, BSCS
PM RelateIQ; FDE Palantir
Product + Bizdev
AT THE BEGINNINg:
ORIGINAL HYPOTHESES
WHAT: Model selection is time consuming and can be
automated
WHO: Data scientists within companies that can automate
“annoying” parts of their job
HOW: We can provide the entire model to the user that can
be embedded within their system at the end of the process
WHAT DID WE DO?
Talked to primarily data scientists at tech companies and
data science consulting
Business Model
Canvas #1:
1/6/15
Where are the
data scientists at
companies?
LESSONS LEARNED: PART 1
PREVIOUSLY
Model selection is time consuming and
can be automated
Automated Model
Recommendation
Data Scientist Model
Contribution
On-the-fly update
API Access
Saves time performing model
selection and money hiring data
scientists
Obtain models that improve user
engagement and
product quality
Purchased model continues to
improve automatically
Automated model selection system produces
the best model with little to no data science
knowledge
Save the customers time and money by
running everything in
the cloud
“Typically less than 20% of our time in data
science consulting is model selection. We’ll get
it to working ‘decently enough’ then move
on...we don’t aim for the best...we would ONLY
use EZML if the feature extraction part of the
project is trivial. Feature extraction is critically
important.”
- Jay Hack, Founder 205 Consulting, Palantir
Engineer
“Model selection is the easy problem; feature
extraction is the hard part.”
- Nick Gorski, Machine Learning Lead at
TellPart
LESSONS LEARNED: PART 1
HYPOTHESIS:
Data Scientists may be the first target
customer to go for
Photozeen - one of the first companies we
interviewed that recognized their deficiency in
machine learning, and needed help to improve
user retention and engagement.
LESSONS LEARNED: PART 1
HYPOTHESIS:
We can pitch the product
“We don’t understand what the **** you guys
are doing.”
- Steve Weinstein
GOING FORWARD
Is there a need?
Who is our customer
archetype?
Interview more companies
Develop MVP
Hone pitch
Articulate product more clearly
UPDATED HYPOTHESES
WHAT: Our tool will still provide machine learning as a
service, but our target customer are companies, not data
scientists/consulting firms.
WHO: Anyone within the firm, from engineers to marketing,
can use our tool. Our target contact is the founder.
Reached out to 200+ companies
● 51 from Y-Combinator
● 49 from Crunchbase
● 35 from Tec Club
● 34 from Tech Stars
● and more
WHat we did
LESSONS LEARNED: PART 2
There is a need for companies lacking
data science solutions to use our tool
While these companies ultimately
did not work out….
….the reasons why they didn’t
were key to our learning
RelateIQ
Too large with enough data scientists on board;
would need fully working EZML
Verge Campus
Recommendation is an ancillary feature
Photozeen
Good problem for us, but not enough data
LESSONS LEARNED: PART 2
Privacy would not be a concern for
companies that use our tool; they
would reveal their data features publicly
for better results, Kaggle style.
“You need to properly address privacy and
security policies before we share data...we
don’t want to reveal our secret sauce.”
Olga Mack, Assistant General Counsel, Zoosk
“I’m uncomfortable sending you my data since our business is based on selling data - privacy is
already a touchy subject for us.”
Andy Bromberg, CEO Sidewire
LESSONS LEARNED: PART 2
Our customer archetype are CEOs and
CTOs
We would go to CEOs, who would redirect us
to their CTOs.
CTOs tended to have a greater understanding
of the value add to their business, and how
they were unable to implement machine
learning without a tool like EZML.
CUSTOME ARCHETYPE
Startup with <10M in funding
CTO or head data scientist/engineer
enough data for machine learning to have effect
no intense privacy concerns
clear problems (ie. user engagement; user churn)
BUSINESS MODEL CANVAS
Business Model
Canvas: 2/10/15
UPDATED HYPOTHESES
SELF-SERVICE: Since it is a self-service tool, the
sales process won’t be difficult
CONSULTING: We can apply a consulting model
for early customers; in exchange for manually
working on problems, we can learn from real
data and create our platform
NEW MVP
Keep Customers
ACTIVATE
ACQUIRE
UP-SELL
NEXT-SELL
CROSS-SELL
REFERRALS
Perfect User Experience:
Easiness, Accuracy
Automatic model upgrade
with continual data upload
Free access to machine learning
models for analytics
Low charge for API calls
Focused sectors (consumer internet)
and algorithms (recommendation)
Expand features / algorithms to tackle
new problems (Upsell)
Invest in sales force, data scientist
models and training data sets
Provide consulting and customization
DIAGRAM OF PAYMENT MODELS:
Amazon EC2
(model
parameters)
eZmL
Compute/storage
resources
Amazon S3
(dataset storage)
- CPU server
$14.17/month/company
- GPU server
$25.73/month/company
$0.0295/GB/mo
Data Scientist
Models
Compensation
%/selected
model
Upload data
($0.030/GB/month)
Evaluation Report
Subscribe
($ 200/model/month)
API Call
Prediction
Continual Improvement of
Model
($ 50/model/month)
Startups
Free Bandwidth
Train model (variable
cost after 10 iterations)
Hive
SO WHAT NOW?
WE LEARNED...
What type of
company to target
Who in the
company to sell to
We need a highly
trained sales team
Provide consulting
for early clients to
get data
Tier pricing is
better than buying
model or charge
per API call
Companies are
most interested in
recommendation
types of problems
Crowdsourcing is
a second priority
at the moment
The costs of
storage, servers,
employees, office
space, and more
We can reduce
costs (user churn),
not just revenue
increases
Feature extraction
is key differentiator
Privacy is a
concern we must
tend to
Dumping of data
for continued
training is highly
important
And many many more
WE THINK...
Is this a viable business?
Do we want to pursue it after this class?
BUSINESS MODEL CANVAS
BUSINESS CANVAS
BUSINESS CANVAS
THANK YOU
Steve Blank, Jeff Epstein, Steve Weinstein
Nick O’Connor + Sunil Nagaraj
Classmates + TAs

Ezml Stanford 2015

  • 2.
    eZmLThe Old TheNew An ecosystem for machine learning. EZML will allow anyone of any background to upload data and receive a solution for their machine learning problem. Size of Opportunity: SAM: 10 Billion (startups 10~100 ees) Target Market: 8 Billion (high tech) Helping startups increase user engagement and retention through crowdsourced recommendations and analytics. Engineers without any data science background can upload data and seamlessly integrate a machine learning solution into their product. Size of Opportunity: SAM: 10 Billion (startups wz 10~100 ees) Target Market: 4 Billion (consumer internet) # of interviews: 103
  • 3.
    THE TEAM Jim Cai MSCS TechnicalLead LinkedIn Engineering William Song MSCS Self driving Car Google[x]; Facebook Engineering Billy Jun MSCS Robotics at MDA Engineering Sam Yu MSx GSB SAIF Partners, GE Strategy + Bizdev Jordan Segall MS in MS&E, BSCS PM RelateIQ; FDE Palantir Product + Bizdev
  • 4.
    AT THE BEGINNINg: ORIGINALHYPOTHESES WHAT: Model selection is time consuming and can be automated WHO: Data scientists within companies that can automate “annoying” parts of their job HOW: We can provide the entire model to the user that can be embedded within their system at the end of the process WHAT DID WE DO? Talked to primarily data scientists at tech companies and data science consulting
  • 5.
    Business Model Canvas #1: 1/6/15 Whereare the data scientists at companies?
  • 6.
    LESSONS LEARNED: PART1 PREVIOUSLY Model selection is time consuming and can be automated
  • 7.
    Automated Model Recommendation Data ScientistModel Contribution On-the-fly update API Access Saves time performing model selection and money hiring data scientists Obtain models that improve user engagement and product quality Purchased model continues to improve automatically Automated model selection system produces the best model with little to no data science knowledge Save the customers time and money by running everything in the cloud
  • 8.
    “Typically less than20% of our time in data science consulting is model selection. We’ll get it to working ‘decently enough’ then move on...we don’t aim for the best...we would ONLY use EZML if the feature extraction part of the project is trivial. Feature extraction is critically important.” - Jay Hack, Founder 205 Consulting, Palantir Engineer
  • 9.
    “Model selection isthe easy problem; feature extraction is the hard part.” - Nick Gorski, Machine Learning Lead at TellPart
  • 10.
    LESSONS LEARNED: PART1 HYPOTHESIS: Data Scientists may be the first target customer to go for
  • 11.
    Photozeen - oneof the first companies we interviewed that recognized their deficiency in machine learning, and needed help to improve user retention and engagement.
  • 12.
    LESSONS LEARNED: PART1 HYPOTHESIS: We can pitch the product
  • 13.
    “We don’t understandwhat the **** you guys are doing.” - Steve Weinstein
  • 14.
    GOING FORWARD Is therea need? Who is our customer archetype? Interview more companies Develop MVP Hone pitch Articulate product more clearly
  • 15.
    UPDATED HYPOTHESES WHAT: Ourtool will still provide machine learning as a service, but our target customer are companies, not data scientists/consulting firms. WHO: Anyone within the firm, from engineers to marketing, can use our tool. Our target contact is the founder.
  • 17.
    Reached out to200+ companies ● 51 from Y-Combinator ● 49 from Crunchbase ● 35 from Tec Club ● 34 from Tech Stars ● and more WHat we did
  • 18.
    LESSONS LEARNED: PART2 There is a need for companies lacking data science solutions to use our tool
  • 20.
    While these companiesultimately did not work out…. ….the reasons why they didn’t were key to our learning
  • 21.
    RelateIQ Too large withenough data scientists on board; would need fully working EZML Verge Campus Recommendation is an ancillary feature Photozeen Good problem for us, but not enough data
  • 22.
    LESSONS LEARNED: PART2 Privacy would not be a concern for companies that use our tool; they would reveal their data features publicly for better results, Kaggle style.
  • 23.
    “You need toproperly address privacy and security policies before we share data...we don’t want to reveal our secret sauce.” Olga Mack, Assistant General Counsel, Zoosk “I’m uncomfortable sending you my data since our business is based on selling data - privacy is already a touchy subject for us.” Andy Bromberg, CEO Sidewire
  • 24.
    LESSONS LEARNED: PART2 Our customer archetype are CEOs and CTOs
  • 25.
    We would goto CEOs, who would redirect us to their CTOs. CTOs tended to have a greater understanding of the value add to their business, and how they were unable to implement machine learning without a tool like EZML.
  • 26.
    CUSTOME ARCHETYPE Startup with<10M in funding CTO or head data scientist/engineer enough data for machine learning to have effect no intense privacy concerns clear problems (ie. user engagement; user churn)
  • 27.
    BUSINESS MODEL CANVAS BusinessModel Canvas: 2/10/15
  • 28.
    UPDATED HYPOTHESES SELF-SERVICE: Sinceit is a self-service tool, the sales process won’t be difficult CONSULTING: We can apply a consulting model for early customers; in exchange for manually working on problems, we can learn from real data and create our platform
  • 29.
  • 31.
    Keep Customers ACTIVATE ACQUIRE UP-SELL NEXT-SELL CROSS-SELL REFERRALS Perfect UserExperience: Easiness, Accuracy Automatic model upgrade with continual data upload Free access to machine learning models for analytics Low charge for API calls Focused sectors (consumer internet) and algorithms (recommendation) Expand features / algorithms to tackle new problems (Upsell) Invest in sales force, data scientist models and training data sets Provide consulting and customization
  • 32.
    DIAGRAM OF PAYMENTMODELS: Amazon EC2 (model parameters) eZmL Compute/storage resources Amazon S3 (dataset storage) - CPU server $14.17/month/company - GPU server $25.73/month/company $0.0295/GB/mo Data Scientist Models Compensation %/selected model Upload data ($0.030/GB/month) Evaluation Report Subscribe ($ 200/model/month) API Call Prediction Continual Improvement of Model ($ 50/model/month) Startups Free Bandwidth Train model (variable cost after 10 iterations)
  • 34.
  • 35.
  • 36.
    WE LEARNED... What typeof company to target Who in the company to sell to We need a highly trained sales team Provide consulting for early clients to get data Tier pricing is better than buying model or charge per API call Companies are most interested in recommendation types of problems Crowdsourcing is a second priority at the moment The costs of storage, servers, employees, office space, and more We can reduce costs (user churn), not just revenue increases Feature extraction is key differentiator Privacy is a concern we must tend to Dumping of data for continued training is highly important And many many more
  • 37.
    WE THINK... Is thisa viable business? Do we want to pursue it after this class?
  • 43.
  • 44.
  • 45.
  • 47.
    THANK YOU Steve Blank,Jeff Epstein, Steve Weinstein Nick O’Connor + Sunil Nagaraj Classmates + TAs

Editor's Notes

  • #3 Need to add TAM and stuff
  • #4 Need to update with info
  • #6 We focused on talking to data scientists at companies but didn’t even have them in customer segment!
  • #8 From one of our earliest presentations - our focus on value prop was on the model selection
  • #17 Created MVP to assist with customer interviews
  • #18 Reached out to every company from techstars in CA, every company from YCombinator in California, and more - hundreds of customers
  • #31 New MVP
  • #33 EC2 price is based on the Oregon regional price. Compute Optimized - c3.8xlarge => 640 GB SSD, 60 GB RAM, 32 Xeon CPU’s, GPU instances - g2.2xlarge, 8 GPU cards. For CPU’s, we assume that 2 threads are needed per company and for GPU’s, each card is sufficient per company.