0
Data
Science
Studio
19 customers
Founded in
January 2013
Data Science
For Everyone
(big) data(s)
+ machine learning
+ for practical applications
= Data Science
The Project
(c) Dataiku 2013 - Confidential
Hal Alowne
BI Manager
Dim’s Private Showroom
Dim Sum
CEO & Founder
Dim’s Priva...
Hal Wish #1

Global Customer Value Funnel
SEO
NewsLetter
Display
Retargeting
Display
AdWords Marketplace
Direct Sales
Deli...
Hal Wish #2

Why people drop basket ?
9/30/13 5
Basket
Payment refused
Credit Refused
Cheaper elsewhere ?
Delivery costs ?...
Hal Wish #3

What product to put on top ?
9/30/13 6
Original
Most Popular on top
Better
Machine Learning Score
(age/discou...
9/30/13 7
Why is it
so
complicated
?
Partner Data Spaghetti
Mailing
Partner
DMP
Partnerz
Mail
Optimizer
Retargeter
Market
Data
Providers
Social z
Networks
Database are Full
9/30/13 9
1 TB
BI Database
20 TB
BI Database
Any new computing job take
> 1 day
NEED FOR SCALE
Architecture Bingo
9/30/13 10
BI Real-TimeBatch Real Real-Time
Simple
Queries
Statistics
Machine
Learning
Hive
Pig
Spark
M...
Hadoop
Ceph
Sphere
Cassandra
Spark
Scikit-Learn
Mahout
WEKA
MLBase
RapidMiner
Panda
D3
Crossfilter
InfiniDB
LucidDB
Impala...
Hal’s Bingo !
9/30/13 12
HADOOP
Google Cloud Platform
Dataiku
Dataiku
Open Source Web Tracker
(WT1)
}  Apache License
}  Javascript & IO
}  Write directly to Google
Cloud Storage
}...
Step 2

Mix All Your Data
9/30/13 14
4 VMs on GCE
Tracking Data
Internal Data
Partner Data
Data Science Studio
Pig
Hive
HA...
Step 3

Mine your Data
9/30/13 15
Builtin Predictive Models
Advanced Adhoc Models
(R or Python)
Shared Web Based
Data Mini...
}  January
◦  Choose Partner / Setup the architecture
}  February
◦  Initial Deployment : 4TB
◦  Replace BI
}  May
◦  N...
}  Enhance Daily Report Availability
◦  Previous architecture
–  Between H+17 and H+26 (!)
◦  Hadoop on GCE
–  Between ...
Thank you !
9/30/13 18
Follow us on twitter
@dataiku
Ask any big data question
florian.douetteau@dataiku.com
Upcoming SlideShare
Loading in...5
×

Dataiku - google cloud platform roadshow - october 2013

957

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
957
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Dataiku - google cloud platform roadshow - october 2013"

  1. 1. Data Science Studio 19 customers Founded in January 2013 Data Science For Everyone
  2. 2. (big) data(s) + machine learning + for practical applications = Data Science
  3. 3. The Project (c) Dataiku 2013 - Confidential Hal Alowne BI Manager Dim’s Private Showroom Dim Sum CEO & Founder Dim’s Private Showroom Medium size e-commerce •  100M$ revenue •  1 Data Analyst Big Guys $10B + revenue 100+ Data Scientists Hey Hal ! We need a big data platform, like the big guys! Let’s just do as they do!
  4. 4. Hal Wish #1
 Global Customer Value Funnel SEO NewsLetter Display Retargeting Display AdWords Marketplace Direct Sales Delivery View Basket Support Returns $ $ $ $ Orders
  5. 5. Hal Wish #2
 Why people drop basket ? 9/30/13 5 Basket Payment refused Credit Refused Cheaper elsewhere ? Delivery costs ? Wait Xmas? ACTION
  6. 6. Hal Wish #3
 What product to put on top ? 9/30/13 6 Original Most Popular on top Better Machine Learning Score (age/discount/margin…) Advanced Machine Learning Score + Personalization
  7. 7. 9/30/13 7 Why is it so complicated ?
  8. 8. Partner Data Spaghetti Mailing Partner DMP Partnerz Mail Optimizer Retargeter Market Data Providers Social z Networks
  9. 9. Database are Full 9/30/13 9 1 TB BI Database 20 TB BI Database Any new computing job take > 1 day NEED FOR SCALE
  10. 10. Architecture Bingo 9/30/13 10 BI Real-TimeBatch Real Real-Time Simple Queries Statistics Machine Learning Hive Pig Spark MongoDB ElasticSearch Cascading R
  11. 11. Hadoop Ceph Sphere Cassandra Spark Scikit-Learn Mahout WEKA MLBase RapidMiner Panda D3 Crossfilter InfiniDB LucidDB Impala Elastic Search SOLR MongoDB Riak Membase Pig Hive Cascading Talend Machine Learning ! Mystery Land! Scalability Central!NoSQL-Slavia! SQL Columnar Republic! Vizualization County! Data Cleanup Wasteland! Statistician Old ! House! R
  12. 12. Hal’s Bingo ! 9/30/13 12 HADOOP Google Cloud Platform Dataiku
  13. 13. Dataiku Open Source Web Tracker (WT1) }  Apache License }  Javascript & IO }  Write directly to Google Cloud Storage }  Full Java, Easy To Deploy Step 1
 Get your own data 9/30/13 13 Silent in night Autoscale during Sales summer and winter
  14. 14. Step 2
 Mix All Your Data 9/30/13 14 4 VMs on GCE Tracking Data Internal Data Partner Data Data Science Studio Pig Hive HADOOP auto-sync to BigQuery
  15. 15. Step 3
 Mine your Data 9/30/13 15 Builtin Predictive Models Advanced Adhoc Models (R or Python) Shared Web Based Data Mining Platform
  16. 16. }  January ◦  Choose Partner / Setup the architecture }  February ◦  Initial Deployment : 4TB ◦  Replace BI }  May ◦  New Applications (SEO, …) }  September ◦  Scale Deployment to 15TB ◦  Integrate all channels Typical Project Calendar 9/30/13 16
  17. 17. }  Enhance Daily Report Availability ◦  Previous architecture –  Between H+17 and H+26 (!) ◦  Hadoop on GCE –  Between H+3 AND H+7 }  +21% Email Channel Optimization }  SEO plan optimization }  and a dozen BI Style “apps” Some Success For the Project 9/30/13 17
  18. 18. Thank you ! 9/30/13 18 Follow us on twitter @dataiku Ask any big data question florian.douetteau@dataiku.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×