Dataiku  - google cloud platform roadshow - october 2013
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,014
On Slideshare
982
From Embeds
32
Number of Embeds
1

Actions

Shares
Downloads
23
Comments
0
Likes
1

Embeds 32

https://twitter.com 32

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Science Studio 19 customers Founded in January 2013 Data Science For Everyone
  • 2. (big) data(s) + machine learning + for practical applications = Data Science
  • 3. The Project (c) Dataiku 2013 - Confidential Hal Alowne BI Manager Dim’s Private Showroom Dim Sum CEO & Founder Dim’s Private Showroom Medium size e-commerce •  100M$ revenue •  1 Data Analyst Big Guys $10B + revenue 100+ Data Scientists Hey Hal ! We need a big data platform, like the big guys! Let’s just do as they do!
  • 4. Hal Wish #1
 Global Customer Value Funnel SEO NewsLetter Display Retargeting Display AdWords Marketplace Direct Sales Delivery View Basket Support Returns $ $ $ $ Orders
  • 5. Hal Wish #2
 Why people drop basket ? 9/30/13 5 Basket Payment refused Credit Refused Cheaper elsewhere ? Delivery costs ? Wait Xmas? ACTION
  • 6. Hal Wish #3
 What product to put on top ? 9/30/13 6 Original Most Popular on top Better Machine Learning Score (age/discount/margin…) Advanced Machine Learning Score + Personalization
  • 7. 9/30/13 7 Why is it so complicated ?
  • 8. Partner Data Spaghetti Mailing Partner DMP Partnerz Mail Optimizer Retargeter Market Data Providers Social z Networks
  • 9. Database are Full 9/30/13 9 1 TB BI Database 20 TB BI Database Any new computing job take > 1 day NEED FOR SCALE
  • 10. Architecture Bingo 9/30/13 10 BI Real-TimeBatch Real Real-Time Simple Queries Statistics Machine Learning Hive Pig Spark MongoDB ElasticSearch Cascading R
  • 11. Hadoop Ceph Sphere Cassandra Spark Scikit-Learn Mahout WEKA MLBase RapidMiner Panda D3 Crossfilter InfiniDB LucidDB Impala Elastic Search SOLR MongoDB Riak Membase Pig Hive Cascading Talend Machine Learning ! Mystery Land! Scalability Central!NoSQL-Slavia! SQL Columnar Republic! Vizualization County! Data Cleanup Wasteland! Statistician Old ! House! R
  • 12. Hal’s Bingo ! 9/30/13 12 HADOOP Google Cloud Platform Dataiku
  • 13. Dataiku Open Source Web Tracker (WT1) }  Apache License }  Javascript & IO }  Write directly to Google Cloud Storage }  Full Java, Easy To Deploy Step 1
 Get your own data 9/30/13 13 Silent in night Autoscale during Sales summer and winter
  • 14. Step 2
 Mix All Your Data 9/30/13 14 4 VMs on GCE Tracking Data Internal Data Partner Data Data Science Studio Pig Hive HADOOP auto-sync to BigQuery
  • 15. Step 3
 Mine your Data 9/30/13 15 Builtin Predictive Models Advanced Adhoc Models (R or Python) Shared Web Based Data Mining Platform
  • 16. }  January ◦  Choose Partner / Setup the architecture }  February ◦  Initial Deployment : 4TB ◦  Replace BI }  May ◦  New Applications (SEO, …) }  September ◦  Scale Deployment to 15TB ◦  Integrate all channels Typical Project Calendar 9/30/13 16
  • 17. }  Enhance Daily Report Availability ◦  Previous architecture –  Between H+17 and H+26 (!) ◦  Hadoop on GCE –  Between H+3 AND H+7 }  +21% Email Channel Optimization }  SEO plan optimization }  and a dozen BI Style “apps” Some Success For the Project 9/30/13 17
  • 18. Thank you ! 9/30/13 18 Follow us on twitter @dataiku Ask any big data question florian.douetteau@dataiku.com