Dataiku - google cloud platform roadshow - october 2013

  • 759 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
759
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
24
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Science Studio 19 customers Founded in January 2013 Data Science For Everyone
  • 2. (big) data(s) + machine learning + for practical applications = Data Science
  • 3. The Project (c) Dataiku 2013 - Confidential Hal Alowne BI Manager Dim’s Private Showroom Dim Sum CEO & Founder Dim’s Private Showroom Medium size e-commerce •  100M$ revenue •  1 Data Analyst Big Guys $10B + revenue 100+ Data Scientists Hey Hal ! We need a big data platform, like the big guys! Let’s just do as they do!
  • 4. Hal Wish #1
 Global Customer Value Funnel SEO NewsLetter Display Retargeting Display AdWords Marketplace Direct Sales Delivery View Basket Support Returns $ $ $ $ Orders
  • 5. Hal Wish #2
 Why people drop basket ? 9/30/13 5 Basket Payment refused Credit Refused Cheaper elsewhere ? Delivery costs ? Wait Xmas? ACTION
  • 6. Hal Wish #3
 What product to put on top ? 9/30/13 6 Original Most Popular on top Better Machine Learning Score (age/discount/margin…) Advanced Machine Learning Score + Personalization
  • 7. 9/30/13 7 Why is it so complicated ?
  • 8. Partner Data Spaghetti Mailing Partner DMP Partnerz Mail Optimizer Retargeter Market Data Providers Social z Networks
  • 9. Database are Full 9/30/13 9 1 TB BI Database 20 TB BI Database Any new computing job take > 1 day NEED FOR SCALE
  • 10. Architecture Bingo 9/30/13 10 BI Real-TimeBatch Real Real-Time Simple Queries Statistics Machine Learning Hive Pig Spark MongoDB ElasticSearch Cascading R
  • 11. Hadoop Ceph Sphere Cassandra Spark Scikit-Learn Mahout WEKA MLBase RapidMiner Panda D3 Crossfilter InfiniDB LucidDB Impala Elastic Search SOLR MongoDB Riak Membase Pig Hive Cascading Talend Machine Learning ! Mystery Land! Scalability Central!NoSQL-Slavia! SQL Columnar Republic! Vizualization County! Data Cleanup Wasteland! Statistician Old ! House! R
  • 12. Hal’s Bingo ! 9/30/13 12 HADOOP Google Cloud Platform Dataiku
  • 13. Dataiku Open Source Web Tracker (WT1) }  Apache License }  Javascript & IO }  Write directly to Google Cloud Storage }  Full Java, Easy To Deploy Step 1
 Get your own data 9/30/13 13 Silent in night Autoscale during Sales summer and winter
  • 14. Step 2
 Mix All Your Data 9/30/13 14 4 VMs on GCE Tracking Data Internal Data Partner Data Data Science Studio Pig Hive HADOOP auto-sync to BigQuery
  • 15. Step 3
 Mine your Data 9/30/13 15 Builtin Predictive Models Advanced Adhoc Models (R or Python) Shared Web Based Data Mining Platform
  • 16. }  January ◦  Choose Partner / Setup the architecture }  February ◦  Initial Deployment : 4TB ◦  Replace BI }  May ◦  New Applications (SEO, …) }  September ◦  Scale Deployment to 15TB ◦  Integrate all channels Typical Project Calendar 9/30/13 16
  • 17. }  Enhance Daily Report Availability ◦  Previous architecture –  Between H+17 and H+26 (!) ◦  Hadoop on GCE –  Between H+3 AND H+7 }  +21% Email Channel Optimization }  SEO plan optimization }  and a dozen BI Style “apps” Some Success For the Project 9/30/13 17
  • 18. Thank you ! 9/30/13 18 Follow us on twitter @dataiku Ask any big data question florian.douetteau@dataiku.com