Dataiku  - google cloud platform roadshow - october 2013
Upcoming SlideShare
Loading in...5
×
 

Dataiku - google cloud platform roadshow - october 2013

on

  • 918 views

 

Statistics

Views

Total Views
918
Views on SlideShare
886
Embed Views
32

Actions

Likes
1
Downloads
23
Comments
0

1 Embed 32

https://twitter.com 32

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Dataiku  - google cloud platform roadshow - october 2013 Dataiku - google cloud platform roadshow - october 2013 Presentation Transcript

  • Data Science Studio 19 customers Founded in January 2013 Data Science For Everyone
  • (big) data(s) + machine learning + for practical applications = Data Science
  • The Project (c) Dataiku 2013 - Confidential Hal Alowne BI Manager Dim’s Private Showroom Dim Sum CEO & Founder Dim’s Private Showroom Medium size e-commerce •  100M$ revenue •  1 Data Analyst Big Guys $10B + revenue 100+ Data Scientists Hey Hal ! We need a big data platform, like the big guys! Let’s just do as they do! View slide
  • Hal Wish #1
 Global Customer Value Funnel SEO NewsLetter Display Retargeting Display AdWords Marketplace Direct Sales Delivery View Basket Support Returns $ $ $ $ Orders View slide
  • Hal Wish #2
 Why people drop basket ? 9/30/13 5 Basket Payment refused Credit Refused Cheaper elsewhere ? Delivery costs ? Wait Xmas? ACTION
  • Hal Wish #3
 What product to put on top ? 9/30/13 6 Original Most Popular on top Better Machine Learning Score (age/discount/margin…) Advanced Machine Learning Score + Personalization
  • 9/30/13 7 Why is it so complicated ?
  • Partner Data Spaghetti Mailing Partner DMP Partnerz Mail Optimizer Retargeter Market Data Providers Social z Networks
  • Database are Full 9/30/13 9 1 TB BI Database 20 TB BI Database Any new computing job take > 1 day NEED FOR SCALE
  • Architecture Bingo 9/30/13 10 BI Real-TimeBatch Real Real-Time Simple Queries Statistics Machine Learning Hive Pig Spark MongoDB ElasticSearch Cascading R
  • Hadoop Ceph Sphere Cassandra Spark Scikit-Learn Mahout WEKA MLBase RapidMiner Panda D3 Crossfilter InfiniDB LucidDB Impala Elastic Search SOLR MongoDB Riak Membase Pig Hive Cascading Talend Machine Learning ! Mystery Land! Scalability Central!NoSQL-Slavia! SQL Columnar Republic! Vizualization County! Data Cleanup Wasteland! Statistician Old ! House! R
  • Hal’s Bingo ! 9/30/13 12 HADOOP Google Cloud Platform Dataiku
  • Dataiku Open Source Web Tracker (WT1) }  Apache License }  Javascript & IO }  Write directly to Google Cloud Storage }  Full Java, Easy To Deploy Step 1
 Get your own data 9/30/13 13 Silent in night Autoscale during Sales summer and winter
  • Step 2
 Mix All Your Data 9/30/13 14 4 VMs on GCE Tracking Data Internal Data Partner Data Data Science Studio Pig Hive HADOOP auto-sync to BigQuery
  • Step 3
 Mine your Data 9/30/13 15 Builtin Predictive Models Advanced Adhoc Models (R or Python) Shared Web Based Data Mining Platform
  • }  January ◦  Choose Partner / Setup the architecture }  February ◦  Initial Deployment : 4TB ◦  Replace BI }  May ◦  New Applications (SEO, …) }  September ◦  Scale Deployment to 15TB ◦  Integrate all channels Typical Project Calendar 9/30/13 16
  • }  Enhance Daily Report Availability ◦  Previous architecture –  Between H+17 and H+26 (!) ◦  Hadoop on GCE –  Between H+3 AND H+7 }  +21% Email Channel Optimization }  SEO plan optimization }  and a dozen BI Style “apps” Some Success For the Project 9/30/13 17
  • Thank you ! 9/30/13 18 Follow us on twitter @dataiku Ask any big data question florian.douetteau@dataiku.com