Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect

1,404 views

Published on

As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that.

As Data Manager, you know the challenges ahead:

Multitudes of technology choices to make
Building a team and solving the skill-set disconnect
Data can be deceiving...
Figuring out what the successful data product must be
The goal of this talk is to provide some perspective to these topics

Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.

Published in: Technology
  • Be the first to comment

How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect

  1. 1. Hi ! I’m FLORIAN DOUETTEAU, CEO of Dataiku x 54 + x 1+ + 58 ++ It’s Me !! It’s our software !!
  2. 2. …and our software is The most complete Data Science platform Deployment
  3. 3. Dataiku - Data Tuesday Meet Hal Alowne Big Guys • 10B$+ Revenue • 100M+ customers • 100+ Data Scientist Hal Alowne BI Manager Dim’s Private Showroom Hey Hal ! We need a big data platform like the big guys. Let’s just do as they do! ‟ ”Average E-commerce Web site • 100M$ Revenue • 1 Million customer • 1 Data Analyst (Hal Himself) Dim Sum CEO & Founder Dim’s Private Showroom Big Data Copy Cat Project
  4. 4. Technology Disconnect 5
  5. 5. Welcome to Technoslavia !
  6. 6. LOL PLATFORM ANTI-PATTERN Test and Invest in Infrastructure == Skilled People or Go For Cloud / Packaged Infrastructure Your Brand New Hadoop Cluster is perceived as slow, not so used and not reliable
  7. 7. TECHNO MISMATCH ANTI-PATTERN Assume Being Polyglot or Be a Dictator VS VS The Python Clan The R Tribe The Old Elephant Fraternity The New Elephant Club
  8. 8. PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY Website 2000’ winners Companies that were able to release fast "Artificial Intelligence with Data for Internet of Things" 2010’ winners Companies able to put intelligence in production ? Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
  9. 9. PEOPLE DISCONNECT 10
  10. 10. Classic Business Intelligence Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor BI Solution Architect Model Designer ETL Developer Dashboard / Report Designer DBA / IT Data Owner Specs
  11. 11. Data Science Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Data Team Manager Data Engineer Data Analyst Data System Engineer / Data Architect Specs Data Scientist
  12. 12. Built From Scratch Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor DBA / IT Data Owner Specs DATA SCIENTISTS EVERYWHERE
  13. 13. Built From Engineering Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Specs DATA ENGINEERS DATA ANALYSTS
  14. 14. Built From Analysts Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Specs
  15. 15. Manage Expectations Data Plumberer Data Engineer Data Scientist Data Waiter Data Cleaner Data Analyst REAL JOB DREAM JOB
  16. 16. Perfectly Natural Hidden thoughts Business Project Sponsor Data Team Manager Data Engineer Data Analyst Data Scientist
  17. 17. Managing Extreme Personalities Data SCIENTIST Highly Creative Passionate Hard to hire ? Hard to manage ? Want to take your job ? Ambitious
  18. 18. Paired for Data Data Analyst Discover Patterns Data Engineer Make things work Fight data entropy Entropy tech entropy
  19. 19. When do you prefer ? One Analyst One Engineer One Data Scientist That work together ? Four data scientists
  20. 20. Data Disconnect 21
  21. 21. What is the main reason for data project to fail ? DATA NOT AVAILABLE
  22. 22. BUT FOR ONLY INCREMENTAL GAIN 50 30 20 0% 25% 50% 75% 100% Contribution to the overall project performance Business Goal Definition and Data Feature Engineering Algorithm
  23. 23. How to Get Data if you don’t have it THE GRASSHOPER THE SPIDER THE FOX
  24. 24. The Cicada : Optimistic and Opportunistic Data THE CICADA As a startup As a group inside a company - Build a new product using open data - Benefit from the data sharing initiative within your company - Wait for data to be available in your data lake
  25. 25. The Spider: Power of the Network THE SPIDER As a startup As a group inside a company - Create a network of (web trackers | sensors) - Make it available for free - Build your service on people’s collected data - Make a web service available to collect data - Promote it internally so that people use it
  26. 26. The Fox: Hunt for the Big Money first THE FOX As a startup As a group inside a company - Hunt for a Business Group within a large company with a problem - Build a SaaS solution using their data - Replicate to competitors - Take in a charge a critical problem as per the CEO’s request - Build your own integrated tech team to solve it - Use those ressources to reset data services internally
  27. 27. 29 PRODUCT DISCONNECT
  28. 28. What is Big Data about ?
  29. 29. The Age Of Distributed Intelligence Global, Personalised and Real Time Data Driven Services
  30. 30. Data to Visualize or Data to Automate ? 2013 2014 2015 2016 2017 2018 Automated Decision VIsualize To Decide Moving to a world of automated decision making
  31. 31. Involve product team Product Feature Personalised Item Ranking Product Feature Notify User Only when Needed Product Feature: Historical Data For Path Optimisation Have Product Management Deeply Involved In the Data Team
  32. 32. Where is your added value ? Is the problem at the Core of my Business Process? Is it a common problem / with share data ? Go for Best of Breed SAAS Solution Can I Solve it on my own ? Really ? Build by the data team Build by the data team ? Build by the data team Hire Consultants and Learn Yes Yes No I can’t Ok, I can try Yes! No! No
  33. 33. Be aware of the confort zone Mission Critical Small Structured Large Diverse Sheer Curiosity Reporting for Finance in Any Industry Analyze Each Tweet Web Navigation For E-Merchant Ticket Data For Discounts in Retail Phone Call Logs for Security RTB Data For Advertising Customer Consumption For Anti-Churn in Utilities Optimization Filings For Fraud in Insurance Not Enough Data To Learn From ? Not Enough “Hard" Examples So that you can learn
  34. 34. Create an "API" Culture Do not share • Random Piece of Code • Flat File Do share • Reproductible documented workflows • Clean, documented APIs
  35. 35. Food for thoughts www.dataiku.com/blog Free Data Science Software www.dataiku.com/dss THANK YOU ! Data Science Is no longer a science

×