Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building & Scaling Data Teams


Published on

An exciting talk on the main difficulties and how to overcome them when building and scaling data teams with Florian Douetteau

- Technological issues: What stack should they choose for the company’s architecture? And what about big data technologies; should they accept being a polyglot or rather assume being a ruthless dictator?

- HR issues: Who should they hire? Should they build their data team as an extension of the BI team? Or should they build it from scratch?

- Data issues: How are they supposed to get data inside his data lake? Which strategy should they adopt: the cicada, the spider or the fox one?

- Product issues: What is big data really about? And eventually, what are they willing to do with this bunch of data?

The talk aims at demonstrating how tough it can be to build and scale a data department, and at giving some insights about the strategy Florian thinks they should adopt.

Published in: Business
  • Be the first to comment

Building & Scaling Data Teams

  1. 1. Data Science Software Platform Building A Data Team @outreachdigit
  2. 2. Meet HAL Hal AlowneBI Manager Dim’s Private Showroom @outreachdigit
  3. 3. Meet Hal’s Boss, DIM Hey Hal ! We need a big data platform like the big guys. Just do what they’re doing! ‟ ”Big Data Copy Cat Project @outreachdigit
  4. 4. TECHNOLOGY DisconnectWhat technologies should I use ?‟ ” @outreachdigit
  5. 5. Welcome to Technoslavia ! 5 @outreachdigit
  6. 6. TOY PLATFORM ANTI-PATTERN 6 Test and Invest in Infrastructure == Skilled People or Go For Cloud / Packaged Infrastructure Your Brand New Hadoop Cluster is perceived as slow, not so used and not reliable @outreachdigit
  7. 7. TECHNO MISMATCH ANTI-PATTERN 7 Assume Being Polyglot or Be a Dictator VS VS The Python Clan The R Tribe The Old Elephant Fraternity The New Elephant Club @outreachdigit
  8. 8. PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY 8 Website 2000’ winners Companies that were able to release fast "Artificial Intelligence with Data for Internet of Things" 2010’ winners Companies able to put intelligence in production ? Design a way to put “PREDICTIVE MODELS” IN PRODUCTION @outreachdigit
  9. 9. PEOPLE Disconnect Who should I hire ? ‟ ” @outreachdigit
  10. 10. Classic Business Intelligence Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor BI Solution Architect Model Designer ETL Developer Dashboard / Report Designer Specs Dim Big Boss @outreachdigit
  11. 11. Data Science Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Data Engineer Data Analyst System Engineer / Data Architect Business Needs Data Scientist IT Constraints I.T. @outreachdigit
  12. 12. Manage Expectations 12 Data Plumberer Data Engineer Data Scientist Data Waiter Data Cleaner Data Analyst REAL JOB DREAM JOB @outreachdigit
  13. 13. Managing Extreme Personalities 13 Data Scientist Highly Creative Passionate Hard to hire? Hard to manage? Want to take Hal’s job?Ambitious Hard to retain? @outreachdigit
  14. 14. Paired for Data 14 Data Analyst Discover Patterns Data Engineer Make things work Fight data entropy Fight tech entropy @outreachdigit
  15. 15. What do you prefer? 15 One Analyst
 One Engineer
 One Data Scientist Four data scientists OR @outreachdigit
  16. 16. Two Mindsets Can Coexist CLICKERS CODERS @outreachdigit 16
  17. 17. DATA Disconnect What about data ? ‟ ” @outreachdigit
  18. 18. What is the main reason for data project to fail ? 18 > DATA NOT 
 AVAILABLE @outreachdigit
  19. 19. BUT FOR ONLY INCREMENTAL GAIN Contribution to the overall project performance 0% 25% 50% 75% 100% 20%30%50% Business Goal Definition and Data Feature Engineering Algorithm @outreachdigit 19
  20. 20. How to Get Data if you don’t have it 20 THE GRASSHOPER THE SPIDER THE FOX @outreachdigit
  21. 21. @outreachdigit 21
  22. 22. The Cicada : Optimistic and Opportunistic Data 22 THE CICADA As a startup As a group inside a company - Build a new product using open data - Benefit from the data sharing initiative within your company - Wait for data to be available in your data lake @outreachdigit
  23. 23. The Spider: Power of the Network 23 THE SPIDER As a startup As a group inside a company - Create a network of (web trackers | sensors) - Make it available for free - Build your service on people’s collected data - Make a web service available to collect data - Promote it internally so that people use it @outreachdigit
  24. 24. The Fox: Hunt for the Big Money first 24 THE FOX As a startup As a group inside a company - Hunt for a Business Group within a large company with a problem - Build a SaaS solution using their data - Replicate to competitors - Take in a charge a critical problem as per the CEO’s request - Build your own integrated tech team to solve it - Use those ressources to reset data services internally @outreachdigit
  25. 25. PRODUCT Disconnect @outreachdigit
  26. 26. What is Big Data about ? @outreachdigit
  27. 27. The Age Of Distributed Intelligence 27 Global, Personalised and Real Time Data Driven Services @outreachdigit
  28. 28. Data to Visualize or Data to Automate ? 2013 2014 2015 2016 2017 2018 Moving to a world of automated decision making 28 DATA FOR MORE INSIGHTS DATA FOR AUTOMATED DECISIONS @outreachdigit
  29. 29. Involve Product Team 29 Product Feature Personalised Item Ranking Product Feature Notify User Only when Needed Product Feature: Historical Data For Path Optimisation Have Product Management Deeply Involved In the Data Team @outreachdigit
  30. 30. Focus on your added value 30 Build by the Data Team Is the problem at the Core of my Business Process? Is it a common problem / with share data? Can i solve it on my own? Really? Hire Consultant and Learn Build by the Data Team Go for Best of Breed SaaS Solution Build by the Data Team? YesNo No Yes No Yes No Yes @outreachdigit
  31. 31. Create an API culture Do not share o Random Piece of Code o Flat File o Email Do share ✓ Reproductible documented workflows ✓ Clean, documented APIs @outreachdigit
  32. 32. Did Hal found his solutions ? Technology Data People Product Polyglot on top of open source Find a way to make clickers and coders work together Create an API culture and involve the product teams Hunt for Big Problems and Convince the CEOIs this the end ?‟ ” Hal Alowne BI Manager Dim’s Private Showroom @outreachdigit 32
  33. 33. @outreachdigit
  34. 34. DATA OR SLAVERY ? @outreachdigit
  35. 35. Objective Alignment Autonomous Vehicles Need Experimental Ethics: Are We Ready for Utilitarian Cars? @outreachdigit 35
  36. 36. Data-Driven Artificial Sales & Marketing ? ARTIFICIAL SUPERVISOR Please Call The customers Please Call again Could you add a JOKE at the end of this email I need you to ATTEND A physical meeting Here is the BRIEF Analyzing continuously prospect behavior on 
 social networks, applications and websites @outreachdigit 36
  37. 37. I don’t know the answer but here 
 a free software for the data addicts in 
 your company data scientists and engineers 25 by the numbers for clickers and coders 3000 lovely users 80 customers by the customers @outreachdigit
  38. 38. Food for thoughts THANK YOU ! FREE (as in Beer) Software @outreachdigit
  39. 39. Car Sharing Worldwide Leader Flash Sales Worldwide Leader One Mission : 
 Never leave Hal ALONE 3,700 Hotels Worldwide 2500 lovely users by the numbers 70 customers @outreachdigit
  40. 40. My nerdy background Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection and Vms Graph Analytics Chess IA Natural Language Processing 80% Emacs / 20% VIM @outreachdigit
  41. 41. So to sum it up … 41 I (USED TO) 
 GUIs @outreachdigit
  42. 42. …and our software is A supercharged Visual IDE for Data Teams Deployment @outreachdigit