Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science and Culture


Published on

Hiring data scientists and deploying Hadoop is not enough. Your company needs a data driven culture, based on values such as honesty, democracy, creativity and strategy. Your company also needs good data engineering and good experimentation practices.

Published in: Data & Analytics

Data Science and Culture

  1. 1. Data Science & Culture (Or how to stop worrying and love data driven culture) Ícaro Medeiros Data Science Forum São Paulo, Jun 2017
  2. 2. Inspired by (not limited to) refs
  3. 3. Big Data ✦ Fundamental blocks: evolutions on CS e.g. distributed systems, databases, massive AI, etc ✦ Fuzzy concept, ill-defined ✦ Popularized by Gartner
 (hype-fueled consulting firm)
  4. 4. ✦ Big Data no longer considered an emerging technology (pervasive in industry) ✦ Entered Trough of Disillusionment in 2013
  5. 5. Chronology of antecedents
  6. 6. Data science ✦ Statistics (late 19th century) ✦ Computer Science (1950s) ✦ Machine Learning (1950s) ✦ Data Mining (1990s) ✦ Data Science (2010s) yet another hyped term
  7. 7. Beware: controversy ✦ Data science is not all-science ✴ It’s getting more and more engineering-like, a practice ✴ Data storytelling is a creative endeavor ✦ Hyper-inflated expectations, misunderstood concepts and hurry to get value: a dangerous recipe
  8. 8. A new hope machine learning big data,big%20data or hype
  9. 9. Hype: not that bad ✦ Haters gonna hate i.e. don’t fully hate the hype ✴ more practitioners = faster tech and processes evolution ✴ Highly skilled professionals and innovation ✦ Academics sometimes look for difficult unwanted problems ✴ industry is more pragmatic, specially in tech
  10. 10. What we need… ✦ Forget about Big Data pokémons ✴ OH so in Big Data we don’t need people to think schemas? ✦ Forget about misunderstood business expectations ✴ OH in deep learning we don’t need people to train models? ✦ You need PEOPLE ✴ Collaborating with shared values ✴ Awesome in tech but more importantly: CREATIVE
  11. 11. Shared values and practices Culture
  12. 12. Good people ✦ People are more important than ideas ✴ A mediocre team will screw up a good idea ✴ Mediocre idea to great team: they will fix it or rethink it ✦ A good lab: different kinds of autonomous thinkers ✴ Why hire smart people if they can't fix what’s broken? ✦ Prefer a heterogeneous and complimentary team instead of looking for unicorns
  13. 13. The mythical 10x professional
  14. 14. Good communication ✦ Honesty, excellence, originality and self- criticism (values) ✦ Communication structure <> organizational ✦ Be ready to hear the truth ✴ Sincerity is only valuable if people are open and willing to give up on ideas that will not work ✦ Braintrust: Leave ego and Jobs outside the door
  15. 15. Power to the people! ✦ Product quality is everyone’s responsibility ✴ Don’t ask permission to take responsibility ✦ Passion and excellence versus autonomy ✦ Good things might shadow the bad ✴ People struggle to explore bad things to avoid being called “complainers”
  16. 16. Rebels
  17. 17. Destroy data silos! ✦ Without information about data there is no science ✦ Software and data should be a collective property within the company ✦ Knowledge management matter ✦ Communication between areas must be enforced
  18. 18. Data portals ✦ Self-service platforms to publish datasets ✴ Descriptions, schemas, samples, relations between datasets, etc ✦ Open Data initiatives, mostly governments ✦ OSS platforms: CKAN, AirBNB’s Dataportal ✦ Examples:,, etc
  19. 19. “When it comes to creative inspiration, job titles and hierarchy are meaningless”
  20. 20. Data storytelling ✦ Explain what numbers tell in layman, clear terms ✦ Make hidden premises clear ✴ Outside data insights ✦ Convince others about actions ✴ Decreases insights-to-value interval ✦ From data to knowledge
  21. 21. What is creativity ✦ Unexpected connections of concepts and ideas ✦ It's a marathon, it needs rhythm ✦ Creativity must start somewhere and there’s power on healthy feedback in a iterative process
  22. 22. Visual communication ✦ Clean straightforward graphs > visually appealing ✴ Choose dataviz libs wisely ✦ “Don’t make me think” ✦ The right graph for the right audience ✴ Prefer a language everyone understands
  23. 23. Visual communication 101
  24. 24. Stats are not enough
  25. 25. Stats are not enough
  26. 26. Strateg a
  27. 27. Avoid egotrip data science ✦ “OH my cluster has 10 Petabytes, I’m awesome” ✦ Fancy ML algorithms are not the goal ✦ The most important V in Big Data is value
  28. 28. KPI versus HiPPO ✦ Tech adoption per se is meaningless ✴ Slide-driven Big Data ✴ KPIs should grow from Big Data and data insights initatives ✦ Poor defined goals -> bad decisions ✦ Define viable but ambitious goals ✦ Data beats opinion
  29. 29. Set goal, plan and GO! ✦ Business questions can't be like “OH we want to detect things related to millennials” ✦ Clear goals must be set, with actionable metrics ✦ Balance perfect models versus time-to-market ✦ Brad Bird: “Sometimes, as a director, you’re guiding. Sometimes you’re letting the car drive”
  30. 30. The process ✦ The process is not the goal ✴ It has no agenda or taste, it’s just a tool ✦ Quality is the best business plan ✦ Agile is a mindset: not only kanbans or scrum ✦ If the model will become operational, mix scientists and engineers from start
  31. 31. Build vs Buy ✦ If you buy and your core business is not techie, you can be illiterate in tech ✴ Benchmark before buying ✴ Accelerate results and boost internal knowledge ✦ If you build and have a good-enough techie culture, you’re more or less good to go ✴ Assess pros and cons consciously ✦ If you surf the tech hype AND build good systems you’re awesome
  32. 32. When data goes to vendors…
  33. 33.
  35. 35. Big Data vs Great Data ✦ If your logical models do not make sense ✦ Most performed queries are slow ✦ If you have string-only databases ✦ If you have unused expensive data ✦ Maybe your data lake is a swamp
  36. 36. “The data is a mess” ✦ First step: accelerate human understanding of data ✴ Metadata, context, hidden assumptions ✦ Datasets might serve multiple purposes ✴ Define rationale and context ✴ Data portals and understandable datasets > Dashboards
  37. 37. Data lost in translation ✦ Heterogeneous and siloed databases (and people) ✦ Rethink ESB (microservices network) ✦ State-of-the-art: data workflow ✴ Luigi, Airflow (open source), almost every big tech vendor ✴ Transparency, reusability, reproducibility, traceability ✴ Automation and monitoring all the way!
  38. 38. Beyond relational models ✦ Not all data problems fits well in traditional SQL or DW models ✴ Key-value, columnar, graph-based, inverted index, etc ✦ Models are a framework for problem-solving ✴ Not the ultimate answer ✴ There’s no one-size-fits-all model
  39. 39. Do not forget fluency ✦ Check the company lingua franca ✦ Make it easy for critical decision-makers ✴ Adhoc SQL queries? ✴ Dashboards? ✴ Reports?
  41. 41. Experiments ✦ Missions to discover facts towards understanding ✴ They don’t fail, any result produces new information ✴ If the initial theory was wrong: good ✴ With new facts you can reformulate the question ✦ Get more modeling questions asked more often ✦ Iterative data science
  42. 42. Product experimentation (A/B) ✦ Product experimentation should be hypothesis- driven (not feature-driven) ✦ Define the proper exposed population ✴ No new users, no heavy users only, no early adopters ✦ Understanding effect is essential
  43. 43. 5 stages of A/B tests
  44. 44. Some other quick tips ✦ Focus on outcomes (not algorithms or methods) ✦ Design the right metric and evaluation ✦ Good experiments don't produce obvious insights ✦ Mix of data and intuition
  45. 45. Being data driven ✦ Be BAYESIAN - uncertainty is everywhere ✦ Be CURIOUS - keep learning ✦ Be AGILE - Fail fast, not too fast: evidence comes first
  46. 46. Being data driven ✦ Be TRUTHFUL - don’t torture data to please opinions ✦ Be HELPFUL - work across silos, support democracy ✦ Be WISE - know when to be analytical or intuitive
  47. 47. With the right people, Democracy, Creativity, Strategy, Big Great Data™ and Experiments there's a good chance to do great SCIENCE Take-away message
  48. 48. Ícaro Medeiros Data Scientist icaromedeiros