Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

You Don't Have to Be a Data Scientist to Do Data Science


Published on

Measurecamp March 2017 deck

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

You Don't Have to Be a Data Scientist to Do Data Science

  1. 1. You don’t have to be a Data Scientist to do Data Science @carmenmardiros (not a data scientist)
  2. 2. “Sexiest job of the 21st century” Why do I, a mere analyst, care?
  3. 3. The appeal of Data Science (for me as an analyst) Increase confidence My own and others’ in my analyses as the complexity of data and business ecosystem increases. Become more productive Speed up the analysis cycle from exploration to hypothesis to experimentation. Add value in new ways As the business and technology landscape changes. Operationalise analysis outcomes as data products.
  4. 4. “It’s just not for me...” “I don’t have a degree in statistics or programming.”
  5. 5. No confidence to attend the sessions. Worried I would not understand the content. Worried I’d be spotted as a fraud. (3m into my data science foray) Understood much of the content and terminology. Mentally thought questions others asked. I knew more than I thought I did. Predictive Analytics Summit 2013 Predictive Analytics Summit 2016
  6. 6. Doing data science requires a PhD/going back to school. Can’t do data science until you can write an algorithm. Bottom-up is the only way. Doing data science requires enthusiasm and confidence in ourselves. Can and should do data science once we’ve conceptually understood how and why the algorithm works. Top-down works. Provide value, learn as you go. Myth Truth
  7. 7. Adapt. Grow. Stay relevant.
  8. 8. Digital Analytics is changing fast Increasingly scientific approaches Essential as we move towards prescriptive analytics at speed. Become familiar with data science toolkit We will be key to bridging the gap between PhDs, machines and management. May even use it ourselves for our day-to-day work. Future-proof ourselves MS Office for Machine Learning coming soon at a cloud near you.
  9. 9. 3 Transformative Data Science techniques
  10. 10. #1 Resampling
  11. 11. The Bootstrap Number of observations: 100 Sample is representative (to the best of our knowledge). Observed mean: 17.54 months
  12. 12. The Bootstrap Draw 100 random samples with replacement. Calculate for each one the mean: [17.61, 16.21, 17.13, 14.08, 19.58 … ] # 100 Plot all means, the 2.5 and 97.5 percentiles and original observed mean. Bootstrap is extremely versatile: ● Fewer assumptions than parametric methods. ● Can be used on any statistic.
  13. 13. Simulations & Sensitivity Analysis Simple simulation: Given existing distribution of order values and a given range of possible conversion rates , how much £££ would we make if we doubled the traffic to our website? Sensitivity analysis (or how to open up black boxes): Given a predictive model, randomly generate new data points for each input based on observed distributions, create predictions using the model and interpret distribution of outcome scenarios.
  14. 14. Cross Validation Iterations 1 Train fold Train fold Train fold Train fold Test fold 2 Train fold Train fold Train fold Test fold Train fold 3 Train fold Train fold Test fold Train fold Train fold 4 Train fold Test fold Train fold Train fold Train fold 5 Test fold Train fold Train fold Train fold Train fold Assesses how well a predictive model generalises to unseen data.
  15. 15. Resampling Protects you from unsound inference Acknowledges and mitigates effects of variance and noise in the data. You already do this when you use confidence intervals. Quantify uncertainty more often. Paints possible future scenarios Leverages randomness and probability to give you glimpses into possible future outcomes. Embrace randomness. It's your ally into prescriptive analytics.
  16. 16. #2 Faceted visualisation
  17. 17. Segmented view, side-by-side Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
  18. 18. Segmented view, side-by-side Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
  19. 19. Segmented view, side-by-side Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
  20. 20. #3 Feature Engineering What?!
  21. 21. #3 Feature Engineering #3 Calculated Metrics or Content Groupings? Back on familiar territory.
  22. 22. Feature Engineering Examples Unique content views per user by content type # politics content views, # business content views # short/long-form content views Distribution of content seen per user % politics content views in total content viewed adjusted for uncertainty of small samples Result: fat user-level table of attributes and behaviour for analysis and modelling.
  23. 23. Feature Engineering Examples Infer trading calendar activities from data (for time series analysis) # new marketing campaigns (first date with sessions) # new brands launched (first date with pageviews) # voucher codes at peak redeem-rate (date with highest redeems) # AB tests started (date with first events tracked) # VIPs active on each date, etc Result: fat date-level table of leading KPIs and activities (model the ecosystem).
  24. 24. Feature Engineering New ways of capturing underlying phenomena Seasoned data scientists: Feature engineering often yields higher rewards than pushing the latest algorithms. You likely already do this, likely in Excel. It’s painful and limiting. Your analytical creativity needs better tools. SQL: The single most valuable tool in our toolkit. We become self-sufficient analysts.
  25. 25. Resources
  26. 26. Inspired? Learn Python -- start learning python for data science right now (no setup!). Learn Machine Learning Understand how algorithms using spreadsheets. Top-down approach. No programming required. Learn SQL