Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Citizen Data Science: A Practical Guide

157 views

Published on

Meet the newest analytics superstar: the citizen data scientist. Who is this person, and what is citizen data science anyways? Find out what all the fuss is about, and how to get yourself on track to break analytic barriers in your own organization.

Giuseppe Cascone, Solutions Engineer - Aleryx

Published in: Data & Analytics
  • Get Paid For Your Opinions! Earn $5-$10 cash on your first survey. ◆◆◆ https://bit.ly/2Ruzr8s
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Citizen Data Science: A Practical Guide

  1. 1. # A L T E R Y X 1 9 PRESENTED BY CITIZEN DATA SCIENCE: A PRACTICAL GUIDE GIUSEPPE CASCONE Sales Engineer, Alteryx gcascone@alteryx.com
  2. 2. # A L T E R Y X 1 9 FORWARD-LOOKING STATEMENTS This presentation includes “forward-looking statements” within the meaning of the Private Securities Litigation Reform Act of 1995. These forward-looking statements may be identified by the use of terminology such as “believe,” “may,” “will,” “intend,” “expect,” “plan,” “anticipate,” “estimate,” “potential,” or “continue,” or other comparable terminology. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product availability, growth and financial metrics and any statements regarding product roadmaps, strategies, plans or use cases. Although Alteryx believes that the expectations reflected in any of these forward-looking statements are reasonable, these expectations or any of the forward-looking statements could prove to be incorrect, and actual results or outcomes could differ materially from those projected or assumed in the forward-looking statements. Alteryx’s future financial condition and results of operations, as well as any forward-looking statements, are subject to risks and uncertainties, including but not limited to the factors set forth in Alteryx’s press releases, public statements and/or filings with the Securities and Exchange Commission, especially the “Risk Factors” sections of Alteryx’s Quarterly Report on Form 10-Q. These documents and others containing important disclosures are available at www.sec.gov or in the “Investors” section of Alteryx’s website at www.alteryx.com. All forward-looking statements are made as of the date of this presentation and Alteryx assumes no obligation to update any such forward-looking statements. Any unreleased services or features referenced in this or other presentations, press releases or public statements are only intended to outline Alteryx’s general product direction. They are intended for information purposes only, and may not be incorporated into any contract. This is not a commitment to deliver any material, code, or functionality (which may not be released on time or at all) and customers should not rely upon this presentation or any such statements to make purchasing decisions. The development, release, and timing of any features or functionality described for Alteryx’s products remains at the sole discretion of Alteryx.
  3. 3. # A L T E R Y X 1 9 COMPLETE SESSION SURVEYS ATTENTION 3 We want your feedback! Be sure to complete session surveys within the mobile app after this session. Surveys are anonymous, and we rely on your opinion for improvement Citizen Data Science: A Practical Guide
  4. 4. # A L T E R Y X 1 9 4 With Alteryx, I can… …help people to take their job to the next level! GIUSEPPE CASCONE When I use Alteryx, I feel… S a l e s E n g i n e e r @ A l t e r y x A L T E R Y X U S E R S I N C E 2 0 1 7
  5. 5. # A L T E R Y X 1 9 5Source: https://thispersondoesnotexist.com/
  6. 6. # A L T E R Y X 1 9 6 Source: http://news.bbc.co.uk/1/hi/5277090.stm MOO... MUUU… MOU!? MUH!
  7. 7. # A L T E R Y X 1 9Source: Gartner 7 THROUGH 2021, THE NUMBER OF CITIZEN DATA SCIENTISTS WILL GROW FIVE TIMES FASTER THAN THE NUMBER OF HIGHLY SKILLED DATA SCIENTISTS
  8. 8. # A L T E R Y X 1 9 8 GIUSEPPE CASCONE S a l e s E n g i n e e r @ A l t e r y x A L T E R Y X U S E R S I N C E 2 0 1 7
  9. 9. # A L T E R Y X 1 9 TRAITS OF A CITIZEN DATA SCIENTIST 9 Source: Adapted from Gartner, https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
  10. 10. # A L T E R Y X 1 9 10 TODAY’S AGENDA 1. Why Citizen Data Science? 2. What is Machine Learning in a business environment? 3. How do I get started? Skills? Technology? Frameworks?
  11. 11. # A L T E R Y X 1 9 TALENT FACTOR DATA SCIENTIST SCARCITY 11 Business Analysts ADVANCED SPREADSHEET + SQL USERS Data Scientists TODAY Business Analysts TOMORROW Data Scientists
  12. 12. # A L T E R Y X 1 9 THE EDGE CASES 12
  13. 13. # A L T E R Y X 1 9 13 JOEY SR BUSINESS ANALYST Highly experienced business analyst within the financial services sector. An adaptable, enthusiastic and result-driven individual with a track record of successfully delivering complex cross divisional digital transformation projects. …see more Skills Business Analysis Financial Services Change Management Leadership Critical Thinking Business Intelligence Data Analytics Microsoft Excel SQL
  14. 14. # A L T E R Y X 1 9 A TYPICAL DAY LAST MONTH 14 • Continuous reporting: automated report runs daily • Focus on business process improvement TODAY • Focus on reporting • 80% of time spent on manual, repetitive tasks • Hard to meet deadlines Day 0 ~Day 15 ~Day 30 Monthly reporting cycle Customer due for payment Action taken ~Day 35 Customer settles Further actions Day 0 Customer due for payment Day 1 Action taken Automated reporting Automation is key to free up time and resources to focus on business process improvements
  15. 15. # A L T E R Y X 1 9 15 • Business rules? • Ask the Data Science team? • Machine learning! • Wait, “machine learning”?! THE CHALLENGE
  16. 16. # A L T E R Y X 1 9 16 MACHINE LEARNING IS THE ITERATIVE PROCESS A COMPUTER FOLLOWS WHEN IT IS ASKED BY A HUMAN TO IDENTIFY PATTERNS IN A DATASET GIVEN SPECIFIC CONSTRAINTS
  17. 17. # A L T E R Y X 1 9 USE CASES 17 Source: Adapted from DHL, Artificial Intelligence in Logistics, 2018 (PDF, 45 pp.)
  18. 18. # A L T E R Y X 1 9 SUPERVISED LEARNING 18 HIS TORICAL DATA NE W DATA T RAIN M ODE L TRAINED MODEL PREDICTIONS
  19. 19. # A L T E R Y X 1 9 HOW TO RECOGNISE PROBLEMS THAT ML CAN SOLVE 19 Large number of columns in your dataset (10s – 100s) Explicit business rules are hard/impossible to figure out Structured and unstructured data (numerical, textual, visual, audio, …) Non-ML solutions have failed 1 2 4 3
  20. 20. # A L T E R Y X 1 9 20 LET’S GET STARTED
  21. 21. # A L T E R Y X 1 9 21 LET’S GET STARTED “THERE’S NO FREE LUNCH IN DATA SCIENCE…” Source: https://en.wikipedia.org/wiki/No_free_lunch_theorem/
  22. 22. # A L T E R Y X 1 9 THE PROBLEM SOLVING FRAME- WORK 22 Source: https://community.alteryx.com/t5/Data-Science-Blog/The-Data-Science-Lifecycle/ba-p/408625
  23. 23. # A L T E R Y X 1 9 23 Identify business question Which? How much/many? Actionability of the answers Is impact of actions measurable? Define business success criteria Have a baseline to compare with What is the incremental business value of better predictions? 1 2 3
  24. 24. # A L T E R Y X 1 9 24 DB queries, web scraping, requests to IT/data engineers… Data Catalogues 1 2 3 Historical data with known outcomes3
  25. 25. # A L T E R Y X 1 9 25
  26. 26. # A L T E R Y X 1 9 26 LOAN_NOT_REPAID CONTRACT_TYPE INCOME CREDIT YEARS_EMPLOYED EDUCATION 1 Cash loans £ 20,500.00 £ 15,000,000.00 0 Secondary 0 Cash loans £ 54,000.00 £ 10,000.00 12 Higher education 0 Cash loans £ 75,000.00 £ 5,000.00 5 Higher education 0 Cash loans £ 40,000.00 £ 3,500.00 32 Secondary 1 Cash loans £ 35,000.00 £ 2,500.00 1 Graduate Revolving loans £ 41,000.00 £ 4,000.00 5 Secondary 0 Cash loans £ 108,000.00 £ 5,000.00 7 Secondary 0 Cash loans £ 150,000.00 £ 6,500.00 8 Secondary 1 Cash loans £ 80,000.00 £ 12,000.00 1 Secondary 0 Cash loans £ 70,000.00 £ 7,500.00 6 Secondary 0 Cash loans £ 82,000.00 £ 6,500.00 2 Higher education 0 Cash loans £ 216,000.00 £ 4,850.00 4 Higher education 0 Revolving loans £ 3,500,000.00 £ 6,500.00 25 Secondary 0 Cash loans £ 112,500.00 £ 4,750.00 -3 Secondary Target variable Leading spaces Outlier Error (negative number) Too few occurrences Missing value Error (value out of range) Past application
  27. 27. # A L T E R Y X 1 9 27
  28. 28. # A L T E R Y X 1 9 SUPERVISED LEARNING 28 HIS TORICAL DATA NEW DATA T RAIN M ODE L TRAINED MODEL PREDICTIONS
  29. 29. # A L T E R Y X 1 9 TRAINING AND EVALUATING THE MODEL 29 E S T IM AT ION DATAS E T VALIDATION DATASET T RAIN M ODE L TRAINED MODEL MODEL VALIDATION HIS TORICAL DATA
  30. 30. # A L T E R Y X 1 9 PREDICTIVE MODELS BY TARGET VARIABLE 30 Predictive Models by Target Variable Categorical Binary Logistic Regression Boosted Model Decision Tree Forest Model Naïve Bayes Classifier Neural Network Multinomial Spline Model Decision Tree Forest Model Boosted Model Neural Network Naïve Bayes Classifier Numeric Count Count Regression Spline Model Decision Tree Forest Model Boosted Model Neural Network Continuous Linear Regression Gamma Model Spline Model Decision Tree Forest Model Boosted Model Neural Network Source: Adapted from https://community.alteryx.com/t5/Data-Science- Blog/Predictive-Process-Step-1-Finding-Your-Target-Variable/ba-p/401639
  31. 31. # A L T E R Y X 1 9 MODEL SELECTION 31 VALIDATION DATAS ET COM PARISON & S E LE CT ION M O D E L 1 … M O D E L 2 M O D E L N
  32. 32. # A L T E R Y X 1 9 MODEL COMPARISON 32Further readings: https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c
  33. 33. # A L T E R Y X 1 9 33 Estimation (or training) set – 70% Validation (or test) set – 30% Train and test all appropriate models Select the best performing model(s) 1 2 3
  34. 34. # A L T E R Y X 1 9 34 Accuracy vs Interpretability How do business users consume the results of a model? 1 2 Read more: https://towardsdatascience.com/interpretable-machine-learning- 1dec0f2f3e6b
  35. 35. # A L T E R Y X 1 9 INTERPRETATION 35 N U M B E R O F E M P L O Y E E S Y ( X ) = A * X + B A V G # O F T I C K E T S I N D U S T R Y A V G # O F T I C K E T S # E M P L O Y E E S L O C A T I O N 10 20 30 40 50 60 70 80 100 200 300 400 500 600 700 800 900 1000 AVERAGE NUMBER OF TICKETS SOLD 10 20 30 40 50 60 70 80 100 200 300 400 500 600 700 800 900 1000 AVERAGE NUMBER OF TICKETS SOLD N U M B E R O F E M P L O Y E E S
  36. 36. # A L T E R Y X 1 9 A MODERN APPROACH D E P L OY M O R E P R E D I C T I V E M O D E L S , FA S T E R 36 YOUR APPLICATION D E P L OY M A N A G E M O N I T O R YOUR CONSUMERSDATA SCIENTISTS Build custom analytic models in R or Python CITIZEN DATA SCIENTISTS Leverage models via analytic apps and Assisted Modeling REAL TIME SCHEDULED PROCESSES YOUR BUSINESS ON DEMAND
  37. 37. # A L T E R Y X 1 9 ENTHUSIAST V SKEPTIC• 75% prediction accuracy • Early customer notifications led to avg. 15 day payment improvement • +10% customers acquired
  38. 38. # A L T E R Y X 1 9 38 - Alteryx Community – Data Science Blog - Predictive Training by Udacity & Alteryx - Kaggle - Towardsdatascience LEARNING RESOURCES FIVE KEY POINTS Automate Start with business (not data!) Framework is your light Communicate results Take action! 1 2 4 5 3
  39. 39. # A L T E R Y X 1 9 THANK YOU Sales Engineer, Alteryx 39 gcascone@alteryx.com www.alteryx.com GIUSEPPE CASCONE
  40. 40. # A L T E R Y X 1 9 BEFORE YOU LEAVE ATTENTION 40 B E F O R E YO U L E AV E … please take a moment to complete your evaluation survey within the mobile app. Surveys are anonymous, and we rely on your opinion for improvement Citizen Data Science: A Practical Guide

×