Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Opinionated Analysis Development -- EARL SF Keynote

2,107 views

Published on

Keynote presentation at the EARL San Francisco 2017 Conference.

https://earlconf.com/sanfrancisco/

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Opinionated Analysis Development -- EARL SF Keynote

  1. 1. OPINIONATED ANALYSIS DEVELOPMENT HILARY PARKER @HSPTER
  2. 2. Not So Standard Deviations
  3. 3. Reproducible Accurate Collaborative
  4. 4. SOFTWARE DEVELOPER SOFTWARE ENGINEER
  5. 5. NARRATIVE
  6. 6. • What models are you going to use? • What argument are you going to make? • How are you going to convince your audience of something?
  7. 7. TECHNICAL ARTIFACT
  8. 8. • What tools are you going to use to make the deliverable? • What technical and coding choices will you make for producing the deliverable?
  9. 9. mastering the statistical tooling :: mastering a musical instrument
  10. 10. mastering the statistical tooling :: mastering a musical instrument statistical theory :: musical theory
  11. 11. mastering the statistical tooling :: mastering a musical instrument statistical theory :: musical theory full analysis :: piece of music
  12. 12. disclaimer: Karl is extremely nice, helpful & non-judgmental
  13. 13. BAD ANALYST? :(
  14. 14. ment
  15. 15. WHY?
  16. 16. don’t want to “limit creativity” embarrassed by personal process
  17. 17. WHY DOES THAT MATTER?
  18. 18. There are known, common problems when developing an analysis Making these problems easy to avoid frees up time for the creativity of developing the narrative
  19. 19. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. You change code but don’t re-execute it, so the results are out of date. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. A second analyst can’t understand your code. Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice. You make a mistake in your code. You use inefficient code. A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  20. 20. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. You change code but don’t re-execute it, so the results are out of date. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. A second analyst can’t understand your code. Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice. You make a mistake in your code. You use inefficient code. A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project. Reproducible Accurate Collaborative
  21. 21. OPERATIONS
  22. 22. Switch from “blaming” a person to blaming the process that they used.
  23. 23. The person did not want to make an error, and acted in a way that she thought wouldn’t create an error. The current system failed the person with good intentions
  24. 24. AN EXAMPLE OF A SUCCESSFUL(ISH) POST-MORTEM
  25. 25. ” “ — Hilary Parker, college essay
  26. 26. THE INCIDENT: BROTHER LOSES HIS CD BINDER ON A PLANE
  27. 27. ” “ — Hilary Parker, college essay
  28. 28. ” “ — Hilary Parker, college essay TYPE A QUOTE HERE.
  29. 29. • Identify the incident, record what happened • Change the system to make it hard for a person acting in good faith to have the incident occur • Use outside resources and best practices to inform decisions • Make trade-offs
  30. 30. BLAMEFUL PROCESS
  31. 31. ” “ — Mom or Dad “ALEX, KEEP BETTER TRACK OF YOUR CDS NEXT TIME!”
  32. 32. ” “ — Boss “CHECK YOUR ANALYSIS MORE CAREFULLY NEXT TIME!”
  33. 33. ” “ — John Allspaw An engineer who thinks they’re going to be reprimanded is disincentivized to give the details necessary to get an understanding of the mechanism, pathology, and operation of the failure. https://codeascraft.com/2012/05/22/blameless-postmortems/
  34. 34. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. You change code but don’t re-execute it, so the results are out of date. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. A second analyst can’t understand your code. Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice. You make a mistake in your code. You use inefficient code. A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  35. 35. Incident Happens Blameless postmortem: discuss system changes to prevent error weigh cost of implementing these changes Adopt new process
  36. 36. Executable analysis scripts You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. Defined dependencies An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date. Version control (individual) You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. Code review A second analyst can’t understand your code. Modular, tested code Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Assertive testing of data, Your data becomes corrupted, but you don’t notice. assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice. Code review You make a mistake in your code. You use inefficient code. Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. Issue tracking You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  37. 37. Executable analysis scripts You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. Defined dependencies An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date. Version control (individual) You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. Code review A second analyst can’t understand your code. Modular, tested code Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Assertive testing of data, Your data becomes corrupted, but you don’t notice. assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice. Code review You make a mistake in your code. You use inefficient code. Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. Issue tracking You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project. “Opinions”
  38. 38. EXECUTABLE ANALYSIS SCRIPTS
  39. 39. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data.
  40. 40. DEFINED DEPENDENCIES
  41. 41. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated.
  42. 42. WATCHERS FOR CHANGED DATA AND CODE
  43. 43. You change code but don’t re-execute it, so the results are out of date.
  44. 44. MODULAR, TESTED CODE
  45. 45. You can’t re-use logic in different parts of the analysis. You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice.
  46. 46. VERSION CONTROL
  47. 47. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. Two analysts want to combine code but cannot.
  48. 48. ASSERTIVE TESTING OF DATA, ASSUMPTIONS AND RESULTS
  49. 49. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice.
  50. 50. CODE REVIEW
  51. 51. You make a mistake in your code. You use inefficient code.
  52. 52. ISSUE TRACKING
  53. 53. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  54. 54. WHY DID YOU SPEND MOST OF YOUR KEYNOTE TIME ON PROCESS?
  55. 55. WITHOUT GOOD PROCESS, OPINIONS CAN CREATE TOXIC CULTURE
  56. 56. 30+ YEARS OF BEING OPINIONATED
  57. 57. ” “ — a friend at his birthday party HILARY, I NEVER FIGHT WITH ANYONE ELSE EXCEPT FOR YOU
  58. 58. ” “ –John Allspaw A funny thing happens when engineers make mistakes and feel safe when giving details about it: they are not only willing to be held accountable, they are also enthusiastic in helping the rest of the company avoid the same error in the future. They are, after all, the most expert in their own error. They ought to be heavily involved in coming up with remediation items. https://codeascraft.com/2012/05/22/blameless-postmortems/
  59. 59. ” “ –helpful (but misguided) coworker “You should be using GitHub.”
  60. 60. ” “ “You should be using GitHub.” “If this is a project where you want to compare past and future results, you might want to use version control. GitHub is a good option, and RStudio integrates well with it.” –actually helpful coworker
  61. 61. ” “ “Excel is the worst.” –helpful (but misguided) coworker
  62. 62. ” “ “Excel is the worst.” “Excel does make it hard to create scripts. If you want to redo your analysis frequently, writing a script will take time now but save you lots of time later.” –actually helpful coworker
  63. 63. Shift from “I know better than you” to “lots of people have run into this problem before, and here’s software that makes a solution easy”
  64. 64. WHY “OPINIONATED”?
  65. 65. ” “ — Stuart Eccles Opinionated Software is a software product that believes a certain way of approaching a business process is inherently better and provides software crafted around that approach. https://medium.com/@stueccles/the-rise-of-opinionated-software-ca1ba0140d5b#.by7aq2c3c
  66. 66. ” “ –David Heinemeier Hansson, creator of Ruby on Rails It’s a strong disagreement with the conventional wisdom that everything should be configurable, that the framework should be impartial and objective. In my mind, that’s the same as saying that everything should be equally hard.
  67. 67. This is bowling. There are rules.
  68. 68. This is bowling. There are rules. analysis
  69. 69. This is bowling. There are rules. analysis development
  70. 70. DEFINE THE PROCESS: ANALYSIS DEVELOPMENT
  71. 71. CREATE AND ENCOURAGE A BLAMELESS CULTURE
  72. 72. FRAME DISCUSSIONS AROUND INCIDENTS. DISCUSS OPINIONS, NOT SOFTWARE CHOICES
  73. 73. ENCOURAGE SOFTWARE MAKERS TO CREATE OPINIONATED ANALYSIS SOFTWARE
  74. 74. THANKS!

×