ARE YOU BETTER THAN AARE YOU BETTER THAN A
COIN TOSS?COIN TOSS?
BY JOHN OLIVER AND RICHARD WARBURTONBY JOHN OLIVER AND RIC...
WHO ARE WE?WHO ARE WE?
Why you should care
The Fundamentals
Practical Problems
Applying the Theory
'EXPERTS" AREN'T VERY GOOD'EXPERTS" AREN'T VERY GOOD
BIG DATA SOLVES ALLBIG DATA SOLVES ALL
KNOWN PROBLEMSKNOWN PROBLEMS
BIG DATABIG DATA SOLVES ALLSOLVES ALL
KNOWN PROBLEMSKNOWN PROBLEMS
... HELPS... HELPS
VALIDATION = TESTSVALIDATION = TESTS
FOR DATAFOR DATA
PART 1: FUNDAMENTALSPART 1: FUNDAMENTALS
NULL HYPOTHESISNULL HYPOTHESIS
Untilproven otherwise there is no relationship between
phenomena
WHEN YOU HEAR "WOLF!" THERE IS A WOLF NEARBYWHEN YOU HEAR "WOLF!" THERE IS A WOLF NEARBY
Cry"Wolf!" StayQuiet
Wolf Nearby ...
WHY IS THIS IMPORTANT?WHY IS THIS IMPORTANT?
Itis better thatten guilty persons escape than
thatone innocentsuffer
-William Blackstone
STATIC ANALYSISSTATIC ANALYSIS
COST BENEFIT ANALYSISCOST BENEFIT ANALYSIS
Costs alotto jailan innocentman
Costs verylittle to show someone an inappropria...
CHOOSE THE RIGHT MEASUREMENTCHOOSE THE RIGHT MEASUREMENT
There's more than one conceptof accuracy
RECALLRECALL
number of true positives /number of actuallytrue values
PRECISIONPRECISION
number of true positives /predicted true value
F MEASUREF MEASURE
CASE STUDY: MEMORY LEAKSCASE STUDY: MEMORY LEAKS
About~10%of our datasethad memoryleaks
Predict"never leaks memory"~= 0.9 ...
PROBLEM: RELIABILITY OF MEASUREMENTPROBLEM: RELIABILITY OF MEASUREMENT
RULE OF THUMBRULE OF THUMB
If itlooks like random noise, itprobablyis random noise.
SOLUTION: CHECK YOUR DATASOLUTION: CHECK YOUR DATA
Low Standard Deviation
Coefficientof Variation = Standard Deviation /Me...
CAVEAT: NON-NORMAL DISTRIBUTONSCAVEAT: NON-NORMAL DISTRIBUTONS
SOLUTION: GO MADSOLUTION: GO MAD
MEDIAN ABSOLUTE DEVIATIONMEDIAN ABSOLUTE DEVIATION
PROBLEM: EXPERIMENTAL FLUKESPROBLEM: EXPERIMENTAL FLUKES
IS YOUR A/B TEST A HEISEN TEST?IS YOUR A/B TEST A HEISEN TEST?
SOLUTION: P-VALUESOLUTION: P-VALUE
SCIENCE WORKS - B****ES!SCIENCE WORKS - B****ES!
PRACTICAL PROBLEMSPRACTICAL PROBLEMS
PART 2PART 2
PROBLEM: FALSE PROPHETSPROBLEM: FALSE PROPHETS
I'M AN EXPERT, LISTEN TO ME!I'M AN EXPERT, LISTEN TO ME!
SOLUTION: ESTABLISH GOALS AND HYPOTHESIS THEN TESTSOLUTION: ESTABLISH GOALS AND HYPOTHESIS THEN TEST
SOLUTIONSSOLUTIONS
PROBLEM: CODE QUALITYPROBLEM: CODE QUALITY
The math works :-) the code does not:-(
@headinthebox
GROWTH IN A TIME OF DEBTGROWTH IN A TIME OF DEBT
SOLUTION: SOFTWARE ENGINEERING PRACTICESSOLUTION: SOFTWARE ENGINEERING PRACTICES
Everyone Lies
-House
SOLUTION: UNDERSTAND BIASES AND DESIGNSOLUTION: UNDERSTAND BIASES AND DESIGN
AROUND THEMAROUND THEM
Gay couples should have an equal rightto get
married, notjustto have civil partnerships
Populus: 65%vs 27%
Marriage should...
ACQUIESCENCE BIASACQUIESCENCE BIAS
Answer yes if there’s apositive connotation
REMOVAL OF PARTICULAR ADVERTISING AND SPONSORSHIP BANSREMOVAL OF PARTICULAR ADVERTISING AND SPONSORSHIP BANS
FOR: 1045
AGA...
SOLUTION: PHRASE QUESTIONS NEUTRALLYSOLUTION: PHRASE QUESTIONS NEUTRALLY
And onlyhave one question
SOCIAL DESIRABILITYSOCIAL DESIRABILITY
Poor people overestimate their income, rich people under
estimate it.
SOLUTIONSSOLUTIONS
Anonymisation
Confidentiality
Randomized Response
Bogus Pipeline
BIAS TOWARDS THE FIRST ANSWER OF A QUESTIONBIAS TOWARDS THE FIRST ANSWER OF A QUESTION
Make sure to randomise the order of...
WHAT WILL THE NEXT CRISIS IN WASHINGTON BE?WHAT WILL THE NEXT CRISIS IN WASHINGTON BE?
Fightover the debtceiling
Difficult...
PROBLEM: CORRELATION DOESN’T IMPLY CAUSALITYPROBLEM: CORRELATION DOESN’T IMPLY CAUSALITY
DATABASE AND NETWORK ACTIVITY CORRELATINGDATABASE AND NETWORK ACTIVITY CORRELATING
Performance Diagnosis: was actuallyaGC ...
SOLUTION: DOMAIN KNOWLEDGESOLUTION: DOMAIN KNOWLEDGE
SOLUTIONSSOLUTIONS
Use domain knowledge -ask Pilots
Stratified sample sets
Measure outcomes -are planes survivingmore?
BE RIGOROUSBE RIGOROUS
PART 3: APPLYING THEPART 3: APPLYING THE
THEORYTHEORY
CORRELATIONCORRELATIONA MEASURE OF THE STRENGTH OF DEPENDENCE BETWEEN TWO VARIABLESA MEASURE OF THE STRENGTH OF DEPENDENCE...
PEARSON CORRELATIONPEARSON CORRELATION
Err...Justlook itup
(Assumes linear relationship)
Range Strength
<0.4 Weak/No Correlation
<0.7 Some Correlation
>0.7 StrongCorrelation
CASE STUDY: PERFORMANCE PROBLEM WITH HIGH SYSTEMCASE STUDY: PERFORMANCE PROBLEM WITH HIGH SYSTEM
TIMETIME
Hypothesis: caus...
Correlation Strength: 0.78453
MACHINE LEARNINGMACHINE LEARNING
Application of statistics to learn arelationship
HOW MANY CLUSTERS?HOW MANY CLUSTERS?
WHERE'S THE ELBOW?WHERE'S THE ELBOW?
FITTINGFITTING
FITTINGFITTING
SOLUTION:SOLUTION:
CROSS VALIDATIONCROSS VALIDATION
CHOOSE CROSS VALIDATION DATA WISELYCHOOSE CROSS VALIDATION DATA WISELY
SELF VALIDATINGSELF VALIDATING
Ensemble methods -Train lots of weak classifiers and merge
RANDOM FOREST AND BAGGINGRANDOM FOREST AND BAGGING
Divide the datainto bootstrap sets
Use the restfor calculatingerror
LEARNING CURVESLEARNING CURVES
HOW MUCH IS TOO MUCH?HOW MUCH IS TOO MUCH?
MONITOR PRODUCTION DATA...IT CHANGESMONITOR PRODUCTION DATA...IT CHANGES
Does itlook like the same datathatyou learntwith?
A/B TEST NEW SYSTEMSA/B TEST NEW SYSTEMS
Satisfaction/Profit/Traffic...
COMMON THREADSCOMMON THREADS
Trainingseterrors are misleading
Cross Validation, Production Monitored Values are the ones
t...
CONCLUSIONCONCLUSION
Analytics are increasinglyimportant
Wide varietyof statisticaland practicaltips to getthem right
Have...
@johno_oliver
@RichardWarburto
QUESTIONS?QUESTIONS?
http://insightfullogic.com
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Better than a coin toss
Upcoming SlideShare
Loading in …5
×

Better than a coin toss

1,041 views

Published on

So you’re a big data and distributed systems “expert”, you’ve collected 500 billion data points, thrown it into sci-lib-of-the-week, you’re using Hadoop, backing onto those cool AWS GPU instances, let it grind away for days and its spit out the answer to life the universe and everything. But is it really better than a coin toss?

How do you validate whether your data analysis algorithm works? Are you learning a solution to your problems or just the data you already have? What problems can you encounter when analysing your data? How do you solve them, and what can you do easily under the time pressures of a business environment?

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,041
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Better than a coin toss

  1. 1. ARE YOU BETTER THAN AARE YOU BETTER THAN A COIN TOSS?COIN TOSS? BY JOHN OLIVER AND RICHARD WARBURTONBY JOHN OLIVER AND RICHARD WARBURTON
  2. 2. WHO ARE WE?WHO ARE WE?
  3. 3. Why you should care The Fundamentals Practical Problems Applying the Theory
  4. 4. 'EXPERTS" AREN'T VERY GOOD'EXPERTS" AREN'T VERY GOOD
  5. 5. BIG DATA SOLVES ALLBIG DATA SOLVES ALL KNOWN PROBLEMSKNOWN PROBLEMS
  6. 6. BIG DATABIG DATA SOLVES ALLSOLVES ALL KNOWN PROBLEMSKNOWN PROBLEMS ... HELPS... HELPS
  7. 7. VALIDATION = TESTSVALIDATION = TESTS FOR DATAFOR DATA
  8. 8. PART 1: FUNDAMENTALSPART 1: FUNDAMENTALS
  9. 9. NULL HYPOTHESISNULL HYPOTHESIS Untilproven otherwise there is no relationship between phenomena
  10. 10. WHEN YOU HEAR "WOLF!" THERE IS A WOLF NEARBYWHEN YOU HEAR "WOLF!" THERE IS A WOLF NEARBY Cry"Wolf!" StayQuiet Wolf Nearby Ok False Negative Its reallyachicken! False Positive Ok
  11. 11. WHY IS THIS IMPORTANT?WHY IS THIS IMPORTANT?
  12. 12. Itis better thatten guilty persons escape than thatone innocentsuffer -William Blackstone
  13. 13. STATIC ANALYSISSTATIC ANALYSIS
  14. 14. COST BENEFIT ANALYSISCOST BENEFIT ANALYSIS Costs alotto jailan innocentman Costs verylittle to show someone an inappropriate house Credibility, Liberty, Moralityare also costs
  15. 15. CHOOSE THE RIGHT MEASUREMENTCHOOSE THE RIGHT MEASUREMENT There's more than one conceptof accuracy
  16. 16. RECALLRECALL number of true positives /number of actuallytrue values
  17. 17. PRECISIONPRECISION number of true positives /predicted true value
  18. 18. F MEASUREF MEASURE
  19. 19. CASE STUDY: MEMORY LEAKSCASE STUDY: MEMORY LEAKS About~10%of our datasethad memoryleaks Predict"never leaks memory"~= 0.9 accuracy, butF1 = 0 Our algorithm ~= 0.9 accuracyand F1 ~= 0.9
  20. 20. PROBLEM: RELIABILITY OF MEASUREMENTPROBLEM: RELIABILITY OF MEASUREMENT
  21. 21. RULE OF THUMBRULE OF THUMB If itlooks like random noise, itprobablyis random noise.
  22. 22. SOLUTION: CHECK YOUR DATASOLUTION: CHECK YOUR DATA Low Standard Deviation Coefficientof Variation = Standard Deviation /Mean
  23. 23. CAVEAT: NON-NORMAL DISTRIBUTONSCAVEAT: NON-NORMAL DISTRIBUTONS
  24. 24. SOLUTION: GO MADSOLUTION: GO MAD
  25. 25. MEDIAN ABSOLUTE DEVIATIONMEDIAN ABSOLUTE DEVIATION
  26. 26. PROBLEM: EXPERIMENTAL FLUKESPROBLEM: EXPERIMENTAL FLUKES
  27. 27. IS YOUR A/B TEST A HEISEN TEST?IS YOUR A/B TEST A HEISEN TEST?
  28. 28. SOLUTION: P-VALUESOLUTION: P-VALUE
  29. 29. SCIENCE WORKS - B****ES!SCIENCE WORKS - B****ES!
  30. 30. PRACTICAL PROBLEMSPRACTICAL PROBLEMS PART 2PART 2
  31. 31. PROBLEM: FALSE PROPHETSPROBLEM: FALSE PROPHETS
  32. 32. I'M AN EXPERT, LISTEN TO ME!I'M AN EXPERT, LISTEN TO ME!
  33. 33. SOLUTION: ESTABLISH GOALS AND HYPOTHESIS THEN TESTSOLUTION: ESTABLISH GOALS AND HYPOTHESIS THEN TEST SOLUTIONSSOLUTIONS
  34. 34. PROBLEM: CODE QUALITYPROBLEM: CODE QUALITY The math works :-) the code does not:-( @headinthebox
  35. 35. GROWTH IN A TIME OF DEBTGROWTH IN A TIME OF DEBT
  36. 36. SOLUTION: SOFTWARE ENGINEERING PRACTICESSOLUTION: SOFTWARE ENGINEERING PRACTICES
  37. 37. Everyone Lies -House
  38. 38. SOLUTION: UNDERSTAND BIASES AND DESIGNSOLUTION: UNDERSTAND BIASES AND DESIGN AROUND THEMAROUND THEM
  39. 39. Gay couples should have an equal rightto get married, notjustto have civil partnerships Populus: 65%vs 27% Marriage should continue to be defined as alife- long exclusive commitmentbetween aman and awoman Comres + Catholic Voices: 22%vs 70%
  40. 40. ACQUIESCENCE BIASACQUIESCENCE BIAS Answer yes if there’s apositive connotation
  41. 41. REMOVAL OF PARTICULAR ADVERTISING AND SPONSORSHIP BANSREMOVAL OF PARTICULAR ADVERTISING AND SPONSORSHIP BANS FOR: 1045 AGAINST: 731 ABSTAIN: 121 Motion Carried MAINTAINING AN ETHICAL UNION BY REAFFIRMING ADVERTISING AND SPONSORSHIP BANSMAINTAINING AN ETHICAL UNION BY REAFFIRMING ADVERTISING AND SPONSORSHIP BANS FOR: 858 AGAINST: 755 ABSTAIN: 166 Motion Carried
  42. 42. SOLUTION: PHRASE QUESTIONS NEUTRALLYSOLUTION: PHRASE QUESTIONS NEUTRALLY And onlyhave one question
  43. 43. SOCIAL DESIRABILITYSOCIAL DESIRABILITY Poor people overestimate their income, rich people under estimate it.
  44. 44. SOLUTIONSSOLUTIONS Anonymisation Confidentiality Randomized Response Bogus Pipeline
  45. 45. BIAS TOWARDS THE FIRST ANSWER OF A QUESTIONBIAS TOWARDS THE FIRST ANSWER OF A QUESTION Make sure to randomise the order of answers
  46. 46. WHAT WILL THE NEXT CRISIS IN WASHINGTON BE?WHAT WILL THE NEXT CRISIS IN WASHINGTON BE? Fightover the debtceiling Difficultyavertingautomatic cuts to the Pentagon Failure to pass basic budgetbills Allof the above http://www.foxnews.com/politics/elections/2012/you-decide/what-will-next-crisis-washington-be
  47. 47. PROBLEM: CORRELATION DOESN’T IMPLY CAUSALITYPROBLEM: CORRELATION DOESN’T IMPLY CAUSALITY
  48. 48. DATABASE AND NETWORK ACTIVITY CORRELATINGDATABASE AND NETWORK ACTIVITY CORRELATING Performance Diagnosis: was actuallyaGC Problem.
  49. 49. SOLUTION: DOMAIN KNOWLEDGESOLUTION: DOMAIN KNOWLEDGE
  50. 50. SOLUTIONSSOLUTIONS Use domain knowledge -ask Pilots Stratified sample sets Measure outcomes -are planes survivingmore?
  51. 51. BE RIGOROUSBE RIGOROUS
  52. 52. PART 3: APPLYING THEPART 3: APPLYING THE THEORYTHEORY
  53. 53. CORRELATIONCORRELATIONA MEASURE OF THE STRENGTH OF DEPENDENCE BETWEEN TWO VARIABLESA MEASURE OF THE STRENGTH OF DEPENDENCE BETWEEN TWO VARIABLES
  54. 54. PEARSON CORRELATIONPEARSON CORRELATION Err...Justlook itup (Assumes linear relationship)
  55. 55. Range Strength <0.4 Weak/No Correlation <0.7 Some Correlation >0.7 StrongCorrelation
  56. 56. CASE STUDY: PERFORMANCE PROBLEM WITH HIGH SYSTEMCASE STUDY: PERFORMANCE PROBLEM WITH HIGH SYSTEM TIMETIME Hypothesis: caused byDisk I/O
  57. 57. Correlation Strength: 0.78453
  58. 58. MACHINE LEARNINGMACHINE LEARNING Application of statistics to learn arelationship
  59. 59. HOW MANY CLUSTERS?HOW MANY CLUSTERS?
  60. 60. WHERE'S THE ELBOW?WHERE'S THE ELBOW?
  61. 61. FITTINGFITTING
  62. 62. FITTINGFITTING
  63. 63. SOLUTION:SOLUTION: CROSS VALIDATIONCROSS VALIDATION
  64. 64. CHOOSE CROSS VALIDATION DATA WISELYCHOOSE CROSS VALIDATION DATA WISELY
  65. 65. SELF VALIDATINGSELF VALIDATING Ensemble methods -Train lots of weak classifiers and merge
  66. 66. RANDOM FOREST AND BAGGINGRANDOM FOREST AND BAGGING Divide the datainto bootstrap sets Use the restfor calculatingerror
  67. 67. LEARNING CURVESLEARNING CURVES
  68. 68. HOW MUCH IS TOO MUCH?HOW MUCH IS TOO MUCH?
  69. 69. MONITOR PRODUCTION DATA...IT CHANGESMONITOR PRODUCTION DATA...IT CHANGES Does itlook like the same datathatyou learntwith?
  70. 70. A/B TEST NEW SYSTEMSA/B TEST NEW SYSTEMS Satisfaction/Profit/Traffic...
  71. 71. COMMON THREADSCOMMON THREADS Trainingseterrors are misleading Cross Validation, Production Monitored Values are the ones thatreallymatter Visualise and compare these errors
  72. 72. CONCLUSIONCONCLUSION Analytics are increasinglyimportant Wide varietyof statisticaland practicaltips to getthem right Have fun and Bestof luck!
  73. 73. @johno_oliver @RichardWarburto QUESTIONS?QUESTIONS? http://insightfullogic.com

×