Not Waving But DrowningUnderstanding DataAndrew Hingstonswitchsolutions.com.au
Not Waving But Drowningby Stevie SmithNobody heard him, the dead man,But still he lay moaning:I was much further out than you thought And not waving but drowning.Poor chap, he always loved larkingAnd now he's deadIt must have been too cold for him his	heart gave way,They said.Oh, no nono, it was too cold always(Still the dead one lay moaning)I was much too far out all my lifeAnd not waving but drowning. Name and what you doHobby others don’t knowWaving or drowning in data?
3Whyunderstanddata?
4Sometimes we are like this
5… and other times like this!
MemorabilityAnchoring and adjustmentStatus quoSelf-servingNegative comparisonsFramingBIAS6
Sources of POWERLegitimate powerReferent powerExpert powerReward powerCoercive powerFrench and Raven (1959)“The bases of social power”See Wikipedia:“Power (philosophy)”7
PersuasionReciprocityConsistencySocial proofAuthorityLikingScarcityRobert Cialdini (2001)“Influence: Science and practice”See Wikipedia “Robert Cialdini”8
Busting jargon9
Steps …Specify problemPropose answersIdentify the right toolsObtain your dataVisualise itCrunch numbersInterpret, persuade, apply10
Today1. Visualising data2. Measuring middle3. Measuring spread4. Histograms5. Box plots6. Bell-shaped curves7. Exercises using R11Course1. Understanding data2. Monitoring processes3. Exploring relationships
121Visualising
Why visualise your data?For youFast understandingBuild solid ‘foundation’Flags problems13For othersEasier to followMemorableLess info overloadMore convincing
14GOODCHARTSBADCHARTS
Lots of charts …15
Bar charts16Most revenueis still from ads
Column charts17And that storygoes back a while
Line charts18Go Android!Source: StatCounterGlobalStats
Scatterplot19When NASDAQ So does Google!
GUI   Shoot Out20iOSWin Phone 7SymbianAndroidBlackBerryOSScore out of 10FormFunctionality
Bubble chart21
22
Pie chart23
Stacked column chart24Check outour profits!TaxGeneral AdminSales & MarketingR&DTraffic acquisition& other
Radar charts25
Compound charts (eg. Stock Chart)261. Visualising data with charts
PRESENTATIONS!5 second ruleSimplicityColourFont sizeHandoutsJargonPictures27
282Middle
Mode = 1Median = 3Mean = 4Trimmed Mean = 329
Weighted average30= 0.25 x 0.04 + 0.50 x -0.08  + 0.25 x 0.04 = 2%Not the mean which is ZERO!
313Spread
32
334Histograms
34LOOKShapePeakMulti-peakLumpyLong tail
FreedmanDiaconisEquationSuggests good bin widthVery robustUse common sense though!35IQR = interquartile range, n = data points
Double - PeakedBell - ShapedCombPlateauSkewedTruncatedEdge - PeakedIsolated - PeakedWhat is the story?36
HISTOGRAMSVisualise spreadEasy to interpretIndicates skewnessIndicates multiple peaks37BADGOODData points?Width of bins?Sample size?
385Box plots
3950454035302520OutliersUpper fenceHighest datainside fence1.5 × IQRQuartile 3BoxIQRMiddle 50%Mean99.3%if normallydistributedMedianQuartile 11.5 × IQRLowest datainside fenceLower fence
BOX PLOTSMean and medianSpreadSymmetryOutliers40BADGOODNot intuitiveNeed stats packageBad for presentations
Interpreting shape41Right-SkewedLeft-SkewedSymmetricQMedianQQMedianQQMedianQ131313***Mean
Side-by-side boxplotsIf boxes don’t overlap then difference between groups is ‘statistically significant’BUT THIS IS A PRETTY ROUGH TEST!42Boxes DON’Toverlap
Why?VisualiseMiddleSpreadHistogramsBoxplotsRecap43
446Bell-shapedcurves
WARNING!45
46NORMALdistribution68.2% chance95.4% chance99.7% chancestddevstddevBell shaped     mean=median=mode     symmetricalGoes from - to +             Area under curve = 1
Mean return 10%Std deviation 10%Likelihood of -50%47?
Different meansbut both still Normal48mean = 1mean = +1
Different standard deviationsbut both still Normal49Stddev = 1Stddev = 2
Calculating probability50stddev =1mean = 0X =How many stddev from the mean?This is called ZPut Z into spreadsheet= NORMSDIST ( 1 ) which gives 84%P.S. is mean is stddev
… is the same as …51stddev =1mean = 2X =How many stddev from the mean?This is called ZPut Z into spreadsheet= NORMSDIST ( 1 ) which gives 84%P.S. is mean is stddev
… or take a shortcut!52stddev =1mean = 2X =In OpenOfficeCalc= NORMSDIST ( 1 ) which gives 84%= NORMDIST ( 3 ; 2 ; 1; 1 ) also gives 84%P.S. is mean is stddevX
NEVERCALCULATE THE PROBABILITYOF ONE POINT53YOU CAN ONLY CALCULATE PROBABILITY OF A REGION
SPEED challenge54
WARNING!55
Double - PeakedBell - ShapedCombPlateauSkewedTruncatedEdge - PeakedIsolated - PeakedBut data might not be Normal56
Central Limit TheoremThe shape of the data doesn’t matter …   if you take a large enough sample ( > 30 )   the means of the sample   will follow a Normal distribution shape   around the mean of the underlying data57DEMO
Use modified Z-score formula for probabilitythat mean of a sampletakes on certain values58 is mean is stddevn is sample    sizeUse Z-score formulaif your data follows aNormal distribution
597Exercises
PROJECT60
Exercises in RTest DataExercise 4 Employee ExpensesExercise 5 Toll BoothExercise 6 Cycle WorldExercise 9 Chan’s LinenExercise 10 Stock returnsFor the brave!Exercise 11 Social Insight TestExercise 13 QualityExercise 16 Car production61
THANKSFeedback please!62

Understanding Data