SlideShare a Scribd company logo
ANALYTICS
AND WHERE IT FITS
A LITTLE ABOUT YOURS TRULY
MY NAME IS RUSSELL TIBBALLS
I HAVE BEEN WORKING WITH DATA, PROGRAMMING, AND PERFORMING VARIOUS LEVELS OF ANALYSIS FOR OVER 40 YEARS. I JOINED THE CUSTOMS
STATISTICS TEAM AFTER LEAVING HIGH SCHOOL IN NOVEMBER 1974.
I AM THE CANBERRA CHAIR OF IAPA
I ALSO CHAIR THE ADVISORY COMMITTEE TO THE SCIENCES AT USQ
I HAVE A MASTERS IN SOCIAL RESEARCH METHODS (ANU), FOCUSES ON SURVEY ANALYSIS, INTERNATIONAL MIGRATION, AND ANALYSIS OF WEB
PRESENCE USING SOCIAL NETWORK ANALYSIS (SNA).
ACS CERTIFIED PROFESSIONAL
TDWI CERTIFIED BUSINESS PROFESSIONAL – DATA ANALYSIS
SAS ADVANCED PROGRAMMER
ETC. ETC.
I MAJOR INTERESTS ARE:
• MY FAMILY AND WHAT IS HAPPENING ON MY ACREAGE AND THE MOLONGO RIVER (IT ADJOINS)
• ANY APPLICATIONS OF ANALYTICS IN THE HARD AND SOFT SCIENCES. YES I AM A TRAGIC WHO READS NATURE AND JSTOR PAPERS WHENEVER I
CAN.
• INTERNATIONAL MIGRATION
THE IMPRESSION
• THE IMPRESSION IS THAT ANYONE GIVEN ACCESS TO THE RIGHT INFORMATION CAN
ANALYZE AND COME UP WITH A SOLUTION FOR ANY PROBLEM IN MOMENTS.
• THE VIEW HAS BECOME INCREASING PERVASIVE. SEE – ‘ARE WE COOL YET?: A
LONGITUDINAL CONTENT ANALYSIS OF NERD AND GEEK REPRESENTATIONS IN
POPULAR TELEVISION’ (2012 – CARDIEL C L)
• HOLLYWOOD HAS MOVED FROM THE MAD SCIENTIST WHO CAN WHIP A WORLD
BEATING GADGET IN SECONDS (THINK DEXTERS LAB), TAKEN THE NERDY FRIEND OF
THE HERO FROM HACKER (MARKY MARK IN DATE NIGHT) TO THE ANALYST WHO
CAN LOG INTO THE INTERNET, FIND THE PETABYTES DATA YOU IN TO ANALYZE, TO
STOP THE END OF THE WORLD IN MOMENTS; OCCASIONALLY SECONDS.
THE MAD SCIENTIST
THE HACKER
THE ANALYST
ANALYST DO ANALYTICS
A FEW DEFINITIONS
• FROM EVAN STUBBS “THE VALUE OF BUSINESS ANALYTICS”
• ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT
• COMMON FORMS ARE:
• REPORTING – THE ORGANISATION OF HISTORICAL DATA
• TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA
• SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA
• PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL DATA
CONTINUING FROM EVAN STUBBS
• ALL APPLICATIONS OF ANALYTICS HAVE A NUMBER OF COMMON CHARACTERISATIONS:
• THEY ARE BASED ON DATA
• THEY APPLY VARIOUS MATHEMATICAL TECHNIQUES TO TRANSFORM AND SUMMARIZE THE RAW DATA
• THE ADD VALUE TO THE ORIGINAL DATA AND TRANSFORM IT INTO “KNOWLEDGE”
• ADVANCED ANALYTICS HOWEVER AIMS TO IDENTIFY:
• WHY THINGS ARE HAPPENING
• WHAT WILL HAPPEN NEXT
• WHAT IS THE POSSIBLE COURSE OF ACTION
• THE BUSINESS OUTCOME DRIVERS FOR THE USE OF “BUSINESS ANALYTICS” ARE:
• BUSINESS RELEVANCY
• ACTIONABLE INSIGHT
• PERFORMANCE MEASUREMENT AND VALUE MEASUREMENT
• KNOWLEDGE IS A FAMILIARITY, AWARENESS OR UNDERSTANDING OF SOMEONE OR SOMETHING, SUCH AS FACTS,
INFORMATION, DESCRIPTIONS, OR SKILLS, WHICH IS ACQUIRED THROUGH EXPERIENCE OR EDUCATION BY
PERCEIVING, DISCOVERING, OR LEARNING.
• IN GOVERNMENT AN AGENCY’S RELEVANCY IS MEASURED IN TERMS OF POLICY ALIGNMENT
THE PRIVATE SECTOR PERSPECTIVE
• TO INCREASE THE EFFICIENCY OF DELIVERY AND VALUE TO THE CUSTOMER
• DRIVERS BEING INCREASING MARKET SHARE AND PROFITS
• AND MOST IMPORTANTLY TO DELIVER BENEFIT TO THE SHAREHOLDER
• ANY PRIVATE OR PUBLIC ENTERPRISE HAS TWO MAIN DRIVERS:
• THE WISHES OF ITS OWNERS, SHAREHOLDERS, OR GOVERNMENT.
• THE ONGOING RELEVANCY OF THE ORGANIZATION
SOME EXAMPLES OF ANALYTICS.
SOME IS INTERESTING!
SOME NOT AS EXCITING BUT STILL
INTERESTING
THE HEDGEHOG AND THE FOX
THE HEDGEHOG AND THE FOX IS AN ESSAY BY PHILOSOPHER ISAIAH BERLIN. IT
WAS ONE OF BERLIN'S MOST POPULAR ESSAYS WITH THE GENERAL PUBLIC.
BERLIN EXPANDS UPON THIS IDEA TO DIVIDE WRITERS AND THINKERS INTO TWO
CATEGORIES: HEDGEHOGS, WHO VIEW THE WORLD THROUGH THE LENS OF A
SINGLE DEFINING IDEA, AND FOXES WHO DRAW ON A WIDE VARIETY OF
EXPERIENCES AND FOR WHOM THE WORLD CANNOT BE BOILED DOWN TO A
SINGLE IDEA.
IN HIS 2012 NEW YORK TIMES BEST-SELLING BOOK THE SIGNAL AND THE NOISE,
FORECASTER NATE SILVER URGES READERS TO BE "MORE FOXY" AFTER
SUMMARIZING BERLIN'S DISTINCTION.
A BRIEF DETOUR ON THE VENDOR VIEW OF A
DATA SCIENTIST/ANALYST (ADVANCED
ANALYTICS)
THE DATA ANALYST
THE UNICORN
DATA ANALYTICS
• ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT
• COMMON FORMS ARE:
• REPORTING – THE ORGANISATION OF HISTORICAL DATA
• TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA
• SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA
• PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL DATA
REPORTING
• VERB - MAKE A FORMAL STATEMENT OR COMPLAINT ABOUT (SOMEONE OR
SOMETHING) TO THE NECESSARY AUTHORITY.
• NOUN
- AN ACCOUNT GIVEN OF A PARTICULAR MATTER, ESPECIALLY IN THE
FORM OF AN OFFICIAL DOCUMENT, AFTER THOROUGH INVESTIGATION OR
CONSIDERATION BY AN APPOINTED PERSON OR BODY. E.G., "THE CHAIRMAN'S
ANNUAL REPORT”. A SPOKEN OR WRITTEN DESCRIPTION OF AN EVENT OR
SITUATION, ESPECIALLY ONE INTENDED FOR PUBLICATION OR BROADCASTING
IN THE MEDIA.
TREND ANALYSIS
• TREND ANALYSIS IS THE PRACTICE OF COLLECTING INFORMATION AND ATTEMPTING TO SPOT
A PATTERN, OR TREND, IN THE INFORMATION. IN SOME FIELDS OF STUDY, THE TERM "TREND
ANALYSIS" HAS MORE FORMALLY DEFINED MEANINGS.[1][2][3]
• ALTHOUGH TREND ANALYSIS IS OFTEN USED TO PREDICT FUTURE EVENTS, IT COULD BE USED
TO ESTIMATE UNCERTAIN EVENTS IN THE PAST, SUCH AS HOW MANY ANCIENT KINGS
PROBABLY RULED BETWEEN TWO DATES, BASED ON DATA SUCH AS THE AVERAGE YEARS
WHICH OTHER KNOWN KINGS REIGNED.
• IN STATISTICS, TREND ANALYSIS OFTEN REFERS TO TECHNIQUES FOR EXTRACTING AN
UNDERLYING PATTERN OF BEHAVIOR IN A TIME SERIES WHICH WOULD OTHERWISE BE PARTLY
OR NEARLY COMPLETELY HIDDEN BY NOISE. A SIMPLE DESCRIPTION OF THESE TECHNIQUES IS
TREND ESTIMATION, WHICH CAN BE UNDERTAKEN WITHIN A FORMAL REGRESSION ANALYSIS.
SEGMENTATION
• MARKET SEGMENTATION IS A MARKETING STRATEGY WHICH INVOLVES DIVIDING A
BROAD TARGET MARKET INTO SUBSETS OF CONSUMERS, BUSINESSES, OR COUNTRIES
THAT HAVE, OR ARE PERCEIVED TO HAVE, COMMON NEEDS, INTERESTS, AND
PRIORITIES, AND THEN DESIGNING AND IMPLEMENTING STRATEGIES TO TARGET
THEM. MARKET SEGMENTATION STRATEGIES ARE GENERALLY USED TO IDENTIFY
AND FURTHER DEFINE THE TARGET CUSTOMERS, AND PROVIDE SUPPORTING DATA
FOR MARKETING PLAN ELEMENTS SUCH AS POSITIONING TO ACHIEVE CERTAIN
MARKETING PLAN OBJECTIVES. BUSINESSES MAY DEVELOP PRODUCT
DIFFERENTIATION STRATEGIES, OR AN UNDIFFERENTIATED APPROACH, INVOLVING
SPECIFIC PRODUCTS OR PRODUCT LINES DEPENDING ON THE SPECIFIC DEMAND AND
ATTRIBUTES OF THE TARGET SEGMENT.
PREDICTIVE MODELLING
• PREDICTIVE MODELING USES STATISTICS TO PREDICT OUTCOMES.[1] MOST OFTEN THE EVENT ONE WANTS TO
PREDICT IS IN THE FUTURE, BUT PREDICTIVE MODELLING CAN BE APPLIED TO ANY TYPE OF UNKNOWN EVENT,
REGARDLESS OF WHEN IT OCCURRED. FOR EXAMPLE, PREDICTIVE MODELS ARE OFTEN USED TO DETECT CRIMES
AND IDENTIFY SUSPECTS, AFTER THE CRIME HAS TAKEN PLACE.[2]
• IN MANY CASES THE MODEL IS CHOSEN ON THE BASIS OF DETECTION THEORY TO TRY TO GUESS THE PROBABILITY
OF AN OUTCOME GIVEN A SET AMOUNT OF INPUT DATA, FOR EXAMPLE GIVEN AN EMAIL DETERMINING HOW LIKELY
THAT IT IS SPAM.
• MODELS CAN USE ONE OR MORE CLASSIFIERS IN TRYING TO DETERMINE THE PROBABILITY OF A SET OF DATA
BELONGING TO ANOTHER SET, SAY SPAM OR 'HAM'.
• DEPENDING ON DEFINITIONAL BOUNDARIES, PREDICTIVE MODELLING IS SYNONYMOUS WITH, OR LARGELY
OVERLAPPING WITH, THE FIELD OF MACHINE LEARNING, AS IT IS MORE COMMONLY REFERRED TO IN ACADEMIC OR
RESEARCH AND DEVELOPMENT CONTEXTS. WHEN DEPLOYED COMMERCIALLY, PREDICTIVE MODELLING IS OFTEN
REFERRED TO AS PREDICTIVE ANALYTICS.
WHERE DOES ANALYTICS FIT
ANYWHERE YOU NEED TO MAKE A
DECISION!
TODAY, STATISTICAL METHODS ARE
APPLIED IN ALL FIELDS THAT INVOLVE
DECISION MAKING, FOR MAKING
ACCURATE INFERENCES FROM A COLLATED
BODY OF DATA AND FOR MAKING
DECISIONS IN THE FACE OF UNCERTAINTY
BASED ON STATISTICAL METHODOLOGY.
ANALYTICS HAS MANY LEVELS OF
COMPLEXITY
ESTIMATION
THE USE OF STATISTICAL METHODS DATES BACK TO LEAST TO THE 5TH CENTURY BCE.
THE HISTORIAN THUCYDIDES IN HIS HISTORY OF THE PELOPONNESIAN WAR [2]
DESCRIBES HOW THE ATHENIANS CALCULATED THE HEIGHT OF THE WALL OF PLATEA
BY COUNTING THE NUMBER OF BRICKS IN AN UNPLASTERED SECTION OF THE WALL
SUFFICIENTLY NEAR THEM TO BE ABLE TO COUNT THEM. THE COUNT WAS REPEATED
SEVERAL TIMES BY A NUMBER OF SOLDIERS. THE MOST FREQUENT VALUE (IN MODERN
TERMINOLOGY - THE MODE ) SO DETERMINED WAS TAKEN TO BE THE MOST LIKELY
VALUE OF THE NUMBER OF BRICKS. MULTIPLYING THIS VALUE BY THE HEIGHT OF THE
BRICKS USED IN THE WALL ALLOWED THE ATHENIANS TO DETERMINE THE HEIGHT OF
THE LADDERS NECESSARY TO SCALE THE WALLS.
HOW TO MEASURE ANYTHING: FINDING THE VALUE OF INTANGIBLES BY DONALD W
HUBBARD
THE CENSUS
THE BIBLICAL STORY OF THE BIRTH OF JESUS WAS SET IN THE CONTEXT OF THE
CENSUS. IN 6 CE PUBLIUS SULPICIUS QUIRINIUS (51 BCE-21 CE), A DISTINGUISHED
SOLDIER AND FORMER CONSUL, WAS APPOINTED IMPERIAL LEGATE (GOVERNOR) OF
THE PROVINCE OF ROMAN SYRIA. IN THE SAME YEAR JUDEA WAS DECLARED A ROMAN
PROVINCE, AND QUIRINIUS WAS TASKED TO CARRY OUT A CENSUS OF THE NEW
TERRITORY FOR TAX PURPOSES.
’ IN THOSE DAYS A DECREE WENT OUT FROM EMPEROR AUGUSTUS THAT ALL THE
WORLD SHOULD BE REGISTERED. THIS WAS THE FIRST REGISTRATION AND WAS TAKEN
WHILE QUIRINIUS WAS GOVERNOR OF SYRIA. ALL WENT TO THEIR OWN TOWNS TO BE
REGISTERED. JOSEPH ALSO WENT FROM THE TOWN OF NAZARETH IN GALILEE TO
JUDEA, TO THE CITY OF DAVID CALLED BETHLEHEM, BECAUSE HE WAS DESCENDED
FROM THE HOUSE AND FAMILY OF DAVID. HE WENT TO BE REGISTERED WITH MARY,
TO WHOM HE WAS ENGAGED AND WHO WAS EXPECTING A CHILD. (LUKE 2:1–7)’
SAMPLING
THE TRIAL OF THE PYX IS A TEST OF THE PURITY OF THE COINAGE OF THE ROYAL
MINT WHICH HAS BEEN HELD ON A REGULAR BASIS SINCE THE 12TH CENTURY.
THE TRIAL ITSELF IS BASED ON STATISTICAL SAMPLING METHODS. AFTER MINTING
A SERIES OF COINS - ORIGINALLY FROM TEN POUNDS OF SILVER - A SINGLE COIN
WAS PLACED IN THE PYX - A BOX IN WESTMINSTER ABBEY. AFTER A GIVEN PERIOD
- NOW ONCE A YEAR - THE COINS ARE REMOVED AND WEIGHED. A SAMPLE OF
COINS REMOVED FROM THE BOX ARE THEN TESTED FOR PURITY.
THE MEAN AND MEDIAN
• THE ARITHMETIC MEAN, ALTHOUGH A CONCEPT KNOWN TO THE GREEKS, WAS
NOT GENERALIZED TO MORE THAN TWO VALUES UNTIL THE 16TH CENTURY.
THE INVENTION OF THE DECIMAL SYSTEM BY SIMON STEVIN IN 1585 SEEMS
LIKELY TO HAVE FACILITATED THESE CALCULATIONS. THIS METHOD WAS FIRST
ADOPTED IN ASTRONOMY BY TYCHO BRAHE WHO WAS ATTEMPTING TO REDUCE
THE ERRORS IN HIS ESTIMATES OF THE LOCATIONS OF VARIOUS CELESTIAL
BODIES.
• THE IDEA OF THE MEDIAN ORIGINATED IN EDWARD WRIGHT'S BOOK ON
NAVIGATION (CERTAINE ERRORS IN NAVIGATION) IN 1599 IN A SECTION
CONCERNING THE DETERMINATION OF LOCATION WITH A COMPASS. WRIGHT
FELT THAT THIS VALUE WAS THE MOST LIKELY TO BE THE CORRECT VALUE IN A
SERIES OF OBSERVATIONS.
DEMOGRAPHY
GAIN UNDERSTANDING COMPLEX SOCIAL PHENOMENA
THE BIRTH OF STATISTICS IS OFTEN DATED TO 1662, WHEN JOHN GRAUNT, ALONG
WITH WILLIAM PETTY, DEVELOPED EARLY HUMAN STATISTICAL AND CENSUS METHODS
THAT PROVIDED A FRAMEWORK FOR MODERN DEMOGRAPHY. HE PRODUCED THE
FIRST LIFE TABLE, GIVING PROBABILITIES OF SURVIVAL TO EACH AGE. HIS BOOK
NATURAL AND POLITICAL OBSERVATIONS MADE UPON THE BILLS OF MORTALITY USED
ANALYSIS OF THE MORTALITY ROLLS TO MAKE THE FIRST STATISTICALLY BASED
ESTIMATION OF THE POPULATION OF LONDON. HE KNEW THAT THERE WERE AROUND
13,000 FUNERALS PER YEAR IN LONDON AND THAT THREE PEOPLE DIED PER ELEVEN
FAMILIES PER YEAR. HE ESTIMATED FROM THE PARISH RECORDS THAT THE AVERAGE
FAMILY SIZE WAS 8 AND CALCULATED THAT THE POPULATION OF LONDON WAS
ABOUT 384,000.
IN 1802 LAPLACE ESTIMATED THE POPULATION OF FRANCE TO BE 28,328,612.[11] HE
CALCULATED THIS FIGURE USING THE NUMBER OF BIRTHS IN THE PREVIOUS YEAR AND
CENSUS DATA FOR THREE COMMUNITIES. THE CENSUS DATA OF THESE COMMUNITIES
SHOWED THAT THEY HAD 2,037,615 PERSONS AND THAT THE NUMBER OF BIRTHS
WERE 71,866. ASSUMING THAT THESE SAMPLES WERE REPRESENTATIVE OF FRANCE,
LAPLACE PRODUCED HIS ESTIMATE FOR THE ENTIRE POPULATION.
PREDICT ORBIT OF PLANETS
THE METHOD OF LEAST SQUARES, WHICH WAS USED TO MINIMIZE ERRORS IN DATA
MEASUREMENT, WAS PUBLISHED INDEPENDENTLY BY ADRIEN-MARIE LEGENDRE (1805),
ROBERT ADRAIN (1808), AND CARL FRIEDRICH GAUSS (1809). GAUSS HAD USED THE
METHOD IN HIS FAMOUS 1801 PREDICTION OF THE LOCATION OF THE DWARF PLANET
CERES. THE OBSERVATIONS THAT GAUSS BASED HIS CALCULATIONS ON WERE MADE BY
THE ITALIAN MONK PIAZZI.
A DETAILED ACCOUNT OF THE METHOD USED CAN BE FOUND AT
HTTP://SCIENCE.LAROUCHEPAC.COM/GAUSS/CERES/INTERIMII/ASTRONOMY/KEPLERP
ROBLEM.HTML
WISDOM OF CROWDS
FRANCIS GALTON IS CREDITED AS ONE OF THE PRINCIPAL FOUNDERS OF
STATISTICAL THEORY. HIS CONTRIBUTIONS TO THE FIELD INCLUDED
INTRODUCING THE CONCEPTS OF STANDARD DEVIATION, CORRELATION,
REGRESSION AND THE APPLICATION OF THESE METHODS TO THE STUDY OF THE
VARIETY OF HUMAN CHARACTERISTICS - HEIGHT, WEIGHT, EYELASH LENGTH
AMONG OTHERS. HE FOUND THAT MANY OF THESE COULD BE FITTED TO A
NORMAL CURVE DISTRIBUTION.[19]
GALTON SUBMITTED A PAPER TO NATURE IN 1907 ON THE USEFULNESS OF THE
MEDIAN.[20] HE EXAMINED THE ACCURACY OF 787 GUESSES OF THE WEIGHT OF AN
OX AT A COUNTRY FAIR. THE ACTUAL WEIGHT WAS 1208 POUNDS: THE MEDIAN
GUESS WAS 1198. THE GUESSES WERE MARKEDLY NON-NORMALLY DISTRIBUTED.
AGRICULTURE
THE SECOND WAVE OF MATHEMATICAL STATISTICS WAS PIONEERED BY RONALD
FISHER WHO WROTE TWO TEXTBOOKS, STATISTICAL METHODS FOR RESEARCH
WORKERS, PUBLISHED IN 1925 AND THE DESIGN OF EXPERIMENTS IN 1935, THAT
WERE TO DEFINE THE ACADEMIC DISCIPLINE IN UNIVERSITIES AROUND THE
WORLD. HE ALSO SYSTEMATIZED PREVIOUS RESULTS, PUTTING THEM ON A FIRM
MATHEMATICAL FOOTING. IN HIS 1918 SEMINAL PAPER THE CORRELATION
BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE, THE
FIRST USE TO USE THE STATISTICAL TERM, VARIANCE. IN 1919, AT ROTHAMSTED
EXPERIMENTAL STATION HE STARTED A MAJOR STUDY OF THE EXTENSIVE
COLLECTIONS OF DATA RECORDED OVER MANY YEARS. THIS RESULTED IN A
SERIES OF REPORTS UNDER THE GENERAL TITLE STUDIES IN CROP VARIATION. IN
1930 HE PUBLISHED THE GENETICAL THEORY OF NATURAL SELECTION WHERE HE
APPLIED STATISTICS TO EVOLUTION.
MEDICINE, RELIABILITY, AND JURISPRUDENCE
• THE TERM BAYESIAN REFERS TO THOMAS BAYES (1702–1761), WHO PROVED A SPECIAL CASE
OF WHAT IS NOW CALLED BAYES' THEOREM. HOWEVER IT WAS PIERRE-SIMON LAPLACE (1749–
1827) WHO INTRODUCED A GENERAL VERSION OF THE THEOREM AND APPLIED IT TO
CELESTIAL MECHANICS, MEDICAL STATISTICS, RELIABILITY, AND JURISPRUDENCE.[52].
• AN INTERESTING READ - HTTP://BLOGS.SCIENTIFICAMERICAN.COM/CROSS-CHECK/ARE-
BRAINS-BAYESIAN/
A QUICK HISTORY
Time Contributor Contribution
Ancient Greece
Philosophe
rs
Ideas - no quantitative analyses
17th Century
Graunt,
Petty
Pascal,
Bernoulli
studied affairs of state, vital statistics of populations
studied probability through games of chance, gambling
18th Century
Laplace,
Gauss
normal curve, regression through study of astronomy
19th Century
Quetelet
Galton
astronomer who first applied statistical analyses to human biology
studied genetic variation in humans(used regression and correlation)
20th Century
(early)
Pearson
Gossett
(Student)
Fisher
studied natural selection using correlation, formed first academic department of statistics, Biometrika journal, helped develop the Chi Square analysis
studied process of brewing, alerted the statistics community about problems with small sample sizes, developed Student's test
evolutionary biologists - developed ANOVA, stressed the importance of experimental design
20th Century
(later)
Wilcoxon
Kruskal,
Wallis
Spearman
Kendall
Tukey
Dunnett
Keuls
Computer
Technology
biochemist studied pesticides, non-parametric equivalent of two-samples test
economists who developed the non-parametric equivalent of the ANOVA
psychologist who developed a non-parametric equivalent of the correlation coefficient
statistician who developed another non-parametric equivalent the correlation coefficient
statistician who developed multiple comparisons procedure
biochemist who studied pesticides, developed multiple comparisons procedure for control groups
agronomist who developed multiple comparisons procedure
provided many advantages over calculations by hand or by calculator, stimulated the growth of investigation into new techniques
http://www.anselm.edu/homepage/jpitocch/biostatstime.html
SO HOW DO YOU DECIDE OF WHICH
ANALYTIC TOOLS TO USE?
WELL ACTUALLY THAT IS THE WRONG QUESTION?
PROCESS IS MUCH MORE IMPORTANT THAN THE TOOLS. THE TOOL/S SHOULD
SUPPORT THE PROCESS
TO GAIN BUSINESS UNDERSTANDING/SCOPE
AND PLANNING/ PLAN
THERE IS NO TOOL FOR THIS:
YOU NEED TO RESEARCH:
• UNDERSTAND THE CONTEXT OF YOUR INVESTIGATION
• UNDERSTAND WHAT IS IMPORTANT TO THE BUSINESS/AGENCY/ORG.
• WHAT HAS GONE BEFORE
• WHAT MIGHT BE DONE DIFFERENTLY
• WAS THE INFORMATION YOU HAD ACCESS TO VALID INPUT?
DATA COLLECTION
DATA COLLECTION CONTINUED
DATA UNDERSTANDING/DISCOVERY
THERE ARE SEVERAL TOOLS THAT CAN HELP YOU HERE:
MOST SITES THESE DAYS HAVE REPORTS AND BUSINESS INTELLIGENCE DASHBOARDS THAT WILL
GIVE YOU AN INSIGHT INTO HOW A BUSINESS/AGENCY/ORG SEES ITSELF. GAIN AS MUCH INSIGHT
AS YOU CAN FROM THESE EXISTING PRODUCTS. DON’T ACCEPT THAT THEY ARE THE FULL STORY
– THEY NOT.
EXCEL: USE PIVOT, AND CHARTING TO GAIN A BASIC UNDERSTANDING.
OTHER COMMON TOOLS ARE:
• SAS/VA
• TABLEAU
• QLIK
• SPSS
• SQL
• STATISTICA
• MATLAB
• ETC
MODELLING
USE THE APPROPRIATE TOOL FOR YOUR
INVESTIGATION.
• USE THE APPROPRIATE DATA
• USE AN APPROPRIATE METHOD
• ITERATE AND CHECK THAT YOUR RESULTS MAKE SENSE IN THE CONTEXT OF THE COLLECTION, AND
THE QUESTION YOU ARE LOOKING TO ANSWER
• SAYING – TO A CARPENTER THE SOLUTION TO EVERYTHING LOOKS LIKE A NAIL.
• ALL ANALYSTS HAVE THEIR BENT TOWARDS PARTICULAR TOOLS – MINE BENT IS TOWARD THE
MODELING TECHNIQUES USED IN THE SOCIAL SCIENCES BECAUSE THAT IS WHAT I STUDIED. BE
AWARE OF THE LIMITS OF YOUR FAVOURITE TOOLS AND BE WILLING TO LEARN NEW TRICKS.
EVALUATING/CHECK/VALIDATION
HAVE A PROCESS AND STANDARDS FOR YOUR ENVIRONMENT THAT LAYS OUT THE
RULES FOR EVALUATING YOUR MODEL. THE STANDARD WILL DEPEND ON THE TOOLS
THAT YOU USE. MOST TOOLS SUCH AS CORRELATION, ANOVA, REGRESSION, ETC.;
HAVE WELL UNDERSTOOD METHODS OF EVALUATION. HOWEVER CHECK THE WHOLE
PROCESS AND IF YOU WANT TO USE THIS MODEL HAVE YOUR TEAM REVIEW AS WELL.
WE ALL LIKE TO THINK WE NEVER MAKE MISTAKES; UNFORTUNATELY THAT IS NEVER
TRUE.
MAKE SURE THAT THE PROCESS INCLUDES SOME SANITY CHECK METHODS. IE THAT
THE NUMBER OF ROWS/OBSERVATIONS THAT WERE READ IS WHAT YOU EXPECTED.
DEPLOYMENT/ACT
AFTER EVALUATING AND VALIDATING YOUR MODEL IT IS OFTEN ’DEPLOYED’ TO
OPERATIONAL SYSTEMS AND REPORTS.
SCORING: OFTEN THE OUTPUT OF THE MODEL WILL BE SCORE THAT USED AS INPUT TO
OPERATIONAL SYSTEMS. EG, ESTIMATES FINANCIAL RISK, TRAVEL TIME, FUEL
CONSUMPTION, RESOURCE REQUIREMENT, AND MEDICAL OUTCOMES.
PARAMETER TO REPORTING: INTEGRATION INTO BUSINESS INTELLIGENCE
DASHBOARDS, AND REGULAR MANAGEMENT INFORMATION SYSTEM REPORTS.
OPERATIONAL SYSTEMS: MODELS PROVIDE INPUT TO ALL MANNER OF OPERATIONAL
SYSTEMS RANGING FROM PRODUCTION CONTROL PROCESSING, LOGISTICS, FRAUD
DETECTION, SYSTEMS MANAGEMENT, AND TRAFFIC CONTROL.
BUSINESS UNDERSTANDING/REPORT
ALL ANALYTICS IS UNDERTAKEN WITHIN A GIVEN CONTEXT. IN A RESEARCH
CONTEXT A PAPER WILL BE THE OUTCOME WITH AN ABSTRACT, BACKGROUND,
METHODS, RESULTS, CONCLUSION. IN A COMMERCIAL SETTING ANY FINDING (IN
MY LIMITED EXPERIENCE) ARE REPORTED IN A VERY SIMILAR MANNER.
REGARDLESS THE OUTCOMES OF THE ANALYTICS PROCESS SHOULD BE
DOCUMENTED AND ADDED TO THE COLLECTIVE STORE OF BUSINESS KNOWLEDGE
AT YOUR SITE.
PS DON’T OVER COMPLICATE THINGS
MONITOR/REVIEW/REPEAT
KNOWLEDGE IS NOT STATIC. THERE ARE THE THINGS YOU KNOW ARE GOING TO
HAPPEN AND
THERE ARE NEW FACTORS THAT YOU WILL NOT HAVE THOUGHT OF.
BOX ‘FOR SUCH A MODEL THERE IS NO NEED TO ASK THE QUESTION "IS THE MODEL
TRUE?". IF "TRUTH" IS TO BE THE "WHOLE TRUTH" THE ANSWER MUST BE "NO". THE
ONLY QUESTION OF INTEREST IS "IS THE MODEL ILLUMINATING AND USEFUL?”’
IN SHORT ‘ALL MODELS ARE WRONG, SOME ARE USEFUL'
MONITOR: COMPARE THE ACTUAL PERFORMANCE OF THE MODELS YOU PRODUCE
AGAINST EXPECTED/PLANNED PERFORMANCE. BE PREPARED TO PROCEED WITH A
PROCESS OF CONTINUAL IMPROVEMENT.
CONCLUSION
WHAT IS ANALYTICS - ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT
WHERE DOES IT(ANALYTICS) FIT – ANYWHERE WE YOU NEED TO MAKE A DECISION
A FEW ANALYTICS TOOLS
• THE 40 DATA SCIENCE
TECHNIQUES
1 LINEAR REGRESSION
2 LOGISTIC REGRESSION
3 JACKKNIFE REGRESSION *
4 DENSITY ESTIMATION
5 CONFIDENCE INTERVAL
6 TEST OF HYPOTHESES
7 PATTERN RECOGNITION
8 CLUSTERING - (AKA
UNSUPERVISED LEARNING)
9 SUPERVISED LEARNING
1 TIME SERIES
1 DECISION TREES
1 RANDOM NUMBERS
1 MONTE-CARLO SIMULATION
1 BAYESIAN STATISTICS
1 NAIVE BAYES
1Principal Component Analysis - (PCA)
1Ensembles
1Neural Networks
1Support Vector Machine - (SVM)
2Nearest Neighbors - (k-NN)
2Feature Selection - (aka Variable
Reduction)
2Indexation / Cataloguing *
2(Geo-) Spatial Modelling
2Recommendation Engine *
2Search Engine *
2Attribution Modelling *
2Collaborative Filtering *
2Rule System
2Linkage Analysis
3Association Rules
3Scoring Engine
3Segmentation
3Predictive Modelling
3Graphs
3Deep Learning
3Game Theory
3Imputation
3Survival Analysis
3Arbitrage
4Lift Modelling
4Yield Optimization
4Cross-Validation
4Model Fitting
SOME THINGS TO CHECK OUT
INFORMATIVE
HTTP://WWW.KDNUGGETS.COM
HTTP://WWW.PREDICTIVEANALYTICSTODAY.COM/DEPLOYMENT-PREDICTIVE-MODELS
BOOKS
HTTP://SHOP.OREILLY.COM/CATEGORY/EBOOKS.DO
COOL
HTTPS://RAPIDMINER.COM
XPATH CAPABILITIES FOR WEB SCRAPING USING GOOGLE DOCS
HTTP://NODEXL.CODEPLEX.COM
HTTPS://D3JS.ORG
HTTP://WWW.FACULTY.UCR.EDU/~HANNEMAN/NETTEXT/ (SOCIAL NETWORK ANALYSI)
HTTPS://WWW.KAGGLE.COM
EDUCATION – CHEAP AND AT WHATEVER PACE YOU WANT TO TAKE
HTTPS://WWW.UDEMY.COM
ACADEMIC EDUCATION
IN CANBERRA BOTH THE ANU AND CU HAVE GOOD COURSES
AND
USQ HAS EXCELLENT COURSES AS – SO DO A LOT OF OTHERS
I WAS ASKED WHO TO FOLLOW ON TWITTER, FOLLOW TRY JUST SEARCH FOR DATA SCIENCE, AND ANALYTICS
AND CHOOSE WHO TO FOLLOW.. ALSO FOLLOW THE JOURNALS, NATURE, AND OTHERS.

More Related Content

Similar to Analytics and Where it Fits - ACS DAMA SIG

REPORT-SURVEY-12-ABM1.pptx
REPORT-SURVEY-12-ABM1.pptxREPORT-SURVEY-12-ABM1.pptx
REPORT-SURVEY-12-ABM1.pptx
MayPearlPascualNual
 
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Fully Exploiting Qualitative and Mixed Methods Data from Online SurveysFully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Shalin Hai-Jew
 
English Project Design
English Project DesignEnglish Project Design
English Project Design
Orlando Castro
 
Practical Research (Introduction to Research)
Practical Research (Introduction to Research)Practical Research (Introduction to Research)
Practical Research (Introduction to Research)
jamaltasarra21
 
research process Presentation .pptx
research process Presentation .pptxresearch process Presentation .pptx
research process Presentation .pptx
Jismi John
 
Lecture 1 ba 1 overview
Lecture 1 ba 1 overviewLecture 1 ba 1 overview
Lecture 1 ba 1 overview
yogesh ingle
 
Cultural Contradictions of Scanning in an Evidence-based Policy Environment
Cultural Contradictions of Scanning in an Evidence-based Policy EnvironmentCultural Contradictions of Scanning in an Evidence-based Policy Environment
Cultural Contradictions of Scanning in an Evidence-based Policy Environment
Wendy Schultz
 
Unit 001Stats (1).pdf
Unit 001Stats (1).pdfUnit 001Stats (1).pdf
Unit 001Stats (1).pdf
SubratSingh23
 
Creating compelling manuscripts.pptx
Creating compelling manuscripts.pptxCreating compelling manuscripts.pptx
Creating compelling manuscripts.pptx
Pradeep513562
 
Qualitative and quantitative methods of research
Qualitative and quantitative methods of researchQualitative and quantitative methods of research
Qualitative and quantitative methods of research
Muhammad Musawar Ali
 
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...
jmkurtz
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)
Merce Crosas
 
Presentation (5).pptx
Presentation (5).pptxPresentation (5).pptx
Presentation (5).pptx
Lyka Gumatay
 
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGE
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGEINDUSTRIALISATION AND GLOBAL CLIMATE CHANGE
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGE
Ian De Mellow
 
Gis
GisGis
Technical report writing – best practice writing principles
Technical report writing – best practice writing principlesTechnical report writing – best practice writing principles
Technical report writing – best practice writing principles
Charles Cotter, PhD
 
Scanning to Manage Disruption and Controversy PACITA 2015
Scanning to Manage Disruption and Controversy PACITA 2015Scanning to Manage Disruption and Controversy PACITA 2015
Scanning to Manage Disruption and Controversy PACITA 2015
Wendy Schultz
 
21st Century Reading: Text and Data Mining Skills for Scotlanf
21st Century Reading: Text and Data Mining Skills for Scotlanf21st Century Reading: Text and Data Mining Skills for Scotlanf
21st Century Reading: Text and Data Mining Skills for Scotlanf
CILIPScotland
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
MoonWeryah
 
importance of research.pptx
importance of research.pptximportance of research.pptx
importance of research.pptx
raffynobleza
 

Similar to Analytics and Where it Fits - ACS DAMA SIG (20)

REPORT-SURVEY-12-ABM1.pptx
REPORT-SURVEY-12-ABM1.pptxREPORT-SURVEY-12-ABM1.pptx
REPORT-SURVEY-12-ABM1.pptx
 
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Fully Exploiting Qualitative and Mixed Methods Data from Online SurveysFully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
 
English Project Design
English Project DesignEnglish Project Design
English Project Design
 
Practical Research (Introduction to Research)
Practical Research (Introduction to Research)Practical Research (Introduction to Research)
Practical Research (Introduction to Research)
 
research process Presentation .pptx
research process Presentation .pptxresearch process Presentation .pptx
research process Presentation .pptx
 
Lecture 1 ba 1 overview
Lecture 1 ba 1 overviewLecture 1 ba 1 overview
Lecture 1 ba 1 overview
 
Cultural Contradictions of Scanning in an Evidence-based Policy Environment
Cultural Contradictions of Scanning in an Evidence-based Policy EnvironmentCultural Contradictions of Scanning in an Evidence-based Policy Environment
Cultural Contradictions of Scanning in an Evidence-based Policy Environment
 
Unit 001Stats (1).pdf
Unit 001Stats (1).pdfUnit 001Stats (1).pdf
Unit 001Stats (1).pdf
 
Creating compelling manuscripts.pptx
Creating compelling manuscripts.pptxCreating compelling manuscripts.pptx
Creating compelling manuscripts.pptx
 
Qualitative and quantitative methods of research
Qualitative and quantitative methods of researchQualitative and quantitative methods of research
Qualitative and quantitative methods of research
 
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)
 
Presentation (5).pptx
Presentation (5).pptxPresentation (5).pptx
Presentation (5).pptx
 
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGE
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGEINDUSTRIALISATION AND GLOBAL CLIMATE CHANGE
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGE
 
Gis
GisGis
Gis
 
Technical report writing – best practice writing principles
Technical report writing – best practice writing principlesTechnical report writing – best practice writing principles
Technical report writing – best practice writing principles
 
Scanning to Manage Disruption and Controversy PACITA 2015
Scanning to Manage Disruption and Controversy PACITA 2015Scanning to Manage Disruption and Controversy PACITA 2015
Scanning to Manage Disruption and Controversy PACITA 2015
 
21st Century Reading: Text and Data Mining Skills for Scotlanf
21st Century Reading: Text and Data Mining Skills for Scotlanf21st Century Reading: Text and Data Mining Skills for Scotlanf
21st Century Reading: Text and Data Mining Skills for Scotlanf
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
importance of research.pptx
importance of research.pptximportance of research.pptx
importance of research.pptx
 

Recently uploaded

06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
zoykygu
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 

Recently uploaded (20)

06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 

Analytics and Where it Fits - ACS DAMA SIG

  • 2. A LITTLE ABOUT YOURS TRULY MY NAME IS RUSSELL TIBBALLS I HAVE BEEN WORKING WITH DATA, PROGRAMMING, AND PERFORMING VARIOUS LEVELS OF ANALYSIS FOR OVER 40 YEARS. I JOINED THE CUSTOMS STATISTICS TEAM AFTER LEAVING HIGH SCHOOL IN NOVEMBER 1974. I AM THE CANBERRA CHAIR OF IAPA I ALSO CHAIR THE ADVISORY COMMITTEE TO THE SCIENCES AT USQ I HAVE A MASTERS IN SOCIAL RESEARCH METHODS (ANU), FOCUSES ON SURVEY ANALYSIS, INTERNATIONAL MIGRATION, AND ANALYSIS OF WEB PRESENCE USING SOCIAL NETWORK ANALYSIS (SNA). ACS CERTIFIED PROFESSIONAL TDWI CERTIFIED BUSINESS PROFESSIONAL – DATA ANALYSIS SAS ADVANCED PROGRAMMER ETC. ETC. I MAJOR INTERESTS ARE: • MY FAMILY AND WHAT IS HAPPENING ON MY ACREAGE AND THE MOLONGO RIVER (IT ADJOINS) • ANY APPLICATIONS OF ANALYTICS IN THE HARD AND SOFT SCIENCES. YES I AM A TRAGIC WHO READS NATURE AND JSTOR PAPERS WHENEVER I CAN. • INTERNATIONAL MIGRATION
  • 3. THE IMPRESSION • THE IMPRESSION IS THAT ANYONE GIVEN ACCESS TO THE RIGHT INFORMATION CAN ANALYZE AND COME UP WITH A SOLUTION FOR ANY PROBLEM IN MOMENTS. • THE VIEW HAS BECOME INCREASING PERVASIVE. SEE – ‘ARE WE COOL YET?: A LONGITUDINAL CONTENT ANALYSIS OF NERD AND GEEK REPRESENTATIONS IN POPULAR TELEVISION’ (2012 – CARDIEL C L) • HOLLYWOOD HAS MOVED FROM THE MAD SCIENTIST WHO CAN WHIP A WORLD BEATING GADGET IN SECONDS (THINK DEXTERS LAB), TAKEN THE NERDY FRIEND OF THE HERO FROM HACKER (MARKY MARK IN DATE NIGHT) TO THE ANALYST WHO CAN LOG INTO THE INTERNET, FIND THE PETABYTES DATA YOU IN TO ANALYZE, TO STOP THE END OF THE WORLD IN MOMENTS; OCCASIONALLY SECONDS.
  • 8. A FEW DEFINITIONS • FROM EVAN STUBBS “THE VALUE OF BUSINESS ANALYTICS” • ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT • COMMON FORMS ARE: • REPORTING – THE ORGANISATION OF HISTORICAL DATA • TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA • SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA • PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL DATA
  • 9. CONTINUING FROM EVAN STUBBS • ALL APPLICATIONS OF ANALYTICS HAVE A NUMBER OF COMMON CHARACTERISATIONS: • THEY ARE BASED ON DATA • THEY APPLY VARIOUS MATHEMATICAL TECHNIQUES TO TRANSFORM AND SUMMARIZE THE RAW DATA • THE ADD VALUE TO THE ORIGINAL DATA AND TRANSFORM IT INTO “KNOWLEDGE” • ADVANCED ANALYTICS HOWEVER AIMS TO IDENTIFY: • WHY THINGS ARE HAPPENING • WHAT WILL HAPPEN NEXT • WHAT IS THE POSSIBLE COURSE OF ACTION • THE BUSINESS OUTCOME DRIVERS FOR THE USE OF “BUSINESS ANALYTICS” ARE: • BUSINESS RELEVANCY • ACTIONABLE INSIGHT • PERFORMANCE MEASUREMENT AND VALUE MEASUREMENT • KNOWLEDGE IS A FAMILIARITY, AWARENESS OR UNDERSTANDING OF SOMEONE OR SOMETHING, SUCH AS FACTS, INFORMATION, DESCRIPTIONS, OR SKILLS, WHICH IS ACQUIRED THROUGH EXPERIENCE OR EDUCATION BY PERCEIVING, DISCOVERING, OR LEARNING. • IN GOVERNMENT AN AGENCY’S RELEVANCY IS MEASURED IN TERMS OF POLICY ALIGNMENT
  • 10. THE PRIVATE SECTOR PERSPECTIVE • TO INCREASE THE EFFICIENCY OF DELIVERY AND VALUE TO THE CUSTOMER • DRIVERS BEING INCREASING MARKET SHARE AND PROFITS • AND MOST IMPORTANTLY TO DELIVER BENEFIT TO THE SHAREHOLDER • ANY PRIVATE OR PUBLIC ENTERPRISE HAS TWO MAIN DRIVERS: • THE WISHES OF ITS OWNERS, SHAREHOLDERS, OR GOVERNMENT. • THE ONGOING RELEVANCY OF THE ORGANIZATION
  • 11. SOME EXAMPLES OF ANALYTICS. SOME IS INTERESTING!
  • 12. SOME NOT AS EXCITING BUT STILL INTERESTING
  • 13. THE HEDGEHOG AND THE FOX THE HEDGEHOG AND THE FOX IS AN ESSAY BY PHILOSOPHER ISAIAH BERLIN. IT WAS ONE OF BERLIN'S MOST POPULAR ESSAYS WITH THE GENERAL PUBLIC. BERLIN EXPANDS UPON THIS IDEA TO DIVIDE WRITERS AND THINKERS INTO TWO CATEGORIES: HEDGEHOGS, WHO VIEW THE WORLD THROUGH THE LENS OF A SINGLE DEFINING IDEA, AND FOXES WHO DRAW ON A WIDE VARIETY OF EXPERIENCES AND FOR WHOM THE WORLD CANNOT BE BOILED DOWN TO A SINGLE IDEA. IN HIS 2012 NEW YORK TIMES BEST-SELLING BOOK THE SIGNAL AND THE NOISE, FORECASTER NATE SILVER URGES READERS TO BE "MORE FOXY" AFTER SUMMARIZING BERLIN'S DISTINCTION.
  • 14. A BRIEF DETOUR ON THE VENDOR VIEW OF A DATA SCIENTIST/ANALYST (ADVANCED ANALYTICS)
  • 17. DATA ANALYTICS • ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT • COMMON FORMS ARE: • REPORTING – THE ORGANISATION OF HISTORICAL DATA • TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA • SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA • PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL DATA
  • 18. REPORTING • VERB - MAKE A FORMAL STATEMENT OR COMPLAINT ABOUT (SOMEONE OR SOMETHING) TO THE NECESSARY AUTHORITY. • NOUN
- AN ACCOUNT GIVEN OF A PARTICULAR MATTER, ESPECIALLY IN THE FORM OF AN OFFICIAL DOCUMENT, AFTER THOROUGH INVESTIGATION OR CONSIDERATION BY AN APPOINTED PERSON OR BODY. E.G., "THE CHAIRMAN'S ANNUAL REPORT”. A SPOKEN OR WRITTEN DESCRIPTION OF AN EVENT OR SITUATION, ESPECIALLY ONE INTENDED FOR PUBLICATION OR BROADCASTING IN THE MEDIA.
  • 19. TREND ANALYSIS • TREND ANALYSIS IS THE PRACTICE OF COLLECTING INFORMATION AND ATTEMPTING TO SPOT A PATTERN, OR TREND, IN THE INFORMATION. IN SOME FIELDS OF STUDY, THE TERM "TREND ANALYSIS" HAS MORE FORMALLY DEFINED MEANINGS.[1][2][3] • ALTHOUGH TREND ANALYSIS IS OFTEN USED TO PREDICT FUTURE EVENTS, IT COULD BE USED TO ESTIMATE UNCERTAIN EVENTS IN THE PAST, SUCH AS HOW MANY ANCIENT KINGS PROBABLY RULED BETWEEN TWO DATES, BASED ON DATA SUCH AS THE AVERAGE YEARS WHICH OTHER KNOWN KINGS REIGNED. • IN STATISTICS, TREND ANALYSIS OFTEN REFERS TO TECHNIQUES FOR EXTRACTING AN UNDERLYING PATTERN OF BEHAVIOR IN A TIME SERIES WHICH WOULD OTHERWISE BE PARTLY OR NEARLY COMPLETELY HIDDEN BY NOISE. A SIMPLE DESCRIPTION OF THESE TECHNIQUES IS TREND ESTIMATION, WHICH CAN BE UNDERTAKEN WITHIN A FORMAL REGRESSION ANALYSIS.
  • 20. SEGMENTATION • MARKET SEGMENTATION IS A MARKETING STRATEGY WHICH INVOLVES DIVIDING A BROAD TARGET MARKET INTO SUBSETS OF CONSUMERS, BUSINESSES, OR COUNTRIES THAT HAVE, OR ARE PERCEIVED TO HAVE, COMMON NEEDS, INTERESTS, AND PRIORITIES, AND THEN DESIGNING AND IMPLEMENTING STRATEGIES TO TARGET THEM. MARKET SEGMENTATION STRATEGIES ARE GENERALLY USED TO IDENTIFY AND FURTHER DEFINE THE TARGET CUSTOMERS, AND PROVIDE SUPPORTING DATA FOR MARKETING PLAN ELEMENTS SUCH AS POSITIONING TO ACHIEVE CERTAIN MARKETING PLAN OBJECTIVES. BUSINESSES MAY DEVELOP PRODUCT DIFFERENTIATION STRATEGIES, OR AN UNDIFFERENTIATED APPROACH, INVOLVING SPECIFIC PRODUCTS OR PRODUCT LINES DEPENDING ON THE SPECIFIC DEMAND AND ATTRIBUTES OF THE TARGET SEGMENT.
  • 21. PREDICTIVE MODELLING • PREDICTIVE MODELING USES STATISTICS TO PREDICT OUTCOMES.[1] MOST OFTEN THE EVENT ONE WANTS TO PREDICT IS IN THE FUTURE, BUT PREDICTIVE MODELLING CAN BE APPLIED TO ANY TYPE OF UNKNOWN EVENT, REGARDLESS OF WHEN IT OCCURRED. FOR EXAMPLE, PREDICTIVE MODELS ARE OFTEN USED TO DETECT CRIMES AND IDENTIFY SUSPECTS, AFTER THE CRIME HAS TAKEN PLACE.[2] • IN MANY CASES THE MODEL IS CHOSEN ON THE BASIS OF DETECTION THEORY TO TRY TO GUESS THE PROBABILITY OF AN OUTCOME GIVEN A SET AMOUNT OF INPUT DATA, FOR EXAMPLE GIVEN AN EMAIL DETERMINING HOW LIKELY THAT IT IS SPAM. • MODELS CAN USE ONE OR MORE CLASSIFIERS IN TRYING TO DETERMINE THE PROBABILITY OF A SET OF DATA BELONGING TO ANOTHER SET, SAY SPAM OR 'HAM'. • DEPENDING ON DEFINITIONAL BOUNDARIES, PREDICTIVE MODELLING IS SYNONYMOUS WITH, OR LARGELY OVERLAPPING WITH, THE FIELD OF MACHINE LEARNING, AS IT IS MORE COMMONLY REFERRED TO IN ACADEMIC OR RESEARCH AND DEVELOPMENT CONTEXTS. WHEN DEPLOYED COMMERCIALLY, PREDICTIVE MODELLING IS OFTEN REFERRED TO AS PREDICTIVE ANALYTICS.
  • 22. WHERE DOES ANALYTICS FIT ANYWHERE YOU NEED TO MAKE A DECISION! TODAY, STATISTICAL METHODS ARE APPLIED IN ALL FIELDS THAT INVOLVE DECISION MAKING, FOR MAKING ACCURATE INFERENCES FROM A COLLATED BODY OF DATA AND FOR MAKING DECISIONS IN THE FACE OF UNCERTAINTY BASED ON STATISTICAL METHODOLOGY.
  • 23. ANALYTICS HAS MANY LEVELS OF COMPLEXITY ESTIMATION THE USE OF STATISTICAL METHODS DATES BACK TO LEAST TO THE 5TH CENTURY BCE. THE HISTORIAN THUCYDIDES IN HIS HISTORY OF THE PELOPONNESIAN WAR [2] DESCRIBES HOW THE ATHENIANS CALCULATED THE HEIGHT OF THE WALL OF PLATEA BY COUNTING THE NUMBER OF BRICKS IN AN UNPLASTERED SECTION OF THE WALL SUFFICIENTLY NEAR THEM TO BE ABLE TO COUNT THEM. THE COUNT WAS REPEATED SEVERAL TIMES BY A NUMBER OF SOLDIERS. THE MOST FREQUENT VALUE (IN MODERN TERMINOLOGY - THE MODE ) SO DETERMINED WAS TAKEN TO BE THE MOST LIKELY VALUE OF THE NUMBER OF BRICKS. MULTIPLYING THIS VALUE BY THE HEIGHT OF THE BRICKS USED IN THE WALL ALLOWED THE ATHENIANS TO DETERMINE THE HEIGHT OF THE LADDERS NECESSARY TO SCALE THE WALLS. HOW TO MEASURE ANYTHING: FINDING THE VALUE OF INTANGIBLES BY DONALD W HUBBARD
  • 24. THE CENSUS THE BIBLICAL STORY OF THE BIRTH OF JESUS WAS SET IN THE CONTEXT OF THE CENSUS. IN 6 CE PUBLIUS SULPICIUS QUIRINIUS (51 BCE-21 CE), A DISTINGUISHED SOLDIER AND FORMER CONSUL, WAS APPOINTED IMPERIAL LEGATE (GOVERNOR) OF THE PROVINCE OF ROMAN SYRIA. IN THE SAME YEAR JUDEA WAS DECLARED A ROMAN PROVINCE, AND QUIRINIUS WAS TASKED TO CARRY OUT A CENSUS OF THE NEW TERRITORY FOR TAX PURPOSES. ’ IN THOSE DAYS A DECREE WENT OUT FROM EMPEROR AUGUSTUS THAT ALL THE WORLD SHOULD BE REGISTERED. THIS WAS THE FIRST REGISTRATION AND WAS TAKEN WHILE QUIRINIUS WAS GOVERNOR OF SYRIA. ALL WENT TO THEIR OWN TOWNS TO BE REGISTERED. JOSEPH ALSO WENT FROM THE TOWN OF NAZARETH IN GALILEE TO JUDEA, TO THE CITY OF DAVID CALLED BETHLEHEM, BECAUSE HE WAS DESCENDED FROM THE HOUSE AND FAMILY OF DAVID. HE WENT TO BE REGISTERED WITH MARY, TO WHOM HE WAS ENGAGED AND WHO WAS EXPECTING A CHILD. (LUKE 2:1–7)’
  • 25. SAMPLING THE TRIAL OF THE PYX IS A TEST OF THE PURITY OF THE COINAGE OF THE ROYAL MINT WHICH HAS BEEN HELD ON A REGULAR BASIS SINCE THE 12TH CENTURY. THE TRIAL ITSELF IS BASED ON STATISTICAL SAMPLING METHODS. AFTER MINTING A SERIES OF COINS - ORIGINALLY FROM TEN POUNDS OF SILVER - A SINGLE COIN WAS PLACED IN THE PYX - A BOX IN WESTMINSTER ABBEY. AFTER A GIVEN PERIOD - NOW ONCE A YEAR - THE COINS ARE REMOVED AND WEIGHED. A SAMPLE OF COINS REMOVED FROM THE BOX ARE THEN TESTED FOR PURITY.
  • 26. THE MEAN AND MEDIAN • THE ARITHMETIC MEAN, ALTHOUGH A CONCEPT KNOWN TO THE GREEKS, WAS NOT GENERALIZED TO MORE THAN TWO VALUES UNTIL THE 16TH CENTURY. THE INVENTION OF THE DECIMAL SYSTEM BY SIMON STEVIN IN 1585 SEEMS LIKELY TO HAVE FACILITATED THESE CALCULATIONS. THIS METHOD WAS FIRST ADOPTED IN ASTRONOMY BY TYCHO BRAHE WHO WAS ATTEMPTING TO REDUCE THE ERRORS IN HIS ESTIMATES OF THE LOCATIONS OF VARIOUS CELESTIAL BODIES. • THE IDEA OF THE MEDIAN ORIGINATED IN EDWARD WRIGHT'S BOOK ON NAVIGATION (CERTAINE ERRORS IN NAVIGATION) IN 1599 IN A SECTION CONCERNING THE DETERMINATION OF LOCATION WITH A COMPASS. WRIGHT FELT THAT THIS VALUE WAS THE MOST LIKELY TO BE THE CORRECT VALUE IN A SERIES OF OBSERVATIONS.
  • 27. DEMOGRAPHY GAIN UNDERSTANDING COMPLEX SOCIAL PHENOMENA THE BIRTH OF STATISTICS IS OFTEN DATED TO 1662, WHEN JOHN GRAUNT, ALONG WITH WILLIAM PETTY, DEVELOPED EARLY HUMAN STATISTICAL AND CENSUS METHODS THAT PROVIDED A FRAMEWORK FOR MODERN DEMOGRAPHY. HE PRODUCED THE FIRST LIFE TABLE, GIVING PROBABILITIES OF SURVIVAL TO EACH AGE. HIS BOOK NATURAL AND POLITICAL OBSERVATIONS MADE UPON THE BILLS OF MORTALITY USED ANALYSIS OF THE MORTALITY ROLLS TO MAKE THE FIRST STATISTICALLY BASED ESTIMATION OF THE POPULATION OF LONDON. HE KNEW THAT THERE WERE AROUND 13,000 FUNERALS PER YEAR IN LONDON AND THAT THREE PEOPLE DIED PER ELEVEN FAMILIES PER YEAR. HE ESTIMATED FROM THE PARISH RECORDS THAT THE AVERAGE FAMILY SIZE WAS 8 AND CALCULATED THAT THE POPULATION OF LONDON WAS ABOUT 384,000. IN 1802 LAPLACE ESTIMATED THE POPULATION OF FRANCE TO BE 28,328,612.[11] HE CALCULATED THIS FIGURE USING THE NUMBER OF BIRTHS IN THE PREVIOUS YEAR AND CENSUS DATA FOR THREE COMMUNITIES. THE CENSUS DATA OF THESE COMMUNITIES SHOWED THAT THEY HAD 2,037,615 PERSONS AND THAT THE NUMBER OF BIRTHS WERE 71,866. ASSUMING THAT THESE SAMPLES WERE REPRESENTATIVE OF FRANCE, LAPLACE PRODUCED HIS ESTIMATE FOR THE ENTIRE POPULATION.
  • 28. PREDICT ORBIT OF PLANETS THE METHOD OF LEAST SQUARES, WHICH WAS USED TO MINIMIZE ERRORS IN DATA MEASUREMENT, WAS PUBLISHED INDEPENDENTLY BY ADRIEN-MARIE LEGENDRE (1805), ROBERT ADRAIN (1808), AND CARL FRIEDRICH GAUSS (1809). GAUSS HAD USED THE METHOD IN HIS FAMOUS 1801 PREDICTION OF THE LOCATION OF THE DWARF PLANET CERES. THE OBSERVATIONS THAT GAUSS BASED HIS CALCULATIONS ON WERE MADE BY THE ITALIAN MONK PIAZZI. A DETAILED ACCOUNT OF THE METHOD USED CAN BE FOUND AT HTTP://SCIENCE.LAROUCHEPAC.COM/GAUSS/CERES/INTERIMII/ASTRONOMY/KEPLERP ROBLEM.HTML
  • 29. WISDOM OF CROWDS FRANCIS GALTON IS CREDITED AS ONE OF THE PRINCIPAL FOUNDERS OF STATISTICAL THEORY. HIS CONTRIBUTIONS TO THE FIELD INCLUDED INTRODUCING THE CONCEPTS OF STANDARD DEVIATION, CORRELATION, REGRESSION AND THE APPLICATION OF THESE METHODS TO THE STUDY OF THE VARIETY OF HUMAN CHARACTERISTICS - HEIGHT, WEIGHT, EYELASH LENGTH AMONG OTHERS. HE FOUND THAT MANY OF THESE COULD BE FITTED TO A NORMAL CURVE DISTRIBUTION.[19] GALTON SUBMITTED A PAPER TO NATURE IN 1907 ON THE USEFULNESS OF THE MEDIAN.[20] HE EXAMINED THE ACCURACY OF 787 GUESSES OF THE WEIGHT OF AN OX AT A COUNTRY FAIR. THE ACTUAL WEIGHT WAS 1208 POUNDS: THE MEDIAN GUESS WAS 1198. THE GUESSES WERE MARKEDLY NON-NORMALLY DISTRIBUTED.
  • 30. AGRICULTURE THE SECOND WAVE OF MATHEMATICAL STATISTICS WAS PIONEERED BY RONALD FISHER WHO WROTE TWO TEXTBOOKS, STATISTICAL METHODS FOR RESEARCH WORKERS, PUBLISHED IN 1925 AND THE DESIGN OF EXPERIMENTS IN 1935, THAT WERE TO DEFINE THE ACADEMIC DISCIPLINE IN UNIVERSITIES AROUND THE WORLD. HE ALSO SYSTEMATIZED PREVIOUS RESULTS, PUTTING THEM ON A FIRM MATHEMATICAL FOOTING. IN HIS 1918 SEMINAL PAPER THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE, THE FIRST USE TO USE THE STATISTICAL TERM, VARIANCE. IN 1919, AT ROTHAMSTED EXPERIMENTAL STATION HE STARTED A MAJOR STUDY OF THE EXTENSIVE COLLECTIONS OF DATA RECORDED OVER MANY YEARS. THIS RESULTED IN A SERIES OF REPORTS UNDER THE GENERAL TITLE STUDIES IN CROP VARIATION. IN 1930 HE PUBLISHED THE GENETICAL THEORY OF NATURAL SELECTION WHERE HE APPLIED STATISTICS TO EVOLUTION.
  • 31. MEDICINE, RELIABILITY, AND JURISPRUDENCE • THE TERM BAYESIAN REFERS TO THOMAS BAYES (1702–1761), WHO PROVED A SPECIAL CASE OF WHAT IS NOW CALLED BAYES' THEOREM. HOWEVER IT WAS PIERRE-SIMON LAPLACE (1749– 1827) WHO INTRODUCED A GENERAL VERSION OF THE THEOREM AND APPLIED IT TO CELESTIAL MECHANICS, MEDICAL STATISTICS, RELIABILITY, AND JURISPRUDENCE.[52]. • AN INTERESTING READ - HTTP://BLOGS.SCIENTIFICAMERICAN.COM/CROSS-CHECK/ARE- BRAINS-BAYESIAN/
  • 32. A QUICK HISTORY Time Contributor Contribution Ancient Greece Philosophe rs Ideas - no quantitative analyses 17th Century Graunt, Petty Pascal, Bernoulli studied affairs of state, vital statistics of populations studied probability through games of chance, gambling 18th Century Laplace, Gauss normal curve, regression through study of astronomy 19th Century Quetelet Galton astronomer who first applied statistical analyses to human biology studied genetic variation in humans(used regression and correlation) 20th Century (early) Pearson Gossett (Student) Fisher studied natural selection using correlation, formed first academic department of statistics, Biometrika journal, helped develop the Chi Square analysis studied process of brewing, alerted the statistics community about problems with small sample sizes, developed Student's test evolutionary biologists - developed ANOVA, stressed the importance of experimental design 20th Century (later) Wilcoxon Kruskal, Wallis Spearman Kendall Tukey Dunnett Keuls Computer Technology biochemist studied pesticides, non-parametric equivalent of two-samples test economists who developed the non-parametric equivalent of the ANOVA psychologist who developed a non-parametric equivalent of the correlation coefficient statistician who developed another non-parametric equivalent the correlation coefficient statistician who developed multiple comparisons procedure biochemist who studied pesticides, developed multiple comparisons procedure for control groups agronomist who developed multiple comparisons procedure provided many advantages over calculations by hand or by calculator, stimulated the growth of investigation into new techniques http://www.anselm.edu/homepage/jpitocch/biostatstime.html
  • 33. SO HOW DO YOU DECIDE OF WHICH ANALYTIC TOOLS TO USE? WELL ACTUALLY THAT IS THE WRONG QUESTION? PROCESS IS MUCH MORE IMPORTANT THAN THE TOOLS. THE TOOL/S SHOULD SUPPORT THE PROCESS
  • 34. TO GAIN BUSINESS UNDERSTANDING/SCOPE AND PLANNING/ PLAN THERE IS NO TOOL FOR THIS: YOU NEED TO RESEARCH: • UNDERSTAND THE CONTEXT OF YOUR INVESTIGATION • UNDERSTAND WHAT IS IMPORTANT TO THE BUSINESS/AGENCY/ORG. • WHAT HAS GONE BEFORE • WHAT MIGHT BE DONE DIFFERENTLY • WAS THE INFORMATION YOU HAD ACCESS TO VALID INPUT?
  • 37. DATA UNDERSTANDING/DISCOVERY THERE ARE SEVERAL TOOLS THAT CAN HELP YOU HERE: MOST SITES THESE DAYS HAVE REPORTS AND BUSINESS INTELLIGENCE DASHBOARDS THAT WILL GIVE YOU AN INSIGHT INTO HOW A BUSINESS/AGENCY/ORG SEES ITSELF. GAIN AS MUCH INSIGHT AS YOU CAN FROM THESE EXISTING PRODUCTS. DON’T ACCEPT THAT THEY ARE THE FULL STORY – THEY NOT. EXCEL: USE PIVOT, AND CHARTING TO GAIN A BASIC UNDERSTANDING. OTHER COMMON TOOLS ARE: • SAS/VA • TABLEAU • QLIK • SPSS • SQL • STATISTICA • MATLAB • ETC
  • 38. MODELLING USE THE APPROPRIATE TOOL FOR YOUR INVESTIGATION. • USE THE APPROPRIATE DATA • USE AN APPROPRIATE METHOD • ITERATE AND CHECK THAT YOUR RESULTS MAKE SENSE IN THE CONTEXT OF THE COLLECTION, AND THE QUESTION YOU ARE LOOKING TO ANSWER • SAYING – TO A CARPENTER THE SOLUTION TO EVERYTHING LOOKS LIKE A NAIL. • ALL ANALYSTS HAVE THEIR BENT TOWARDS PARTICULAR TOOLS – MINE BENT IS TOWARD THE MODELING TECHNIQUES USED IN THE SOCIAL SCIENCES BECAUSE THAT IS WHAT I STUDIED. BE AWARE OF THE LIMITS OF YOUR FAVOURITE TOOLS AND BE WILLING TO LEARN NEW TRICKS.
  • 39. EVALUATING/CHECK/VALIDATION HAVE A PROCESS AND STANDARDS FOR YOUR ENVIRONMENT THAT LAYS OUT THE RULES FOR EVALUATING YOUR MODEL. THE STANDARD WILL DEPEND ON THE TOOLS THAT YOU USE. MOST TOOLS SUCH AS CORRELATION, ANOVA, REGRESSION, ETC.; HAVE WELL UNDERSTOOD METHODS OF EVALUATION. HOWEVER CHECK THE WHOLE PROCESS AND IF YOU WANT TO USE THIS MODEL HAVE YOUR TEAM REVIEW AS WELL. WE ALL LIKE TO THINK WE NEVER MAKE MISTAKES; UNFORTUNATELY THAT IS NEVER TRUE. MAKE SURE THAT THE PROCESS INCLUDES SOME SANITY CHECK METHODS. IE THAT THE NUMBER OF ROWS/OBSERVATIONS THAT WERE READ IS WHAT YOU EXPECTED.
  • 40. DEPLOYMENT/ACT AFTER EVALUATING AND VALIDATING YOUR MODEL IT IS OFTEN ’DEPLOYED’ TO OPERATIONAL SYSTEMS AND REPORTS. SCORING: OFTEN THE OUTPUT OF THE MODEL WILL BE SCORE THAT USED AS INPUT TO OPERATIONAL SYSTEMS. EG, ESTIMATES FINANCIAL RISK, TRAVEL TIME, FUEL CONSUMPTION, RESOURCE REQUIREMENT, AND MEDICAL OUTCOMES. PARAMETER TO REPORTING: INTEGRATION INTO BUSINESS INTELLIGENCE DASHBOARDS, AND REGULAR MANAGEMENT INFORMATION SYSTEM REPORTS. OPERATIONAL SYSTEMS: MODELS PROVIDE INPUT TO ALL MANNER OF OPERATIONAL SYSTEMS RANGING FROM PRODUCTION CONTROL PROCESSING, LOGISTICS, FRAUD DETECTION, SYSTEMS MANAGEMENT, AND TRAFFIC CONTROL.
  • 41. BUSINESS UNDERSTANDING/REPORT ALL ANALYTICS IS UNDERTAKEN WITHIN A GIVEN CONTEXT. IN A RESEARCH CONTEXT A PAPER WILL BE THE OUTCOME WITH AN ABSTRACT, BACKGROUND, METHODS, RESULTS, CONCLUSION. IN A COMMERCIAL SETTING ANY FINDING (IN MY LIMITED EXPERIENCE) ARE REPORTED IN A VERY SIMILAR MANNER. REGARDLESS THE OUTCOMES OF THE ANALYTICS PROCESS SHOULD BE DOCUMENTED AND ADDED TO THE COLLECTIVE STORE OF BUSINESS KNOWLEDGE AT YOUR SITE.
  • 42. PS DON’T OVER COMPLICATE THINGS
  • 43. MONITOR/REVIEW/REPEAT KNOWLEDGE IS NOT STATIC. THERE ARE THE THINGS YOU KNOW ARE GOING TO HAPPEN AND THERE ARE NEW FACTORS THAT YOU WILL NOT HAVE THOUGHT OF. BOX ‘FOR SUCH A MODEL THERE IS NO NEED TO ASK THE QUESTION "IS THE MODEL TRUE?". IF "TRUTH" IS TO BE THE "WHOLE TRUTH" THE ANSWER MUST BE "NO". THE ONLY QUESTION OF INTEREST IS "IS THE MODEL ILLUMINATING AND USEFUL?”’ IN SHORT ‘ALL MODELS ARE WRONG, SOME ARE USEFUL' MONITOR: COMPARE THE ACTUAL PERFORMANCE OF THE MODELS YOU PRODUCE AGAINST EXPECTED/PLANNED PERFORMANCE. BE PREPARED TO PROCEED WITH A PROCESS OF CONTINUAL IMPROVEMENT.
  • 44. CONCLUSION WHAT IS ANALYTICS - ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT WHERE DOES IT(ANALYTICS) FIT – ANYWHERE WE YOU NEED TO MAKE A DECISION
  • 45. A FEW ANALYTICS TOOLS • THE 40 DATA SCIENCE TECHNIQUES 1 LINEAR REGRESSION 2 LOGISTIC REGRESSION 3 JACKKNIFE REGRESSION * 4 DENSITY ESTIMATION 5 CONFIDENCE INTERVAL 6 TEST OF HYPOTHESES 7 PATTERN RECOGNITION 8 CLUSTERING - (AKA UNSUPERVISED LEARNING) 9 SUPERVISED LEARNING 1 TIME SERIES 1 DECISION TREES 1 RANDOM NUMBERS 1 MONTE-CARLO SIMULATION 1 BAYESIAN STATISTICS 1 NAIVE BAYES 1Principal Component Analysis - (PCA) 1Ensembles 1Neural Networks 1Support Vector Machine - (SVM) 2Nearest Neighbors - (k-NN) 2Feature Selection - (aka Variable Reduction) 2Indexation / Cataloguing * 2(Geo-) Spatial Modelling 2Recommendation Engine * 2Search Engine * 2Attribution Modelling * 2Collaborative Filtering * 2Rule System 2Linkage Analysis 3Association Rules 3Scoring Engine 3Segmentation 3Predictive Modelling 3Graphs 3Deep Learning 3Game Theory 3Imputation 3Survival Analysis 3Arbitrage 4Lift Modelling 4Yield Optimization 4Cross-Validation 4Model Fitting
  • 46. SOME THINGS TO CHECK OUT INFORMATIVE HTTP://WWW.KDNUGGETS.COM HTTP://WWW.PREDICTIVEANALYTICSTODAY.COM/DEPLOYMENT-PREDICTIVE-MODELS BOOKS HTTP://SHOP.OREILLY.COM/CATEGORY/EBOOKS.DO COOL HTTPS://RAPIDMINER.COM XPATH CAPABILITIES FOR WEB SCRAPING USING GOOGLE DOCS HTTP://NODEXL.CODEPLEX.COM HTTPS://D3JS.ORG HTTP://WWW.FACULTY.UCR.EDU/~HANNEMAN/NETTEXT/ (SOCIAL NETWORK ANALYSI) HTTPS://WWW.KAGGLE.COM EDUCATION – CHEAP AND AT WHATEVER PACE YOU WANT TO TAKE HTTPS://WWW.UDEMY.COM ACADEMIC EDUCATION IN CANBERRA BOTH THE ANU AND CU HAVE GOOD COURSES AND USQ HAS EXCELLENT COURSES AS – SO DO A LOT OF OTHERS I WAS ASKED WHO TO FOLLOW ON TWITTER, FOLLOW TRY JUST SEARCH FOR DATA SCIENCE, AND ANALYTICS AND CHOOSE WHO TO FOLLOW.. ALSO FOLLOW THE JOURNALS, NATURE, AND OTHERS.

Editor's Notes

  1. Hackers and Analysts have a lot in common they are both curious and looking for patterns. The hacker can be a lawful, or unlawful. The unlawful aspects draw more interest. Not sure quite where wikileeaks sits in that. Use your own judgement.
  2. Money Ball. You know analytics is main stream when Brad Pitt plays the part of an analyst.
  3. Analysts find patterns. PS D3 is cool.
  4. Knowledge increases. Our understanding of everything changes over time. Therefore it is imperative for the good analyst to show how they came to the “knowledge” outcome so that the knowledge can be improved over time.
  5. This is one level of analysis. So far I think the world was going to end in 2012, 2015, in May this year, then June, and August.
  6. If they are talking about Planet X. Well its orbit will pass well beyond Pluto so no need for a bunker just yet. The closest thing is Asteroid 2016 HO3. http://neo.jpl.nasa.gov/news/news192.html
  7. The Hedgehog and the Fox is an essay by philosopher Isaiah Berlin. It was one of Berlin's most popular essays with the general public. Berlin himself said of the essay: "I never meant it very seriously. I meant it as a kind of enjoyable intellectual game, but it was taken seriously. Every classification throws light on something."[1]   Berlin expands upon this idea to divide writers and thinkers into two categories: hedgehogs, who view the world through the lens of a single defining idea (examples given include Plato, Lucretius, Dante, Pascal, Hegel, Dostoevsky, Nietzsche, Ibsen, Proust, and Fernand Braudel) and foxes who draw on a wide variety of experiences and for whom the world cannot be boiled down to a single idea (examples given include Herodotus, Aristotle, Erasmus, Shakespeare, Montaigne, Molière, Goethe, Pushkin, Balzac, Joyce, Anderson).     In his 2012 New York Times best-selling book The Signal and the Noise, forecaster Nate Silver urges readers to be "more foxy" after summarizing Berlin's distinction. He cites the work of Philip Tetlock on the accuracy of political forecasts in the United States during the Cold War while he was a professor of political science at the University of California, Berkeley. Silver's news website fivethirtyeight.com, when launched in March 2014, also adopted the fox as its logo "as an allusion to" Archilochus' original work.[7]
  8. The mythical Data Scientist is considered a blend of many talents: Math, Stats, Algorithms; Software Engineering; and Data Communications The Term Data Analyst is the blending of Data Communications (communicating results – in simpler terms research) and Math Stats etc I am qualified, industry certified and experienced in all these skills, Math, Stats, Algorithms, Research (Data Communications). Still I would not call myself a Data Scientist.
  9. This is what Data Analyst Job Specification will look like.
  10. The bulk of I have been asked to do has be Reporting; some Segmentation, a little bit of trend analysis, and even less predictive modelling.
  11. In Canberra this is the usual domain of the Data Analyst. This is the primary domain for the Agency reporting, the ABS and AIHW
  12. Trend Analysis is general an outcome from Time Series analysis
  13. Segmentation is usually a term used in association with marketing; however the same techniques can are used to segment population groups that are targeted by social policy
  14. Detection modelling signal detection theory is a means to quantify the ability to discern between information-bearing patterns (called stimulus in living organisms, signal in machines) and random patterns that distract from the information (called noise, consisting of background stimuli and random activity of the detection machine and of the nervous system of the operator). In the field of electronics, the separation of such patterns from a disguising background is referred to as signal recovery.[1] Machine learning is a subfield of computer science[1] (more particularly soft computing) that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.[1] In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed".[2] Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.[3] Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs,[4]:2 rather than following strictly static program instructions. Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.[1][2] Predictive analytics is used in actuarial science,[4] marketing,[5] financial services,[6] insurance, telecommunications,[7] retail,[8] travel,[9] healthcare,[10] child protection,[11][12] pharmaceuticals,[13] capacity planning[citation needed] and other fields. One of the most well known applications is credit scoring,[1] which is used throughout financial services. Scoring models process a customer's credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time.
  15. .
  16. The simplest measure used in analytics is a count. Why do you need to count. The know who and how much tax you should expect (Caesar , William the Conquer, and Incas) These days the counts are used in many different ways: Planning for education, health, transport, and financial expectations.
  17. Mean is the average One of the most useful and misused measures of all. Median is midpoint – Often in Australian we talk about the median house price of say $500K in Canberra. This median price means half the houses were less expensive and half were more. The median is a very powerful tool
  18. Adolphe Quetelet (1796–1874), another important founder of statistics, introduced the notion of the "average man" (l'homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates, and suicide rates.[14]
  19. A detailed account of method @ http://science.larouchepac.com/gauss/ceres/InterimII/Astronomy/KeplerProblem.html A detailed accouThe Method of Least squares is a simple linear regression. Legrende “Of all the principles which can be proposed for [making estimates from a sample], I think there is none more general, more exact, and more easy of application, than that of which we have made use… which consists of rendering the sum of the squares of the errors a minimum Non-linear Regression -The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. In raw score form the regression equation is: Y=a+B1X1+B2X2 ..BkXk + e Regression for fitting a "true relationship". In standard regression analysis, that leads to fitting by least squares, there is an implicit assumption that errors in the independent variable are zero or strictly controlled so as to be negligible. When errors in the independent variable are non-negligible, models of measurement error can be used; such methods can lead to parameter estimates, hypothesis testing and confidence intervals that take into account the presence of observation errors in the independent variables.
  20. Nate Silver – Took the all public polls; weighted the results based on previous bias and successfully called the previous presidential election way before anyone else Note at this time the betting market have donald trump with only 26% chance of winning.
  21. Business Analysis To investigate business systems, taking a holistic view of the situation. This may include examining elements of the organisation structures and staff development issues as well as current processes and IT systems. To evaluate actions to improve the operation of a business system. Again, this may require an examination of organisational structure and staff development needs, to ensure that they are in line with any proposed process redesign and IT system development. To document the business requirements for the IT system support using appropriate documentation standards. Research understand the context of your research through “Literature Review” Ie what do we already know about the subject matter of interest/question Is the literature available stand up; ie does the question make sense, was the data used valid for the purpose, was method valid and appropriate; does the conclusion/finding stand up; is it repeatable, That last one is an important one Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science. These finding were found to be repeatable across the whole spectrum of sciences. This is almost certainly true of analytics in the workplace. A number of times I have been asked to work out why a researcher was was getting different numbers to the organisation; my first question to the researcher was where did you get your data from. Invariably they would point me to some dataset with a name like FY9697A112. Ok where did that come from and how was that dereived; in all cases we would quickly reach apoint where they would give up. They could not show the provence of their data. Therefore the process they had gone through had no meaning or value. Also beware the Business Owner who comes to you saying they want to prove ‘XXXX’, or we don’t like those numbers.
  22. In all cases I have worked on in the last 30 years I have been accessing existing administrative data
  23. Almost always the bulk of this already exists. Is you are told it doesn’t exist. Start building; however keep an eye open because it almost certainly does and you don’t want rebuild all this if it does exist. Always use existing data if possible. Recognise thatr all collections have issues and are a work in progress. However it is almost always better to fix the existing than start anew. 90% of any analytics is/should be the establishing a well understood and documented datastore where the build of any measure can be achieved from first principles. If you can’t build from 1st principles then don’t start any analysis .
  24. Data understanding has been known many names data discovery data exploration In the SAS world this normally means a Proc Freq, and Proc Summary. In R Describe In SPSS –FREQUENCIES, DESCRIPTIVES, EXAMINE
  25. As always document your review and add to your standards if required; however keep it brief. Reviews are not a place where bullying should tolerated. People learn best in an environment where they can make mistakes and recover. It is a means for learning, for imparting and gaining knowledge and the process is shared by the team to build a team’s coherence.
  26. No-one can be good at all these methods. I know a few