A Presentation I made to the the ACS and DAMA SIG in Canberra at the Mantra on Tuesday evening.
The intention is to provide a layman on someone interested in take this career and broad brush view of Analytics and how it can be used. A little history added in for good measure.
A large percentage of the contents are plagiarised and all ideas and materials are mainly sourced from well known sources such as wikipedia and google images. I f you object to any material that you believe you have copyright to being included, please contact me with proof of ownership and I will removed said content.
InCites for publishers Frankfurt Book Fair 2015Ian Potter
An overview of the Thomson Reuters InCites BenchMarking & Analytics package with a range of indicators in addition to the journal impact fact and its use by publishers to analyse journal and list performance.
This document summarizes Chapter 1 of Dale Jorgenson's 1967 book "The Theory of Investment Behavior". It discusses the relationship between empirical and theoretical research on business investment behavior. While empirical and theoretical research were often conducted in isolation, the author argues they are best pursued together within a common theoretical framework. This allows empirical studies to be properly directed and avoids premature assumptions. The goal of the paper is to make the implicit theoretical framework behind investment models explicit, to provide a basis for evaluating evidence on what drives investment behavior.
What is a data scientist - a presentation I made to the Canberra IAPARussell Tibballs
This document discusses definitions of a data scientist and proposes criteria for what qualifies someone for the job title. It argues that a data scientist should be considered a scientist since they apply scientific methods to data. A data scientist requires tertiary education in a relevant field like math, statistics or computer science. Their work involves autonomously applying expertise to analyze data and solve problems using scientific approaches. Definitions should recognize various specializations within data science and that technical skills alone don't make someone a data scientist. Formal education requirements and certification from professional bodies would give the field greater credibility and clarity.
Research is a systematic process of gathering, interpreting, and analyzing information to resolve a specific problem. It involves searching again for available information to make sense of it. Quantitative research uses statistical data from large sample sizes to generalize findings but does not explain why or how, while qualitative research relies on narratives from participatory research or observation to understand perspectives without oversimplifying but from too few respondents to generalize. Both have advantages and disadvantages.
Quantitative and qualitative analysis of dataNisha M S
This document provides an overview of several qualitative and quantitative research methods and analysis techniques. It discusses interpretative phenomenological analysis (IPA) for qualitative analysis, which aims to explore participants' experiences and perspectives while acknowledging the researcher's own biases. It also reviews grounded theory methodology, discourse analysis techniques, and narrative analysis approaches. For quantitative analysis, it outlines organizing data, visual presentation methods, measures of central tendency, measures of variation, and common statistical tests. The document presents steps and considerations for applying these diverse analytical methods to research.
Schemas are mental structures that help organize and interpret information. They influence what we notice and how we remember things. When we encounter new information, we either assimilate it into our existing schemas or accommodate our schemas based on the new information. Accommodation involves modifying or replacing schemas, while assimilation uses schemas to understand new concepts. Piaget's theory of cognitive development emphasized how children actively construct knowledge through assimilation and accommodation of information into their schemas.
The document discusses different ways to classify research based on data type, purpose, and method. Research can be classified as either quantitative or qualitative based on whether it uses numerical data that can be measured or qualitative data such as opinions. Quantitative research aims to quantify data while qualitative research provides descriptive details. Research can also be classified as fundamental, applied, or action-oriented based on its purpose. The methods used to conduct research can include historical, philosophical, experimental, or descriptive survey approaches.
This document provides an overview of the research process. It discusses the key stages in research including identifying a research problem, reviewing relevant literature, formulating research questions and hypotheses, choosing a study design, deciding on sampling methods, collecting and analyzing data, and writing a research report. The document emphasizes that research aims to systematically investigate issues to contribute to generalizable knowledge. It also highlights the importance of research in advancing scientific understanding and improving health.
InCites for publishers Frankfurt Book Fair 2015Ian Potter
An overview of the Thomson Reuters InCites BenchMarking & Analytics package with a range of indicators in addition to the journal impact fact and its use by publishers to analyse journal and list performance.
This document summarizes Chapter 1 of Dale Jorgenson's 1967 book "The Theory of Investment Behavior". It discusses the relationship between empirical and theoretical research on business investment behavior. While empirical and theoretical research were often conducted in isolation, the author argues they are best pursued together within a common theoretical framework. This allows empirical studies to be properly directed and avoids premature assumptions. The goal of the paper is to make the implicit theoretical framework behind investment models explicit, to provide a basis for evaluating evidence on what drives investment behavior.
What is a data scientist - a presentation I made to the Canberra IAPARussell Tibballs
This document discusses definitions of a data scientist and proposes criteria for what qualifies someone for the job title. It argues that a data scientist should be considered a scientist since they apply scientific methods to data. A data scientist requires tertiary education in a relevant field like math, statistics or computer science. Their work involves autonomously applying expertise to analyze data and solve problems using scientific approaches. Definitions should recognize various specializations within data science and that technical skills alone don't make someone a data scientist. Formal education requirements and certification from professional bodies would give the field greater credibility and clarity.
Research is a systematic process of gathering, interpreting, and analyzing information to resolve a specific problem. It involves searching again for available information to make sense of it. Quantitative research uses statistical data from large sample sizes to generalize findings but does not explain why or how, while qualitative research relies on narratives from participatory research or observation to understand perspectives without oversimplifying but from too few respondents to generalize. Both have advantages and disadvantages.
Quantitative and qualitative analysis of dataNisha M S
This document provides an overview of several qualitative and quantitative research methods and analysis techniques. It discusses interpretative phenomenological analysis (IPA) for qualitative analysis, which aims to explore participants' experiences and perspectives while acknowledging the researcher's own biases. It also reviews grounded theory methodology, discourse analysis techniques, and narrative analysis approaches. For quantitative analysis, it outlines organizing data, visual presentation methods, measures of central tendency, measures of variation, and common statistical tests. The document presents steps and considerations for applying these diverse analytical methods to research.
Schemas are mental structures that help organize and interpret information. They influence what we notice and how we remember things. When we encounter new information, we either assimilate it into our existing schemas or accommodate our schemas based on the new information. Accommodation involves modifying or replacing schemas, while assimilation uses schemas to understand new concepts. Piaget's theory of cognitive development emphasized how children actively construct knowledge through assimilation and accommodation of information into their schemas.
The document discusses different ways to classify research based on data type, purpose, and method. Research can be classified as either quantitative or qualitative based on whether it uses numerical data that can be measured or qualitative data such as opinions. Quantitative research aims to quantify data while qualitative research provides descriptive details. Research can also be classified as fundamental, applied, or action-oriented based on its purpose. The methods used to conduct research can include historical, philosophical, experimental, or descriptive survey approaches.
This document provides an overview of the research process. It discusses the key stages in research including identifying a research problem, reviewing relevant literature, formulating research questions and hypotheses, choosing a study design, deciding on sampling methods, collecting and analyzing data, and writing a research report. The document emphasizes that research aims to systematically investigate issues to contribute to generalizable knowledge. It also highlights the importance of research in advancing scientific understanding and improving health.
This document provides information and guidelines on writing a survey report. It defines a survey report as a formal piece of writing based on research that summarizes the results of a survey in percentages and proportions. The purpose of a survey report is to thoroughly study a research topic and summarize existing studies in an organized manner. The document outlines the typical sections of a survey report and provides tips for writing an effective report, including keeping questions short and simple and avoiding biased questions.
Fully Exploiting Qualitative and Mixed Methods Data from Online SurveysShalin Hai-Jew
A wide range of contemporary research uses online surveys. This presentation provides an overview of ways to exploit survey-captured data for analysis. There will be a summary of basic survey and item analysis that may be achieved with survey data results. There will also be a range of tips for extracting, cleaning, structuring, and presenting both quantitative and qualitative data for data-consumer sense-making. The platform that will be used as an exemplar will be the Qualtrics survey platform, and two supporting tools used for analysis are Excel 2013 and NVivo 10. Real-world projects are used to demo these approaches—with principal investigator (PI) permission.
This document outlines various research methods and concepts related to conducting research projects. It discusses qualitative and quantitative research approaches and covers key parts of research papers like the introduction, methods, results, and discussion sections. Different types of data collection methods are examined, including acceptability judgements, elicited imitation, and processing research techniques. Ethical considerations around informed consent and working with human subjects are also addressed.
Practical Research (Introduction to Research)jamaltasarra21
This document provides an overview of research concepts and processes. It defines research as a systematic investigation that involves collecting, analyzing, and interpreting data to contribute to generalizable knowledge. The document discusses different types of research such as qualitative vs. quantitative, exploratory vs. explanatory, and basic vs. applied research. It also covers research methodology, design, and important considerations like validity and ethics. The overall purpose is to introduce learners to the inquisitive world of research and how it impacts society.
RESEARCH PROCESS
SELECTION OF RESEARCH PROBLEM
REVIEW LITERATURE
MAKING HYPOTHESIS
PREPARING THE RESEARCH DESIGN
SAMPLING
DATA COLLECTION
DATA ANALYSIS
HYPOTHESIS TESTING
GENERALIZATION AND INTERPRETATION
CONCLUSION
PREPARATION OF REPORT
This document outlines a syllabus for a course on business analytics. It covers topics like introduction to analytics, statistics for business analytics, advanced Excel, R, data mining techniques like decision trees and clustering in R, time series forecasting, predictive modeling with logistic regression in R, and an overview of big data and Hadoop. It also defines key concepts like data analysis, data analytics, data mining. Descriptive, predictive, and prescriptive analytics techniques are discussed. Applications of business analytics in various domains like finance, marketing, HR, CRM, manufacturing, and credit cards are provided.
Cultural Contradictions of Scanning in an Evidence-based Policy EnvironmentWendy Schultz
Dr. Wendy L. Schultz discusses horizon scanning as an essential tool for foresight activities that identifies emerging issues and changes. However, scanning faces challenges in an evidence-based policy environment due to contradictions between the subjective, tentative nature of scanning and political and scientific desires for objective, authoritative conclusions. Various techniques like causal layered analysis, integral futures, and spiral dynamics can help overcome biases and validate scan findings from diverse sources to better identify surprises and alternatives for policymaking.
This document provides an introduction to statistics, including its definition, history, and scope. It discusses how statistics originated as a way to collect administrative data for governments and has evolved into a discipline. Key figures who contributed to the development of modern statistics and probability theory are mentioned. The document also defines key statistical concepts like population, sample, and different sampling methods. It outlines how statistics is applied in economics, management, and quality control in industry.
This document provides guidance on creating compelling research manuscripts. It discusses identifying quality research publications and constructing the structure of a research article. It outlines the typical sections of a research paper like introduction, literature review, methodology, results, and discussion. It also addresses title, abstract, conclusions, and future research. Instructional methods include lectures and demonstrations. The document recommends resources like reference managers and academic databases. It emphasizes writing clearly and concisely while avoiding plagiarism.
This document provides an overview of the differences between qualitative and quantitative research methods. It discusses that qualitative research aims to understand social interactions through smaller, non-randomly selected groups, using open-ended responses and interviews. Quantitative research aims to test hypotheses and make predictions through larger, randomly selected groups and validated quantitative data collection instruments. The researcher's role and biases are known in qualitative research but hidden in quantitative research. Qualitative findings are less generalizable while quantitative findings can be more widely applied.
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...jmkurtz
After attending the SLA 2011 Annual Conference in Philadelphia, I developed this presentation to share the innovative ideas and technologies I learned about with my department at the Information Research Center.
This document discusses information literacy and its components. It defines information literacy as using information effectively and ethically to achieve objectives. The five components of information literacy are: identifying information needs, finding information, evaluating information, applying information, and acknowledging information sources. The document also covers ethical use of information, different citation styles, and how students, teachers, medical practitioners and others practice information literacy in their work.
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGEIan De Mellow
This document discusses the scientific method and its application to understanding industrialization and climate change. It provides historical examples of conflicts between science and religion from Galileo's trial in 1616 to debates about Darwin's theory of evolution. The document defines the scientific method and explores how it has developed over time. It also examines abuses of the scientific method, such as appeals to authority instead of evidence. Overall, the document analyzes the scientific method as a means for verifying knowledge through observation and testing of hypotheses.
This document discusses the use of geographical information systems (GIS) in public health. It provides background on GIS, including Dr. John Snow's use of maps to study the 1854 cholera outbreak in London. The document outlines key GIS concepts like geocoding, layers, and thematic mapping. It describes GIS functions such as data acquisition, storage, analysis, and presentation. Examples are given of how GIS can be used for tasks like calculating rates, measuring distances, and cluster analysis. Commonly used GIS software and advantages of GIS for public health are also summarized.
Scanning to Manage Disruption and Controversy PACITA 2015Wendy Schultz
An overview of horizon scanning for change management that reviews the results of previous scanning projects and presents some innovative software platforms to support futures and foresight research.
21st Century Reading: Text and Data Mining Skills for ScotlanfCILIPScotland
This document summarizes a series of workshops hosted by the National Library of Scotland on text and data mining skills. The workshops aimed to explore awareness and abilities around text and data mining in Scotland, investigate how cultural heritage organizations can expose text collections to researchers, and discuss tools, skills, and audiences related to distant reading. Key topics included defining text, levels of skills, analyzing tools, prioritizing digitization resources, the four waves of text mining approaches, demonstrating the value of text and data mining, and next steps such as establishing an interdisciplinary network.
This document provides an overview of statistics presented by five students. It defines statistics as the practice of collecting and analyzing numerical data. Descriptive statistics summarize data through parameters like the mean, while inferential statistics interpret descriptive statistics to draw conclusions. The document discusses examples of statistics, different types of charts and graphs, descriptive versus inferential statistics, and the importance and applications of statistics in fields like business, economics, and social sciences. It also covers topics like sampling methods, characteristics of sampling, probability versus non-probability sampling, and differences between the two.
This document discusses quantitative research. It states that quantitative research is objective, uses clearly defined questions, and has large sample sizes. This allows the numerical data to be analyzed quickly and easily using statistical models to generalize findings to the target population. However, quantitative research is also costly and difficult, gathering large amounts of data. It concludes by explaining an activity where students will discuss how quantitative research is used across different fields in small groups.
06-18-2024-Princeton Meetup-Introduction to MilvusTimothy Spann
06-18-2024-Princeton Meetup-Introduction to Milvus
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
https://github.com/milvus-io/milvus
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/142-17June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications.
This document provides information and guidelines on writing a survey report. It defines a survey report as a formal piece of writing based on research that summarizes the results of a survey in percentages and proportions. The purpose of a survey report is to thoroughly study a research topic and summarize existing studies in an organized manner. The document outlines the typical sections of a survey report and provides tips for writing an effective report, including keeping questions short and simple and avoiding biased questions.
Fully Exploiting Qualitative and Mixed Methods Data from Online SurveysShalin Hai-Jew
A wide range of contemporary research uses online surveys. This presentation provides an overview of ways to exploit survey-captured data for analysis. There will be a summary of basic survey and item analysis that may be achieved with survey data results. There will also be a range of tips for extracting, cleaning, structuring, and presenting both quantitative and qualitative data for data-consumer sense-making. The platform that will be used as an exemplar will be the Qualtrics survey platform, and two supporting tools used for analysis are Excel 2013 and NVivo 10. Real-world projects are used to demo these approaches—with principal investigator (PI) permission.
This document outlines various research methods and concepts related to conducting research projects. It discusses qualitative and quantitative research approaches and covers key parts of research papers like the introduction, methods, results, and discussion sections. Different types of data collection methods are examined, including acceptability judgements, elicited imitation, and processing research techniques. Ethical considerations around informed consent and working with human subjects are also addressed.
Practical Research (Introduction to Research)jamaltasarra21
This document provides an overview of research concepts and processes. It defines research as a systematic investigation that involves collecting, analyzing, and interpreting data to contribute to generalizable knowledge. The document discusses different types of research such as qualitative vs. quantitative, exploratory vs. explanatory, and basic vs. applied research. It also covers research methodology, design, and important considerations like validity and ethics. The overall purpose is to introduce learners to the inquisitive world of research and how it impacts society.
RESEARCH PROCESS
SELECTION OF RESEARCH PROBLEM
REVIEW LITERATURE
MAKING HYPOTHESIS
PREPARING THE RESEARCH DESIGN
SAMPLING
DATA COLLECTION
DATA ANALYSIS
HYPOTHESIS TESTING
GENERALIZATION AND INTERPRETATION
CONCLUSION
PREPARATION OF REPORT
This document outlines a syllabus for a course on business analytics. It covers topics like introduction to analytics, statistics for business analytics, advanced Excel, R, data mining techniques like decision trees and clustering in R, time series forecasting, predictive modeling with logistic regression in R, and an overview of big data and Hadoop. It also defines key concepts like data analysis, data analytics, data mining. Descriptive, predictive, and prescriptive analytics techniques are discussed. Applications of business analytics in various domains like finance, marketing, HR, CRM, manufacturing, and credit cards are provided.
Cultural Contradictions of Scanning in an Evidence-based Policy EnvironmentWendy Schultz
Dr. Wendy L. Schultz discusses horizon scanning as an essential tool for foresight activities that identifies emerging issues and changes. However, scanning faces challenges in an evidence-based policy environment due to contradictions between the subjective, tentative nature of scanning and political and scientific desires for objective, authoritative conclusions. Various techniques like causal layered analysis, integral futures, and spiral dynamics can help overcome biases and validate scan findings from diverse sources to better identify surprises and alternatives for policymaking.
This document provides an introduction to statistics, including its definition, history, and scope. It discusses how statistics originated as a way to collect administrative data for governments and has evolved into a discipline. Key figures who contributed to the development of modern statistics and probability theory are mentioned. The document also defines key statistical concepts like population, sample, and different sampling methods. It outlines how statistics is applied in economics, management, and quality control in industry.
This document provides guidance on creating compelling research manuscripts. It discusses identifying quality research publications and constructing the structure of a research article. It outlines the typical sections of a research paper like introduction, literature review, methodology, results, and discussion. It also addresses title, abstract, conclusions, and future research. Instructional methods include lectures and demonstrations. The document recommends resources like reference managers and academic databases. It emphasizes writing clearly and concisely while avoiding plagiarism.
This document provides an overview of the differences between qualitative and quantitative research methods. It discusses that qualitative research aims to understand social interactions through smaller, non-randomly selected groups, using open-ended responses and interviews. Quantitative research aims to test hypotheses and make predictions through larger, randomly selected groups and validated quantitative data collection instruments. The researcher's role and biases are known in qualitative research but hidden in quantitative research. Qualitative findings are less generalizable while quantitative findings can be more widely applied.
2011 SLA Annual Conference & INFO-EXPO: Novel Applications for TD Bank\'s...jmkurtz
After attending the SLA 2011 Annual Conference in Philadelphia, I developed this presentation to share the innovative ideas and technologies I learned about with my department at the Information Research Center.
This document discusses information literacy and its components. It defines information literacy as using information effectively and ethically to achieve objectives. The five components of information literacy are: identifying information needs, finding information, evaluating information, applying information, and acknowledging information sources. The document also covers ethical use of information, different citation styles, and how students, teachers, medical practitioners and others practice information literacy in their work.
INDUSTRIALISATION AND GLOBAL CLIMATE CHANGEIan De Mellow
This document discusses the scientific method and its application to understanding industrialization and climate change. It provides historical examples of conflicts between science and religion from Galileo's trial in 1616 to debates about Darwin's theory of evolution. The document defines the scientific method and explores how it has developed over time. It also examines abuses of the scientific method, such as appeals to authority instead of evidence. Overall, the document analyzes the scientific method as a means for verifying knowledge through observation and testing of hypotheses.
This document discusses the use of geographical information systems (GIS) in public health. It provides background on GIS, including Dr. John Snow's use of maps to study the 1854 cholera outbreak in London. The document outlines key GIS concepts like geocoding, layers, and thematic mapping. It describes GIS functions such as data acquisition, storage, analysis, and presentation. Examples are given of how GIS can be used for tasks like calculating rates, measuring distances, and cluster analysis. Commonly used GIS software and advantages of GIS for public health are also summarized.
Scanning to Manage Disruption and Controversy PACITA 2015Wendy Schultz
An overview of horizon scanning for change management that reviews the results of previous scanning projects and presents some innovative software platforms to support futures and foresight research.
21st Century Reading: Text and Data Mining Skills for ScotlanfCILIPScotland
This document summarizes a series of workshops hosted by the National Library of Scotland on text and data mining skills. The workshops aimed to explore awareness and abilities around text and data mining in Scotland, investigate how cultural heritage organizations can expose text collections to researchers, and discuss tools, skills, and audiences related to distant reading. Key topics included defining text, levels of skills, analyzing tools, prioritizing digitization resources, the four waves of text mining approaches, demonstrating the value of text and data mining, and next steps such as establishing an interdisciplinary network.
This document provides an overview of statistics presented by five students. It defines statistics as the practice of collecting and analyzing numerical data. Descriptive statistics summarize data through parameters like the mean, while inferential statistics interpret descriptive statistics to draw conclusions. The document discusses examples of statistics, different types of charts and graphs, descriptive versus inferential statistics, and the importance and applications of statistics in fields like business, economics, and social sciences. It also covers topics like sampling methods, characteristics of sampling, probability versus non-probability sampling, and differences between the two.
This document discusses quantitative research. It states that quantitative research is objective, uses clearly defined questions, and has large sample sizes. This allows the numerical data to be analyzed quickly and easily using statistical models to generalize findings to the target population. However, quantitative research is also costly and difficult, gathering large amounts of data. It concludes by explaining an activity where students will discuss how quantitative research is used across different fields in small groups.
Similar to Analytics and Where it Fits - ACS DAMA SIG (20)
06-18-2024-Princeton Meetup-Introduction to MilvusTimothy Spann
06-18-2024-Princeton Meetup-Introduction to Milvus
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
https://github.com/milvus-io/milvus
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/142-17June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications.
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
2. A LITTLE ABOUT YOURS TRULY
MY NAME IS RUSSELL TIBBALLS
I HAVE BEEN WORKING WITH DATA, PROGRAMMING, AND PERFORMING VARIOUS LEVELS OF ANALYSIS FOR OVER 40 YEARS. I JOINED THE CUSTOMS
STATISTICS TEAM AFTER LEAVING HIGH SCHOOL IN NOVEMBER 1974.
I AM THE CANBERRA CHAIR OF IAPA
I ALSO CHAIR THE ADVISORY COMMITTEE TO THE SCIENCES AT USQ
I HAVE A MASTERS IN SOCIAL RESEARCH METHODS (ANU), FOCUSES ON SURVEY ANALYSIS, INTERNATIONAL MIGRATION, AND ANALYSIS OF WEB
PRESENCE USING SOCIAL NETWORK ANALYSIS (SNA).
ACS CERTIFIED PROFESSIONAL
TDWI CERTIFIED BUSINESS PROFESSIONAL – DATA ANALYSIS
SAS ADVANCED PROGRAMMER
ETC. ETC.
I MAJOR INTERESTS ARE:
• MY FAMILY AND WHAT IS HAPPENING ON MY ACREAGE AND THE MOLONGO RIVER (IT ADJOINS)
• ANY APPLICATIONS OF ANALYTICS IN THE HARD AND SOFT SCIENCES. YES I AM A TRAGIC WHO READS NATURE AND JSTOR PAPERS WHENEVER I
CAN.
• INTERNATIONAL MIGRATION
3. THE IMPRESSION
• THE IMPRESSION IS THAT ANYONE GIVEN ACCESS TO THE RIGHT INFORMATION CAN
ANALYZE AND COME UP WITH A SOLUTION FOR ANY PROBLEM IN MOMENTS.
• THE VIEW HAS BECOME INCREASING PERVASIVE. SEE – ‘ARE WE COOL YET?: A
LONGITUDINAL CONTENT ANALYSIS OF NERD AND GEEK REPRESENTATIONS IN
POPULAR TELEVISION’ (2012 – CARDIEL C L)
• HOLLYWOOD HAS MOVED FROM THE MAD SCIENTIST WHO CAN WHIP A WORLD
BEATING GADGET IN SECONDS (THINK DEXTERS LAB), TAKEN THE NERDY FRIEND OF
THE HERO FROM HACKER (MARKY MARK IN DATE NIGHT) TO THE ANALYST WHO
CAN LOG INTO THE INTERNET, FIND THE PETABYTES DATA YOU IN TO ANALYZE, TO
STOP THE END OF THE WORLD IN MOMENTS; OCCASIONALLY SECONDS.
8. A FEW DEFINITIONS
• FROM EVAN STUBBS “THE VALUE OF BUSINESS ANALYTICS”
• ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT
• COMMON FORMS ARE:
• REPORTING – THE ORGANISATION OF HISTORICAL DATA
• TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA
• SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA
• PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL DATA
9. CONTINUING FROM EVAN STUBBS
• ALL APPLICATIONS OF ANALYTICS HAVE A NUMBER OF COMMON CHARACTERISATIONS:
• THEY ARE BASED ON DATA
• THEY APPLY VARIOUS MATHEMATICAL TECHNIQUES TO TRANSFORM AND SUMMARIZE THE RAW DATA
• THE ADD VALUE TO THE ORIGINAL DATA AND TRANSFORM IT INTO “KNOWLEDGE”
• ADVANCED ANALYTICS HOWEVER AIMS TO IDENTIFY:
• WHY THINGS ARE HAPPENING
• WHAT WILL HAPPEN NEXT
• WHAT IS THE POSSIBLE COURSE OF ACTION
• THE BUSINESS OUTCOME DRIVERS FOR THE USE OF “BUSINESS ANALYTICS” ARE:
• BUSINESS RELEVANCY
• ACTIONABLE INSIGHT
• PERFORMANCE MEASUREMENT AND VALUE MEASUREMENT
• KNOWLEDGE IS A FAMILIARITY, AWARENESS OR UNDERSTANDING OF SOMEONE OR SOMETHING, SUCH AS FACTS,
INFORMATION, DESCRIPTIONS, OR SKILLS, WHICH IS ACQUIRED THROUGH EXPERIENCE OR EDUCATION BY
PERCEIVING, DISCOVERING, OR LEARNING.
• IN GOVERNMENT AN AGENCY’S RELEVANCY IS MEASURED IN TERMS OF POLICY ALIGNMENT
10. THE PRIVATE SECTOR PERSPECTIVE
• TO INCREASE THE EFFICIENCY OF DELIVERY AND VALUE TO THE CUSTOMER
• DRIVERS BEING INCREASING MARKET SHARE AND PROFITS
• AND MOST IMPORTANTLY TO DELIVER BENEFIT TO THE SHAREHOLDER
• ANY PRIVATE OR PUBLIC ENTERPRISE HAS TWO MAIN DRIVERS:
• THE WISHES OF ITS OWNERS, SHAREHOLDERS, OR GOVERNMENT.
• THE ONGOING RELEVANCY OF THE ORGANIZATION
13. THE HEDGEHOG AND THE FOX
THE HEDGEHOG AND THE FOX IS AN ESSAY BY PHILOSOPHER ISAIAH BERLIN. IT
WAS ONE OF BERLIN'S MOST POPULAR ESSAYS WITH THE GENERAL PUBLIC.
BERLIN EXPANDS UPON THIS IDEA TO DIVIDE WRITERS AND THINKERS INTO TWO
CATEGORIES: HEDGEHOGS, WHO VIEW THE WORLD THROUGH THE LENS OF A
SINGLE DEFINING IDEA, AND FOXES WHO DRAW ON A WIDE VARIETY OF
EXPERIENCES AND FOR WHOM THE WORLD CANNOT BE BOILED DOWN TO A
SINGLE IDEA.
IN HIS 2012 NEW YORK TIMES BEST-SELLING BOOK THE SIGNAL AND THE NOISE,
FORECASTER NATE SILVER URGES READERS TO BE "MORE FOXY" AFTER
SUMMARIZING BERLIN'S DISTINCTION.
14. A BRIEF DETOUR ON THE VENDOR VIEW OF A
DATA SCIENTIST/ANALYST (ADVANCED
ANALYTICS)
17. DATA ANALYTICS
• ANALYTICS – ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT
• COMMON FORMS ARE:
• REPORTING – THE ORGANISATION OF HISTORICAL DATA
• TREND ANALYSIS – THE IDENTIFICATION OF PATTERNS IN TIMES SERIES DATA
• SEGMENTATION – IDENTIFICATION OF SIMILARITIES WITHIN DATA
• PREDICTIVE MODELING – PREDICTION OF FUTURE EVENTS USING HISTORICAL DATA
18. REPORTING
• VERB - MAKE A FORMAL STATEMENT OR COMPLAINT ABOUT (SOMEONE OR
SOMETHING) TO THE NECESSARY AUTHORITY.
• NOUN - AN ACCOUNT GIVEN OF A PARTICULAR MATTER, ESPECIALLY IN THE
FORM OF AN OFFICIAL DOCUMENT, AFTER THOROUGH INVESTIGATION OR
CONSIDERATION BY AN APPOINTED PERSON OR BODY. E.G., "THE CHAIRMAN'S
ANNUAL REPORT”. A SPOKEN OR WRITTEN DESCRIPTION OF AN EVENT OR
SITUATION, ESPECIALLY ONE INTENDED FOR PUBLICATION OR BROADCASTING
IN THE MEDIA.
19. TREND ANALYSIS
• TREND ANALYSIS IS THE PRACTICE OF COLLECTING INFORMATION AND ATTEMPTING TO SPOT
A PATTERN, OR TREND, IN THE INFORMATION. IN SOME FIELDS OF STUDY, THE TERM "TREND
ANALYSIS" HAS MORE FORMALLY DEFINED MEANINGS.[1][2][3]
• ALTHOUGH TREND ANALYSIS IS OFTEN USED TO PREDICT FUTURE EVENTS, IT COULD BE USED
TO ESTIMATE UNCERTAIN EVENTS IN THE PAST, SUCH AS HOW MANY ANCIENT KINGS
PROBABLY RULED BETWEEN TWO DATES, BASED ON DATA SUCH AS THE AVERAGE YEARS
WHICH OTHER KNOWN KINGS REIGNED.
• IN STATISTICS, TREND ANALYSIS OFTEN REFERS TO TECHNIQUES FOR EXTRACTING AN
UNDERLYING PATTERN OF BEHAVIOR IN A TIME SERIES WHICH WOULD OTHERWISE BE PARTLY
OR NEARLY COMPLETELY HIDDEN BY NOISE. A SIMPLE DESCRIPTION OF THESE TECHNIQUES IS
TREND ESTIMATION, WHICH CAN BE UNDERTAKEN WITHIN A FORMAL REGRESSION ANALYSIS.
20. SEGMENTATION
• MARKET SEGMENTATION IS A MARKETING STRATEGY WHICH INVOLVES DIVIDING A
BROAD TARGET MARKET INTO SUBSETS OF CONSUMERS, BUSINESSES, OR COUNTRIES
THAT HAVE, OR ARE PERCEIVED TO HAVE, COMMON NEEDS, INTERESTS, AND
PRIORITIES, AND THEN DESIGNING AND IMPLEMENTING STRATEGIES TO TARGET
THEM. MARKET SEGMENTATION STRATEGIES ARE GENERALLY USED TO IDENTIFY
AND FURTHER DEFINE THE TARGET CUSTOMERS, AND PROVIDE SUPPORTING DATA
FOR MARKETING PLAN ELEMENTS SUCH AS POSITIONING TO ACHIEVE CERTAIN
MARKETING PLAN OBJECTIVES. BUSINESSES MAY DEVELOP PRODUCT
DIFFERENTIATION STRATEGIES, OR AN UNDIFFERENTIATED APPROACH, INVOLVING
SPECIFIC PRODUCTS OR PRODUCT LINES DEPENDING ON THE SPECIFIC DEMAND AND
ATTRIBUTES OF THE TARGET SEGMENT.
21. PREDICTIVE MODELLING
• PREDICTIVE MODELING USES STATISTICS TO PREDICT OUTCOMES.[1] MOST OFTEN THE EVENT ONE WANTS TO
PREDICT IS IN THE FUTURE, BUT PREDICTIVE MODELLING CAN BE APPLIED TO ANY TYPE OF UNKNOWN EVENT,
REGARDLESS OF WHEN IT OCCURRED. FOR EXAMPLE, PREDICTIVE MODELS ARE OFTEN USED TO DETECT CRIMES
AND IDENTIFY SUSPECTS, AFTER THE CRIME HAS TAKEN PLACE.[2]
• IN MANY CASES THE MODEL IS CHOSEN ON THE BASIS OF DETECTION THEORY TO TRY TO GUESS THE PROBABILITY
OF AN OUTCOME GIVEN A SET AMOUNT OF INPUT DATA, FOR EXAMPLE GIVEN AN EMAIL DETERMINING HOW LIKELY
THAT IT IS SPAM.
• MODELS CAN USE ONE OR MORE CLASSIFIERS IN TRYING TO DETERMINE THE PROBABILITY OF A SET OF DATA
BELONGING TO ANOTHER SET, SAY SPAM OR 'HAM'.
• DEPENDING ON DEFINITIONAL BOUNDARIES, PREDICTIVE MODELLING IS SYNONYMOUS WITH, OR LARGELY
OVERLAPPING WITH, THE FIELD OF MACHINE LEARNING, AS IT IS MORE COMMONLY REFERRED TO IN ACADEMIC OR
RESEARCH AND DEVELOPMENT CONTEXTS. WHEN DEPLOYED COMMERCIALLY, PREDICTIVE MODELLING IS OFTEN
REFERRED TO AS PREDICTIVE ANALYTICS.
22. WHERE DOES ANALYTICS FIT
ANYWHERE YOU NEED TO MAKE A
DECISION!
TODAY, STATISTICAL METHODS ARE
APPLIED IN ALL FIELDS THAT INVOLVE
DECISION MAKING, FOR MAKING
ACCURATE INFERENCES FROM A COLLATED
BODY OF DATA AND FOR MAKING
DECISIONS IN THE FACE OF UNCERTAINTY
BASED ON STATISTICAL METHODOLOGY.
23. ANALYTICS HAS MANY LEVELS OF
COMPLEXITY
ESTIMATION
THE USE OF STATISTICAL METHODS DATES BACK TO LEAST TO THE 5TH CENTURY BCE.
THE HISTORIAN THUCYDIDES IN HIS HISTORY OF THE PELOPONNESIAN WAR [2]
DESCRIBES HOW THE ATHENIANS CALCULATED THE HEIGHT OF THE WALL OF PLATEA
BY COUNTING THE NUMBER OF BRICKS IN AN UNPLASTERED SECTION OF THE WALL
SUFFICIENTLY NEAR THEM TO BE ABLE TO COUNT THEM. THE COUNT WAS REPEATED
SEVERAL TIMES BY A NUMBER OF SOLDIERS. THE MOST FREQUENT VALUE (IN MODERN
TERMINOLOGY - THE MODE ) SO DETERMINED WAS TAKEN TO BE THE MOST LIKELY
VALUE OF THE NUMBER OF BRICKS. MULTIPLYING THIS VALUE BY THE HEIGHT OF THE
BRICKS USED IN THE WALL ALLOWED THE ATHENIANS TO DETERMINE THE HEIGHT OF
THE LADDERS NECESSARY TO SCALE THE WALLS.
HOW TO MEASURE ANYTHING: FINDING THE VALUE OF INTANGIBLES BY DONALD W
HUBBARD
24. THE CENSUS
THE BIBLICAL STORY OF THE BIRTH OF JESUS WAS SET IN THE CONTEXT OF THE
CENSUS. IN 6 CE PUBLIUS SULPICIUS QUIRINIUS (51 BCE-21 CE), A DISTINGUISHED
SOLDIER AND FORMER CONSUL, WAS APPOINTED IMPERIAL LEGATE (GOVERNOR) OF
THE PROVINCE OF ROMAN SYRIA. IN THE SAME YEAR JUDEA WAS DECLARED A ROMAN
PROVINCE, AND QUIRINIUS WAS TASKED TO CARRY OUT A CENSUS OF THE NEW
TERRITORY FOR TAX PURPOSES.
’ IN THOSE DAYS A DECREE WENT OUT FROM EMPEROR AUGUSTUS THAT ALL THE
WORLD SHOULD BE REGISTERED. THIS WAS THE FIRST REGISTRATION AND WAS TAKEN
WHILE QUIRINIUS WAS GOVERNOR OF SYRIA. ALL WENT TO THEIR OWN TOWNS TO BE
REGISTERED. JOSEPH ALSO WENT FROM THE TOWN OF NAZARETH IN GALILEE TO
JUDEA, TO THE CITY OF DAVID CALLED BETHLEHEM, BECAUSE HE WAS DESCENDED
FROM THE HOUSE AND FAMILY OF DAVID. HE WENT TO BE REGISTERED WITH MARY,
TO WHOM HE WAS ENGAGED AND WHO WAS EXPECTING A CHILD. (LUKE 2:1–7)’
25. SAMPLING
THE TRIAL OF THE PYX IS A TEST OF THE PURITY OF THE COINAGE OF THE ROYAL
MINT WHICH HAS BEEN HELD ON A REGULAR BASIS SINCE THE 12TH CENTURY.
THE TRIAL ITSELF IS BASED ON STATISTICAL SAMPLING METHODS. AFTER MINTING
A SERIES OF COINS - ORIGINALLY FROM TEN POUNDS OF SILVER - A SINGLE COIN
WAS PLACED IN THE PYX - A BOX IN WESTMINSTER ABBEY. AFTER A GIVEN PERIOD
- NOW ONCE A YEAR - THE COINS ARE REMOVED AND WEIGHED. A SAMPLE OF
COINS REMOVED FROM THE BOX ARE THEN TESTED FOR PURITY.
26. THE MEAN AND MEDIAN
• THE ARITHMETIC MEAN, ALTHOUGH A CONCEPT KNOWN TO THE GREEKS, WAS
NOT GENERALIZED TO MORE THAN TWO VALUES UNTIL THE 16TH CENTURY.
THE INVENTION OF THE DECIMAL SYSTEM BY SIMON STEVIN IN 1585 SEEMS
LIKELY TO HAVE FACILITATED THESE CALCULATIONS. THIS METHOD WAS FIRST
ADOPTED IN ASTRONOMY BY TYCHO BRAHE WHO WAS ATTEMPTING TO REDUCE
THE ERRORS IN HIS ESTIMATES OF THE LOCATIONS OF VARIOUS CELESTIAL
BODIES.
• THE IDEA OF THE MEDIAN ORIGINATED IN EDWARD WRIGHT'S BOOK ON
NAVIGATION (CERTAINE ERRORS IN NAVIGATION) IN 1599 IN A SECTION
CONCERNING THE DETERMINATION OF LOCATION WITH A COMPASS. WRIGHT
FELT THAT THIS VALUE WAS THE MOST LIKELY TO BE THE CORRECT VALUE IN A
SERIES OF OBSERVATIONS.
27. DEMOGRAPHY
GAIN UNDERSTANDING COMPLEX SOCIAL PHENOMENA
THE BIRTH OF STATISTICS IS OFTEN DATED TO 1662, WHEN JOHN GRAUNT, ALONG
WITH WILLIAM PETTY, DEVELOPED EARLY HUMAN STATISTICAL AND CENSUS METHODS
THAT PROVIDED A FRAMEWORK FOR MODERN DEMOGRAPHY. HE PRODUCED THE
FIRST LIFE TABLE, GIVING PROBABILITIES OF SURVIVAL TO EACH AGE. HIS BOOK
NATURAL AND POLITICAL OBSERVATIONS MADE UPON THE BILLS OF MORTALITY USED
ANALYSIS OF THE MORTALITY ROLLS TO MAKE THE FIRST STATISTICALLY BASED
ESTIMATION OF THE POPULATION OF LONDON. HE KNEW THAT THERE WERE AROUND
13,000 FUNERALS PER YEAR IN LONDON AND THAT THREE PEOPLE DIED PER ELEVEN
FAMILIES PER YEAR. HE ESTIMATED FROM THE PARISH RECORDS THAT THE AVERAGE
FAMILY SIZE WAS 8 AND CALCULATED THAT THE POPULATION OF LONDON WAS
ABOUT 384,000.
IN 1802 LAPLACE ESTIMATED THE POPULATION OF FRANCE TO BE 28,328,612.[11] HE
CALCULATED THIS FIGURE USING THE NUMBER OF BIRTHS IN THE PREVIOUS YEAR AND
CENSUS DATA FOR THREE COMMUNITIES. THE CENSUS DATA OF THESE COMMUNITIES
SHOWED THAT THEY HAD 2,037,615 PERSONS AND THAT THE NUMBER OF BIRTHS
WERE 71,866. ASSUMING THAT THESE SAMPLES WERE REPRESENTATIVE OF FRANCE,
LAPLACE PRODUCED HIS ESTIMATE FOR THE ENTIRE POPULATION.
28. PREDICT ORBIT OF PLANETS
THE METHOD OF LEAST SQUARES, WHICH WAS USED TO MINIMIZE ERRORS IN DATA
MEASUREMENT, WAS PUBLISHED INDEPENDENTLY BY ADRIEN-MARIE LEGENDRE (1805),
ROBERT ADRAIN (1808), AND CARL FRIEDRICH GAUSS (1809). GAUSS HAD USED THE
METHOD IN HIS FAMOUS 1801 PREDICTION OF THE LOCATION OF THE DWARF PLANET
CERES. THE OBSERVATIONS THAT GAUSS BASED HIS CALCULATIONS ON WERE MADE BY
THE ITALIAN MONK PIAZZI.
A DETAILED ACCOUNT OF THE METHOD USED CAN BE FOUND AT
HTTP://SCIENCE.LAROUCHEPAC.COM/GAUSS/CERES/INTERIMII/ASTRONOMY/KEPLERP
ROBLEM.HTML
29. WISDOM OF CROWDS
FRANCIS GALTON IS CREDITED AS ONE OF THE PRINCIPAL FOUNDERS OF
STATISTICAL THEORY. HIS CONTRIBUTIONS TO THE FIELD INCLUDED
INTRODUCING THE CONCEPTS OF STANDARD DEVIATION, CORRELATION,
REGRESSION AND THE APPLICATION OF THESE METHODS TO THE STUDY OF THE
VARIETY OF HUMAN CHARACTERISTICS - HEIGHT, WEIGHT, EYELASH LENGTH
AMONG OTHERS. HE FOUND THAT MANY OF THESE COULD BE FITTED TO A
NORMAL CURVE DISTRIBUTION.[19]
GALTON SUBMITTED A PAPER TO NATURE IN 1907 ON THE USEFULNESS OF THE
MEDIAN.[20] HE EXAMINED THE ACCURACY OF 787 GUESSES OF THE WEIGHT OF AN
OX AT A COUNTRY FAIR. THE ACTUAL WEIGHT WAS 1208 POUNDS: THE MEDIAN
GUESS WAS 1198. THE GUESSES WERE MARKEDLY NON-NORMALLY DISTRIBUTED.
30. AGRICULTURE
THE SECOND WAVE OF MATHEMATICAL STATISTICS WAS PIONEERED BY RONALD
FISHER WHO WROTE TWO TEXTBOOKS, STATISTICAL METHODS FOR RESEARCH
WORKERS, PUBLISHED IN 1925 AND THE DESIGN OF EXPERIMENTS IN 1935, THAT
WERE TO DEFINE THE ACADEMIC DISCIPLINE IN UNIVERSITIES AROUND THE
WORLD. HE ALSO SYSTEMATIZED PREVIOUS RESULTS, PUTTING THEM ON A FIRM
MATHEMATICAL FOOTING. IN HIS 1918 SEMINAL PAPER THE CORRELATION
BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE, THE
FIRST USE TO USE THE STATISTICAL TERM, VARIANCE. IN 1919, AT ROTHAMSTED
EXPERIMENTAL STATION HE STARTED A MAJOR STUDY OF THE EXTENSIVE
COLLECTIONS OF DATA RECORDED OVER MANY YEARS. THIS RESULTED IN A
SERIES OF REPORTS UNDER THE GENERAL TITLE STUDIES IN CROP VARIATION. IN
1930 HE PUBLISHED THE GENETICAL THEORY OF NATURAL SELECTION WHERE HE
APPLIED STATISTICS TO EVOLUTION.
31. MEDICINE, RELIABILITY, AND JURISPRUDENCE
• THE TERM BAYESIAN REFERS TO THOMAS BAYES (1702–1761), WHO PROVED A SPECIAL CASE
OF WHAT IS NOW CALLED BAYES' THEOREM. HOWEVER IT WAS PIERRE-SIMON LAPLACE (1749–
1827) WHO INTRODUCED A GENERAL VERSION OF THE THEOREM AND APPLIED IT TO
CELESTIAL MECHANICS, MEDICAL STATISTICS, RELIABILITY, AND JURISPRUDENCE.[52].
• AN INTERESTING READ - HTTP://BLOGS.SCIENTIFICAMERICAN.COM/CROSS-CHECK/ARE-
BRAINS-BAYESIAN/
32. A QUICK HISTORY
Time Contributor Contribution
Ancient Greece
Philosophe
rs
Ideas - no quantitative analyses
17th Century
Graunt,
Petty
Pascal,
Bernoulli
studied affairs of state, vital statistics of populations
studied probability through games of chance, gambling
18th Century
Laplace,
Gauss
normal curve, regression through study of astronomy
19th Century
Quetelet
Galton
astronomer who first applied statistical analyses to human biology
studied genetic variation in humans(used regression and correlation)
20th Century
(early)
Pearson
Gossett
(Student)
Fisher
studied natural selection using correlation, formed first academic department of statistics, Biometrika journal, helped develop the Chi Square analysis
studied process of brewing, alerted the statistics community about problems with small sample sizes, developed Student's test
evolutionary biologists - developed ANOVA, stressed the importance of experimental design
20th Century
(later)
Wilcoxon
Kruskal,
Wallis
Spearman
Kendall
Tukey
Dunnett
Keuls
Computer
Technology
biochemist studied pesticides, non-parametric equivalent of two-samples test
economists who developed the non-parametric equivalent of the ANOVA
psychologist who developed a non-parametric equivalent of the correlation coefficient
statistician who developed another non-parametric equivalent the correlation coefficient
statistician who developed multiple comparisons procedure
biochemist who studied pesticides, developed multiple comparisons procedure for control groups
agronomist who developed multiple comparisons procedure
provided many advantages over calculations by hand or by calculator, stimulated the growth of investigation into new techniques
http://www.anselm.edu/homepage/jpitocch/biostatstime.html
33. SO HOW DO YOU DECIDE OF WHICH
ANALYTIC TOOLS TO USE?
WELL ACTUALLY THAT IS THE WRONG QUESTION?
PROCESS IS MUCH MORE IMPORTANT THAN THE TOOLS. THE TOOL/S SHOULD
SUPPORT THE PROCESS
34. TO GAIN BUSINESS UNDERSTANDING/SCOPE
AND PLANNING/ PLAN
THERE IS NO TOOL FOR THIS:
YOU NEED TO RESEARCH:
• UNDERSTAND THE CONTEXT OF YOUR INVESTIGATION
• UNDERSTAND WHAT IS IMPORTANT TO THE BUSINESS/AGENCY/ORG.
• WHAT HAS GONE BEFORE
• WHAT MIGHT BE DONE DIFFERENTLY
• WAS THE INFORMATION YOU HAD ACCESS TO VALID INPUT?
37. DATA UNDERSTANDING/DISCOVERY
THERE ARE SEVERAL TOOLS THAT CAN HELP YOU HERE:
MOST SITES THESE DAYS HAVE REPORTS AND BUSINESS INTELLIGENCE DASHBOARDS THAT WILL
GIVE YOU AN INSIGHT INTO HOW A BUSINESS/AGENCY/ORG SEES ITSELF. GAIN AS MUCH INSIGHT
AS YOU CAN FROM THESE EXISTING PRODUCTS. DON’T ACCEPT THAT THEY ARE THE FULL STORY
– THEY NOT.
EXCEL: USE PIVOT, AND CHARTING TO GAIN A BASIC UNDERSTANDING.
OTHER COMMON TOOLS ARE:
• SAS/VA
• TABLEAU
• QLIK
• SPSS
• SQL
• STATISTICA
• MATLAB
• ETC
38. MODELLING
USE THE APPROPRIATE TOOL FOR YOUR
INVESTIGATION.
• USE THE APPROPRIATE DATA
• USE AN APPROPRIATE METHOD
• ITERATE AND CHECK THAT YOUR RESULTS MAKE SENSE IN THE CONTEXT OF THE COLLECTION, AND
THE QUESTION YOU ARE LOOKING TO ANSWER
• SAYING – TO A CARPENTER THE SOLUTION TO EVERYTHING LOOKS LIKE A NAIL.
• ALL ANALYSTS HAVE THEIR BENT TOWARDS PARTICULAR TOOLS – MINE BENT IS TOWARD THE
MODELING TECHNIQUES USED IN THE SOCIAL SCIENCES BECAUSE THAT IS WHAT I STUDIED. BE
AWARE OF THE LIMITS OF YOUR FAVOURITE TOOLS AND BE WILLING TO LEARN NEW TRICKS.
39. EVALUATING/CHECK/VALIDATION
HAVE A PROCESS AND STANDARDS FOR YOUR ENVIRONMENT THAT LAYS OUT THE
RULES FOR EVALUATING YOUR MODEL. THE STANDARD WILL DEPEND ON THE TOOLS
THAT YOU USE. MOST TOOLS SUCH AS CORRELATION, ANOVA, REGRESSION, ETC.;
HAVE WELL UNDERSTOOD METHODS OF EVALUATION. HOWEVER CHECK THE WHOLE
PROCESS AND IF YOU WANT TO USE THIS MODEL HAVE YOUR TEAM REVIEW AS WELL.
WE ALL LIKE TO THINK WE NEVER MAKE MISTAKES; UNFORTUNATELY THAT IS NEVER
TRUE.
MAKE SURE THAT THE PROCESS INCLUDES SOME SANITY CHECK METHODS. IE THAT
THE NUMBER OF ROWS/OBSERVATIONS THAT WERE READ IS WHAT YOU EXPECTED.
40. DEPLOYMENT/ACT
AFTER EVALUATING AND VALIDATING YOUR MODEL IT IS OFTEN ’DEPLOYED’ TO
OPERATIONAL SYSTEMS AND REPORTS.
SCORING: OFTEN THE OUTPUT OF THE MODEL WILL BE SCORE THAT USED AS INPUT TO
OPERATIONAL SYSTEMS. EG, ESTIMATES FINANCIAL RISK, TRAVEL TIME, FUEL
CONSUMPTION, RESOURCE REQUIREMENT, AND MEDICAL OUTCOMES.
PARAMETER TO REPORTING: INTEGRATION INTO BUSINESS INTELLIGENCE
DASHBOARDS, AND REGULAR MANAGEMENT INFORMATION SYSTEM REPORTS.
OPERATIONAL SYSTEMS: MODELS PROVIDE INPUT TO ALL MANNER OF OPERATIONAL
SYSTEMS RANGING FROM PRODUCTION CONTROL PROCESSING, LOGISTICS, FRAUD
DETECTION, SYSTEMS MANAGEMENT, AND TRAFFIC CONTROL.
41. BUSINESS UNDERSTANDING/REPORT
ALL ANALYTICS IS UNDERTAKEN WITHIN A GIVEN CONTEXT. IN A RESEARCH
CONTEXT A PAPER WILL BE THE OUTCOME WITH AN ABSTRACT, BACKGROUND,
METHODS, RESULTS, CONCLUSION. IN A COMMERCIAL SETTING ANY FINDING (IN
MY LIMITED EXPERIENCE) ARE REPORTED IN A VERY SIMILAR MANNER.
REGARDLESS THE OUTCOMES OF THE ANALYTICS PROCESS SHOULD BE
DOCUMENTED AND ADDED TO THE COLLECTIVE STORE OF BUSINESS KNOWLEDGE
AT YOUR SITE.
43. MONITOR/REVIEW/REPEAT
KNOWLEDGE IS NOT STATIC. THERE ARE THE THINGS YOU KNOW ARE GOING TO
HAPPEN AND
THERE ARE NEW FACTORS THAT YOU WILL NOT HAVE THOUGHT OF.
BOX ‘FOR SUCH A MODEL THERE IS NO NEED TO ASK THE QUESTION "IS THE MODEL
TRUE?". IF "TRUTH" IS TO BE THE "WHOLE TRUTH" THE ANSWER MUST BE "NO". THE
ONLY QUESTION OF INTEREST IS "IS THE MODEL ILLUMINATING AND USEFUL?”’
IN SHORT ‘ALL MODELS ARE WRONG, SOME ARE USEFUL'
MONITOR: COMPARE THE ACTUAL PERFORMANCE OF THE MODELS YOU PRODUCE
AGAINST EXPECTED/PLANNED PERFORMANCE. BE PREPARED TO PROCEED WITH A
PROCESS OF CONTINUAL IMPROVEMENT.
44. CONCLUSION
WHAT IS ANALYTICS - ANY DATA DRIVEN PROCESS THAT PROVIDES INSIGHT
WHERE DOES IT(ANALYTICS) FIT – ANYWHERE WE YOU NEED TO MAKE A DECISION
45. A FEW ANALYTICS TOOLS
• THE 40 DATA SCIENCE
TECHNIQUES
1 LINEAR REGRESSION
2 LOGISTIC REGRESSION
3 JACKKNIFE REGRESSION *
4 DENSITY ESTIMATION
5 CONFIDENCE INTERVAL
6 TEST OF HYPOTHESES
7 PATTERN RECOGNITION
8 CLUSTERING - (AKA
UNSUPERVISED LEARNING)
9 SUPERVISED LEARNING
1 TIME SERIES
1 DECISION TREES
1 RANDOM NUMBERS
1 MONTE-CARLO SIMULATION
1 BAYESIAN STATISTICS
1 NAIVE BAYES
1Principal Component Analysis - (PCA)
1Ensembles
1Neural Networks
1Support Vector Machine - (SVM)
2Nearest Neighbors - (k-NN)
2Feature Selection - (aka Variable
Reduction)
2Indexation / Cataloguing *
2(Geo-) Spatial Modelling
2Recommendation Engine *
2Search Engine *
2Attribution Modelling *
2Collaborative Filtering *
2Rule System
2Linkage Analysis
3Association Rules
3Scoring Engine
3Segmentation
3Predictive Modelling
3Graphs
3Deep Learning
3Game Theory
3Imputation
3Survival Analysis
3Arbitrage
4Lift Modelling
4Yield Optimization
4Cross-Validation
4Model Fitting
46. SOME THINGS TO CHECK OUT
INFORMATIVE
HTTP://WWW.KDNUGGETS.COM
HTTP://WWW.PREDICTIVEANALYTICSTODAY.COM/DEPLOYMENT-PREDICTIVE-MODELS
BOOKS
HTTP://SHOP.OREILLY.COM/CATEGORY/EBOOKS.DO
COOL
HTTPS://RAPIDMINER.COM
XPATH CAPABILITIES FOR WEB SCRAPING USING GOOGLE DOCS
HTTP://NODEXL.CODEPLEX.COM
HTTPS://D3JS.ORG
HTTP://WWW.FACULTY.UCR.EDU/~HANNEMAN/NETTEXT/ (SOCIAL NETWORK ANALYSI)
HTTPS://WWW.KAGGLE.COM
EDUCATION – CHEAP AND AT WHATEVER PACE YOU WANT TO TAKE
HTTPS://WWW.UDEMY.COM
ACADEMIC EDUCATION
IN CANBERRA BOTH THE ANU AND CU HAVE GOOD COURSES
AND
USQ HAS EXCELLENT COURSES AS – SO DO A LOT OF OTHERS
I WAS ASKED WHO TO FOLLOW ON TWITTER, FOLLOW TRY JUST SEARCH FOR DATA SCIENCE, AND ANALYTICS
AND CHOOSE WHO TO FOLLOW.. ALSO FOLLOW THE JOURNALS, NATURE, AND OTHERS.
Editor's Notes
Hackers and Analysts have a lot in common they are both curious and looking for patterns. The hacker can be a lawful, or unlawful. The unlawful aspects draw more interest. Not sure quite where wikileeaks sits in that. Use your own judgement.
Money Ball. You know analytics is main stream when Brad Pitt plays the part of an analyst.
Analysts find patterns. PS D3 is cool.
Knowledge increases.
Our understanding of everything changes over time. Therefore it is imperative for the good analyst to show how they came to the “knowledge” outcome so that the knowledge can be improved over time.
This is one level of analysis. So far I think the world was going to end in 2012, 2015, in May this year, then June, and August.
If they are talking about Planet X. Well its orbit will pass well beyond Pluto so no need for a bunker just yet. The closest thing is Asteroid 2016 HO3. http://neo.jpl.nasa.gov/news/news192.html
The Hedgehog and the Fox is an essay by philosopher Isaiah Berlin. It was one of Berlin's most popular essays with the general public. Berlin himself said of the essay: "I never meant it very seriously. I meant it as a kind of enjoyable intellectual game, but it was taken seriously. Every classification throws light on something."[1]
Berlin expands upon this idea to divide writers and thinkers into two categories: hedgehogs, who view the world through the lens of a single defining idea (examples given include Plato, Lucretius, Dante, Pascal, Hegel, Dostoevsky, Nietzsche, Ibsen, Proust, and Fernand Braudel) and foxes who draw on a wide variety of experiences and for whom the world cannot be boiled down to a single idea (examples given include Herodotus, Aristotle, Erasmus, Shakespeare, Montaigne, Molière, Goethe, Pushkin, Balzac, Joyce, Anderson).
In his 2012 New York Times best-selling book The Signal and the Noise, forecaster Nate Silver urges readers to be "more foxy" after summarizing Berlin's distinction. He cites the work of Philip Tetlock on the accuracy of political forecasts in the United States during the Cold War while he was a professor of political science at the University of California, Berkeley. Silver's news website fivethirtyeight.com, when launched in March 2014, also adopted the fox as its logo "as an allusion to" Archilochus' original work.[7]
The mythical Data Scientist is considered a blend of many talents:
Math, Stats, Algorithms;
Software Engineering; and
Data Communications
The Term Data Analyst is the blending of Data Communications (communicating results – in simpler terms research) and Math Stats etc
I am qualified, industry certified and experienced in all these skills, Math, Stats, Algorithms, Research (Data Communications). Still I would not call myself a Data Scientist.
This is what Data Analyst Job Specification will look like.
The bulk of I have been asked to do has be Reporting; some Segmentation, a little bit of trend analysis, and even less predictive modelling.
In Canberra this is the usual domain of the Data Analyst.
This is the primary domain for the Agency reporting, the ABS and AIHW
Trend Analysis is general an outcome from Time Series analysis
Segmentation is usually a term used in association with marketing; however the same techniques can are used to segment population groups that are targeted by social policy
Detection modelling signal detection theory is a means to quantify the ability to discern between information-bearing patterns (called stimulus in living organisms, signal in machines) and random patterns that distract from the information (called noise, consisting of background stimuli and random activity of the detection machine and of the nervous system of the operator). In the field of electronics, the separation of such patterns from a disguising background is referred to as signal recovery.[1]
Machine learning is a subfield of computer science[1] (more particularly soft computing) that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.[1] In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed".[2] Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.[3] Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs,[4]:2 rather than following strictly static program instructions.
Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.[1][2]
Predictive analytics is used in actuarial science,[4] marketing,[5] financial services,[6] insurance, telecommunications,[7] retail,[8] travel,[9] healthcare,[10] child protection,[11][12] pharmaceuticals,[13] capacity planning[citation needed] and other fields.
One of the most well known applications is credit scoring,[1] which is used throughout financial services. Scoring models process a customer's credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time.
.
The simplest measure used in analytics is a count.
Why do you need to count.
The know who and how much tax you should expect (Caesar , William the Conquer, and Incas)
These days the counts are used in many different ways:
Planning for education, health, transport, and financial expectations.
Mean is the average One of the most useful and misused measures of all.
Median is midpoint – Often in Australian we talk about the median house price of say $500K in Canberra. This median price means half the houses were less expensive and half were more. The median is a very powerful tool
Adolphe Quetelet (1796–1874), another important founder of statistics, introduced the notion of the "average man" (l'homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates, and suicide rates.[14]
A detailed account of method @ http://science.larouchepac.com/gauss/ceres/InterimII/Astronomy/KeplerProblem.html
A detailed accouThe Method of Least squares is a simple linear regression.
Legrende
“Of all the principles which can be proposed for [making estimates from a sample], I think there is none more general, more exact, and more easy of application, than that of which we have made use… which consists of rendering the sum of the squares of the errors a minimum
Non-linear Regression -The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations.
In raw score form the regression equation is:
Y=a+B1X1+B2X2 ..BkXk + e
Regression for fitting a "true relationship". In standard regression analysis, that leads to fitting by least squares, there is an implicit assumption that errors in the independent variable are zero or strictly controlled so as to be negligible. When errors in the independent variable are non-negligible, models of measurement error can be used; such methods can lead to parameter estimates, hypothesis testing and confidence intervals that take into account the presence of observation errors in the independent variables.
Nate Silver – Took the all public polls; weighted the results based on previous bias and successfully called the previous presidential election way before anyone else
Note at this time the betting market have donald trump with only 26% chance of winning.
Business Analysis
To investigate business systems, taking a holistic view of the situation. This may include examining elements of the organisation structures and staff development issues as well as current processes and IT systems.
To evaluate actions to improve the operation of a business system. Again, this may require an examination of organisational structure and staff development needs, to ensure that they are in line with any proposed process redesign and IT system development.
To document the business requirements for the IT system support using appropriate documentation standards.
Research understand the context of your research through “Literature Review”
Ie what do we already know about the subject matter of interest/question
Is the literature available stand up; ie does the question make sense, was the data used valid for the purpose, was method valid and appropriate; does the conclusion/finding stand up; is it repeatable, That last one is an important one Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science. These finding were found to be repeatable across the whole spectrum of sciences. This is almost certainly true of analytics in the workplace. A number of times I have been asked to work out why a researcher was was getting different numbers to the organisation; my first question to the researcher was where did you get your data from. Invariably they would point me to some dataset with a name like FY9697A112. Ok where did that come from and how was that dereived; in all cases we would quickly reach apoint where they would give up. They could not show the provence of their data. Therefore the process they had gone through had no meaning or value.
Also beware the Business Owner who comes to you saying they want to prove ‘XXXX’, or we don’t like those numbers.
In all cases I have worked on in the last 30 years I have been accessing existing administrative data
Almost always the bulk of this already exists. Is you are told it doesn’t exist. Start building; however keep an eye open because it almost certainly does and you don’t want rebuild all this if it does exist. Always use existing data if possible. Recognise thatr all collections have issues and are a work in progress. However it is almost always better to fix the existing than start anew.
90% of any analytics is/should be the establishing a well understood and documented datastore where the build of any measure can be achieved from first principles. If you can’t build from 1st principles then don’t start any analysis .
Data understanding has been known many names data discovery data exploration
In the SAS world this normally means a Proc Freq, and Proc Summary.
In R Describe
In SPSS –FREQUENCIES, DESCRIPTIVES, EXAMINE
As always document your review and add to your standards if required; however keep it brief. Reviews are not a place where bullying should tolerated. People learn best in an environment where they can make mistakes and recover. It is a means for learning, for imparting and gaining knowledge and the process is shared by the team to build a team’s coherence.
No-one can be good at all these methods. I know a few