Data analytics portfolio demonstrates applying technical skills, a thorough comprehension of statistical analysis, and good data visualization talents.
Statistics can be used to describe patterns but need context to avoid being misleading. While averages, measures of spread, and probabilities help summarize data, graphs are better to show trends over time. Pie charts, bar graphs, and maps can effectively visualize data geographically or by category when formatted properly. Advanced statistical software and websites provide cutting-edge tools for analysis and interactive graphics but improper use can result in poor statistical reasoning.
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
If you ask people what BIG DATA is they often say it is about a lot of data. But the world has ALWAYS had a lot of data. It is about datafication – a word so new even spellcheck functions don’t know it is a real word!
Learn more about:
» How BIG DATA changes career paths of even the most unsuspecting?
» How BIG DATA changes the way business decision are made?
» How BIG DATA changes who makes those decisions & the reshuffle of the balance of power it causes?
» What BIG DATA skills can you bring to the office tomorrow to increase your value to the firm
Presentation on the uses & misues of data, embracing illustrations & examples, as presented to the Numis Securities Media Conference in London April 2011
The Art of Storytelling Using Data ScienceGramener
Gramener's VP - Sales, APAC Region, Vijayam Sirikonda interacted with the students of IIM Raipur and talked about the importance of data storytelling for business users.
This document outlines an analysis of health insurance rate data from Healthcare.gov to identify key factors that influence individual rates. The analysis included downloading nationwide data from Healthcare.gov, selecting Delaware data, cleaning the data, and performing various analyses including decision trees, partial least squares, and neural networks. The analysis found that age, insurance plan version number (whether a plan was marked up or down), and insurance issuer were the most significant factors in determining individual health insurance rates in Delaware.
Hedge Fund case study solution - Credit default swaps execution system and Gr...Naveen Kumar
I designed the entire end-to-end trading architecture of a hedge fund.
The execution system for integrating a fund with Credit default swap capabilities and also solved Hedge fund's liquidity constraint in moving funds across the countries.
Graphic Representation Grading GuideCOMTM541 Version 22.docxwhittemorelucilla
The document provides a grading guide for an assignment that requires students to create two graphs or tables from the same data set that tell different stories. The guide outlines the criteria that will be used to evaluate the students' work, including choosing an appropriate data source, creating graphs that illustrate opposing stories, comparing the results, and addressing ethical implications. It also provides guidelines for formatting, citations, grammar, and other writing conventions. The overall purpose is for students to think critically about how data can be represented in a way that tells different or misleading stories, and to analyze graphs by considering the motivation and perspective of their creators.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.Francesco D'Orazio
"Big data" has been around for a few years now but for every hundred people talking about it there’s probably only one actually doing it. As a result Big Data has become the preferred vehicle for inflated expectations and misguided strategy.
As always, language holds the key and the seed of the issue is reflected in the expression itself. "Big Data" is not so much about a quality of the data or the tools to mine it, it’s about a new approach to product, policy or business strategy design. And that’s way harder and trickier to implement than any new technology stack.
In this talk I look at where Big Data is going, what are the real opportunities, limitations and dangers and what can we do to stop talking about it and start doing it today.
Statistics can be used to describe patterns but need context to avoid being misleading. While averages, measures of spread, and probabilities help summarize data, graphs are better to show trends over time. Pie charts, bar graphs, and maps can effectively visualize data geographically or by category when formatted properly. Advanced statistical software and websites provide cutting-edge tools for analysis and interactive graphics but improper use can result in poor statistical reasoning.
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
If you ask people what BIG DATA is they often say it is about a lot of data. But the world has ALWAYS had a lot of data. It is about datafication – a word so new even spellcheck functions don’t know it is a real word!
Learn more about:
» How BIG DATA changes career paths of even the most unsuspecting?
» How BIG DATA changes the way business decision are made?
» How BIG DATA changes who makes those decisions & the reshuffle of the balance of power it causes?
» What BIG DATA skills can you bring to the office tomorrow to increase your value to the firm
Presentation on the uses & misues of data, embracing illustrations & examples, as presented to the Numis Securities Media Conference in London April 2011
The Art of Storytelling Using Data ScienceGramener
Gramener's VP - Sales, APAC Region, Vijayam Sirikonda interacted with the students of IIM Raipur and talked about the importance of data storytelling for business users.
This document outlines an analysis of health insurance rate data from Healthcare.gov to identify key factors that influence individual rates. The analysis included downloading nationwide data from Healthcare.gov, selecting Delaware data, cleaning the data, and performing various analyses including decision trees, partial least squares, and neural networks. The analysis found that age, insurance plan version number (whether a plan was marked up or down), and insurance issuer were the most significant factors in determining individual health insurance rates in Delaware.
Hedge Fund case study solution - Credit default swaps execution system and Gr...Naveen Kumar
I designed the entire end-to-end trading architecture of a hedge fund.
The execution system for integrating a fund with Credit default swap capabilities and also solved Hedge fund's liquidity constraint in moving funds across the countries.
Graphic Representation Grading GuideCOMTM541 Version 22.docxwhittemorelucilla
The document provides a grading guide for an assignment that requires students to create two graphs or tables from the same data set that tell different stories. The guide outlines the criteria that will be used to evaluate the students' work, including choosing an appropriate data source, creating graphs that illustrate opposing stories, comparing the results, and addressing ethical implications. It also provides guidelines for formatting, citations, grammar, and other writing conventions. The overall purpose is for students to think critically about how data can be represented in a way that tells different or misleading stories, and to analyze graphs by considering the motivation and perspective of their creators.
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.Francesco D'Orazio
"Big data" has been around for a few years now but for every hundred people talking about it there’s probably only one actually doing it. As a result Big Data has become the preferred vehicle for inflated expectations and misguided strategy.
As always, language holds the key and the seed of the issue is reflected in the expression itself. "Big Data" is not so much about a quality of the data or the tools to mine it, it’s about a new approach to product, policy or business strategy design. And that’s way harder and trickier to implement than any new technology stack.
In this talk I look at where Big Data is going, what are the real opportunities, limitations and dangers and what can we do to stop talking about it and start doing it today.
Statistics is a mathematical science including methods of collecting, organizing, and analyzing data in such a way that meaningful conclusions can be drawn from them. In general, its investigations and analyses fall into two broad categories called descriptive and inferential statistics.
AMES 2016 - The Human Side of AnalyticsStephen Tracy
The document provides 10 tips for analytics success. It discusses the importance of asking good questions to gain insights, thinking long-term about building an analytics program, starting with investing in people over technology, seeking truth over validating preconceptions from data, understanding data limitations, ensuring ownership of the analytics function, investing in storytellers to communicate insights, finding meaningful ways to visualize data, and transforming data into actionable insights.
This document discusses various visualizations and analyses created in TIBCO Spotfire using different public datasets. Some key points:
- Maps showing park bench locations in Rostock, Germany and combining layers from different WMS resources.
- Unemployment trends in Germany by gender with Holt-Winters forecasting.
- How Germans voted in the 2014 European elections.
- K-means clustering distinguishing east and west German states using census housing data.
- Hierarchical clustering of BI product usage patterns from a survey.
- Scraping privacy data from a website directly into Spotfire using import.io.
- Double WMS layer map showing German population density overlaid on rivers.
- Text
Cutting Edge Predictive Analytics with Eric Siegel Databricks
Apache Spark empowers predictive analytics and machine learning by increasing the reach and potential. But, before jumping to new deployments, it’s critical we 1) get the analytics right and 2) not overlook less conspicuous business opportunities. In this keynote, Predictive Analytics World founder and “Predictive Analytics” author Eric Siegel ramps you up on a dangerous pitfall and a critical value proposition:
– PITFALL: Avoiding BS predictive insights, i.e., “bad science,” spurious discoveries
– OPPORTUNITY: Optimizing marketing persuasion by predicting the *influence* of marketing treatments, i.e., uplift modeling
Big data provides an unprecedented opportunity to predict consumer behavior through the longitudinal and cross-sectional analysis of vast time series data. However, the inherent randomness of human behavior poses a limiting factor, and while marginal gains can be made through big data, breakthroughs may remain elusive as long as human behavior stays inconsistent, impulsive, and dynamic. The biggest impact of big data will be creating new areas like personalized medicine, improved customer service, and powering artificial intelligence through vast data analysis to understand and anticipate human behavior.
This document provides an overview of course materials for QNT/275 Statistics for Decision Making. It includes prompts for weekly assignments that involve defining statistics, distinguishing quantitative and qualitative data, describing data measurement levels and the role of statistics in business decision making. It also includes a business scenario and data set for analysis, as well as sample quiz questions covering topics like data types, measures of central tendency, probability, and random variables.
Data surrounds us. In business and personal life. At work and on the go. But how do we make sense of it, or more specifically, how do we allow others to make sense of it. Learn how to deliver data ... <reports>.
Storyfying your Data: How to go from Data to Insights to StoriesGramener
Gramener's Director - Client success, Shravan Kumar A, delivered an online session to the students of Praxis Business School.
In his session he talked about how converting data into stories can benefit businesses and enable quick decision making. Furthermore, he shared approaches to create data stories along with some use cases and case studies we solved at Gramener to benefit our clients.
Check out our initiative to teach data storytelling to data scientists and analysts so that they can think out of the box and create wonderful data stories for their stakeholders: https://gramener.com/data-storytelling-workshop
This document discusses the role of statistics in business decision making. It describes descriptive statistics, which presents data in a way that is easier to understand through charts and graphs. Descriptive statistics measures central tendency and the spread of data using metrics like mean, median, mode, range, and standard deviation. The document also covers inferential statistics, which analyzes data samples to estimate parameters and test hypotheses. Examples are given of how statistics are used in various business contexts like Wall Street analysis and clothing design to draw conclusions from raw data and inform future decisions.
WUD2008 - The Numbers Revolution and its Effect on the WebRich Miller
The document discusses how the "numbers revolution" is affecting the web and user experience design through increased data collection and analysis. It covers how more data availability and analysis tools are enabling new types of applications for decision support, personalization, prediction and visualization. This is changing how people access and think about information by augmenting human cognition with computer analysis. The document provides many examples of current and emerging applications that utilize these approaches in areas like business, health, sports and media.
This document discusses the relevance and implications of forecasting retail deposits. Forecasting retail deposits involves analyzing macroeconomic data to build models that can accurately predict future deposit levels given economic conditions. Accurately forecasting deposits is important for banks to inform strategic planning and decisions around operations, technology, and infrastructure needs. The implications of deposit forecasting are discussed from social and philosophical perspectives, including how forecasting stems from humans' innate desire to understand and prepare for an uncertain future.
The most profitable insurance organizations will outperform competitors in key areas as personalized customer service, claims processing, subrogation recovery, fraud detection and product innovation. This requires thinking beyond the traditional data warehouse to the data fabric - an emerging data management architecture.
In this webinar Andy Sohn, Senior Advisor at NewVantage Partners, and Bob Parker, Senior Director for Insurance at Cambridge Semantics, explore the role of the data discovery and integration layer in an enterprise data fabric for the Insurance industry. These are their slides.
Page 579Assess the Constituent Data. What is included Omi.docxbunyansaturnina
Page 579
Assess the Constituent Data. What is included? Omitted? What are the data based on?
What assumptions are being made? Different retirement calculators give widely different
estimates of how much savings is needed for retirement because of factors they include or omit
(such as entertainment) and assumptions they make (such as inflation rate or healthiness of
annuities and mutual funds) in the calculations.
Two reputable sources can give different figures because they take their data from different
places. Suppose you wanted to know employment figures. The Labor Department’s monthly
estimate of nonfarm payroll jobs is the most popular, but some economists like Automatic Data
Processing’s monthly estimate, which is based on the roughly 20 million paychecks it processes
for clients. Both survey approximately 400,000 workplaces, but the Labor Department selects
employers to mirror the U.S. economy, while ADP’s sample is skewed, with too many
construction firms and too few of the largest employers. On the other hand, the government has
trouble counting jobs at businesses that are opening and closing, and some employers do not
return the survey. (Both organizations do attempt to adjust their numbers to compensate
accurately.)7
Check the Currency of the Data. Population figures should be from the 2010 census, not
the 2000 one. Technology figures in particular need to be current. Do remember, however, that
some large data sets are one to two years behind in being analyzed. Such is the case for some
government figures, also. If you are doing a report in 2014 that requires national education data
from the Department of Education, for instance, 2013 data may not even be fully collected. And
even the 2012 data may not be fully analyzed, so indeed the 2011 data may be the most current
available.
Hard to Quantify Sports Participation
How many people participate In sports, and which sports do they choose?
Governments and equipment makers want to know, but the data are fuzzy. Multiple
questions contribute to the lack of clarity.
What is a sport? One survey includes bird-watching.
Who should be counted? Do young children count?
How often do you have to participate in a sport to be counted? Is once a year enough?
How was the count made? Because younger and more active people tend to have only cell phones, a survey
made through landlines probably won’t be accurate.
In case you are curious, the National Sporting Goods Association survey says hiking is the most popular
participation sport in the United States, with over 40 million people.
Adapted from Carl Bialik, “Sports Results that Leave Final Score Unclear,” Wall Street Journal, June 9, 2012, A2.
Choosing the Best Data
Sometimes even good sources and authorities can differ on the numbers they offer, or on the
PRINTED BY: SHERIFAT EGBERONGBE <[email protected]>. Printing is for personal, private use only. No part of this book may be reproduced or transmitted witho.
About Your Signature AssignmentThis signature assignment is de.docxransayo
About Your Signature Assignment
This signature assignment is designed to align with specific program student learning outcome(s) in your program. Program Student Learning Outcomes are broad statements that describe what students should know and be able to do upon completion of their degree. The signature assignments might be graded with an automated rubric that allows the University to collect data that can be aggregated across a location or college/school and used for program improvements.
Purpose of Assignment
The purpose of this assignment is for students to synthesize the concepts learned throughout the course. This assignment will provide students an opportunity to build critical thinking skills, develop businesses and organizations, and solve problems requiring data by compiling all pertinent information into one report.
Assignment Steps
Resources: Microsoft Excel®, Signature Assignment Databases, Signature Assignment Options, Part 3: Inferential Statistics
Scenario: Upon successful completion of the MBA program, say you work in the analytics department for a consulting company. Your assignment is to analyze one of the following databases:
· Manufacturing
· Hospital
· Consumer Food
· Financial
Select one of the databases based on the information in the Signature Assignment Options.
Provide a 12 paragraphs each 150 words statistical report including the following:
· Explain the context of the case
· Provide a research foundation for the topic
· Present graphs
· Explain outliers
· Prepare calculations
· Conduct hypotheses tests
· Discuss inferences you have made from the results
This assignment is broken down into four parts:
· Part 1 - Preliminary Analysis
· Part 2 - Examination of Descriptive Statistics
· Part 3 - Examination of Inferential Statistics
· Part 4 - Conclusion/Recommendations
Part 1 - Preliminary Analysis (3-4 paragraphs)
Generally, as a statistics consultant, you will be given a problem and data. At times, you may have to gather additional data. For this assignment, assume all the data is already gathered for you.
State the objective:
· What are the questions you are trying to address?
Describe the population in the study clearly and in sufficient detail:
· What is the sample?
Discuss the types of data and variables:
· Are the data quantitative or qualitative?
· What are levels of measurement for the data?
Part 2 - Descriptive Statistics (3-4 paragraphs)
Examine the given data.
Present the descriptive statistics (mean, median, mode, range, standard deviation, variance, CV, and five-number summary).
Identify any outliers in the data.
Present any graphs or charts you think are appropriate for the data.
Note: Ideally, we want to assess the conditions of normality too. However, for the purpose of this exercise, assume data is drawn from normal populations.
Part 3 - Inferential Statistics (2-3 paragraphs)
Use the Part 3: Inferential Statistics document.
· Create (formulate) hypotheses
· Run formal hyp.
Data Storytelling - Game changer for Analytics Gramener
This document discusses the importance of data storytelling and provides recommendations for data leaders. It argues that data stories are more memorable and impactful than raw data or facts. The document outlines four patterns for telling data stories and provides examples. It recommends that organizations embed design skills, automate storytelling, and embrace storytelling as part of the data insights process. Telling data stories can help people really understand data intuitively and aid decision making.
Introduction to Descriptive & Predictive AnalyticsDilum Bandara
This document provides an introduction to descriptive and predictive analytics. It discusses key concepts including descriptive analytics which uses data aggregation and mining to provide insights into past data, predictive analytics which uses statistical models and forecasts to understand the future, and prescriptive analytics which uses optimization and simulation to advise on possible outcomes. The document also reviews basic statistical concepts such as measures of location, dispersion, shape, and association that are important for data analytics. These concepts include mean, median, standard deviation, skewness, kurtosis, and correlation.
This document discusses the four main types of analytics: descriptive, diagnostic, predictive, and prescriptive. Descriptive analytics answers the question "What happened?" by summarizing past data. Diagnostic analytics answers "Why did this happen?" by analyzing data to determine causes of trends. Predictive analytics answers "What might happen in the future?" by using statistics and modeling to predict outcomes. Prescriptive analytics answers "What should we do next?" by recommending actions based on predictive analytics. The document provides examples of each type.
Early Lessons Learned in Applying Big Data To TV AdvertisingJeff Storan
This document discusses how Simulmedia is applying big data techniques to television advertising. It summarizes that Simulmedia has assembled a large set of television viewing data through partnerships. It uses this data and data science techniques to sell targeted television ads, gaining insights into audience fragmentation and how to better reach audiences. It also discusses some challenges in working with television data and lessons learned around quality control, the value of more data, and showing addressable TV ads can be effective.
Early Lessons Learned in Applying Big Data To TV AdvertisingJeffrey Storan
- Simulmedia is a startup that uses big data to target TV advertising. They have assembled the world's largest set of actionable television data through partnerships with major data providers.
- Their data set includes over 200 terabytes of information on 113 million daily events and 400,000 weekly ads, which they use advanced statistical techniques and machine learning to analyze.
- Their analysis shows that while audiences are fragmenting across many channels, simple algorithms applied at large scale using their extensive data can better predict audience movement and interests than other existing tools.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Statistics is a mathematical science including methods of collecting, organizing, and analyzing data in such a way that meaningful conclusions can be drawn from them. In general, its investigations and analyses fall into two broad categories called descriptive and inferential statistics.
AMES 2016 - The Human Side of AnalyticsStephen Tracy
The document provides 10 tips for analytics success. It discusses the importance of asking good questions to gain insights, thinking long-term about building an analytics program, starting with investing in people over technology, seeking truth over validating preconceptions from data, understanding data limitations, ensuring ownership of the analytics function, investing in storytellers to communicate insights, finding meaningful ways to visualize data, and transforming data into actionable insights.
This document discusses various visualizations and analyses created in TIBCO Spotfire using different public datasets. Some key points:
- Maps showing park bench locations in Rostock, Germany and combining layers from different WMS resources.
- Unemployment trends in Germany by gender with Holt-Winters forecasting.
- How Germans voted in the 2014 European elections.
- K-means clustering distinguishing east and west German states using census housing data.
- Hierarchical clustering of BI product usage patterns from a survey.
- Scraping privacy data from a website directly into Spotfire using import.io.
- Double WMS layer map showing German population density overlaid on rivers.
- Text
Cutting Edge Predictive Analytics with Eric Siegel Databricks
Apache Spark empowers predictive analytics and machine learning by increasing the reach and potential. But, before jumping to new deployments, it’s critical we 1) get the analytics right and 2) not overlook less conspicuous business opportunities. In this keynote, Predictive Analytics World founder and “Predictive Analytics” author Eric Siegel ramps you up on a dangerous pitfall and a critical value proposition:
– PITFALL: Avoiding BS predictive insights, i.e., “bad science,” spurious discoveries
– OPPORTUNITY: Optimizing marketing persuasion by predicting the *influence* of marketing treatments, i.e., uplift modeling
Big data provides an unprecedented opportunity to predict consumer behavior through the longitudinal and cross-sectional analysis of vast time series data. However, the inherent randomness of human behavior poses a limiting factor, and while marginal gains can be made through big data, breakthroughs may remain elusive as long as human behavior stays inconsistent, impulsive, and dynamic. The biggest impact of big data will be creating new areas like personalized medicine, improved customer service, and powering artificial intelligence through vast data analysis to understand and anticipate human behavior.
This document provides an overview of course materials for QNT/275 Statistics for Decision Making. It includes prompts for weekly assignments that involve defining statistics, distinguishing quantitative and qualitative data, describing data measurement levels and the role of statistics in business decision making. It also includes a business scenario and data set for analysis, as well as sample quiz questions covering topics like data types, measures of central tendency, probability, and random variables.
Data surrounds us. In business and personal life. At work and on the go. But how do we make sense of it, or more specifically, how do we allow others to make sense of it. Learn how to deliver data ... <reports>.
Storyfying your Data: How to go from Data to Insights to StoriesGramener
Gramener's Director - Client success, Shravan Kumar A, delivered an online session to the students of Praxis Business School.
In his session he talked about how converting data into stories can benefit businesses and enable quick decision making. Furthermore, he shared approaches to create data stories along with some use cases and case studies we solved at Gramener to benefit our clients.
Check out our initiative to teach data storytelling to data scientists and analysts so that they can think out of the box and create wonderful data stories for their stakeholders: https://gramener.com/data-storytelling-workshop
This document discusses the role of statistics in business decision making. It describes descriptive statistics, which presents data in a way that is easier to understand through charts and graphs. Descriptive statistics measures central tendency and the spread of data using metrics like mean, median, mode, range, and standard deviation. The document also covers inferential statistics, which analyzes data samples to estimate parameters and test hypotheses. Examples are given of how statistics are used in various business contexts like Wall Street analysis and clothing design to draw conclusions from raw data and inform future decisions.
WUD2008 - The Numbers Revolution and its Effect on the WebRich Miller
The document discusses how the "numbers revolution" is affecting the web and user experience design through increased data collection and analysis. It covers how more data availability and analysis tools are enabling new types of applications for decision support, personalization, prediction and visualization. This is changing how people access and think about information by augmenting human cognition with computer analysis. The document provides many examples of current and emerging applications that utilize these approaches in areas like business, health, sports and media.
This document discusses the relevance and implications of forecasting retail deposits. Forecasting retail deposits involves analyzing macroeconomic data to build models that can accurately predict future deposit levels given economic conditions. Accurately forecasting deposits is important for banks to inform strategic planning and decisions around operations, technology, and infrastructure needs. The implications of deposit forecasting are discussed from social and philosophical perspectives, including how forecasting stems from humans' innate desire to understand and prepare for an uncertain future.
The most profitable insurance organizations will outperform competitors in key areas as personalized customer service, claims processing, subrogation recovery, fraud detection and product innovation. This requires thinking beyond the traditional data warehouse to the data fabric - an emerging data management architecture.
In this webinar Andy Sohn, Senior Advisor at NewVantage Partners, and Bob Parker, Senior Director for Insurance at Cambridge Semantics, explore the role of the data discovery and integration layer in an enterprise data fabric for the Insurance industry. These are their slides.
Page 579Assess the Constituent Data. What is included Omi.docxbunyansaturnina
Page 579
Assess the Constituent Data. What is included? Omitted? What are the data based on?
What assumptions are being made? Different retirement calculators give widely different
estimates of how much savings is needed for retirement because of factors they include or omit
(such as entertainment) and assumptions they make (such as inflation rate or healthiness of
annuities and mutual funds) in the calculations.
Two reputable sources can give different figures because they take their data from different
places. Suppose you wanted to know employment figures. The Labor Department’s monthly
estimate of nonfarm payroll jobs is the most popular, but some economists like Automatic Data
Processing’s monthly estimate, which is based on the roughly 20 million paychecks it processes
for clients. Both survey approximately 400,000 workplaces, but the Labor Department selects
employers to mirror the U.S. economy, while ADP’s sample is skewed, with too many
construction firms and too few of the largest employers. On the other hand, the government has
trouble counting jobs at businesses that are opening and closing, and some employers do not
return the survey. (Both organizations do attempt to adjust their numbers to compensate
accurately.)7
Check the Currency of the Data. Population figures should be from the 2010 census, not
the 2000 one. Technology figures in particular need to be current. Do remember, however, that
some large data sets are one to two years behind in being analyzed. Such is the case for some
government figures, also. If you are doing a report in 2014 that requires national education data
from the Department of Education, for instance, 2013 data may not even be fully collected. And
even the 2012 data may not be fully analyzed, so indeed the 2011 data may be the most current
available.
Hard to Quantify Sports Participation
How many people participate In sports, and which sports do they choose?
Governments and equipment makers want to know, but the data are fuzzy. Multiple
questions contribute to the lack of clarity.
What is a sport? One survey includes bird-watching.
Who should be counted? Do young children count?
How often do you have to participate in a sport to be counted? Is once a year enough?
How was the count made? Because younger and more active people tend to have only cell phones, a survey
made through landlines probably won’t be accurate.
In case you are curious, the National Sporting Goods Association survey says hiking is the most popular
participation sport in the United States, with over 40 million people.
Adapted from Carl Bialik, “Sports Results that Leave Final Score Unclear,” Wall Street Journal, June 9, 2012, A2.
Choosing the Best Data
Sometimes even good sources and authorities can differ on the numbers they offer, or on the
PRINTED BY: SHERIFAT EGBERONGBE <[email protected]>. Printing is for personal, private use only. No part of this book may be reproduced or transmitted witho.
About Your Signature AssignmentThis signature assignment is de.docxransayo
About Your Signature Assignment
This signature assignment is designed to align with specific program student learning outcome(s) in your program. Program Student Learning Outcomes are broad statements that describe what students should know and be able to do upon completion of their degree. The signature assignments might be graded with an automated rubric that allows the University to collect data that can be aggregated across a location or college/school and used for program improvements.
Purpose of Assignment
The purpose of this assignment is for students to synthesize the concepts learned throughout the course. This assignment will provide students an opportunity to build critical thinking skills, develop businesses and organizations, and solve problems requiring data by compiling all pertinent information into one report.
Assignment Steps
Resources: Microsoft Excel®, Signature Assignment Databases, Signature Assignment Options, Part 3: Inferential Statistics
Scenario: Upon successful completion of the MBA program, say you work in the analytics department for a consulting company. Your assignment is to analyze one of the following databases:
· Manufacturing
· Hospital
· Consumer Food
· Financial
Select one of the databases based on the information in the Signature Assignment Options.
Provide a 12 paragraphs each 150 words statistical report including the following:
· Explain the context of the case
· Provide a research foundation for the topic
· Present graphs
· Explain outliers
· Prepare calculations
· Conduct hypotheses tests
· Discuss inferences you have made from the results
This assignment is broken down into four parts:
· Part 1 - Preliminary Analysis
· Part 2 - Examination of Descriptive Statistics
· Part 3 - Examination of Inferential Statistics
· Part 4 - Conclusion/Recommendations
Part 1 - Preliminary Analysis (3-4 paragraphs)
Generally, as a statistics consultant, you will be given a problem and data. At times, you may have to gather additional data. For this assignment, assume all the data is already gathered for you.
State the objective:
· What are the questions you are trying to address?
Describe the population in the study clearly and in sufficient detail:
· What is the sample?
Discuss the types of data and variables:
· Are the data quantitative or qualitative?
· What are levels of measurement for the data?
Part 2 - Descriptive Statistics (3-4 paragraphs)
Examine the given data.
Present the descriptive statistics (mean, median, mode, range, standard deviation, variance, CV, and five-number summary).
Identify any outliers in the data.
Present any graphs or charts you think are appropriate for the data.
Note: Ideally, we want to assess the conditions of normality too. However, for the purpose of this exercise, assume data is drawn from normal populations.
Part 3 - Inferential Statistics (2-3 paragraphs)
Use the Part 3: Inferential Statistics document.
· Create (formulate) hypotheses
· Run formal hyp.
Data Storytelling - Game changer for Analytics Gramener
This document discusses the importance of data storytelling and provides recommendations for data leaders. It argues that data stories are more memorable and impactful than raw data or facts. The document outlines four patterns for telling data stories and provides examples. It recommends that organizations embed design skills, automate storytelling, and embrace storytelling as part of the data insights process. Telling data stories can help people really understand data intuitively and aid decision making.
Introduction to Descriptive & Predictive AnalyticsDilum Bandara
This document provides an introduction to descriptive and predictive analytics. It discusses key concepts including descriptive analytics which uses data aggregation and mining to provide insights into past data, predictive analytics which uses statistical models and forecasts to understand the future, and prescriptive analytics which uses optimization and simulation to advise on possible outcomes. The document also reviews basic statistical concepts such as measures of location, dispersion, shape, and association that are important for data analytics. These concepts include mean, median, standard deviation, skewness, kurtosis, and correlation.
This document discusses the four main types of analytics: descriptive, diagnostic, predictive, and prescriptive. Descriptive analytics answers the question "What happened?" by summarizing past data. Diagnostic analytics answers "Why did this happen?" by analyzing data to determine causes of trends. Predictive analytics answers "What might happen in the future?" by using statistics and modeling to predict outcomes. Prescriptive analytics answers "What should we do next?" by recommending actions based on predictive analytics. The document provides examples of each type.
Early Lessons Learned in Applying Big Data To TV AdvertisingJeff Storan
This document discusses how Simulmedia is applying big data techniques to television advertising. It summarizes that Simulmedia has assembled a large set of television viewing data through partnerships. It uses this data and data science techniques to sell targeted television ads, gaining insights into audience fragmentation and how to better reach audiences. It also discusses some challenges in working with television data and lessons learned around quality control, the value of more data, and showing addressable TV ads can be effective.
Early Lessons Learned in Applying Big Data To TV AdvertisingJeffrey Storan
- Simulmedia is a startup that uses big data to target TV advertising. They have assembled the world's largest set of actionable television data through partnerships with major data providers.
- Their data set includes over 200 terabytes of information on 113 million daily events and 400,000 weekly ads, which they use advanced statistical techniques and machine learning to analyze.
- Their analysis shows that while audiences are fragmenting across many channels, simple algorithms applied at large scale using their extensive data can better predict audience movement and interests than other existing tools.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
Generative Classifiers: Classifying with Bayesian decision theory, Bayes’ rule, Naïve Bayes classifier.
Discriminative Classifiers: Logistic Regression, Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Attribute selection measures- Gini impurity; Entropy, Regularization Hyperparameters, Regression Trees, Linear Support vector machines.
2. Hello!
My name is Iryna Smologonova
I am a data analyst with a background in product
development, audit and customer service.
With my curiosity and tenacity, I make connections
between different data sets, translate data into
actionable insights and communicate the ideas to
stakeholders.
I am excited to expand my data analysis and
critical-thinking skills, and to uncover the most
desirable solutions of not only solving problems
from the customer’s perspective and also affecting
business growth.
2
Technical skills
Excel Tableau
SQL Project management
Python Big data processing
Soft skills
Problem-solving Collaboration
Leadership Business acumen
Storytelling Curiosity
3. Projects
3
Rockbuster Stealth
Analyzing online movie rental transactions to answer business questions
Flu Season
Analyzing regional and seasonal trends of influenza in the US
GameCo
Analyzing global video game sales
Instacart
Analyzing historical grocery order data to generate insights for Marketing
strategy
4. 4
Rockbuster Stealth – a global movie rental company
Objective
To assist with launcging strategy for
the new online video service
Perform an analysis of historical
data to identify sales trends,
customer behavior, rental duration
Develop insights and
recommendations for Rockbuster
(fictitious company)
Tools
PostgreSQL
Power Point
Tableau
Data
Rockbuster dataset
Source: PostgreSQL Tutorial
Skills
Creating data dictionary
Database querying
Joining tables
Subqueries
Common table expressions
ERD
visualization
Database
querying
Summarizing
& cleaning
data in SQL
Filtering &
grouping
Answer
business
questions
Data
visualization
Recommend
strategies
5. 5
Rockbuster Stealth SQL functions:
Aggregating, ranking, joining & grouping
SQL queries available here
Business questions
Which movies contributed the most to
revenue gain?
Which genres the most popular?
Answers
Top 20 movies account for 6% of total
revenue and 2% number of movies
Top 3 genres by revenue- Sports, Sci-
Fi and Animation produce
Top genres - Comedy, New and Sports
produce more revenue per film.
6. 6
Rockbuster Stealth Business questions:
Do sales figures vary between geographic regions?
Which rating yield the most revenue?
Top 3 countries - India, China and the United States account for
25% of total number of customers and company revenue.
Ratings PG-13 & NC 17 generate the most revenue
Data visualizations created in Tableau, available here
7. 7
Rockbuster Stealth Project deliverables:
Project report
GitHub Repo
Data Dictionary
Key learning experience:
Common Table Expressions (CTE) is more readable than
subqueries and can be reusable. However, subqueries
and CTE have pros & cons and the choice between them
should be made on a case-by-case basis.
SQL ranking functions allowed me to define top 20
movies with the highest revenue in a simple way and
made my query more readable. RANK()/DENSE_RANK()
functions are great for sequencing and comparing data
across various factors.
A bubble chart is a solution to visualize three metrics –
number of transactions, revenue and average revenue
per number of transactions. It allowed to include the
addition of a third dimension as a bubble size/color to
emphasize the most popular genres.
Recommendations:
Focus on:
Adding movies to inventory generating the most
revenue Ratings: PG-13 & NC-17 and Genres: Sports,
Sci-Fi and Animation
Comedy, New and Sports as higher generating
genres produced more revenue per film could be
beneficial at a pilot project
Top 3 countries - India, China and the United States
account for 25% of total number of customers and
company revenue. Therefore, I would recommend to
start the streaming service by piloting in these
countries
Click links
to check
the project
8. Flu season
8
Sourcing the proper
data
Data profiling
& integrity
Data quality
measures
Data
transformation
& integration
Conducting
statistical
analysis
Consolidating
analytical
insights
Statistical
hypothesis
testing
Objective
To assist in preparation of
staffing plan in the United
States for upcoming influenza
season:
Analyze death trends
Prioritize states with
vulnerable populations
Tools
Excel
Tableau
Data
• Influenza deaths by geography, time,
age, and gender
Source: CDC
Population data by geography
Source: US Census Bureau
Influenza lab test results by state
Source: CDC (Fluview)
Skills
Translating business requirements
Data cleaning
Data integration & transformation
Statistical hypothesis testing
Visual analysis
Forecasting
Storytelling in Tableau
Presenting results
Scenario: To help a medical staffing agency prepare for the upcoming flu season by examining trends in
influenza and how they can be used to proactively plan for clinic and hospital staffing needs across the country
Data viz &
storytelling
9. 9
Flu Season
Data transformation by using pivot tables and VLOOKUP functions
Combining different data sets by utilizing common state and year/month variables
Normalizing the flu deaths data according to state populations by deriving new variables representing flu
deaths as a percentage of state population
Examining the data variability by calculating the variance and the standard deviation
Correlation coefficient between death rate of 65+ population and death rate below 65 was 0.79 that
quantified the strong relationship. It means that the higher rate of vulnerable population 65+ in the state
the higher a death rate
Null Hypothesis: Flu death rate of population 65+ years old is less or equal than people under 65 years
old
Alternative Hypothesis: Flu death rate of population 65+ years old is higher than people under 65 years
old
The p-value is much less than the significance level of 0.05. This means that the null hypothesis is
rejected and there is 95% chance that the flu death rate of people 65+ years old is higher
With 95% confidence (alfa 0.05) we can say that there is a significant difference between the flu death
rate of 65+ years and other groups.
Data
transformation
& integration
Conducting
Statistical
Analysis
Statistical
Hypothesis
Testing
Transformation & integration
Statistical Analysis
Project management plan
Interim report
Click links
to check
the project
10. 10
Flu Season
Influenza death rate among population 65+ years old Number of influenza deaths by state
One-year forecast of influenza deaths by state
The flu season starts in December and ends in March with a peak in January
The death rate forecast for an upcoming flu season is pretty the same in
comparison with the historical data
The less populous states have the higher death rate of elderly populations
per 100K: Alaska, Hawaii, Wyoming, District of Columbia, South and North
Dakota, Vermont
The higher populous states have the higher number of deaths: California,
New York, Texas, Pennsylvania and Florida
Key questions:
When does the flu season start and end?
Where to focus during the flu season?
11. 11
Flu Season Project deliverables: Storytelling in Tableau – an interactive slide deck
GitHub Repo
Key learning experience:
It is important to consider data limitations and assess the
impact of it on the analysis and the result interpretation.
As analysis progresses, data limitations may become
apparent and should be added to the analysis plan.
Data mapping helps to match variables between
different data sets. Death data and population data by
states were mapped using age variable on 10 years
ranges starting from age of 5 years old.
Normalization is a part of data preparation and allows
data in different units to be compared using the same
units. The flu deaths data was normalized according to
state populations by deriving new variables representing
flu deaths as a percentage of state population.
Recommendations:
Focus on:
6 top states with the highest death rate:
Alaska, Hawaii, Wyoming, District of Columbia,
South and North Dakota, Vermont
5 top states with the highest elderly
populations:
California, Texas, Florida, New York and
Pennsylvania
the influenza season months:
Dec-Mar with a peak in Jan
Click links
to check
the project
12. GameCo – an online rental video game company
12
Data exploration
Data
cleaning
Grouping
data
Descriptive
analytics
Developing
insights
Proposal
report
Viz data
insights
Objective
Perform a descriptive analysis of
an online video game sales data
set to foster a better
understanding of how GameCo’s
(fictitious company) new games
might fare in the market.
Compare historical regional sales
assumptions with the reality of
current market conditions.
Tools
Excel
Power Point
Data
VGSales
Skills
Grouping data
Summarizing data
Descriptive analysis
Visualizing results in Excel
Presenting results
13. GameCo
13
0%
10%
20%
30%
40%
50%
60%
Percent
of
Sales
Years
Regional Sales as a Percentage of Global Sales
NA sales EU Sales JP sales
Key question: How have their sales figures varied between geographic regions over time?
52%
27%
9%
32%
38%
19%
North America Europe Japan
Sales by region 2008 vs 2016
2008 2016
Assumption:
The initial understanding of the business was that
videogame sales have remained consistent across
regions over time.
Testing:
• by plotting each region as a percentage of global
sales in a line graph, it reveals that sales in Europe
overtook North America in 2016
• the column chart has indicated a comparison of
regional sales in 2008 vs 2016 - share of North
America sales dropped by 20%, in Europe and Japan
it increased by 11% and 10% accordingly
14. 14
54%
17%
14%
10%
5%
New games by Genre 2015-2016
Action
Role-Playing
Sports
Shooter
Fighting
GameCo Key question: Are certain types of games more popular than others?
0 5 10 15 20 25 30 35 40
Shooter
Action
Sports
Role-Playing
Fighting
Misc
Racing
Sales in Millions ($)
Genre
Regional Sales by Genre 2015-2016
JP sales EU sales NA sales
The clustered bar chart was created in Excel to
represent the regional sales by different video games
genres.
The pie chart shows the percentage breakdown of new
games sales for the last two years.
Insights:
• Shooter dominates in North American market
• Action is popular in all regional markets and it has
the highest number of new games
• Role-Playing is the second leader in Japan
15. 15
GameCo Project deliverables:
Market Goal Target
audience
Actions
North American
refocus the budget
allocation
stabilize
sales in
preventing
further
decline
current and
former
customers using
the large
historical
customer base
launch direct marketing campaigns
on promoting new games in
Shooter, Action and Sports genres
European
support the sales with
a slight increasing
budget resources
keep the
growing
trend over
time
loyal customers
and acquire new
customers
promote intensively new games via
marketing campaigns such as
BOGO - buy one game in
Shooter/Sports genre and get one in
Action/Role-Playing genre with a
certain discount
Japanese
allocate additional
resources for
emphasizing
promotion and
attracting new
customers
continue a
growing
trend
started the
last year
current
customers and
attract new ones
advance promotions using the last
year approach and keeping the main
accent on Action and Role-Playing
new games
Proposal report
GitHub Repo
Key learning experience:
It is important to consider testing
the assumptions that go with the
analysis. It allows to determine if
conclusions are correctly drawn
from the results of the analysis.
The goal of the project, the
regional customers, sales over
time and best-selling genres were
taken into consideration to
develop a regional approach of
setting goals, focusing on a target
audience and developing
recommendations for marketing
activities.
Recommendations:
Click links
to check
the project
16. 16
Instacart – an online grocery store operates via an app
Objective
To assist with identifying sales
patterns for better segmentation:
Explore historical data to
define buying trends and
customer behavior
Select sub-groups of
customers and analyze their
ordering habits
Tools
Python
Jupiter notebook
Pandas & NumPy libraries
Matplotlib, seaborn & pyplot
Excel
Data
Skills
Data cleaning &
wrangling
Data merging
Deriving new variables
Aggregating
Population flows
Data wrangling &
subsetting
Data
consistency
check
Combining &
exporting
data
Deriving
new
variables
Grouping &
aggregating
Excel report
Data viz with
Python
Orders Departments
Products Customers
17. 17
Instacart Population flows
Merging the datasets
The project was started from cleaning,
organising and merging data before
conducting the analysis
Data wrangling procedure:
- dropping and renaming columns in “orders” dataset
- renaming columns and changing data types in “products” and “customers” datasets
- transposing “departments” datasets
Merge data together
18. 18
Instacart Deriving variables & crosstabs
Flag creation:
Spending flag was defined by spending amount: less than $8 & over or equal $8
Product price range was divided by 4 categories: High-range product over $15, Mid-
high range between $10 &$ 15, Mid-low range between $5 &$10 and Low range
product equal or less than $5
Crosstab calculation:
To display the number of orders made for
every day of the week, a flag ‘busiest days’ and
a variable ‘order_day_of_week’ were taken
19. 19
Instacart Visualization:
Matplotlib
Seaborn
Business question
What differences can you find in
ordering habits of different
customer profiles?
Answer
High income level customers
prefer products from alcohol
and pets department.
Affluent customers prefer
products from meat seafood
and can goods.
There is no clear preference for
Middle income customers.
Low income customers prefer
products from snacks,
beverages and breakfast
departments.
Business question
What different classifications does the
demographic information suggest?
Answer
Middle and Affluent customers across
all age groups make the major number
of orders
Young and Middle-aged customers
with the Middle income level are the
core of the customer base
20. 20
Instacart
Key learning experience:
Run out of memory in RAM for a code execution
is substantially reduced by converting data
types, sampling data or restarting kernels
A vast collection of libraries helps with exploring,
cleaning large data sets and creating
visualizations in a simple way
Deriving new variables and creating crosstabs
give unlimited opportunities to find insights and
communicate them via visualizations with
different features for creating informative,
customized, and appealing plots to present data
in the most simple and effective way.
Recommendations:
The least busiest days are the middle of the week: Tuesday
and Wednesday. Ads should be running on Monday-
Wednesday with a target to increase number of orders on
Tuesday and Wednesday
To boost sales - focus on middle level income with the profile
young single and single parent .Target younger customers
with low income level lower price products from snacks,
beverages and breakfast departments
Promote to High income and Affluent customers high range
products from meat seafood, can goods, alcohol and pets
departments due to they have more potential capability buy a
group of products
Project deliverables:
GitHub Repo
Excel report
Click links
to check
the project