This document discusses key concepts in economic statistics including:
1. The stages of the research process include problem identification, generating hypotheses, conducting research, statistical analysis, and drawing conclusions.
2. Descriptive statistics summarize and describe data while inferential statistics make inferences about a population based on a sample.
3. Data can be qualitative, quantitative, cross-sectional, or time-series. Common descriptive statistics include the mean, median, mode, standard deviation, and range.
Our latest report is out on the spending, saving and borrowing intentions of Irish consumers. For the fourth month in a row, our Economic Recovery Index has reached a record level. Maybe there's a recovery under way...?
More here:
http://www.amarach.com/resources/economic-recovery-index.htm
The Paris-Saclay Center for Data ScienceBalázs Kégl
My slides from the MASTODONS (big data) workshop of the CNRS.
http://www.cnrs.fr/mi/spip.php?article631&lang=fr
On defining and managing the data science ecosystem at Université Paris-Saclay. Challenges and tools.
My talk at Gartner Symposium 2012 about why the Web is not dead and how Firefox OS will save the Web on mobile and stir competition on mobile browsers as they did on the desktop
Our latest report is out on the spending, saving and borrowing intentions of Irish consumers. For the fourth month in a row, our Economic Recovery Index has reached a record level. Maybe there's a recovery under way...?
More here:
http://www.amarach.com/resources/economic-recovery-index.htm
The Paris-Saclay Center for Data ScienceBalázs Kégl
My slides from the MASTODONS (big data) workshop of the CNRS.
http://www.cnrs.fr/mi/spip.php?article631&lang=fr
On defining and managing the data science ecosystem at Université Paris-Saclay. Challenges and tools.
My talk at Gartner Symposium 2012 about why the Web is not dead and how Firefox OS will save the Web on mobile and stir competition on mobile browsers as they did on the desktop
Database Marketing - Dominick's stores in Chicago districDemin Wang
Determined two courses for the Dominick's transnational database analysis: one performed on a corporate level to facilitate a variety of corporate planning activities; and the other one on a category level to improves sales performance and expand product offerings.
• Extracted one year sales data from 109 Dominick's stores in Chicago district and merged with store demographic data.
• Analysis the data by segmentation analysis (create groups of the stores similar in performance), response analysis (find targetable characteristics of identified groups of stores) and model validation (evaluate performance of the model on a 20% hold-out sample) utilizing SAS
• Explicated the result in 25 pages report, which discussed the evaluation of potential locations for a new store and choice of the stores to test market a new product.
Database Marketing - Dominick's stores in Chicago districDemin Wang
Determined two courses for the Dominick's transnational database analysis: one performed on a corporate level to facilitate a variety of corporate planning activities; and the other one on a category level to improves sales performance and expand product offerings.
• Extracted one year sales data from 109 Dominick's stores in Chicago district and merged with store demographic data.
• Analysis the data by segmentation analysis (create groups of the stores similar in performance), response analysis (find targetable characteristics of identified groups of stores) and model validation (evaluate performance of the model on a 20% hold-out sample) utilizing SAS
• Explicated the result in 25 pages report, which discussed the evaluation of potential locations for a new store and choice of the stores to test market a new product.
Basic Statistics for Paid Search AdvertisingNina Estenzo
SGS is not directly affiliated with PPC Pinas.
Katharine is a full-time employee of SGS and a member of PPC Pinas.
SGS is the world's leading inspection, testing, certification and verification company.
PPC Pinas is a community for Filipino paid search professionals and individuals who have interest in search engine marketing, digital media buying and related activities.
ANALYSIS OF PRODUCTION PERFORMANCE OF TAMILNADU NEWSPRINT AND PAPERS LTD – C...Editor IJCATR
Every day, Tamilnadu Newsprint and Papers Ltd managers must make decisions about Production delivery without
knowing what will happen in the future. Forecasts enable them to anticipate the future and plan, many forecasting methods are
available to Tamilnadu Newsprint and Papers Ltd managers for planning, to estimate future demand or any other issues at hand.
However, for any type of forecast to bring about later success, it must follow a step-by-step process comprising five major steps: 1)
goal of the forecast and the identification of resources for conducting it; 2) time horizon; 3) selection of a forecasting technique; 4)
conducting and completing the forecast; and 5) monitoring the accuracy of the forecast. Accordingly Linear Regression method is a
widely used to predict this kind of demand. In this paper, we forecast the Production of Papers in TamilNadu Newsprint and Papers
Ltd from the past 15 years of Production using the Linear Regression method
A tool-agnostic overview of how to analyse and explore data in a systematic way. This talk covers metadata generation, univariate analysis, and the basics of bivariate analysis.
The talk also provides examples of natural power law distributions (scale-free networks.)
When fitting loss data (insurance) to a distribution, often the parameters that provide a good overall fit will understate the density in the tail.
This method allows one to split the distribution into 2 portions, and use a Pareto distribution to fit the tail.
Presented at the CAS Spring Meeting in Seattle, May 2016.
Chapter 7 Forecasting Time Series ModelsLan WangCSU East .docxchristinemaritza
Chapter 7: Forecasting
Time Series Models
Lan Wang
CSU East Bay
Some Time Series Terms
Stationary Data - a time series variable exhibiting no significant upward or downward trend over time.
Moving average
Exponential smoothing
Some Time Series Terms
Nonstationary Data - a time series variable exhibiting a significant upward or downward trend over time.
Regression analysis
Some Time Series Terms
Seasonal Data - a time series variable exhibiting a repeating patterns at regular intervals over time.
Seasonal index
Simple Moving Average
Average random fluctuations in a time series to infer short-term changes in direction
Assumption: future observations will be similar to recent past
Moving average for next period = average of most recent k observations
Moving Average Example
The monthly sales for Telco Batteries, Inc. were as follows:MONTHSALESFebruary21March15April14May13June16July18August20
a. Calculate a 3 month moving average forecast for September
b. Calculate a 2 month moving average forecast for September
c. Which moving average forecast is more accurate?
Moving Average Example
Error Metrics and Forecast Accuracy
Mean absolute deviation (MAD)
Mean square error (MSE)
Mean absolute percentage error (MAPE)
The quality of a forecast depends on how accurate it is in predicting future values of a time series.
8
Telco Batteries Example - continued
Exponential Smoothing
Exponential smoothing model:
Ft+1 = (1 – a )Ft + aAt
= Ft + a (At – Ft )
Ft+1 is the forecast for time period t+1,
Ft is the forecast for period t,
At is the observed value in period t, and
a is a constant between 0 and 1, called the smoothing constant.
Highly effective approach.
10
Exponential Smoothing
The monthly sales for Telco Batteries, Inc. were as follows:MONTHSALESFebruary21March15April14May13June16July18August20
a. Calculate an Exponential Smoothing forecast with alpha = 0.2, for September
b. Calculate an Exponential Smoothing forecast with alpha = 0.3, for September
c. Which Exponential Smoothing forecast is more accurate?
Exponential Smoothing Example - ContinuedalphaMonthSales0.20.3AD(0.2)AD(0.3)SE(0.2)SE(0.3)APE(0.2)APE(0.3)February212121March1521216.006.0036.0036.000.400.40April1419.8019.205.805.2033.6427.040.410.37May1318.6417.645.644.6431.8121.530.430.36June1617.5116.251.510.252.290.060.090.02July1817.2116.170.791.830.623.340.040.10August2017.3716.722.633.286.9310.750.130.16September17.8917.71MAD3.733.53MSE18.5516.45MAPE0.250.23
AD - Absolute Deviation SE Squared error
APE - Absolute Percentage Error
Practice
Attendance in each time period. Please forecast the attendance using exponential smoothing (alpha=0.4 and 0.6).
Use MAD, MSE as guidance, find the better alpha setting for each forecasting model.
Trend Models
Trend is the long-term sweep or general direction of movement in a time series.
We’ll now consider some nonstationary time series techniques that are appropriate for dat ...
Compare and Contrast two very different jurisdictions and products that have a similar problem.
Loss trends that are high and changing, with a very long tail.
Argentina Auto and California Workers Compensation
We show 3 different approaches at tackling a similar problem
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. 2
Stages of Research Process:
1.Problem Identification
2.Generating Hypothesis
3.Conducting the Research
4.Statistical Analysis (Descriptive and
Inferential Statistics)
5. Drawing Conclusion
3. 3
Descriptive statistics Inferential statistics
Mean t-test
Median Analysis of variance (ANOVA)
Mode Correlation
Standard deviation Multiple regression
Variance Factor analysis
Range Discriminant analysis
Chi square
Repeated measures ANOVA
5. Economic Statistics
Statistics
-are a collection of theory and methods
applied for the purpose of understanding
data.
-art and science of collecting, analyzing,
presenting, and interpreting data.
6. Why Study Econometrics?
Economic theory makes statement or hypotheses
Theories do not provide
the necessary measure of strength of relationship
(numerical estimate of the relationship) &
the proper functional relationship between variables.
Example: Law of Demand
A reduction in price of a commodity is expected to
increase the quantity demanded of that commodity.
to provide empirical verification of theories
7. 7
Economic Statistics
Data, Data Set, Elements, Variables and
Observations
Data are facts and figures that are collected, analyzed, and
summarized for presentation and interpretation.
Data set refers to all data collected in particular study.
Elements are the entities on which data are collected.
Variable is a characteristic of interest for the elements.
Observation is a set of measurements obtained for a particular
element.
8. 8
Economic Statistics
Qualitative, Quantitative, Cross-section
and time series Data
Qualitative data are labels or names used to identity an
attribute of each element.
Quantitative data are numeric values that indicate how
much or how many.
Qualitative variable is a variable with qualitative data.
Quantitative variable is a variable with quantitative
data.
9. 9
Economic Statistics
Cross-sectional data are data collected at the same
or approximately the same point in time.
Time series data are data collected over several time
periods.
Pooled data are data with elements of both cross-
sectional and time series data.
Panel data are data with the same cross-sectional
unit, say, a family or firm, and is surveyed over time.
10. 10
Economic Statistics
ITEM 1990
PHILIPPINES 9266287
CAR 165585
ILOCOS 847691
CAGAYAN VALLEY 1164758
CENTRAL LUZON 1910930
S. TAGALOG - A 904297
BICOL 686998
WESTERN VISAYAS 886732
CENTRAL VISAYAS 182940
EASTERN VISAYAS 337459
Western Mindanao 350313
NORTHERN MINDANAO 306069
Southern Mindanao 649812
Central Mindanao 443068
ARMM 203718
CARAGA 225917
Cross-sectional Data: Volume of Palay Production (000MT), Philippines, 1990.
13. 13
Economic Statistics
Scales of Measurement
The nominal scale has no mathematical value. It is also called a
categorical scale. Numbers are assigned to categories of
nominal data/variables to facilitate data processing.
An ordinal scale is a measure in which data or categories of a
variables are ordered or ranked into two or more levels or
degrees, such as from low to high or least to most.
An interval scale has the characteristics of an ordinal scale, but in
addition, the distance between points in interval scales is equal.
A ratio scale is almost like the interval scale, except that the ratio
scale has a real zero point.
14. 14
Economic Statistics
Scale Description Example
Nominal Categories do not have mathematical
values. One is not higher or lower
than the other.
Sex: male, female
Color: red, white, yellow
Civil Status: single, married
Ordinal Categories can be ranked. The
difference between the first and the
second rank is not the same as the
difference between the second and
the third ranks.
Degree of malnutrition: 1st
degree, 2nd
degree, 3rd
degree
Honor roll: 1st, 2nd, 3rd
Level of anger: not angry, very
angry.
Interval The data have numerical value. The
distance between two points is the
same, but there is no zero point or it
may be arbitrary.
Body temperature in
Fahrenheit: 30 degrees, 40
degrees, 50 degrees
Business capital (PhP): 1m, 2m,
3m
Ratio The same as interval data but the
zero point is fixed.
No. of children: 0,1,2,3,4
Hrs. spent in studying: 0, 5,10
Descriptions and Examples of the Four Scales of measurement
15. 15
Economic Statistics
Data
Qualitative Data Quantitative Data
Tabular Methods Graphical Methods Tabular Methods Graphical Method
Frequency
Distribution
Relative Frequency
Distribution
Percent Frequency
Distribution
Bar Graph
Pie Chart
Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Cumulative Frequency Distribution
Cumulative Relative Frequency Distribution
Cumulative Percent Frequency Distribution
Histogram
Scatter Diagram
16. 16
Economic Statistics
Frequency Distribution: Qualitative Data
A Frequency Distribution is a tabular
summary of data showing the number
(frequency) of items in each of several
nonoverlapping classes
19. 19
Economic Statistics
Relative Frequency Distribution
A Relative Frequency distribution is tabular summary
of data showing the relative frequency for each class
Relative Frequency =
Frequency of the Class
n
n = number of observations
Percent Frequency Distribution
A percent frequency distribution is a tabular summary
of data showing the percent frequency for each class.
20. 20
Economic Statistics
Frequency Distribution of Softdrink Purchases
Relative Percent
Softdrink Frequency Frequency
Coke Classic 0.38 38
Diet Coke 0.16 16
Dr. Pepper 0.10 10
Pepsi-Cola 0.26 26
Sprite 0.10 10
Total 1.00 100
n = 50
21. 21
Economic Statistics
A bar graph is a graphical device depicting data that
have been summarized in a frequency, relative
frequency, or percent frequency distribution.
The pie chart is a graphical device for presenting
relative frequency and percent frequency
distributions.
24. 24
Economic Statistics
Sex Number Percent
Male 45 39.13
Female 70 60.87
Total 115 100
Frequency Distribution of Students According to Sex
25. 25
Economic Statistics
Nutritional status Number Percent
Normal 30 40
1
st
degree malnourished 20 26.7
2nd
degree malnourished 15 20
3
rd
degree malnourished 10 13.3
Total 75 100
Frequency Distribution of Children by Nutritional Status
26. 26
Economic Statistics
Frequency Distribution: Quantitative Data
1. Determine the number of nonoverlapping
classes.
2. Determine the width of each class.
3. Determine the class limits.
27. 27
Economic Statistics
Number of Classes: Five or six classes
Width of the Classes
Approximate Class Width =
Largest Data Value – Smallest Data Value
Number of Classes
Class Limits:
The lower class limit identifies the smallest possible data
value assigned to the class.
The upper class limit identifies the largest possible data value
assigned to the class.
31. 31
Economic Statistics
Audits Time (days) Relative Percentage Frequency
10-14 .20 20
15-19 .40 40
20-24 .25 25
25-29 .10 10
30-34 .05 5
Total 1.00 100
Relative and Percent Frequency Distributions for the Audit-Time Data
n = 20
32. 32
Economic Statistics
Cumulative Frequency Distribution shows the number
of data items with values less than or equal to the
upper class limit of each class.
Cumulative Relative Frequency distribution shows the
proportion of data items with values less than or
equal to the upper class limit of each class.
Cumulative Percent Frequency distribution shows the
percentage of data items with values less than or
equal to the upper class limit of each class.
33. 33
Economic Statistics
Cumulative Frequency Distribution
Audits Time (days) Cumulative Cumulative Relative Cumulative Percent
Frequency Frequency Frequency
Less than or equal to 14 4 0.20 20
Less than or equal to 19 12 0.60 60
Less than or equal to 24 17 0.85 85
Less than or equal to 29 19 0.95 95
Less than or equal to 34 20 1.00 100
Cumulative Frequency, Cumulative Relative Frequency, and Cumulative
Percent Frequency Distributions for the Audit-Time Data
38. Summation Notation
S = sum of; X is a variable such as
family income
Then total family income across N
observations is
=
=
N
i
Ni XXXX1
21 ...
39. Summation Notation
Summation of a constant times a
variable is equal to the constant times
the summation of that variable:
=
=
N
i
Ni kXkXkXXk 1
21 ...
40. Summation Notation
Summation of the sum of observations
on two variables is equal to the sum of
their summations:
===
=
N
i
i
N
i
i
N
i
ii YXYX 111
)(
42. 42
Economic Statistics
Measures of Central Tendency: Mean, Median and
Mode
The mean is the average of all values. It is useful in analyzing
interval and ratio data. The mean is derived by adding all the
values and dividing the sum by the number of cases.
Example: Achievement can be measured by a score in a 100 item
test. Scores of 15 students in the test
82 83 85 87 87 88 90 91 93 93 94 95 95 95 96
Mean = Sum of 82 + 83 + 85 + 87…96 = 1266/15 = 84.4
43. 43
Economic Statistics
The median is the value in the middle when the data are arranged
from highest to lowest.
For example:
Scores: 82 83 85 87 87 88 90 91 93 93 94 95 95 95 96
Note: For an odd number of observations, the median is the middle
value. For an even number of observations, the median s the
average of the two middle values.
Scores: 82 83 85 87 87 88 90 91 93 93 94 95 95 95 96 98
44. 44
Economic Statistics
The mode is the most frequently occurring in a
set of figures or value that occurs with greatest
frequency.
Example. 82 83 85 87 87 88 90 90 90 91 93 93
96 97 97
45. 45
Economic Statistics
Describing the Variance in the data
(Univariate)
The range is a simple measure of variation calculated as
the highest value in a distribution, minus the lowest value
plus 1.
Example: 82 83 85 87 87 88 90 90 90 91 93 93 96 97 97
Range = highest value – Lowest value
97 - 82 = 15
46. 46
Economic Statistics
Variance
The variance is a measure of variability that utilizes
all the data. The variance is based on the difference
between the value of each observation (xi) and the
mean. The difference between each xi and the mean
(x for a sample , u for a population) is called a
deviation about the mean.
48. 48
Economic Statistics
Number of Students Mean Class Size Deviation About Squared Deviation
in Class the Mean About the Mean
46 44 2 4
54 44 10 100
42 44 -2 4
46 44 2 4
32 44 -12 144
0 256
Computation of Deviations and Squared Deviations About the Mean for the
Class-Size Data
64
4
256
1
2
2
==
=
n
xx
s
i
xxi 2
xxi
49. 49
Economic Statistics
Standard Deviation
The standard deviation is defined as the positive square root of
the variance .The standard deviation is easier to interpret than
the variance because standard deviation is measured in the
same units as the data.
2
ss =
2
=
Sample Standard Deviation
Population Standard Deviation
50. 50
Economic Statistics
Number of Students Mean Class Size Deviation About Squared Deviation
in Class the Mean About the Mean
46 44 2 4
54 44 10 100
42 44 -2 4
46 44 2 4
32 44 -12 144
0 256
64
4
256
1
2
2
==
=
n
xx
s
i
xxi 2
xxi
864 ==s
51. 51
Economic Statistics
The coefficient of variation is a relative measure
of variability; it measures the standard deviation
relative to the mean. It is computed as follows
100
Mean
DeviationStandard
x
100x
x
s
52. 52
Economic Statistics
Number of Students Mean Class Size Deviation About Squared Deviation
in Class the Mean About the Mean
46 44 2 4
54 44 10 100
42 44 -2 4
46 44 2 4
32 44 -12 144
0 256
64
4
256
1
2
2
==
=
n
xx
s
i
xxi 2
xxi
2.18100
44
8
100 == xx
x
s
53. 53
Economic Statistics
The z-score is often called the standardized value.
The standardized value or z-score, zi can be
interpreted as the number of standard deviation xi is
from the mean x. The z-score for any observation
can be interpreted as a measure of the relative
location of the observation in a data set.
s
xx
z i
i
=
54. 54
Economic Statistics
Z-Scores for the Class-Size Data
Number of Students in Class Deviation about the Mean z-score
46 2 2/8 = 0 .25
54 10 10/8 = 1.25
42 -2 -2/8 = -0.25
46 2 2/8 = 0.25
32 -12 -12/8 = -1.50
s
xx
z i
i
=
55. Economic Statistics
12 14 19 18
15 15 18 17
20 27 22 23
22 21 33 28
14 18 16 13
Audit Times (In Days)
n
x xi= = 19.3
56. Economic Statistics
Audit Time Frequency
10-14 4
15-19 8
20-24 5
25-29 2
30-34 1
Total 20
Frequency Distribution for the Audit-time Data
57. Economic Statistics
Sample Mean for Grouped Data
n
Mf
x
ii=
Mi = the midpoint for class i
fi = the frequency for class i
n = Sfi = the sample size
58. Economic Statistics
Audit Time Class Midpoint Frequency
(Days) Mi fi fiMi
10-14 12 4 48
15-19 17 8 136
20-24 22 5 110
25-29 27 2 54
30-34 32 1 32
Total 20 380
days19
20
380
===
n
Mf
x
ii
60. Economic Statistics
Among the measures of central tendency discussed, the
mean is by far the most widely used.
The mean is not appropriate for highly skewed distributions
and is less efficient than other measures of central tendency
when extreme scores are possible.
The geometric mean is a viable alternative if all the scores
are positive and the distribution has a positive skew.
61. Economic Statistics
A distribution is skewed if one of its tails is longer than
the other.
This distribution has a positive skew. This means that
it has a long tail in the positive direction. Distributions
with positive skew are sometimes called "skewed to
the right”.
62. Economic Statistics
The distribution below has a negative skew since it
has a long tail in the negative directions,so it is
“skewed to the left.
72. 72
Economic Statistics
Scartter Diagram for the Stereo and Sound Equiptment Store
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6
No. of Commercials
SalesVolume
II I
III IV
3
51
77. Economic Statistics
Spearman rho (p)
Applicable to some research studies in which the data consist of
ranks or the raw scores can be converted to ranking. Spearman
rho is a special case of the Pearson r because rankings are
ordinal data.
rankspairedebetween thdifferenced
rankspairedofnumbern
where
1
6
1 2
2
=
=
=
nn
d
79. Economic Statistics
Size of Correlation Interpretation
0.90 to 1.00 (-0.90 to -1.00) Very high positive (negative) correlation
0.70 to 0.90 (-0.70 to -0.90) High positive (negative) correlation
0.50 to 0.70 (-0.50 to -0.70) Moderate positive (negative) correlation
0.30 to 0.50 (-0.30 to -0.50) Low positive (negative) correlation
0.00 to 0.30 (-0.00 to -0.30) Little if any correlation
Rule of Thumb for Interpreting the Size of a Correlation Coefficient
A correlation coefficient can take on values between –1.0 and +1.0,
inclusive. The sign indicates the direction of the relationship. A plus
indicates that the relationship is positive; a minus sign indicates that the
relationship is negative. The absolute value of the coefficient indicates
the magnitude of the relationship.
80. Economic Statistics
Variable X Variable Y
Pearson r Interval/Ratio
Number of Commercial
Salary
Interval/Ratio
Sales
Years of Schooling
Spearman (p) Ordinal (Ranking) Ordinal (Ranking)
Point-Biserial Nominal (Dichotomous)
Gender
Interval/Ratio
Test Scores
Phi (Φ) Nominal (Dichotomous)
Gender
Gender
Nominal (Dichotomous)
Political Party Affiliation
Issues
Rank-Biserial Nominal (Dichotomous)
Marital Status
Ordinal
Socio-economic Status
Lambda (λ) Nominal (more than two
classification levels)
Level of Education
Nominal (more than two
classification levels)
Occupational Choice
Matrix Showing Correlation Coefficients Appropriate for Scales of Measurement for
Variable X and Variable Y
82. Economic Statistics
Subject Item Score Test Score
(X) (Y)
A 1 10
B 1 12
C 1 16
D 1 10
E 1 11
F 0 7
G 0 6
H 0 11
I 0 8
J 0 5
5 96
X = nominal data with two classification levels (a dichotomous variable). Assignment of value 1 to correct
response to item 1 of the 20-item test and value 0 to an incorrect response.
Y = data on the total test scores for ten students
Need to correlate success on one item of a test (the dichotomy—either right or wrong) with total score on
the test.
Data for Calculating the Point-Biserial Correlation Coefficient
83. Economic StatisticsThe point-biserial correlation coefficient
=1Y mean of the Y scores for those individuals with X scores equal to 1
0Y = mean of the Y scores for those individuals with X scores equal to 0
ys = standard deviation of all Y scores
p = proportion of individuals with an X score of 1
q = proportion of individuals with an X score of 0
pq
s
YY
r
y
pb
01
=
The resulting correlation coefficient is the index of the relationship between
performance on one test item and performance on the test as a whole.
84. Economic Statistics
50.050.0
07.3
40.780.11
=pbr = 0.716
Subjects scoring high on the total test tended to answer item 1 correctly and
those with lower scores tended to answer the item 1 incorrectly.
85. Economic Statistics
Person Gender Political Affiliation
(X) (Y)
A 1 1
B 1 1
C 1 0
D 1 1
E 1 1
F 0 0
G 0 1
H 0 1
I 0 0
J 0 0
5 6
1 = FEMALE 1 = PRO-ADMIN
0 = MALE 0 = ANTI-ADMIN
Data for Calculating the Phi (Φ) Coefficient
X and Y are nominal
dichotomous variables
86. Economic Statistics
Gender
Male (0) Female (1) Totals
Political affiliation Pro-Admin (1) 2 4 6
Anti-Admin (0) 3 1 4
Totals 5 5 10
Variable X
0 1 Totals
Variable Y 1 A B A + B
0 C D C + D
Totals A + C B + D N
DBCADCBA
ADBC
=
•Phi (Φ) coefficient
2x2 Contingency Table for Computing the Phi (Φ) Coefficient
87. Economic Statistics
14321342
1234
= = 0.408
This coefficient indicates that there is a low positive relationship between
gender and political affiliation. Females tend to be pro-admin and males tend
to be anti-admin.
This direction is evidenced by the positive correlation, which indicates that
scores of 1 tend to be associated with scores of 1 (1 = female, pro-admin) and
zeros (0 = male, anti-admin)
88. Economic Statistics
Less HS Some College Graduate Total
than HS Graduate College Graduate Degree
Laborer/Farmers 347 128 84 37 5 601
Skilled Crafts 164 277 103 43 36 623
Sales/Clerical 30 77 217 147 80 551
Professional/Managerial 2 34 82 198 267 583
Total 543 516 486 425 388 2358
Data for Determining the Relationship Between Level of Education and Occupational Choice
Lambda (λ) coefficient
mm
j
j
I
I
mmimmj
nnn
nnnn
= =
=
2
1 1
nmj = largest frequency in the jth column
nim = largest frequency in the ith row
nm+ = largest marginal row total
n+m = largest marginal column total
n = number of observation
89. Economic Statistics
=
==
j
j
mjn
1
1306267198217277347
=
==
j
j
imn
1
1108267217277347
nm+ = 623
n+m = 543
n = 2358
543623)2358(2
54362311081306
= = 0.394
There is a moderate relationship between level of education and occupational
choice. Based on the data, those individuals with more education tend to have
sales/clerical or professional/ managerial positions, where as those with less
education tend to have laborer/farmer or skilled-crafts positions.
90. Economic Statistics
Person Immigrating Rank of Socio-
Generation (X) economic Status (Y)
A 1 1
B 1 2
C 1 3
D 0 4
E 0 5
F 1 6
G 1 7
H 0 8
I 1 9
J 0 10
K 0 11
L 0 12
Data for Calculating the Rank-Biserial Correlation Coefficient
Need to know the relationship between the fact that an individual is at least a
second-generation American (X) and socio-economic status (Y).
The X variable (immigration status) is considered a nominal dichotomy ( 0 = less
than second generation; 1 = second generation or greater). The data for the Y
variable (socio-economic status) are ranked with 1 = highest value; 2 = next highest
status; and so on.
91. Economic Statistics
Rank-Biserial Correlation Coefficient
01
2
YY
n
rrb =
n = number of observations
1Y = mean rank for individuals with X scores equal to 1
2Y = mean rank for individuals with scores equal to 0
93. Economic Statistics
Aside from Spearman rank correlation, there are correlations that are
applied to two ordinal kinds of variables. These correlation coefficients are
distribution free and are usually applied to the ranks of the two variables.
Examples are the Gamma and the Kendal.
94. Economic Statistics
Goodman and Kruskal Gamma
The Gamma is a simple symmetric correlation. It does not correct for tied ranks.
It is one of many indicators of monotonicity that may be applied. Monotonicity
is measured by the proportion of concordant changes from one value in one
variable to paired values in the other variable.
Concordance (C)--when the change in one variable is positive and the
corresponding change in the other variable is also positive.
Discordance (D) --when the change in one variable is positive and the
corresponding change in the other variable is negative.
95. Economic Statistics
Kendall's Tau a
The number of concordances minus the number of discordances is compared
to the total number of pairs, n(n-1)/2.