1. The document discusses the nature of regression analysis, which involves studying the dependence of a dependent variable on one or more explanatory variables, with the goal of estimating or predicting the average value of the dependent variable based on the explanatory variables.
2. It provides examples of regression analysis, such as studying how crop yield depends on factors like temperature, rainfall, and fertilizer. It also distinguishes between statistical and deterministic relationships, and notes that regression analysis indicates dependence but does not necessarily imply causation.
3. Regression analysis differs from correlation analysis in that it treats the dependent and explanatory variables asymmetrically, with the goal of prediction rather than just measuring the strength of the linear association between variables.
This Presentation is tailor made for those who are willing to get an overview of Econometrics as to what it means, how it works and the methodology it follows.
The presentation aims to explain the meaning of ECONOMETRICS and why this subject is studied as a separate discipline.
The reference is based on the book "BASIC ECONOMETRICS" by Damodar N. Gujarati.
For further explanation, check out the youtube link:
https://youtu.be/S3SUDiVpUGU
This Presentation is tailor made for those who are willing to get an overview of Econometrics as to what it means, how it works and the methodology it follows.
The presentation aims to explain the meaning of ECONOMETRICS and why this subject is studied as a separate discipline.
The reference is based on the book "BASIC ECONOMETRICS" by Damodar N. Gujarati.
For further explanation, check out the youtube link:
https://youtu.be/S3SUDiVpUGU
We can define heteroscedasticity as the condition in which the variance of the error term or the residual term in a regression model varies. As you can see in the above diagram, in the case of homoscedasticity, the data points are equally scattered while in the case of heteroscedasticity, the data points are not equally scattered.
Two Conditions:
1] Known Variance
2] Unknown Variance
Ramsey–Cass–Koopmans model and its application in EthiopiaMolla Derbe
Many economists have argued on macroeconomics words for several years in their school of
thoughts. Ramsey, the neoclassical economist, has not believed in the Solow model with some
terms. What makes his model differs from the Solow model is that it explicitly models the choice
of consumption at a point in time and so has made the savings rate endogenous. The Twentieth
first research in Ethiopia (Seid Nuru, 2012, p.6-7) found that the outcome of the optimization of
the dynamic model is that growth in the long-run depends on the rate of technological change
and rate of change of rainfall variability in terms of both amplitude and frequency.
We can define heteroscedasticity as the condition in which the variance of the error term or the residual term in a regression model varies. As you can see in the above diagram, in the case of homoscedasticity, the data points are equally scattered while in the case of heteroscedasticity, the data points are not equally scattered.
Two Conditions:
1] Known Variance
2] Unknown Variance
Ramsey–Cass–Koopmans model and its application in EthiopiaMolla Derbe
Many economists have argued on macroeconomics words for several years in their school of
thoughts. Ramsey, the neoclassical economist, has not believed in the Solow model with some
terms. What makes his model differs from the Solow model is that it explicitly models the choice
of consumption at a point in time and so has made the savings rate endogenous. The Twentieth
first research in Ethiopia (Seid Nuru, 2012, p.6-7) found that the outcome of the optimization of
the dynamic model is that growth in the long-run depends on the rate of technological change
and rate of change of rainfall variability in terms of both amplitude and frequency.
correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it normally refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.
Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling. However, in general, the presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation).
Formally, random variables are dependent if they do not satisfy a mathematical property of probabilistic independence. In informal parlance, correlation is synonymous with dependence. However, when used in a technical sense, correlation refers to any of several specific types of mathematical operations between the tested variables and their respective expected values. Essentially, correlation is the measure of how two or more variables are related to one another. There are several correlation coefficients, often denoted
ρ
\rho or
r
r, measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other). Other correlation coefficients – such as Spearman's rank correlation – have been developed to be more robust than Pearson's, that is, more sensitive to nonlinear relationships.[1][2][3] Mutual information can also be applied to measure dependence between two variables.
The presentation was prepared to provide a brief overview on Regression analysis as to what it means and how it differs from related statistical measures like Correlation. The last few slides briefly discusses on the different types of data used in Econometrics.
iStockphoto/Thinkstock
chapter 10
Linear Regression
Learning Objectives
After reading this chapter, you will be able to. . .
1 explain the relationship between correlation and regression.
2. understand the importance of prediction using regression.
3. identify the two values that determine the regression line in least-squares regression.
4 predict the value of a criterion based on a predictor value.
5. describe the extension of bivariate regression to multiple regression.
6. report and interpret results of multiple regression in APA format.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_10_c10.indd 367 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
Regression is a powerful analytical tool that involves using relationships between vari-ables to predict one from a measure of the other. Generally speaking, people make
regression-like predictions often. The presence of clouds in the morning sky prompts us
to take an umbrella to work, for example, or a phone call from unexpected guests prompts
us to prepare extra food for a meal. This chapter follows the same thinking, except that the
predictions are mathematical.
Regression topics represent a bit of a puzzle. Sometimes readers shy away from regression
topics, thinking that they are difficult to understand or at least difficult to master. How-
ever, regression, as the reader will see, is neither difficult to understand nor to execute.
Social scientists rely on regression in virtually every advanced statistical procedure. Most
of those high-end statistical techniques—such as multivariate analysis of variance, dis-
criminant function analysis, and structural equations modeling—are beyond the scope of
an introductory text, but regression analysis is an essential part of the preparation for each
of them. In the meantime, we will learn to use regression for its own purposes, to detect
significant independent variables known as predictors on a dependent outcome variable
known as the criterion.
10.1 Regression and Correlation
In Chapter 9, we made the point that when two variables are correlated, it is because they both contain some of the same information. If intelligence and reading comprehension
are correlated, it is because to some degree they both measure the same characteristic. The
more highly they are correlated, the greater the quantity of whatever is measured that the
two characteristics have in common.
Recall that the coefficient of determination (rxy
2) indicates the proportion of one variable,
the x variable, for example, that can be explained by the other, which is designated y.
If intelligence (x) and reading comprehension (y) are correlated rxy 5 .8, then rxy
2 5 .64.
The coefficient of determination tells us that 64% of whatever reading comprehension
measures can be explained by differences in intelligence. Another way to say this is that
the coefficient of determination indicates how much information two correlated variab ...
Statistics is an important tool in pharmacological research that is used to summarize (descriptive statistics) experimental data in terms of central tendency (mean or median) and variance (standard deviation, standard error of the mean, confidence interval or range)
CHAPTER NINE Steady States and Transitions THE STEADY-JinElias52
CHAPTER NINE
Steady States and Transitions
THE STEADY-STATE STRATEGY
Collecting Repeated Measures
Comparing States of Responding
The Risk of Extraneous Variables
Summary
STEADY STATES
Definition
Uses
Evaluates Measurement Decisions
Reveals the Influence of Conditions
Evaluates Experimental Control
Facilitates Experimental Comparisons
Identification
Trends
Range
Cycles
Criteria
Uses
Statistical
Graphical
Nondata
Establishing Stable Responding
TRANSITIONS
Transition States
Transitory States
Identification
Making Phase Change Decisions
191
I
192
9 . S1' EADY STATES AND TRANsrr
IONs
1,.
Manipulation of new variables will often produce changes, but tn
order to describe tbe cbanges, we must be able to specify the baseline
0111 wbicb tbey occurred.
-Murray Sidman
1HE STEADY-STATE STRATEGY
Collecting Repeated Measures
Let us suppose that we have defined a response class, selected a dimensional
quantity, set up observation procedures, and are ready to sta~ collecting data
under a baseline ( control) condition. The first graphed data point summarizing
responding during a session will tell us something we never knew before, but it
will only make it obvious that one data point does not tell us very much about
what responding looks like under this condition. In particular, we would not
know whether this value is typical of what we should expect under this base
line condition.
The only way to answer this question is to observe for another session. What
we are likely to find is that our second data point is not the same as the first.
Our question then becomes: "Which of these two values is more representative
of the impact of this phase?" Again, there is no way to settle this issue except to
observe for another session.
We should not be surprised if the third value is at least somewhat different
from the other two. However, if the three values are not wildly different,
they may begin to tell us something about responding in this phase. Still, it
would be easy to admit that we do not yet have a very complete picture of
what responding is like in this phase. After all, our participant has had only
limited exposure to this condition, and we know it can take a bit of time for
responding to adapt to a new set of influences. In other words, there is good
reason to anticipate that the initial impact of our baseline condition may not
be a ~ery good prediction of how responding might change with increasing
experience.
As we keep collecting data from one session to the next, our graph will
gradually draw an increasingly comprehensive picture of responding. With
some luck, we may find that responding under this initial condition is relatively
:able. This means t~t ~~sponding is neither generally increasing nor decrea~
~ and th~t the variability from one value to another is not excessive and 15 i
fat~l~ consistent. We may even begin to feel some confidence in answering the
ongm ...
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGIJDKP
This paper presents a methodology that eliminates multicollinearity of the predictors variables in
supervised classification by transforming the predictor variables into orthogonal components obtained
from the application of Partial Least Squares (PLS) Logistic Regression. The PLS logistic regression was
developed by Bastien, Esposito-Vinzi, and Tenenhaus [1]. We apply the techniques of supervised
classification on data, based on the original variables and data based on the PLS components. The error
rates are calculated and the results compared. The implementation of the methodology of classification is
rests upon the development of computer programs written in the R language to make possible the
calculation of PLS components and error rates of classification. The impact of this research will be
disseminated, based on evidence that the methodology of Partial Least Squares Logistic Regression, is
fundamental when working in a supervised classification with data of many predictors variables.
BUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docxcurwenmichaela
BUS 308 Week 5 Lecture 3
A Different View: Effect Sizes
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. What effect size measures exist for different statistical tests.
2. How to interpret an effect size measure.
3. How to calculate an effect size measure for different tests.
Overview
While confidence intervals can give us a sense of how much variation is in our decisions,
effect size measures help us understand the practical significance of our decision to reject the
null hypothesis. Not all statistically significant results are of the same importance in decision
making. A difference in means of 25 cents is more important with means around a dollar than
with means in the millions of dollars, yet with the right sample size both groups can have this
difference be statistically significant.
Effect size measures help us understand the practice importance of our decision to reject
the null hypothesis.
Excel has limited functions available for us to use on Effect Size measures. We generally
need to take the output from the other functions and generate our Effect Size values.
Effect Sizes
One issue many have with statistical significance is the influence of sample size on the
decision to reject the null hypothesis. If the average difference in preference for a soft drink was
found to be ½ of 1%; most of us would not expect this to be statistically significant. And,
indeed, with typical sample sizes (even up to 100), a statistical test is unlikely to find any
significant difference. However, if the sample size were much larger; for example, 100,000; we
would suddenly find this miniscule difference to be significant!
Statistical significance is not the same as practical significance. If for example, our
sample of 100,000 was 1% more in favor of an expensive product change, would it really be
worthwhile making the change? Regardless of how large the sample was, it does not seem
reasonable to base a business decision on such a small difference.
Enter the idea of Effect Size. The name is descriptive but at the same time not very
illuminating on what this measure does. We will get to specific measures shortly, but for now,
let’s look at how an Effect Size measure can help us understand our findings. First, the name:
Effect Size. What effect? What size? In very general terms, the effect we are monitoring is the
effect that occurs when we change one of the variables. For example, is there an effect on the
average compa-ratio when we change from male to female. Certainly, but not all that much, as
we found no significant difference between the average male and female compa-ratios. Is there
an effect when we change from male to female on the average salary? Definitely. And it is
much larger than what we observed on the compa-ratio means. We found a significant
difference in the average salary for males than females – around $14,000.
The Effect Siz.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
1. 405: ECONOMETRICS
Chapter # 1: THE NATURE OF REGRESSION ANALYSIS
By: Domodar N. Gujarati
Prof. M. El-SakkaProf. M. El-Sakka
Dept of Economics: Kuwait UniversityDept of Economics: Kuwait University
2. THE MODERN INTERPRETATION OF REGRESSION
• Regression analysis is concerned with the study of theRegression analysis is concerned with the study of the dependencedependence ofof
one variable, theone variable, the dependentdependent variable, on one or more other variables,variable, on one or more other variables,
thethe explanatoryexplanatory variables,variables, with a view to estimating and/or predictingwith a view to estimating and/or predicting
the (population)the (population) mean or averagemean or average value of the former in terms of thevalue of the former in terms of the
known or fixed (known or fixed (in repeated samplingin repeated sampling) values of the latter.) values of the latter.
ExamplesExamples
1.1. Consider Galton’s law of universal regression. Our concern is findingConsider Galton’s law of universal regression. Our concern is finding
out how theout how the average height of sons changesaverage height of sons changes,, given the fathers’given the fathers’ heightheight..
To see how this can be done, consider Figure 1.1, which is a scatterTo see how this can be done, consider Figure 1.1, which is a scatter
diagram, or scattergram.diagram, or scattergram.
3.
4. 2.2. Consider the scattergram in Figure 1.2, which gives the distribution in aConsider the scattergram in Figure 1.2, which gives the distribution in a
hypothetical populationhypothetical population of heights of boysof heights of boys measuredmeasured atat fixed agesfixed ages..
5. 3.3. studying the dependence ofstudying the dependence of personal consumption expenditurepersonal consumption expenditure on after taxon after tax
or disposable real personalor disposable real personal incomeincome. Such an analysis may be helpful in. Such an analysis may be helpful in
estimating the marginal propensity to consume (MPC.estimating the marginal propensity to consume (MPC.
4.4. A monopolist who canA monopolist who can fix the price or outputfix the price or output (but not both) may want to(but not both) may want to
find out thefind out the responseresponse of the demand for a product to changes in price. Suchof the demand for a product to changes in price. Such
an experiment may enable the estimation of the pricean experiment may enable the estimation of the price elasticityelasticity of theof the
demand for the product and may help determine the mostdemand for the product and may help determine the most profitable price.profitable price.
5.5. We may want to study the rate of change ofWe may want to study the rate of change of money wagesmoney wages in relation to thein relation to the
unemployment rateunemployment rate. The curve in Figure 1.3 is an example of the. The curve in Figure 1.3 is an example of the PhillipsPhillips
curvecurve. Such a scattergram may enable the labor economist to predict the. Such a scattergram may enable the labor economist to predict the
average change in money wagesaverage change in money wages given a certaingiven a certain unemployment rateunemployment rate..
6.
7. 6.6. The higher the rate ofThe higher the rate of inflationinflation ππ, the lower the proportion (k) of, the lower the proportion (k) of theirtheir
income that people would want to hold in the form ofincome that people would want to hold in the form of moneymoney. Figure 1.4.. Figure 1.4.
8. • 7. The marketing director of a company may want to know how the7. The marketing director of a company may want to know how the
demand for the company’s productdemand for the company’s product is related to, say,is related to, say, advertisingadvertising
expenditureexpenditure. Such a study will be of considerable help in finding out the. Such a study will be of considerable help in finding out the
elasticityelasticity of demand with respect to advertising expenditure. Thisof demand with respect to advertising expenditure. This
knowledge may be helpful in determining theknowledge may be helpful in determining the “optimum” advertising“optimum” advertising
budget.budget.
• 8. Finally, an agronomist may be interested in studying the dependence of8. Finally, an agronomist may be interested in studying the dependence of
crop yieldcrop yield, say, of wheat,, say, of wheat, on temperature, rainfall, amount of sunshine, andon temperature, rainfall, amount of sunshine, and
fertilizerfertilizer. Such a dependence analysis may enable the prediction of the. Such a dependence analysis may enable the prediction of the
average crop yield, given information about the explanatory variables.average crop yield, given information about the explanatory variables.
9. 1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS
• In statistical relationshipsIn statistical relationships among variables we essentially deal withamong variables we essentially deal with randomrandom
or stochastic variablesor stochastic variables, that is, variables that have, that is, variables that have probability distributionsprobability distributions..
In functional or deterministicIn functional or deterministic dependency, on the other hand, we also dealdependency, on the other hand, we also deal
with variables, but these variableswith variables, but these variables are not random or stochasticare not random or stochastic..
• The dependence ofThe dependence of crop yieldcrop yield on temperature, rainfall, sunshine, andon temperature, rainfall, sunshine, and
fertilizer, for example, isfertilizer, for example, is statisticalstatistical in naturein nature
• InIn deterministicdeterministic phenomena, we deal with relationships of the type, say,phenomena, we deal with relationships of the type, say,
exhibited byexhibited by Newton’s law of gravityNewton’s law of gravity, which states: Every particle in the, which states: Every particle in the
universe attracts every other particle with a force directly proportional touniverse attracts every other particle with a force directly proportional to
the product of their masses and inversely proportional to the square of thethe product of their masses and inversely proportional to the square of the
distance between them. Symbolically,distance between them. Symbolically, F = k(mF = k(m11mm22/r/r22
), where F = force, m), where F = force, m11 andand
mm22 are the masses of the two particles, r = distance, and k = constant ofare the masses of the two particles, r = distance, and k = constant of
proportionality.proportionality. we are not concerned with such deterministic relationships.we are not concerned with such deterministic relationships.
10. 1.4 REGRESSION VERSUS CAUSATION
• Although regression analysis deals with theAlthough regression analysis deals with the dependencedependence of one variable onof one variable on
other variables,other variables, it does not necessarily imply causationit does not necessarily imply causation. In the crop-yield. In the crop-yield
example cited previously, there is noexample cited previously, there is no statistical reasonstatistical reason toto assume thatassume that rainfallrainfall
does not depend on crop yielddoes not depend on crop yield. The fact that we treat crop yield as dependent. The fact that we treat crop yield as dependent
on rainfall (among other things) is due toon rainfall (among other things) is due to non-statistical considerationsnon-statistical considerations::
Common sense suggests that theCommon sense suggests that the relationship cannot be reversedrelationship cannot be reversed, for we, for we
cannot control rainfall by varying crop yield. A statistical relationshipcannot control rainfall by varying crop yield. A statistical relationship inin
itself cannot logically imply causationitself cannot logically imply causation. To ascribe causality, one must appeal. To ascribe causality, one must appeal
to a priori or theoretical considerations.to a priori or theoretical considerations.
11. 1.5 REGRESSION VERSUS CORRELATION
• InIn correlationcorrelation analysis, the primary objective is to measure theanalysis, the primary objective is to measure the strength orstrength or
degree of linear association between two variablesdegree of linear association between two variables.. For example,For example, smokingsmoking
andand lung cancerlung cancer, scores on, scores on statisticsstatistics andand mathematicsmathematics examinations, and soexaminations, and so
on. In regression analysis,on. In regression analysis, we try to estimate or predict the average value ofwe try to estimate or predict the average value of
one variable on the basis of the fixed values of otherone variable on the basis of the fixed values of other variablesvariables..
• Regression and correlation have some fundamental differences. InRegression and correlation have some fundamental differences. In
regression analysis there is anregression analysis there is an asymmetryasymmetry in the way the dependent andin the way the dependent and
explanatory variables are treated.explanatory variables are treated.
• In correlation analysis, we treat any (two) variablesIn correlation analysis, we treat any (two) variables symmetricallysymmetrically; there is; there is
nono distinction between the dependent and explanatory variablesdistinction between the dependent and explanatory variables. The. The
correlation between scores on mathematics and statistics examinations iscorrelation between scores on mathematics and statistics examinations is
the same as that between scores on statistics and mathematicsthe same as that between scores on statistics and mathematics
examinations. Moreover, both variables are assumed to beexaminations. Moreover, both variables are assumed to be randomrandom..
Whereas most of the regression theory to be dealt with here is conditionalWhereas most of the regression theory to be dealt with here is conditional
upon the assumption that the dependent variable isupon the assumption that the dependent variable is stochasticstochastic but thebut the
explanatory variables areexplanatory variables are fixed or nonstochasticfixed or nonstochastic..
12. 1.6 TERMINOLOGY AND NOTATION
• In the literature the termsIn the literature the terms dependent variable and explanatory variable aredependent variable and explanatory variable are
described variously. A representativedescribed variously. A representative list is:list is:
13. • We will use theWe will use the dependent variable/explanatorydependent variable/explanatory variable or the morevariable or the more
neutral,neutral, regressand and regressorregressand and regressor terminology.terminology.
• The termThe term random is a synonym for the term stochasticrandom is a synonym for the term stochastic. A random or. A random or
stochastic variable is a variable that can take on any set of values, positivestochastic variable is a variable that can take on any set of values, positive
or negative, with a given probability.or negative, with a given probability.
14. 1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS
• Types of DataTypes of Data
• There are three types of data: time series, cross-section, and pooled (i.e.,There are three types of data: time series, cross-section, and pooled (i.e.,
combination of time series and cross-section) data.combination of time series and cross-section) data.
• AA time seriestime series is a set of observations on theis a set of observations on the values that a variable takes atvalues that a variable takes at
different timesdifferent times. It is collected at. It is collected at regular time intervalsregular time intervals, such as daily,, such as daily,
weekly, monthly quarterly, annually, quinquennially, that is, every 5 yearsweekly, monthly quarterly, annually, quinquennially, that is, every 5 years
(e.g., the census of manufactures), or decennially (e.g., the census of(e.g., the census of manufactures), or decennially (e.g., the census of
population).population).
• Most empirical work based on time series data assumes that the underlyingMost empirical work based on time series data assumes that the underlying
time series is stationary. Loosely speakingtime series is stationary. Loosely speaking a time series is stationary if itsa time series is stationary if its
mean and variance do not vary systematically over time.mean and variance do not vary systematically over time.
15.
16. • Cross-Section Data.Cross-Section Data. Cross-section data are data on one or more variablesCross-section data are data on one or more variables
collectedcollected at the same point in time,at the same point in time, such as the census of populationsuch as the census of population
conducted by the Census Bureau every 10 years. example of cross-sectionalconducted by the Census Bureau every 10 years. example of cross-sectional
data is given in Table 1.1. For each year the data on the 50 states are cross-data is given in Table 1.1. For each year the data on the 50 states are cross-
sectional data. because of the stationarity issue, cross-sectional data toosectional data. because of the stationarity issue, cross-sectional data too
have their own problems, specifically the problem ofhave their own problems, specifically the problem of heterogeneityheterogeneity..
• From Table 1.1From Table 1.1 we see that we have some states that produce huge amountswe see that we have some states that produce huge amounts
of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).
When we include suchWhen we include such heterogeneousheterogeneous units in a statistical analysis, the sizeunits in a statistical analysis, the size
or scale effect must be taken into account. To see this clearly, we plot inor scale effect must be taken into account. To see this clearly, we plot in
Figure 1.6 the data on eggs produced and their prices in 50 states for theFigure 1.6 the data on eggs produced and their prices in 50 states for the
year 1990. This figure shows how widely scattered the observations are.year 1990. This figure shows how widely scattered the observations are.
19. • Pooled Data.Pooled Data. In pooled, or combined, data are elements ofIn pooled, or combined, data are elements of both time seriesboth time series
and cross-sectionand cross-section datdata. The data in Table 1.1 are an example of pooled data.a. The data in Table 1.1 are an example of pooled data.
For each year we have 50 cross-sectional observations and for each state weFor each year we have 50 cross-sectional observations and for each state we
have two time series observations on prices and output of eggs, a total ofhave two time series observations on prices and output of eggs, a total of
100100 pooledpooled (or combined) observations.(or combined) observations.
• Panel, Longitudinal, or Micropanel Data.Panel, Longitudinal, or Micropanel Data. This is aThis is a special typespecial type of pooledof pooled
data in which thedata in which the same cross-sectional unit (say, a family or a firm)same cross-sectional unit (say, a family or a firm) isis
surveyed over time.surveyed over time.
• The Sources of DataThe Sources of Data
• The data used in empirical analysis may be collected by aThe data used in empirical analysis may be collected by a governmentalgovernmental
agencyagency (e.g., the Department of Commerce), an(e.g., the Department of Commerce), an international agencyinternational agency (e.g.,(e.g.,
the International Monetary Fund (IMF) or the World Bank), athe International Monetary Fund (IMF) or the World Bank), a privateprivate
organizationorganization (e.g., the Standard & Poor’s Corporation), or an(e.g., the Standard & Poor’s Corporation), or an individualindividual..
Literally, there are thousands of such agencies collecting data for oneLiterally, there are thousands of such agencies collecting data for one
purpose or another.purpose or another.
20. • The Accuracy of DataThe Accuracy of Data
• The quality of the data is often not that good. Some reasons for that are:The quality of the data is often not that good. Some reasons for that are:
• FirstFirst, as noted, most social science data are nonexperimental in nature., as noted, most social science data are nonexperimental in nature.
Therefore, there is the possibility ofTherefore, there is the possibility of observational errorsobservational errors, either of omission, either of omission
or commission.or commission.
• SecondSecond, even in experimentally collected data, even in experimentally collected data errors of measurementerrors of measurement arisearise
from approximations and roundoffs.from approximations and roundoffs.
• ThirdThird, in questionnaire-type surveys, the problem of, in questionnaire-type surveys, the problem of nonresponsenonresponse can becan be
serious; a researcher is lucky to get a 40% response to a questionnaire.serious; a researcher is lucky to get a 40% response to a questionnaire.
• FourthFourth, the sampling methods used in obtaining the data may vary so, the sampling methods used in obtaining the data may vary so
widely that it is often difficult to compare the results obtained from thewidely that it is often difficult to compare the results obtained from the
various samples.various samples.
• FifthFifth, economic data are generally available at a highly aggregate level. For, economic data are generally available at a highly aggregate level. For
example, most macrodata (e.g., GNP, inflation, unemployment).example, most macrodata (e.g., GNP, inflation, unemployment).
• The researcher should always keep in mind that the results of research areThe researcher should always keep in mind that the results of research are
only as good as the quality of the dataonly as good as the quality of the data..
21. A Note on the Measurement Scales of Variables.
• The variables that we will generally encounter fall into four broadThe variables that we will generally encounter fall into four broad
categories:categories: ratio scale, interval scale, ordinal scale, and nominal scaleratio scale, interval scale, ordinal scale, and nominal scale.. It isIt is
important that we understand each.important that we understand each.
• Ratio ScaleRatio Scale. For a variable. For a variable X, taking two values, X1 and X2, the ratio X1/X2X, taking two values, X1 and X2, the ratio X1/X2..
Comparisons such asComparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful.X2 ≤ X1 or X2 ≥ X1 are meaningful.
• Interval ScaleInterval Scale. The distance between two time periods, say (2000–1995) is. The distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000meaningful, but not the ratio of two time periods (2000/1995)./1995).
• Ordinal Scale.Ordinal Scale. Examples are grading systems (A, B, C grades) or incomeExamples are grading systems (A, B, C grades) or income
class (upper, middle, lower). For these variables the ordering exists but theclass (upper, middle, lower). For these variables the ordering exists but the
distances between the categories cannot be quantified.distances between the categories cannot be quantified.
• Nominal Scale.Nominal Scale. Variables such as gender and marital status simply denoteVariables such as gender and marital status simply denote
categories.categories. Such variables cannot be expressedSuch variables cannot be expressed on the ratio, interval, oron the ratio, interval, or
ordinal scales.ordinal scales.
• Econometric techniques that may be suitable for ratio scale variables mayEconometric techniques that may be suitable for ratio scale variables may
not be suitable for nominal scale variables. Therefore, it is important tonot be suitable for nominal scale variables. Therefore, it is important to
bear in mind the distinctions among the four types.bear in mind the distinctions among the four types.