2. Ilembo, B.2
Data Versus Information
Data usually refers to raw numbers, such as the
numbers reported by the TMA on weather
forecast or quarterly records on price of
commodities reported by NBS
Information usually refers to data that have been
analyzed in some way to provide some meaning
Thus processed data should give some
meaningful information- interpretation!!!!
3. Ilembo, B.3
Classification and Presentation
of Statistical Data
Various sampling techniques and methods of data
collection can be applied to obtain the data of interest!
In its original form, such a data set is a mere aggregate of
numbers, thus not very helpful in extracting information
We need to summarize and display the information in a
readily digestible form
Such;- ordering the data according to their magnitude,
compiling into tables or graphing them to form a visual
image (serving those with no taste with numbers)
5. Ilembo, B.5
Scales of Measurement
Nominal scale: assigns numbers as a way to label or identify characteristics
The numbers assigned have no quantitative meaning beyond indicating
the presence or absence of the characteristic under investigation.
The numbers are not obtained as a result of counting or measurement
process
For example, we can record the gender of respondents as 0 and 1, where 0
stands for male and 1 stands for female
The numbers we assign for the various categories are purely arbitrary, and
any arithmetic operation applied to these numbers is meaningless
6. Ilembo, B.6
Ordinal Scale
The next higher level of measurement precision.
It ensures that the possible categories can be placed in a specific order
(rank) or in some ‘natural’ way
Again here the numbers are not obtained as a result of a counting or
measurement process, and consequently, arithmetic operations are not
allowed
For example, responses for Mzumbe Restaurant service provision can be
coded as 1, 2, 3 and 4: 1 for poor – 2 for moderate – 3 for good – 4 for
excellent. It is quite obvious that there is some natural ordering: the
category 'excellent' (which is coded as 4) indicates a better restaurant
service provision than the category 'moderate' (which is coded as 2) and,
thus, order relations are meaningful
7. Ilembo, B.7
Interval Scale
Unlike the nominal and ordinal scales of measurement, the numbers
in an interval scale are obtained as a result of a measurement process
and have some units of measurement
Also the differences between any two adjacent points on any part of
the scale are meaningful. However, a point can not be considered to
be a multiple of another, that is, ratios have no meaningful
interpretation.
For example, Celsius temperature scale that subdivides the distance
between the freezing and boiling point into 100 equally spaced parts
is an interval scale
There is a meaningful difference between 30 degree Celsius and 12
degree Celsius. However, a temperature of 20 degree Celsius can not
be interpreted as twice as hot as a temperature of 10 degree Celsius.
8. Ilembo, B.8
Ratio Scale
The ratio scale represents the highest form of measurement
precision. In addition to the properties of all lower scales of
measurement, it possesses the additional feature that ratios have
meaningful interpretation
Furthermore, there is no restriction on the kind of statistics that can
be computed for ratio scaled data.
For example, the height of individuals (in centimeters), the annual
profit of firms (in Tshs.) and plot elevation (in meters) represent
ratio scales. The statement ‘the annual profit of Firm X is twice as
large as that of Firm Y’ has a meaningful interpretation.
9. Ilembo, B.9
Importance of measurements
Helps you decide on how to interpret the data.
For example, if you know that a measure is nominal,
then you know that the numerical values are just short
codes for the longer names.
Second, knowing the level of measurement helps you
decide what statistical analysis is appropriate on the
values that were assigned
If a measure is nominal, for instance, then you know
that you would never average the data values.
11. Ilembo, B.11
Variable
The word “Variable” is often used in the study of statistics, so it is
important to understand its meaning.
A variable is a characteristic that may assume more than one set of
values to which a numerical measure can be assigned.
Varies from one unit of investigation to another: examples are Sex,
age, amount of income, Region or country of birth, grades obtained
at school and mode of transportation to work
There are broadly three types of data that can be employed in
quantitative analysis:
time series data, cross-sectional data, and panel data
12. Ilembo, B.12
Time Series data
As the name suggests, are data that have been collected over a
period of time on one or more variables.
Time series data have associated with them a particular
frequency of observation or collection of data points.
The frequency is simply a measure of the interval over, or the
regularity with which, the data are collected or recorded.
Examples: Students enrolment in higher learning institutions
over the past 20 years, a firm’s quarterly sales over the past 5
years. etc
13. Ilembo, B.13
Cross sectional data
Cross-sectional data are data on one or more variables
collected at a single point in time. Such data do not have
a meaningful sequence. For example, the data might be
on:
– Sales of 30 companies
– Productivity of each sales division
– Farmer’s production and productivity in a particular
farming season etc.
14. Ilembo, B.14
Panel Data
Panel data have the dimensions of both time series and cross-sections,
e.g. the daily prices of a number of rubber shoes in Mbaji store over two
years.
Note:
For time series data, it is usual to denote the individual observation
numbers using the index t, and the total number of observations
available for analysis by T
For cross-sectional data, the individual observation numbers are
indicated using the index i , and the total number of observations
available for analysis by N
In contrast to the time series case, no natural ordering of the
observations in a cross-sectional sample. For example, the observations
i might be on the price of bonds of different firms at a particular point
in time, ordered alphabetically by company name. On the other hand, in
a time series context, the ordering of the data is relevant since the data
are usually ordered chronologically.
15. Ilembo, B.15
Continuous and discrete variables
A quantitative variable that has a ‘connected’ string of possible values
at all points along the number line, with no gaps between them, is
called a continuous variable
In other words, a variable is said to be continuous if it can assume an
infinite number of real values within a certain range. It can take on any
value and is not confined to take specific numbers. The values of such
variables are often obtained by measuring
Examples of a continuous variable are distance, age and daily revenue
The measurement of a continuous variable is restricted by the methods
used, or by the accuracy of the measuring instruments. For example,
the height of a student is a continuous variable because a student may
be 1.6321748755... meters tall
However, when the height of a person is measured, it is usually
measured to the nearest centimeter. Thus, this student's height would
be recorded as 1.63 m
16. Ilembo, B.16
Variables cont..
A quantitative variable that has ‘separate’ values at specific
points along the number line, with gaps between them, is called
a discrete variable
Such variables can only take on certain values, which are
usually integers (whole numbers), and are often defined to be
count numbers (i.e., obtained by counting).
The number of people in a particular shopping mall per hour or
the number of shares traded during a day are examples of
discrete variables. These can take on values such as 0, 1, 2, 3 ...
In these cases, having 86.3 people in the mall or 585.7 shares
traded would not make sense.