Data
Any bitof information that is expressed in a value or numerical
number is data. For example, the marks you scored in your Math
exam is data, and the number of cars that pass through a bridge in
a day is also data. Data is basically a collection of information,
measurements or observations.
Raw data is an initial collection of information. This information
has not yet been organized. After the very first step of data
collection, you will get raw data. For example, we go around and
ask a group of five friends their favourite colour. The answers are
Blue, Green, Blue, Red, and Red. This collection of information
is the raw data.
3.
What are Typesof Data in Statistics?
Qualitative or Categorical Data
Qualitative data, also known as the categorical data,
describes the data that fits into the categories.
Qualitative data are not numerical. The categorical
information involves categorical variables that describe
the features such as a person’s gender, home town, hair
colour, religion, etc.
Sometimes categorical data can hold numerical values
(quantitative value), but those values do not have a
mathematical sense. Examples of the categorical data are
birthdate, favourite sport, school postcode.
4.
Nominal Data
Nominaldata is one of the types of qualitative information which
helps to label the variables without providing the numerical value.
Nominal data is also called the nominal scale. It cannot be ordered
and measured.
Example
Students of a university are classified by the school in which they are
enrolled using a nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for the school variable
(e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education,
and so on).
5.
Ordinal data
Ordinaldata is a kind of qualitative data that groups
variables into ordered categories, which have a natural
order or rank based on some hierarchal scale, like from
high to low. But there is a lack of distinctly defined
intervals between the categories.
We use ordinal data to observe customer feedback,
satisfaction, economic status, education level, etc.
6.
Quantitative or NumericalData
Quantitative data is also known as numerical data which represents the
numerical value (i.e., how much, how often, how many).
Numerical data gives information about the quantities of a specific
thing. Some examples of numerical data are height, length, size,
weight, and so on.
The quantitative data can be classified into two different types based on
the data sets.
The two different classifications of numerical data are discrete data and
continuous data.
7.
Discrete Data
Discretedata can take only discrete values. Discrete information contains
only a finite number of possible values. Those values cannot be subdivided
meaningfully. Here, things can be counted in whole numbers.
Example: Number of students in the class
Continuous Data
Continuous data is data that cannot be calculated. It has an infinite
number of probable values that can be selected within a given specific
range.
Example: Temperature range
Tabulation of data
The systematic presentation of numerical data in rows and columns
is known as Tabulation.
It is designed to make presentation simpler and analysis easier. This
type of presentation facilitates comparison by putting relevant
information close to one another, and it helps in further statistical
analysis and interpretation.
In statistics as well as mathematics is a method of storing classified
data in a tabular form. It may be complex, double, or simple,
depending upon the type of categorization.
The purpose of a tabulation chart/data is to display a large volume of
complex information in a systematic fashion that would enable the
viewers to draw reasonable outcomes and interpretations from them.
10.
Parts of Tablein Tabulation
Table Number: This is the first section of a table and is presented on top of
any table to facilitate straightforward identification and for further reference.
Title of the Table: One of the most related parts of any given table is its title.
The title of the table describes its contents. It is important that the title should
be short and crisp and exactly worded to define the table’s contents efficiently.
Column Headings or Captions: Captions are the piece of information on the
table which is at the top of each column that tells the figures under each
column.
Row Headings: The title of every horizontal row comes under the row
heading.
Body of a Table: This is the part that includes the numeric information
collected from examined facts. The data in the body is displayed in rows
which are read horizontally starting from left to right and the data in the
columns are read vertically from top to bottom.
11.
Objectives of Tabulation
1. To Simplify Complex Data: Data or information presented in such a format decreases the bulk
of information, i.e., it lessens raw data in a more simplified and exact form that can be easily
interpreted by a common person in less time.
2. To Highlight Important Information: Representing any data in rows and columns extends
the scope to highlight the relevant information by presenting facts clearly and precisely without
textual information. Thus this automatically contains any crucial data without difficulty.
3. To Enable Easy Comparison: When data is displayed in an orderly manner in rows and
columns, it becomes more obvious to perform the comparison of quantity on the grounds of
several parameters. For example, it becomes more straightforward to determine the month when a
country has experienced the highest amount of rainfall if the information is presented in a table.
Otherwise, there is always room for making an error in processing the data correctly.
4. To Facilitate Statistical Analysis: Tables serve as the most reliable source of classified data
for statistical analysis. The task of computing percentage, distribution, correlation, etc., becomes
more manageable if data is presented in the form of a table.
5. To Save Space: A table presents facts in a more reliable way than the textual structure. Hence,
it saves space without losing the quality and quantity of data.
12.
Features of aGood Table
Title: The top of the table must have a title and it needs to be very appealing and
attractive.
Manageable Size: The table shouldn’t be too big or too small. The size of the table
should be in accordance with its objectives and the characteristics of the data. It
should completely cover all significant characteristics of data.
Attractive: A table should have an appealing appearance that appeals to both the
sight and the mind so that the reader can grasp it easily without any strain.
Special Emphasis: The data to be compared should be placed in the left-hand
corner of columns, with their titles in bold letters.
Fit with the Objective: The table should reflect the objective of the statistical
investigation.
Simplicity: To make the table easily understandable, it should be simple and
compact.
Data Comparison: The data to be compared must be placed closely in the columns.
13.
Numbered Columnsand Rows: When there are several rows and
columns in a table, they must be numbered for reference.
Clarity: A table should be prepared so that even a layman may make
conclusions from it. The table should contain all necessary information and
it must be self-explanatory.
Units: The unit designations should be written on the top of the table,
below the title. For example, Height in cm, Weight in kg, Price in , etc.
₹
However, if different items have different units, then they should be
mentioned in the respective rows and columns.
Suitably Approximated: If the figures are large, then they should be
rounded or approximated.
Scientifically Prepared: The preparation of the table should be done in a
systematic and logical manner and should be free from any kind of
ambiguity and overlapping.
14.
Merits of TabularPresentation of Data
Brief and Simple Presentation: Tabular presentation is possibly the simplest
method of data presentation. As a result, information is simple to understand. A
significant amount of statistical data is also presented in a very brief manner.
Facilitates Comparison: By grouping the data into different classes, tabulation
facilitates data comparison.
Simple Analysis: Analysing data from tables is quite simple. One can
determine the data’s central tendency, dispersion, and correlation by organising
the data as a table.
Highlights Characteristics of the Data: Tabulation highlights characteristics
of the data. As a result of this, it is simple to remember the statistical facts.
Cost-effective: Tabular presentation is a very cost-effective way to convey data.
It saves time and space.
Provides Reference: As the data provided in a tabular presentation can be used
for other studies and research, it acts as a source of reference.
Definition ofFrequency distribution
• A frequency distribution is a tabular summary of
data showing the number (frequency) of
observations in each of several non-overlapping
categories or classes.
• The objective is to provide insights about the
data that cannot be quickly obtained by looking
only at the original data.
23.
Bar Chart
Abar chart is a graphical display for depicting qualitative
data.
On one axis (usually the horizontal axis), we specify the
labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency scale
can be used for the other axis (usually the vertical axis).
Using a bar of fixed width drawn above each class label,
we extend the height appropriately.
The bars are separated to emphasize the fact that each
class is a separate category.
25.
Pie Chart
Thepie chart is a commonly used graphical display for
presenting relative frequency and percent frequency
distributions for categorical data.
First draw a circle; then use the relative frequencies to
subdivide the circle into sectors that correspond to the
relative frequency for each class.
The relative frequency of a class is the fraction or proportion
of the total number of data items belonging to the class.
=
The percent frequency of a class is the relative frequency
multiplied by 100.
Frequency Distribution
Example:Hudson Auto Repair
The manager of Hudson Auto would like to gain a better understanding of the
cost of parts used in the engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts, rounded to the nearest
dollar, are listed on the next slide.
29.
The threesteps necessary to define the classes for a frequency
distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.
30.
Guidelines for Determiningthe Number of Classes
Use between 5 and 20 classes.
Data sets with a larger number of elements usually require a
larger number of classes.
Smaller data sets usually require fewer classes.
The goal is to use enough classes to show the variation in the
data, but not so many classes that some contain only a few data
items.
31.
Guidelines for Determiningthe Width of Each Class
Use classes of equal width.
Approximate Class Width =
Making the classes the same width reduces the chance of
inappropriate interpretations.
32.
Example: HudsonAuto Repair
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5
10
33.
Histogram
Another commongraphical display of quantitative data is a
histogram.
The variable of interest is placed on the horizontal axis.
A rectangle is drawn above each class interval with its height
corresponding to the interval’s frequency.
Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.