TYPES OF DATA
AND
GRAPHICAL / TABULAR
REPRESENTATION
DR. REENA TITORIA
INTRODUCTION
• Statistics may be defined as the science, which deals
with collection, presentation, analysis and
interpretation of numerical data
DATA INFORMATION PRESENTATION
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
A set of values recorded on one or more
observational units is called data
Data should be processed
Data depiction, data summarization and
data transformation
INFORMATION
Collected data should be
• Accurate (i.e. Measures true value of what is
under study)
• Valid( i.e. Measures only what is supposed to
measure)
• Precise(i.e. Gives adequate details of the
measurement)
• Reliable(i.e. Should be repeatable)
Types of DATA
• Qualitative/ Quantitative
• Discrete/ Continuous/ Interval/ Ratio
• Primary/ Secondary
• Nominal/ Ordinal
Quantitative data: Qualitative data:
•Also called as measurement data
•Can be expressed as number with
or without unit of measurement
•Eg: Height in cm, Hb in gm%, BP in
mm of Hg, Weight in kg
•Represents a particular quality or
attribute
•Expressed as numbers without unit
of measurements
•Eg: religion, Sex, Blood group etc
Discrete data:
• Here we always get a whole number.
• Ex: Number of beds in hospital
Malaria cases
Continuous data :
• It can take any value possible to measure or
possibility of getting fractions
• Ex: Hb level, Ht, Wt.
WHAT IS IMPORTANT???
Interval:
• Has values of equal intervals that mean
something. For example, a thermometer might
have intervals of ten degrees
• Ex: Celsius Temperature, IQ (intelligence scale)
Ratio:
• Exactly the same as the interval scale except that
the zero on the scale means: does not exist
• Ex: Age, Weight, Height
Primary data:
• Data collected by the investigator himself/ herself for a
specific purpose
• Ex: Data collected by a student for his/her thesis or
research project
• Advantages:
– The investigator collects data specific to the problem
under study.
– There is no doubt about the quality of the data collected
(for the investigator).
– If required, it may be possible to obtain additional data
during the study period.
Secondary data:
• Data collected by someone else for some other purpose
(but being utilized by the investigator for another
purpose)
• Ex: Census data being used to analyze the impact of
education on career choice and earning
• Advantages of using Secondary data:
– The data’s already there- no hassles of data collection
– It is less expensive
– The investigator is not personally responsible for the
quality of data (“I didn’t do it”)
Nominal data:
• The information or data fits into one of the
categories, but the categories cannot be ordered
• Categories without order
• Ex: Colour of eyes, Race, Gender
Ordinal data:
• A rank or order
• Here the categories can be ordered, but the
space or class interval between two categories
may not be the same
• Ex: Ranking in the class or exam, SES
QUESTION
A person's highest educational level is which type of variable?
• Continuous
• Discrete
• Ordinal
• Nominal
The number of motor-vehicle accidents on a particular stretch
of the national highway in a week is which type of variable?
• Continuous
• Discrete
• Nominal
• Ordinal
DATA
Quantitative
Discrete Continuous Interval Ratio
Qualitatitive
Nominal Ordinal
REPRESENTATION OF DATA
• Tabular
• Graphic
• Numeric
When to use Tables
• When you wish to show how a single category of
information varies when measured at different points
• When the dataset contains relatively few numbers
• When the precise value is crucial to your argument and
a graph would not convey the same level of precision
• For example: when it is important that the reader
knows that the result was 2.48 and not 2.45
• When you don’t wish the presence of one or two very
high or low numbers to detract from the message
contained in the rest of the dataset
Tabular Presentation
1. Table must be numbered
2. Brief and self explanatory title must be given to each table
3. The heading of columns and rows must be clear, sufficient, concise and
fully defined
4. The data must be presented according to size of importance,
chronologically, alphabetically or geographically
5. Table should not be too large
6. The classes should be fully defined, should not lead to any ambiguity
7. The classes should be exhaustive i.e. should include all the given values
8. The classes should be mutually exclusive and non overlapping.
9. The classes should be of equal width or class interval should be same
10. The number of classes should be neither too large nor too small
Normal Range  18.5 ≤ x < 25
Frequency distribution table with
quantitative data:
Table 1: Fasting blood glucose level in diabetics
at the time of diagnosis (n=78)
Fasting Glucose n
120-129 12
130-139 8
140-149 10
150-159 10
160-169 15
170-179 18
180-189 5
Cross- Tabulation
• Table 2: Fasting blood glucose level in
diabetics at the time of diagnosis (n=78)
Frequency distribution table with
qualitative data:
Table 1: Cases of malaria in adults and children in the months of
June and July 2010 in Nair Hospital (n=389)
EXAMPLE
This is a poor example because:
• The table lacks a title
• The source of the information is
not provided
• Row titles overlap two lines
• The alphabetical listing of regions
results in a non-numerical ordering
of data down the columns
EXAMPLE
This is a better example
because:
• The table has title
• The source of the information
is provided
• Row titles not in two lines
• The alphabetical listing of
regions results in a numerical
ordering of data down the
columns
• Numbers are aligned
Graphical Presentation
• A Graphical representation is a visual display of
data and statistical results. It is more often and
effective than presenting data in tabular form
• Graphical representation helps to quantify, sort
and present data in a method that is
understandable to a large variety of audience
• Graphs also enable us in studying both time
series and frequency distribution as they give
clear account and precise picture of problem
• Graphs are also easy to understand and eye
catching
General Principles of Graphic
Presentation
• In a graph there are two lines called coordinate axes
• One is vertical known as Y axis and the other is
horizontal called X axis
• These two lines are perpendicular to each other.
Where these two lines intersect each other is called ‘0’
or the Origin
• On the X axis the distances right to the origin have
positive value and distances left to the origin have
negative value
• On the Y axis distances above the origin have a positive
value and below the origin have a negative value
• It should have a title, legend and labelling
VARIOUS CHARTS AND DIAGRAMS
• Bar Diagram
• Histogram
• Frequency polygon
• Cumulative frequency curve/ Ogive
• Scatter diagram
• Line diagram
• Pie diagram
• Pictogram
• Stem and Leaf Plot
BAR DIAGRAM
• Bar charts are used for qualitative type of variable in which the variable
studied is plotted in the form of bar along the X-axis (horizontal) and the
height of the bar is equal to the percentage or frequencies which are
plotted along the Y-axis (vertical).
• The width of the bars is kept constant for all the categories
• The space between the bars also remains constant throughout.
• The number of subjects along with percentages in bracket written on the
top of each bar
• Types:
– Simple
– Compound
– Component
SIMPLE BAR CHART
• When we draw bar charts with only one
variable or a single group it is called as simple
bar chart
COMPOUND BAR CHART
• When two variables or two groups are considered it is called as multiple/
compound bar chart
• In multiple bar chart the two bars representing two variables are drawn
adjacent to each other and equal width of the bars is maintained
COMPONENT BAR CHART
• Bar chart wherein we have two qualitative variables which are further
segregated into different categories or components is called component
bar chart
• In this the total height of the bar corresponding to one variable is further
sub-divided into different components or categories of the other variable
HISTOGRAM
• A histogram is used for quantitative continuous
type of data where, on the X-axis, class intervals
and on the Y-axis we plot the frequencies
• It is very similar to the bar chart with the
difference that the rectangles or bars are
adherent (without gaps)
• It is used for presenting class frequency table
(continuous data)
• Diagram consisting of rectangles whose area is
proportional to the frequency of a variable and
whose width is equal to the class interval
Distribution of the subjects by Cholesterol level
Serum Cholesterol (mg/dl) No. of Subjects Percentage (%)
175-200 3 30
200-225 3 30
225-250 2 20
250-275 1 10
275-300 1 10
Total 10 100
EXERCISE
EXERCISE
FREQUENCY POLYGON AND CURVE
•Plot the variable along the X-axis and the
frequencies along the Y-axis
•Derived from a histogram by connecting the
mid points of the tops of the rectangles in the
histogram
•The line connecting the centres of histogram
rectangles is called frequency polygon
•If we construct a smooth freehand curve
passing through these points. Such a curve is
known as frequency curve
(n=37)
CUMULATIVE FREQUENCY DIAGRAM
One can tell the number of patients that lie above or below a certain level
Exercise
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation
Types of data and graphical representation

Types of data and graphical representation

  • 1.
    TYPES OF DATA AND GRAPHICAL/ TABULAR REPRESENTATION DR. REENA TITORIA
  • 2.
    INTRODUCTION • Statistics maybe defined as the science, which deals with collection, presentation, analysis and interpretation of numerical data DATA INFORMATION PRESENTATION DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
  • 3.
    A set ofvalues recorded on one or more observational units is called data Data should be processed Data depiction, data summarization and data transformation INFORMATION
  • 4.
    Collected data shouldbe • Accurate (i.e. Measures true value of what is under study) • Valid( i.e. Measures only what is supposed to measure) • Precise(i.e. Gives adequate details of the measurement) • Reliable(i.e. Should be repeatable)
  • 6.
    Types of DATA •Qualitative/ Quantitative • Discrete/ Continuous/ Interval/ Ratio • Primary/ Secondary • Nominal/ Ordinal
  • 7.
    Quantitative data: Qualitativedata: •Also called as measurement data •Can be expressed as number with or without unit of measurement •Eg: Height in cm, Hb in gm%, BP in mm of Hg, Weight in kg •Represents a particular quality or attribute •Expressed as numbers without unit of measurements •Eg: religion, Sex, Blood group etc
  • 8.
    Discrete data: • Herewe always get a whole number. • Ex: Number of beds in hospital Malaria cases Continuous data : • It can take any value possible to measure or possibility of getting fractions • Ex: Hb level, Ht, Wt. WHAT IS IMPORTANT???
  • 9.
    Interval: • Has valuesof equal intervals that mean something. For example, a thermometer might have intervals of ten degrees • Ex: Celsius Temperature, IQ (intelligence scale) Ratio: • Exactly the same as the interval scale except that the zero on the scale means: does not exist • Ex: Age, Weight, Height
  • 10.
    Primary data: • Datacollected by the investigator himself/ herself for a specific purpose • Ex: Data collected by a student for his/her thesis or research project • Advantages: – The investigator collects data specific to the problem under study. – There is no doubt about the quality of the data collected (for the investigator). – If required, it may be possible to obtain additional data during the study period.
  • 11.
    Secondary data: • Datacollected by someone else for some other purpose (but being utilized by the investigator for another purpose) • Ex: Census data being used to analyze the impact of education on career choice and earning • Advantages of using Secondary data: – The data’s already there- no hassles of data collection – It is less expensive – The investigator is not personally responsible for the quality of data (“I didn’t do it”)
  • 12.
    Nominal data: • Theinformation or data fits into one of the categories, but the categories cannot be ordered • Categories without order • Ex: Colour of eyes, Race, Gender Ordinal data: • A rank or order • Here the categories can be ordered, but the space or class interval between two categories may not be the same • Ex: Ranking in the class or exam, SES
  • 13.
    QUESTION A person's highesteducational level is which type of variable? • Continuous • Discrete • Ordinal • Nominal The number of motor-vehicle accidents on a particular stretch of the national highway in a week is which type of variable? • Continuous • Discrete • Nominal • Ordinal
  • 15.
    DATA Quantitative Discrete Continuous IntervalRatio Qualitatitive Nominal Ordinal
  • 16.
    REPRESENTATION OF DATA •Tabular • Graphic • Numeric
  • 17.
    When to useTables • When you wish to show how a single category of information varies when measured at different points • When the dataset contains relatively few numbers • When the precise value is crucial to your argument and a graph would not convey the same level of precision • For example: when it is important that the reader knows that the result was 2.48 and not 2.45 • When you don’t wish the presence of one or two very high or low numbers to detract from the message contained in the rest of the dataset
  • 18.
    Tabular Presentation 1. Tablemust be numbered 2. Brief and self explanatory title must be given to each table 3. The heading of columns and rows must be clear, sufficient, concise and fully defined 4. The data must be presented according to size of importance, chronologically, alphabetically or geographically 5. Table should not be too large 6. The classes should be fully defined, should not lead to any ambiguity 7. The classes should be exhaustive i.e. should include all the given values 8. The classes should be mutually exclusive and non overlapping. 9. The classes should be of equal width or class interval should be same 10. The number of classes should be neither too large nor too small
  • 19.
    Normal Range 18.5 ≤ x < 25
  • 20.
    Frequency distribution tablewith quantitative data: Table 1: Fasting blood glucose level in diabetics at the time of diagnosis (n=78) Fasting Glucose n 120-129 12 130-139 8 140-149 10 150-159 10 160-169 15 170-179 18 180-189 5
  • 21.
    Cross- Tabulation • Table2: Fasting blood glucose level in diabetics at the time of diagnosis (n=78)
  • 22.
    Frequency distribution tablewith qualitative data: Table 1: Cases of malaria in adults and children in the months of June and July 2010 in Nair Hospital (n=389)
  • 23.
    EXAMPLE This is apoor example because: • The table lacks a title • The source of the information is not provided • Row titles overlap two lines • The alphabetical listing of regions results in a non-numerical ordering of data down the columns
  • 24.
    EXAMPLE This is abetter example because: • The table has title • The source of the information is provided • Row titles not in two lines • The alphabetical listing of regions results in a numerical ordering of data down the columns • Numbers are aligned
  • 25.
    Graphical Presentation • AGraphical representation is a visual display of data and statistical results. It is more often and effective than presenting data in tabular form • Graphical representation helps to quantify, sort and present data in a method that is understandable to a large variety of audience • Graphs also enable us in studying both time series and frequency distribution as they give clear account and precise picture of problem • Graphs are also easy to understand and eye catching
  • 26.
    General Principles ofGraphic Presentation • In a graph there are two lines called coordinate axes • One is vertical known as Y axis and the other is horizontal called X axis • These two lines are perpendicular to each other. Where these two lines intersect each other is called ‘0’ or the Origin • On the X axis the distances right to the origin have positive value and distances left to the origin have negative value • On the Y axis distances above the origin have a positive value and below the origin have a negative value • It should have a title, legend and labelling
  • 27.
    VARIOUS CHARTS ANDDIAGRAMS • Bar Diagram • Histogram • Frequency polygon • Cumulative frequency curve/ Ogive • Scatter diagram • Line diagram • Pie diagram • Pictogram • Stem and Leaf Plot
  • 28.
    BAR DIAGRAM • Barcharts are used for qualitative type of variable in which the variable studied is plotted in the form of bar along the X-axis (horizontal) and the height of the bar is equal to the percentage or frequencies which are plotted along the Y-axis (vertical). • The width of the bars is kept constant for all the categories • The space between the bars also remains constant throughout. • The number of subjects along with percentages in bracket written on the top of each bar • Types: – Simple – Compound – Component
  • 29.
    SIMPLE BAR CHART •When we draw bar charts with only one variable or a single group it is called as simple bar chart
  • 30.
    COMPOUND BAR CHART •When two variables or two groups are considered it is called as multiple/ compound bar chart • In multiple bar chart the two bars representing two variables are drawn adjacent to each other and equal width of the bars is maintained
  • 31.
    COMPONENT BAR CHART •Bar chart wherein we have two qualitative variables which are further segregated into different categories or components is called component bar chart • In this the total height of the bar corresponding to one variable is further sub-divided into different components or categories of the other variable
  • 32.
    HISTOGRAM • A histogramis used for quantitative continuous type of data where, on the X-axis, class intervals and on the Y-axis we plot the frequencies • It is very similar to the bar chart with the difference that the rectangles or bars are adherent (without gaps) • It is used for presenting class frequency table (continuous data) • Diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval
  • 33.
    Distribution of thesubjects by Cholesterol level Serum Cholesterol (mg/dl) No. of Subjects Percentage (%) 175-200 3 30 200-225 3 30 225-250 2 20 250-275 1 10 275-300 1 10 Total 10 100 EXERCISE
  • 34.
  • 35.
    FREQUENCY POLYGON ANDCURVE •Plot the variable along the X-axis and the frequencies along the Y-axis •Derived from a histogram by connecting the mid points of the tops of the rectangles in the histogram •The line connecting the centres of histogram rectangles is called frequency polygon •If we construct a smooth freehand curve passing through these points. Such a curve is known as frequency curve
  • 37.
  • 38.
  • 39.
    One can tellthe number of patients that lie above or below a certain level
  • 40.

Editor's Notes

  • #9 BP? Go by the entity and not the tool with which you are measuring the entity
  • #32 For example two communities are compared in their proportion of energy obtained from various food stuff, each bar represents energy intake by one community, the height of the bar is 100, it is divided horizontally into 3 components (Protein, Fat and carbohydrate) of diet, each component is represented by different color or shape.