2. Data Analysis
Learning Objectives
• List out the common responsibilities of a data analyst
• Distinguish between descriptive statistics and inferential statistics
• Explain the types of inferences
3. Data Analysis
Data Analyst – is a facilitator between an organization and stakeholders in summarizing,
analysing, and presenting data into a more meaningful manner.
Reasons for Data Analysis
• Describe the data set using the results of analysis
• Make inferences from the results of analysis.
• Estimate unknown quantity from the data set
• Test hypothesis about unknown quantity
• Make decisions based on statistical results
4. Classification of Statistical Analysis
Methods and Tools for Data Analysis
• Descriptive Statistics
• Inferential Statistics
Descriptive Statistics:
• Deals with:
Data graphs
Tables
Various summary measures such as measures of central tendency (mead, mode, median etc) and measures
of deviation (variance, standard deviation)
• Numeric values are computed and used to describe the values in the data set
• Descriptive statistics provides a general insight about a problem or situation at hand by describing the data
5. Classification of Statistical Analysis Cont’d
Inferential Statistics
• Draw or make meaningful predictions from the data
• Lets Define two Words
Population – set of all elements under a study or research
Sample – subset derived from the population
• Two Types of Inferences are:
Estimation – Making inferences about the exact values of the population parameter
Hypothesis – making inferences about not exact values of the population
6. Types of Data
Data
• Data are the raw facts/characteristics about and entity
Broad Classification of
Data
Quantitative
A qualitative (or categorical) variable is a
variable that records a quality.
Quantitative
A quantitative variable is one that can be
described by a number for which arithmetic
operations, such as averaging, makes sense.
7. Measurement Scales
The scales of measurements are ways in which variables or numbers are defined and
categorized.
Nominal Scale
• numbers serve as tags or
labels only, to identify or
classify an object.
• deals only with non-numeric
(quantitative) variables or
where numbers have no
value.
Ordinal Scale
• data elements may be
ordered according to their
relative size or quality
Interval Scale
• attributes composing
variables are measured on
specific numerical scores or
values and there are equal
distances between
attributes.
• The distance between any
two adjacent attributes is
called an interval, and
intervals are always equal.
• the value of zero is assigned
arbitrarily. Along major
highways, miles or
kilometres are marked
prominently at various
points.
Ratio Scale
• quantitative scale where
there is a true zero and
equal intervals between
neighboring points
8. Quantitative Variables
Two Types of Quantitative Variable are Discrete and Continuous Variables
Discrete Variables
•variables are those that
can assume only a
countable or finite
number of values.
Example: The number of
customers that come to a
shop
Continuous Variables
•Variables that vary
continuously.
•Example is the distance
covered
10. Visual Representation of Data
A large data set is meaningless unless it is visually represented . This is the visual representation
of data and information by using visual elements like:
• Tables - arrange data in terms of rows and columns
• Charts – visual display of information
• Graphs - a visual representation of the relations between certain quantities, represented as
points, plotted with reference to a set of axes
• Maps – ls
11. Visual Representation of Data – Tables
There are two types of tables used to represent data
Frequency Tables
• An effective method of presenting data that is essentially quantitative is to present it as a
frequency table, where the data is grouped into various categories or classes.
• The frequency table lists the number of observations for some variables that fall into various
categories. For Example Age Frequency
<35 1
36-40 3
41-45 2
46-50 7
51-55 5
56-60 3
61-65 1
66-70 3
12. Visual Representation of Data – Tables
Pivot Tables
• One of the most powerful tools in Excel for analysing data is a pivot table.
• Pivot tables enable us in a variety of ways in order to investigate relationships in data. Tables
generated using this technique are often called contingency tables or cross tabs.
Gender Counter Gender
Age 0(Male) 0(Female) Grand Total
36-40 19.2% 12.60% 31.8%
41-45 22.2% 15.10% 37.3%
46-50 13.3% 7.20% 20.50%
51-55 5.1% 3.8% 0.90%
56-60 1.0% 0.50% 1.50%
Grand Total 60.50% 39.5% 100.00%
13. Visual Representation of Data – Charts
Histogram
• When data is grouped into categories or classes, we could plot a frequency distribution of the data.
• Visually, a histogram is a chart made up of bars of different heights.
• The height of each bar represents the frequency of values in the class represented by the bar.
• Adjacent bar share sides
Bar Chart
• Bar charts use horizontal or vertical rectangles to display categorical data when there is no emphasis on
the percentage of a total represented by each category.
• The scale of measurement is nominal or ordinal.
• A bar chart is a good way to show how different categories stack up against one another.
14. Visual Representation of Data – Charts
Pie Chart
• Pie charts are used to show the proportion of different parts that make up the whole.
Pie Chart
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
15. Visual Representation of Data – Graphs
Scatter Plot
• used when you want to investigating the relationship between two variables..
0
0.5
1
1.5
2
2.5
3
3.5
0 0.5 1 1.5 2 2.5 3
Axis
Title
Axis Title
Y-Values
Linear (Y-Values)
16. Visual Representation of Data – Graphs
Line Graphs
• Line graphs are an effective way to represent the relationship between two variables
particularly when time is involved.