MSC III_Research Methodology and Statistics_FINAL.pptx

Research
Methodology and
Statistics
Dr. Suchita Rawat
MPhil, NET, PhD

Course Objectives
To provide an understanding of research methodology
To enable students to apply research methodology to the field of
forensic science
Course Outcomes
After the successful completion of the course, the student will
be able to:
CO1 : to recall and recognize the objectives, motivations, and types
of research
CO2 : appraise the methods of sampling and research design
CO3 : develop and execute primary and secondary data collection
CO4 : Test and validate descriptive and inferential statistics on
continuous and categorical

Unit /Topic No. OF
HOURS
TEACHING METHODOLOGY TIME OF
COMPLETION
Unit 1: Introduction to Research
Methodology
15 Mapped to Ms. Aditi
Unit 2: Research and Sampling Design
(Steps in Sampling Design; Criteria of
Selecting a Sampling Procedure; 10
Characteristics of a Good Sample Design;
Types of Sample Designs; Hypothesis
formulation and testing)
10 Co shared with Ms. Aditi
Participatory TL: Interactive Lecture,
Guided library work, Technical
presentation
Aug 14- Aug 19, 2023
Unit 3: Data Collection 8 Participatory TL: Interactive Lecture,
Guided library work, Technical
presentation
Sep 4- Sep 9, 2023
Unit 4: Descriptive Statistics and Inferential
Statistics
12 Participatory TL: Interactive Lecture,
,Guided library work,
Technical presentation
Experiential TL: workshop
Oct 30 – Oct 31, 2023

Unit 2: Research and
Sampling Design
Unit 3: Data Collection Unit 4: Descriptive Statistics
and Inferential Statistics
Sampling Design; Criteria of
Selecting a Sampling
Procedure; Characteristics of a
Good Sample Design; Types of
Sample Designs; Hypothesis
formulation and testing.
Sampling Design; Criteria of
Selecting a Sampling
Procedure; 10 Characteristics
of a Good Sample Design;
Types of Sample Designs;
Hypothesis formulation and
testing.
Statistics in research;
Measures of Central Tendency;
Measures of Dispersion;
Measures of Asymmetry;
Measures of Relationship;
Simple Regression Analysis;
Multiple Correlation and
Regression; t-test; Chi square
test; ANOVA; Introduction to
Statistical Package for Social
Sciences (SPSS)

Sampling techniques are methods used to select a
representative subset (sample) from a larger population
for the purpose of conducting research, analysis, or
making inferences about the entire population.

Some of the key advantages of sampling include:
Time and Cost Efficiency
Feasibility
Accuracy if representative of the population
Ethical Considerations (potential harm or invasion of
privacy)
Practicality for Data Analysis
Generalization (in a representative and unbiased)
Accessibility

Important terminology
Population.
• The entire group or set of individuals, items, or elements from which the
sample is drawn, and the results are generalized.
Sampling Frame
• A list or representation of all the elements in the population from which
the sample is drawn. It is the actual source used to select the sample.
Sample
• A subset of the population that is selected for study or analysis. The
sample represents the larger population, and conclusions drawn from the
sample are extrapolated to the population.
Sampling Unit
• The individual element or item in the population that can be selected in
the sampling process. It can be a person, household, product, or any other
discrete entity.

Criteria of Selecting a Sampling Procedure
Representativeness: sample is a true reflection of the
population's characteristics
Randomness : each individual in the population has an
equal chance of being selected for the sample/avoid bias
Precision: close the sample's results are to the true
population values
Feasibility: The chosen sampling procedure should be
feasible in terms of time, budget, and resources.
Research Objectives
Accessibility:availability of a sampling frame
Homogeneity
Ethical Considerations

‘Stratified Random Sampling’,
and
‘Stratified Proportional
Sampling’.
Lottery and Tippet’s Table

Sampling
technique
Advantages Disadvantages
Simple
Random
Sampling
1. It is hassle-free method of sampling
population is homogeneous.
2. b. There is no chance of personal bias of the
researcher to influence
3. requires no computation of any sort
1. It cannot be used in heterogeneous population.
2. It cannot be used where researcher wants to
conduct a mini-comparison within the universe by
studying the sample in divisions.
3. It requires basic knowledge of the universe, to
make a list to be able to choose from
Systemic
Sampling
1. This method is easy to understand and use.
2. b. This method involves least number of
steps.
3. There is least chance of influence of
personal bias of researcher.
4. No knowledge of the universe is required
before sampling
1. Every unit in the universe does not have equal
chances of being selected in the sample as the
selection depends on the ‘n’ number chosen.
2. It is not an effective sampling method in case of
heterogeneous population.

Sampling
technique
Stratified
Sampling
1. There is better representation of the
different characteristics of the population.
2. The researcher can use results from
different strata to compare results within
the universe.
1. it involves more time as samples are to be taken
out from each strata to form the final sample
Cluster
sampling
1. It is useful where the population is divisible
into clusters, even heterogeneous clusters.
2. useful in large geographical areas.
3. As division of clusters is not dependent on
them being homogeneous. Therefore, more
than one characteristic can be studied in
one cluster.
4. There is no need to have a prior knowledge
of the population.
1. The clusters are not equal in size, so the final
sample may not represent the population
proportionately. Even if the study is conducted in
multi- phase manner, the clusters do not offer a
comparative analysis.
2. There is a possibility that a same person may form
part of more than one cluster. This will lead to
over representativeness.
3. there is a possibility that some clusters may be
homogeneous while other may be heterogeneous

Sampling
technique
Convenience
sampling
1. suited for those researches
which are preliminary or pilot
projects, and which will be
supplemented with further
probability sampling research.
1. Low Diversity: tends to attract participants who are
readily accessible or willing to participate, leading to a
sample that lacks diversity in terms of demographics,
opinions, or experiences.
Purposive
sampling
1. It is easy on the pocket, as the
researcher chooses the units
himself/herself. There is no cost
involved in selecting units for
sample.
2. No prior knowledge of the
universe is required.
1. Representativeness of the sample is questionable.
2. It is not useful in cases of heterogeneous population.
3. Sampling may be influenced by the personal bias of the
researcher

Sampling
technique
Quota sampling 1. The advantage of quota
sampling is its cost and time
efficacy.
2. It is one of the most effective
sampling, for small scale as
well as large scale sampling.
1. Lack of Representativeness
2. Determining the appropriate quotas can be challenging,
especially if the characteristics being targeted for quota
setting are interconnected or difficult to define
Snow ball
sampling
1. Access to Hard-to-Reach
Populations:
2. Cost-Effective
3. Quick Data Collection
1. Limited Control: Researchers have limited control over
the sampling process, as it relies heavily on
participants' referrals.

Principles and Precautions
of Sampling
The universe must be clearly defined.
The sampling units must be distinct and independent of
each other.
A clearly chalked out sampling design ensures
predetermined steps, and also encompasses planning for
contingencies.
Sampling must be done in an unbiased, objective and
systematic manner.
The objective of the research must be kept in mind while
sampling.
Arbitrary alterations must be avoided during sampling.
Sample size must be chosen in accordance with the nature of
study, i.e. qualitative or quantitative, and taking into
consideration the size of the universe.

Principles and Precautions
of Sampling
 The cost and time factor is an important influencing
factor in research. It is advisable to not see these factors
as an impediment to research, but to utilise them in the
most efficient way possible.
 Ease of contacting the respondents is another
important factor that is to be taken into consideration
while sampling.
 Even with the advent of technology, care must be taken
by the researcher that the selected respondents are
source of objective, unbiased answers.
 It should also be ensured to maximum possible extent
that the potential respondents are not being forced for
participation in the research.
 Sampling errors (in sample size, proportions) must be
avoided as much as possible.

Characteristics of a Good Sample
Design
Representativeness Randomization Adequate Sample Size
Sampling Frame: A clear
and accurate sampling
frame, which is a list of all
the potential individuals
or units in the population
Sampling Method:
should align with the
research objective
Sample Variability:
considers the variability
within the population.
Avoidance of Bias: non-
response bias, selection
bias, or measurement
bias.
Ethical Considerations:
Participants' rights and
informed, privacy and
confidentiality
Clear Sampling Plan
Pilot Testing: help
identify any potential
issues or areas for
improvement in the
sampling process.

HYPOTHESIS
A hypothesis is a specific, testable, and
falsifiable statement or proposition that
predicts a relationship between variables or
anticipates an outcome in a research study.
It serves as a tentative explanation that
researchers aim to confirm or reject through
empirical observation and analysis.
Testable
Specific
(clear and precise)
Falsifiable
(capable of being
proven wrong through
evidence)
Predictive
(Relationship or effect
between variables)
Empirical
(Observations,
existing theories, or
logical reasoning)
Verifiable(observable
and measurable
results)

Hypothesis Formulation
Identify the
Research
Problem
Literature Review
(understand gap
in literature)
Formulate the
Hypothesis: Null
and alternative
hypothesis
Specify Variables:
Clearly define the
independent
variable(s) and the
dependent
variable(s)
Directional vs.
Non-Directional
Hypotheses
Research Question: Does the new drug lead to a decrease in blood pressure?
Example of a Directional Hypothesis:
The new drug leads to a decrease in blood pressure.
Example of a Non Directional Hypothesis:
There is a relationship between new drug and blood pressure.

Types of Hypothesis
•Simple hypothesis: This type of hypothesis suggests
that there is a relationship between one independent
variable and one dependent variable.
Eg. "Students who eat breakfast will perform
better on a math exam than students who do not
eat breakfast.“
•Complex hypothesis: This type of hypothesis suggests a
relationship between three or more variables, such as
two independent variables and a dependent variable.
Eg. "People with high-sugar diets and sedentary
activity levels are more likely to develop
depression."

Null hypothesis: This hypothesis suggests
no relationship exists between two or
more variables.
Eg. "Children who receive a new
reading intervention will have no
difference in the scores.“
•Alternative hypothesis: This hypothesis
states the opposite of the null hypothesis.
Eg. "Children who receive a new
reading intervention will perform
better than students who did not
receive the intervention."

Statistical hypothesis: This hypothesis uses statistical
analysis to evaluate a representative sample of the
population and then generalizes the findings to the larger
group.
Eg. There is a correlation between students'
study hours and their exam scores.
Logical hypothesis: This hypothesis assumes a
relationship between variables without collecting data
or evidence.(based on logic)
Eg. If a plant is deprived of sunlight, it is
expected that its growth will be negatively
affected compared to a plant that receives
adequate sunlight.

This Photo by Unknown Author is licensed under CC BY

To write a hypothesis:
Identify what the problem is.
Make an educated guess as to what direction of the
relationship or difference is.
Identify the major variables.
The format for writing a hypothesis is . . . o If (variables),
o Then (predict the outcome of the experiment using the
dependent variable).
Eg. Observation : Chocolate may cause acne.
scientific hypothesis statement that is measurable: If a
person’s frequency of acne is related to the amount of
chocolate a person consumes, then the frequency of acne
will be 25% higher when subjects consume large amounts
of chocolate (5 chocolate bars per day) than when subjects
consume little or no chocolate.

As a group, create hypothesis based
on sample observations/general
hypotheses.
A few sample items from which to develop scientific
hypothesis are:
1. Salt in soil may affect plant growth.
2. Temperature may cause leaves to change color.
3. Sunlight causes fruit to ripen more quickly.
4. Plant growth may be affected by the color of the
light.
5. Bacterial growth may be affected by temperature.
6. Ultra violet light may cause skin cancer.

Testing of hypothesis
Formulate a
Hypothesis
(Null and
Alternative
Hypotheses)
Choose a
Significance
Level (epresents
the probability of
making a Type I
error (rejecting
the null
hypothesis when
it's actually true)
Collect Data
(hypothesis and
research design)
Perform
Statistical
Analysis
Calculate Test
Statistic: (e.g., t-
statistic, z-score,
F-statistic).
Draw a
Conclusion
Consider
Limitations
(potential
sources of error,
and the
generalizability
of findings to
population)
Report Results
(research papers,
presentations, or
other appropriate
channels.)

Data collection
Data collection for research refers to the systematic process
of gathering relevant and accurate information or data
from various sources or participants in order to address
research questions, test hypotheses, and draw meaningful
conclusions.

Types of data collection
PRIMARY DATA COLLECTION SECONDARY DATA COLLECTION
DEFINITION new and original data directly from
the source or participants for a
specific research project
using existing data that has been
collected by other researchers,
organizations, or sources for purposes
other than the current research project
ADVANTAGES  Relevance
 Control (Data quality and
consistency)
 Specificity
 Freshness
 Time and Cost Savings
 Convenience
 Historical Analysis(trends, changes
over time)
 Large scale data
DISADVANTAGES  Time and Resources
 Complexity
 Bias (Researcher bias or
unintentional influence)
 Relevance
 Qulaity
 Limited Control
 Availability
METHODS Surveys,Interview, Observation,
Experiments
Focus group, Case studies,
Ethnography

Primary data collection
Surveys
Interview
Observation
Experiments
Focus group
Case
studies
Ethnography
This Photo by Unknown Author is licensed under CC BY-SA-NC

Surveys
presenting a set of structured questions to participants, usually in
written or digital format.
Surveys can be administered through various channels, such as
paper forms, online platforms, emails, or mobile apps

Interview
direct interaction between the researcher and the participants to gather in-depth information,
insights, opinions, and perspectives on a specific research topic.
Allow researchers to delve into the thoughts, experiences, and viewpoints of participants,
providing rich qualitative data

STEPS
Define Objectives Select Participants
•Demographics, expertise, or
roles.
Choose Interview
Type
•Structured Interviews (pre-
defined set of questions)
•Semi-Structured Interviews
(pre-determined questions with
open-ended follow-up questions)
•Unstructured Interviews
(more conversational style)
Develop Interview
Guide
•questions you plan to ask,
along with prompts or probes
to encourage elaboration
Preparation
•interview guide, topic, and
participants' backgrounds
Scheduling and
Conduct Interviews
•mode of interview (in-person,
phone, video)
Recording and
Transcription
Data Analysis and
coding
Interpretation Quoting and
Reporting
•use direct quotes from
participants to illustrate key
points and support your findings

Methodology
In 2017, we interviewed six Mumbai residents aged 60 and over who had been victims of cybercrime,
and five of their family members. The victims interviewed were all males, aged from 62 to 77 years,
graduates and used the internet for an average of an hour a day before their victimization. They lost
between 30K and 2000K rupees, through cell phone, bank card hacking or social engineering fraud.
We also interviewed seven experts: two from banking (n = 2, a Bank Chairman and Chief technology
office), one police inspector, one lawyer, one private investigator specializing in cybercrime and two
older people’s welfare workers from a third sector organization.

From victims
semi-structured, qualitative interviews, in English or
Hindi.
The interview topic guide:
 how the cybercrime took place
 experiences reporting the crime
 impacts of the crime upon victims
 their emotional and practical responses
 how the crime be prevented, or better managed.

From experts
semi-structured, qualitative interviews, in English or
Hindi.
The interview topic guide:
 how and why older people might be particularly
vulnerable to cybercrime
 what they consider to be protective factors
 about the institutional response to cybercrime
experienced by older people in Mumbai
 how it might be prevented or better managed.

theoretical basis was the crime triangle approach
Themes: Unresponsive institutions, Lack of data protection and privacy
safeguards of data protection and privacy safeguards
“In foreign countries if anything happens the bank will take the responsibility
and refund the money, here bank will not take any cognisance of it.” (Wife of
a victim)
Themes:Lack of proximal family support,
Relative affluence
“Senior citizens don’t understand that they
have to protect their password very
carefully. Since it is very difficult, maybe it is
difficult for them to remember the password,
they write it somewhere … They generally
write it on a paper and keep it with the credit
card or the debit card. So if their wallet is
stolen, then all the information is gone.”
(Police inspector)
“The criminals are organised. If some amount
is stolen or deposited by some victim in an
account here in Bombay, the amount is
withdrawn from different ATMs all over India.
And it has to be an organised racket to work in
this fashion. Moreover, it is a very low-risk
crime. Even if the criminal is caught, there is no
proper investigation to collect the evidence and
to take the case to its logical end. That’s why
the criminals have become more and more
bold to commit these type of crimes.” (Police
inspector)

Focus group discussion
A focus group is a small group of six to
ten people led through an open
discussion by a skilled moderator.
The ideal amount of time to set aside
for a focus group is anywhere from 45
to 90 minutes.
A focus group is not:
ƒ
A debate ƒ Group therapy ƒ A conflict
resolution session ƒ A problem solving
session ƒAn opportunity to collaborate ƒ
A promotional opportunity ƒ An
educational session

Designing focus group
questions
Twelve is the maximum number of questions for any one
group. Ten is better, and eight is ideal
questions should be
 Short and to the point
 Focused on one dimension each
 Unambiguously worded
 Open-ended or sentence completion types
 Non-threatening or embarrassing
 Worded in a way that they cannot be answered with a
simple “yes” or “no” answer (use “why” and “how”
instead)
Engagement questions
Exploration questions
Exit question

Recruiting and preparing
for participants
all the participants are very comfortable with each other
but none of them know each other
Homogeneity is key to maximizing disclosure among focus
group participants. Consider the following in establishing
selection criteria for individual groups:
 Gender – Will both men and women feel comfortable
discussing the topic in a mixed gender group? ƒ
 Age – How intimidating would it be for a young person
to be included in a group of older adults? Or vice versa
 Power – Would a teacher be likely to make candid
remarks in a group where his/her principal is also a
participant?
 Cliques – How influential might three cheerleaders be in
a group of high school peers?

Recruiting and preparing for participants
Participant inclusion/exclusion criteria should be established upfront and based on the purpose of the study.
Use the criteria as a basis to screen all potential applicants.
Nomination
Random selection
All members of the same group
Same role/job title
Volunteers
participants
recruitment
methods
(over invite/offer
incentives/reduce barrier of
attending)
Once a group of viable recruits has been established, call each one to confirm interest and availability. Give
them times and locations of the focus groups and secure verbal/ email confirmation.

Conducting the focus
group
Moderator (listen/think/group
participation/knowledge of
topic/handle group
challenges)
Assistant moderator
(tape recording/notes/body
language and cues)
This Photo by Unknown Author is licensed under CC BY

secure approval from a Human Subjects
Committee.

Analyzing the data
Using NVIVO

Strengths of Focus Group
Discussions
Rich Qualitative Data
Group Dynamics
Exploration of Complex Issues
Rapid Data Collection
Understanding Diversity
Contextual Insights:

Weaknesses of Focus Group
Discussions
Group Influence
Lack of Anonymity
Limited Generalizability
Moderator Skill Dependency
Time and Resource Intensive:
Analysis Complexity
Limited Privacy

Observation methods
Primary data collection through observation
involves systematically observing and
recording behaviors, events, or phenomena
in their natural setting.
This method is particularly useful when
researchers aim to gain a direct understanding
of how people or objects behave and interact in
real-world situations. Here's how primary data
collection by observation method works:

Define Objectives:
Choose the Setting: public space, a workplace, a classroom, a natural
habitat, or any location relevant to your research.
Select Participants: demographics, roles, or characteristic
Decide on Observation Type: Structured
Observation/Unstructured Observation/Participant Observation
Create Observation Plan: location, duration, and frequency of
observation sessions.
Training and Pilot: multiple observers, provide training to ensure
consistency in observations

Ethnography
Conducting ethnography involves immersing
yourself in a particular culture or social
setting to gain a deep understanding of its
practices, behaviors, beliefs, and values.
Ethnography is commonly used in
anthropology and sociology to study human
societies. Here's a step-by-step guide on how
to conduct ethnography:

Select :a Research Topic:
lives, practices, or beliefs.
Research and Background
Study: culture or community,
history, language, traditions,
and any existing research
Establish Relationships
Participant Observation:
Observe and take notes on their
behaviors, interactions, rituals,
and practices.
1.Field Notes and
Journals: field notes and
journals during your
observations.
1.Conduct Interviews: Collect Artifacts
1.Document
Visuals: photographs, videos, or
sketches to document visual
such as architecture, clothing,
and rituals.
1.Triangulation: Use multiple
sources of data to validate your
findings.
Constant Comparative Analysis
Data Analysis and Writing
Ethnographic Reports
Feedback and validation

Secondary data collection refers to the process of
gathering and using existing data that has been
previously collected by someone else or for a different
purpose.
research studies
government reports
academic literature
online databases
.

Internal
Secondary
Data
an organization or individual
has collected for their own
purposes, and it's being reused
for a new analysis
External
Secondary
Data
data collected by external
sources, such as government
agencies, research
institutions, or other
organizations, and is made
available for public use

GOAL OF LITERATURE
SEARCH
 SEARCH COMPREHENSIVELY 9
(TERMINOLOGY/DATABASE)
 SEARCH FOR GREY
LITERATURE
 SEARCH FOR UNPUBLISHED
STUDY

Sources of Information
Retrieval
 Subject specific journals
 Citations from most relevant
articles (crossref)
 Electronic Databases
 Grey literature
(dissertation/thesis/govt. report, research
reports, organization websites)
 Experts in the field

AI tools for
Literature
searches

CASE STUDY
A case study is a research method that
involves an in-depth and detailed
examination of a specific subject, often
within its real-world context.
This method is commonly used in various
fields, including psychology, sociology,
business, education, and medicine, among
others.

The primary characteristics of a case study
Focus on a Specific Subject (single individual, group, organization,
event, or phenomenon)
In-Depth Investigation (interviews, observations, surveys, or
document analysis)
Contextual Analysis (ircumstances, conditions, and factors that
influence or are influenced by the case)
Holistic Perspective (psychological, social, cultural, economic,
historical etc)

The primary characteristics of a case study
Qualitative and/or Quantitative Data
Analytical Approach (identify patterns, themes,
relationships, and key insights)
Narrative Presentation (description of the case,
analysis of the data, conclusions, and often
recommendations or implications)
Unique and Contextualized Findings

Select Your Case
Define Your Research
Questions or Objectives
Choose Your Case Study
Type (Single/Multiple/
Longitudinal)
Gather Data
(Primary/Secondary/Trian
gulation)
Data Collection Methods
(Interviews/Observations/
Surveys/Document
Analysis/Archival
Research)
Data Analysis
(qualitative/quantitative/c
ross reference)
Develop a Case Study
Narrative (description of
the case, its context, key
events, and the people
involved,Use quotations to
support analysis)
Draw Conclusions and
Interpretations
Provide Recommendations
or Implications
Write the Case Study Report
(Introduction, literature review (if
applicable), methodology, case
description, analysis, conclusions, and
recommendations)

SUMMARY: students will analyse
cases in order to explore and combat
gender stereotypes and homophobia

CASE STUDY 1
John has recently moved to a different town with his parents, because they found better jobs. As he hadn’t
had the chance to meet people and make friends yet, he decided to find an extracurricular activity to do after
school. He searched for lessons or activities available and he found an incredible offer about some ballet
classes. On Wednesday, after school, he went to enrol to the lessons. When he entered the class, the girls that
were already there were really surprised and staring at him. After he explained that he wanted to attend the
lessons, the girls started pointing at him, laughing. The teacher did not react at all and looked really surprised.
John ran out of the class, crying.
Questions:
Do you think there are stereotypes in the case of John?
Could you mention some?
How do you think John feels about this situation?
Do you think that something like that could happen in real life?
How do you think you would react if something like that happened in your school?

CASE STUDY 2
Laura is running for president of the class. She is really happy that she will have a
chance to contribute to the exercise of students’ rights and she has made a plan on what
she wants to change. One day, five of her male classmates approached her and said ‘You
can’t be the president of our class! You’re a girl! Girls cannot be the leaders!’. Laura was
devastated, since she has been trying really hard to find ways in order for all students to
be represented by her plan.
Questions:
Do you think there are stereotypes in the case of Laura?
How do you think Laura feels about this situation?

CASE STUDY 3
Paul is a 17year old boy who has been struggling in order to accept himself for the past few
years. He understood that he was into boys two years back, but he has been trying to ‘fix
himself’ as he was told. He dated girls, hung out with male classmates and did what other
boys his age did. One day, his girlfriend, Sarah, wanted their relationship to go further, but
Paul was not into it. Sarah started mocking him and told everyone in their class that he was
gay. After one day, the whole school started calling Paul names, telling him that he was a
‘weirdo’ and that this was not normal.
Questions:
Do you think there are stereotypes in the case of Paul?
How do you think Paul feels about this situation?

Statistics deals with the collection, presentation,
analysis and interpretation of
quantitative/qualitative information.

This Photo by Unknown Author is licensed under
CC BY

The facts, observations and all the relevant
information that have been collected from research
and investigations is known as data
This Photo by Unknown Author is licensed under CC BY-SA

Graphs
They are the visual representation of a data for easy
understanding and to save time. They present frequency
distributions to see the shape of the distribution easily.

Bar Graphs
are a graphical representation of data based on statistics and numerical figures. A bar graph
uses the two axes – x-axis and y-axis to plot rectangular bars.
Types of Bar Graph
•Horizontal bar graph
•Vertical bar graph
•Double bar graph (Grouped bar graph)
•Multiple bar graph (Grouped bar graph)
•Stacked bar graph
•Bar line graph

Horizontal bar graph
When the Y-axis represents the observation to be
compared, and the x-axis represents the magnitude of the
observations, then the bars run horizontally along the x-axis
up to the point of value proportional to the observation.

Vertical Bar Graph
Vertical bar graphs are just the opposite of horizontal bar
graphs. Vertical bar graphs are preferred more than
horizontal bar graphs. When the X-axis represents the
observation to be compared, and the y-axis represents
the magnitude of the observations

Grouped Bar Graph
Double Bar Graph
make a comparison among various observations or
categories using two parameters. However, those two
parameters should be measured in similar quantities,
which means that they should be of the same unit.

Multiple Bar Graph
make a comparison among various observations on the basis
of multiple parameters. You can include as many
parameters as you wish, however, each parameter should
have the same unit of measurement.

Stacked Bar Graph
A stacked bar graph also represents various parameters in
a single graph. The difference is that in a stacked bar
graph all the parameters are represented in a single bar.
So you can say that there are segments of a total in a
single bar.

Pie chart
type of graph that uses a circular graph to view data. The
graph's pieces are equal to the percentage of the total in
each group. In other words, the size of each slice of the
pie is proportional to the size of the group as a whole. The
entire "pie" represents 100% of a total, while the "slices"
represent parts of the whole.

Histogram
X axis represents variables and frequencies depending
on it are represented on Y axis which constitutes the
height of its rectangle

Boxplot
It is a graphical representation of dispersions and
extreme scores. This graph represents minimum, maximum
and quartile scores in the form of a box with whiskers.
The box includes the range of scores falling into the
middle 50% of the distribution (Inter Quartile Range =
75th percentile - 25th percentile) and the whiskers are
lines extended to the minimum and maximum scores in the
distribution.

Scatterplot
Scatterplots are also known as scattergrams and scatter
charts.
•X-axis representing values of a continuous variable. By
custom, this is the independent/ Exposure variable
•Y-axis representing values of a continuous variable.
Traditionally, this is the dependent/ Outcome variable
•Symbols plotted at the (X, Y) coordinates of your data.

Measures of Central
Tendency
The statistical tool which helps in condensing, simplifying
and making the data more understandable.
A measure of central tendency is a single value that
attempts to describe a set of data by identifying the
central position within that set of data
1. Arithmetic Mean
2. Median
3. Mode
4. Geometric Mean
5. Harmonic Mean

Arithmetic Mean
The arithmetic mean or average as referred in common
parlance is the most common measure of central tendency.
It is obtained by adding all the observations and then
dividing the sum by the number of observations.
Arithmetic Mean of Ungrouped Data:
Example1:- Find the arithmetic mean of marks
obtained by 10 students in a test. The marks are as
follows:- 61, 81, 87, 78, 54, 56, 67, 65, 68, 69.

Arithmetic Mean
The arithmetic mean or average as referred in common
Arithmetic Mean of Grouped Data:
Suppose we have data in form of X1, X2…….…….Xn
observations with corresponding frequencies f1,
f2…………………fn. The arithmetic mean will be
Example 3:- Calculate the average number of children
per family from the following data.

Median
The median is that value of the variable which divides the
group into two parts, one part comprising all the values
greater and the other, all the values less than the median.
In case of ungrouped data, when the number of
observations is odd/even, the median is the middle value
after the observations have been arranged

Determine the median for the following
data sets
1) 132, 139, 131, 138, 132, 139, 133, 137,
139
2) 25, 10, 16, 25, 12, 22, 20, 23, 13, 10
3) 56, 23, 48, 78, 94, 35, 88, 69, 44, 53, 27

Mode
Mode is the value the occurrence of which is most
frequent. It is the value around which the observations
are clustered in a given distribution.
Determine the mode for the following data sets
1) 132, 139, 131, 138, 132, 139, 133, 137, 139
2) 3, 3, 3, 5, 5, 5, 3, 6, 4, 8, 5, 4, 2, 4, 3, 5
3) 56, 23, 48, 78, 94, 35, 88, 69, 44, 53, 27

Mode
In case of frequency distribution, it is the value of the
variable that has the highest frequency. In case of
continuous frequency distribution, the value of the mode is
computed using interpolation formula:
Where,
l=lower limit of the modal class,
f1=frequency of the modal class,
f0=frequency of the class preceding the modal class,
f2=frequency of the class succeeding the modal class,
h=width of the modal class.

The modal class is the class corresponding to
the maximum frequency.

When to use Mean, Median & Mode?

Dispersion in Statistics
Dispersion in statistics is a way of describing how
to spread out a set of data is.
Measures of Dispersion
•Absolute Measures of Dispersion (one data set)
•Relative Measures of Dispersion (two or more
datasets)
The measures of dispersion contain almost the
same unit as the quantity being measured. There
are many Measures of Dispersion :
1.Range
2.Variance
3.Standard Deviation
4.IQR

Range:
Range is the measure of the difference between the
largest and smallest value of the data variability. The
range is the simplest form of Measures of Dispersion.
Example: 1,2,3,4,5,6,7
Range = Highest value – Lowest value

Variance (σ2)
simple terms, the variance can be calculated by
obtaining the sum of the squared distance of each term
in the distribution from the Mean, and then dividing this
by the total number of the terms in the distribution.
(σ2) = ∑ ( X − μ)2 / N
X=observation x, x=1….n
N=No. Of observation
μ= Mean

Standard Deviation
Standard Deviation can be represented as the square
root of Variance. To find the standard deviation of any
data, you need to find the variance first. Standard
Deviation is considered the best measure of dispersion.
Formula:
Standard Deviation = √σ

Quartile Deviation
Quartile Deviation is the measure of the difference
between the upper and lower quartile. This measure of
deviation is also known as the interquartile range.
Formula:
Interquartile Range: Q3 – Q1.

Relative Measures of
Dispersion
Relative Measure of Dispersion in Statistics are the
values without units. A relative measure of dispersion is
used to compare the distribution of two or more
datasets.

Co-efficient of Range:
it is calculated as the ratio of the difference between
the largest and smallest terms of the distribution, to
the sum of the largest and smallest terms of the
distribution.
Formula:
L – S / L + S
where L = largest value
S= smallest value

Co-efficient of Variation:
The coefficient of variation is used to compare the 2
data with respect to homogeneity or consistency.
Formula:
C.V = (σ / X) 100
X = standard deviation
σ = mean

Co-efficient of Standard
Deviation:
The co-efficient of Standard Deviation is the ratio of
standard deviation with the mean of the distribution of
terms.
Formula:
σ = ( √( X – X1)) / (N - 1)
Deviation = ( X – X1)
σ = standard deviation
N= total number

Co-efficient of Quartile
Deviation:
The co-efficient of Quartile Deviation is the ratio of
the difference between the upper quartile and the lower
quartile to the sum of the upper quartile and lower
quartile.
Formula:
( Q3 – Q1) / ( Q3 + Q1)
Q3 = Upper Quartile
Q1 = Lower Quartile

Measuring Asymmetry with Skewness

Skewness
is a measure of asymmetry or
distortion of symmetric
distribution. It measures the
deviation of the given
distribution of a random
variable from a symmetric
distribution, such as normal
distribution.
Types of Skewness
1. Positive Skewness (right
skewed)
2. Negative Skewness (left skewed)

Skewness can be measured using several methods; however,
Pearson mode skewness and Pearson median skewness are
the two frequently used methods.
The formula for Pearson mode skewness:
The formula for Person median skewness:
Where:
X = Mean value
Mo = Mode value Md = Median value
s = Standard deviation of the sample data

Covariance
Covariance is the measure of the joint variability
of two random variables (X, Y).

Correlation
•Covariance only shows the direction
of the linear relationship between
two Variables (I.e., Positive,
Negative, or No Covariance). It
cannot measure the strength of the
relationship between the two
variables.
•To measure both the strength and
direction of the linear relationship
between two variables, we use a
statistical measure called
correlation.

•The Value of
Correlation Coefficient
(r) will be Positive.
•The Value of Correlation
Coefficient (r) will be Negative.
•The Value of the Correlation
Coefficient (r) will be Zero

Simple Regression Analysis; Multiple
Correlation and Regression; t-test; Chi
square test; ANOVA

T Test
A t test is a statistical test that is used to compare
the means of two groups.
It is often used in hypothesis testing
The null hypothesis (H0) is that the true difference
between these group means is zero.
•The alternate hypothesis (Ha) is that the true
difference is different from zero.

The t test assumes your
data:
1.are independent
2.are (approximately) normally distributed
3.have a similar amount of variance within each group
being compared (a.k.a. homogeneity of variance)

chi-square test
A Pearson’s chi-square test is a statistical test for
categorical data
There are two types of Pearson’s chi-square tests:
•The chi-square goodness of fit test is used to
test whether the frequency distribution of a
categorical variable is different from your
expectations.
•The chi-square test of independence is used to
test whether two categorical variables are related
to each other.

Regression models
 describe the relationship between variables
by fitting a line to the observed data.
 Linear regression models use a straight line,
while logistic and nonlinear regression models
use a curved line.
 Regression allows you to estimate how
a dependent variable changes as the
independent variable(s) change.

Simple linear regression
It is used to estimate the relationship
between two quantitative variables.
1.How strong the relationship is between
two variables (e.g., the relationship
between rainfall and soil erosion).
2.The value of the dependent variable at
a certain value of the independent
variable (e.g., the amount of soil erosion at
a certain level of rainfall).

Assumptions of simple linear
regression
1.Homogeneity of variance: the size of the error in
our prediction doesn’t change significantly across
the values of the independent variable.
2.Independence of observations: the observations
in the dataset were collected using statistically
valid sampling methods, and there are no hidden
relationships among observations.
3.Normality: The data follows a normal distribution.
4.The relationship between the independent and
dependent variable is linear: the line of best fit
through the data points is a straight line (rather
than a curve or some sort of grouping factor).

Simple linear regression
formula
•y is the predicted value of the dependent variable (y)
for any given value of the independent variable (x).
•B0 is the intercept, the predicted value of y when
the x is 0.
•B1 is the regression coefficient – how much we
expect y to change as x increases.
•x is the independent variable ( the variable we expect
is influencing y).
•e is the error of the estimate, or how much variation
there is in our estimate of the regression coefficient.

Multiple Linear Regression (MLR)

The multiple regression model
is based on the following
assumptions:
•There is a linear relationship between the
dependent variables and the independent
variables
•The independent variables are not too
highly correlated with each other
•yi observations are selected independently
and randomly from the population
•Residuals should be normally distributed

Formula and Calculation of
Multiple Linear Regression

MSC III_Research Methodology and Statistics_FINAL.pptx

Recommended

Recommended

More Related Content

Similar to MSC III_Research Methodology and Statistics_FINAL.pptx

Similar to MSC III_Research Methodology and Statistics_FINAL.pptx (20)

More from Suchita Rawat

More from Suchita Rawat (20)

Recently uploaded

Recently uploaded (20)

MSC III_Research Methodology and Statistics_FINAL.pptx