SlideShare a Scribd company logo
1 of 36
Unit-4
Processing and Analysis of Data
Miss Tanu Shree
*
*
Data processing occurs when data is collected and translated into
usable information.
Usually performed by a data scientist or team of data scientists, it is
important for data processing to be done correctly as not to
negatively affect the end product, or data output.
Data processing is concerned with Editing, Coding, Classifying,
Tabulating and Charting and Diagramming research data.
Data Processing in research consists of five important steps:
1. Editing of Data
2. Coding of Data
3. Classification of Data
4. Tabulation of Data
5. Data diagram
*
*Data editing is the application of checks to detect missing, invalid
or inconsistent entries or to point to data records that are
potentially in error. No matter what type of data you are working
with, certain edits are performed at different stages or phases of
data collection and processing.
Purpose of Data Editing
1. Clarify Responses
2. Make omissions
3. Avoid biased editing
4. Make judgements
5. Logical adjustments
*
*Data coding in research methodology is a preliminary step to
analyzing data. The data that is obtained from surveys,
experiments or secondary sources are in raw form.
*This data needs to be refined and organized to evaluate and draw
conclusions.
*Data coding is not an easy job and the person or persons involved
in data coding must have knowledge and experience of it.
*Data coding is the process of converting data into a form that can
be analyzed. It involves assigning numerical or categorical codes
to data items, such as responses to survey questions or
demographic information. Coded data can then be analyzed using
statistical software or other tools.
*
*Classification is the way of arranging the data in different classes in
order to give a definite form and a coherent structure to the data
collected, facilitating their use in the most systematic and
effective manner.
Objectives of classification of data
*To group heterogeneous data under the homogeneous group of
common characteristics
*To facility similarity of various group
*To facilitate effective comparison
*To present complex, haphazard and scattered dates in a concise,
logical, homogeneous, and intelligible form
*To maintain clarity and simplicity of complex data
*To identify independent and dependent variables and establish
their relationship
*
*Tabulation is a method of presenting numeric data in rows and columns in
a logical and systematic manner to aid comparison and statistical analysis.
It allows for easier comparison by putting relevant data closer together,
and it aids in statistical analysis and interpretation.
*Tabulation, in other terms, is the process of arranging organized data into
a tabular format.
*Depending on the nature of the classification, it might be complicated,
double, or simple.
*The goal of a tabulation chart/data is to present a significant amount of
complicated data in a systematic way that allows readers to derive logical
conclusions and interpretations from it.
Objectives of Tabulation
*For the Purpose of Data Simplification
*To Draw Attention to Important Information
*To Make Comparisons Easier
*To Assist with Data Statistical Analysis
*Conserves space
*
*Diagrams have been used to collect data from research subjects
by asking them to either draw a diagram themselves or modify a
prototypic diagram supplied by the researcher.
*The use of diagrams in data collection has been viewed favorably
in helping to gather rich data on healthcare topics.
*Diagrams and charts are important because they present
information visually.
*The adage “a picture is worth a thousand words” applies when it
comes diagrams and charts. This handout provide a few hints on
understanding information visually
Creative presentation of data is possible. The data diagrams
classified into:
*Charts: A chart is a diagrammatic form of data presentation. Bar
charts, rectangles, squares and circles can be used to present data.
Bar charts are uni-dimensional, while rectangular, squares and
circles are two-dimensional.
*Graphs: The method of presenting numerical data in visual form is
called graph, A graph gives relationship between two variables by
means of either a curve or a straight line. Graphs may be divided
into two categories.
(1)Graphs of Time Series and
(2) Graphs of Frequency Distribution.
In graphs of time series one of the factors is time and other or others
is / are the study factors. Graphs on frequency show the distribution
of by income, age, etc. of executives and so on.
*Collection of Data:The very first challenge in data processing
comes in the collection or acquisition of the correct data for the
input.
The challenge here is to collect the exact data to get the
proper result. The result directly depends on the input data.
Hence, it is vital to collect the correct data to get the desired
result.
*Duplication of data: As the data is collected from different data
sources, it often happens that there is duplication in data. The
same entries and entities may present a number of times during
the data encoding stage. This duplicate data is redundant and
may produce an incorrect result.
Hence, we need to check the data for duplication and
proactively remove the duplicate data.
*Inconsistency of Data: When we collect a huge amount of data,
there is no guarantee that the data would be complete or all the
fields that we need are filled correctly. Moreover, the data may
be ambiguous.
As the input/raw data is heterogeneous in nature and is
collected from autonomous data sources, the data may conflict
with each other.
*Variety of data: The input data, as it is collected from different
sources, can contain different forms. The rows and columns of a
relational database don’t limit the data.
The data varies from application to application and source to
source. Much of this data is unstructured and cannot fit into a
spreadsheet or a relational database.
*Data Integration: Data integration means to combine the data
from various sources and present it in a unified view.
With the increased variety of data and different formats of
data, the challenge to integrate the data becomes bigger.
*
Data analysis is an aspect of data science that is all about analyzing
data for different kinds of purposes. It involves inspecting, cleaning,
transforming and modeling data to draw useful insights from it.
WHAT ARE THE DIFFERENT TYPES OF DATA ANALYSIS?
*Descriptive analysis
*Exploratory analysis
*Inferential analysis
*Predictive analysis
*Causal analysis
*Mechanistic analysis
1. DESCRIPTIVE ANALYSIS
The goal of descriptive analysis is to describe or summarize a set of data. Here’s
what you need to know:
Descriptive analysis is the very first analysis performed.
It generates simple summaries about samples and measurements.
It involves common, descriptive statistics like measures of central tendency,
variability, frequency, and position.
2. EXPLORATORY ANALYSIS (EDA)
Exploratory analysis involves examining or exploring data and finding
relationships between variables that were previously unknown. Here’s what you
need to know:
EDA helps you discover relationships between measures in your data, which are
not evidence for the existence of the correlation, as denoted by the phrase,
“Correlation doesn’t imply causation.”
It’s useful for discovering new connections and forming hypotheses. It drives
design planning and data collection.
3. INFERENTIAL ANALYSIS
Inferential analysis involves using a small sample of data to infer information about a
larger population of data.
The goal of statistical modeling itself is all about using a small amount of information
to extrapolate and generalize information to a larger group. Here’s what you need to
know:
Inferential analysis involves using estimated data that is representative of a
population and gives a measure of uncertainty or standard deviation to your
estimation.
The accuracy of inference depends heavily on your sampling scheme. If the sample
isn’t representative of the population, the generalization will be inaccurate. This is
known as the central limit theorem.
4. PREDICTIVE ANALYSIS
Predictive analysis involves using historical or current data to find patterns and make
predictions about the future. Here’s what you need to know:
The accuracy of the predictions depends on the input variables.
Accuracy also depends on the types of models. A linear model might work well in some
cases, and in other cases it might not.
Using a variable to predict another one doesn’t denote a causal relationship.
5. CAUSAL ANALYSIS
Causal analysis looks at the cause and effect of relationships between variables and is
focused on finding the cause of a correlation. Here’s what you need to know:
To find the cause, you have to question whether the observed correlations driving
your conclusion are valid. Just looking at the surface data won’t help you discover
the hidden mechanisms underlying the correlations.
Causal analysis is applied in randomized studies focused on identifying causation.
6. MECHANISTIC ANALYSIS
Mechanistic analysis is used to understand exact changes in variables that lead to
other changes in other variables. Here’s what you need to know:
It’s applied in physical or engineering sciences, situations that require high
precision and little room for error, only noise in data is measurement error.
It’s designed to understand a biological or behavioral process, the
pathophysiology of a disease or the mechanism of action of an intervention.
Descriptive analysis summarizes the data at hand and presents
your data in a comprehensible way.
Exploratory data analysis helps you discover correlations and
relationships between variables in your data.
Inferential analysis is for generalizing the larger population
with a smaller sample size of data.
Predictive analysis helps you make predictions about the future
with data.
Causal analysis emphasizes on finding the cause of a correlation
between variables.
Mechanistic analysis is for measuring the exact changes in
variables that lead to other changes in other variables.
*
*A hypothesis is an assumption that is made based on some evidence.
*This is the initial point of any investigation that translates the research
questions into predictions.
*It includes components like variables, population and the relation
between the variables.
*A research hypothesis is a hypothesis that is used to test the relationship
between two or more variables.
Sources of Hypothesis
Following are the sources of hypothesis:
*The resemblance between the phenomenon.
*Observations from past studies, present-day experiences and from the
competitors.
*Scientific theories.
*General patterns that influence the thinking process of people.
Characteristics of Hypothesis
Following are the characteristics of the hypothesis:
1. The hypothesis should be clear and precise to consider it to be reliable.
2. If the hypothesis is a relational hypothesis, then it should be stating the relationship between
variables.
3. The hypothesis must be specific and should have scope for conducting more tests.
4. The way of explanation of the hypothesis must be very simple and it should also be understood
that the simplicity of the hypothesis is not related to its significance.
Examples of Hypothesis
Following are the examples of hypotheses based on their types:
1. Consumption of sugary drinks every day leads to obesity is an example of a simple hypothesis.
2. All lilies have the same number of petals is an example of a null hypothesis.
If a person gets 7 hours of sleep, then he will feel less fatigue than if he sleeps less. It is an
example of a directional hypothesis.
Types of Hypothesis
There are six forms of hypothesis and they are:
1. Simple hypothesis
2. Complex hypothesis
3. Directional hypothesis
4. Non-directional hypothesis
5. Null hypothesis
6. Associative and casual hypothesis
1.Simple Hypothesis
It shows a relationship between one dependent variable and a single independent variable. For
example – If you eat more vegetables, you will lose weight faster. Here, eating more vegetables
is an independent variable, while losing weight is the dependent variable.
2. Complex Hypothesis
It shows the relationship between two or more dependent variables and two or more
independent variables. Eating more vegetables and fruits leads to weight loss, glowing skin, and
reduces the risk of many diseases such as heart disease.
3. Directional Hypothesis
It shows how a researcher is intellectual and committed to a particular outcome. The
relationship between the variables can also predict its nature. For example- children aged
four years eating proper food over a five-year period are having higher IQ levels than children
not having a proper meal. This shows the effect and direction of the effect.
4. Non-directional Hypothesis
It is used when there is no theory involved. It is a statement that a relationship exists
between two variables, without predicting the exact nature (direction) of the relationship.
5. Null Hypothesis
It provides a statement which is contrary to the hypothesis. It’s a negative statement, and
there is no relationship between independent and dependent variables. The symbol is denoted
by “HO”.
6. Associative and Causal Hypothesis
Associative hypothesis occurs when there is a change in one variable resulting in a change in
the other variable. Whereas, the causal hypothesis proposes a cause and effect interaction
between two or more variables.
*
Hypothesis testing is a systematic procedure for deciding whether the
results of a research study support a particular theory which applies to a
population. Hypothesis testing uses sample data to evaluate a hypothesis
about a population.
Hypothesis testing in statistics refers to analyzing an assumption about a
population parameter. It is used to make an educated guess about an
assumption using statistics. With the use of sample data, hypothesis testing
makes an assumption about how true the assumption is for the entire
population from where the sample is being taken.
For example, you might implement protocols for performing intubation on
pediatric patients in the pre-hospital setting.
To evaluate whether these protocols were successful in improving
intubation rates, you could measure the intubation rate over time in one
group randomly assigned to training in the new protocols, and compare this
to the intubation rate over time in another control group that did not
receive training in the new protocols.
Five Steps in Hypothesis Testing:
1.Specify the Null Hypothesis
2.Specify the Alternative Hypothesis
3.Set the Significance Level (a)
4.Calculate the Test Statistic and Corresponding P-Value
5.Drawing a Conclusion
Step 1: Specify the Null Hypothesis
The null hypothesis (H0) is a statement of no effect, relationship, or difference
between two or more groups or factors. In research studies, a researcher is usually
interested in disproving the null hypothesis.
Examples:
There is no difference in intubation rates across ages 0 to 5 years.
The intervention and control groups have the same survival rate (or, the intervention
does not improve survival rate).
Step 2: Specify the Alternative Hypothesis
The alternative hypothesis (H1) is the statement that there is an effect or
difference. This is usually the hypothesis the researcher is interested in proving. The
alternative hypothesis can be one-sided (only provides one direction, e.g., lower) or
two-sided.
We often use two-sided tests even when our true hypothesis is one-sided because it
requires more evidence against the null hypothesis to accept the alternative
hypothesis.
Examples: The intubation success rate differs with the age of the patient being
treated (two-sided).
The time to resuscitation from cardiac arrest is lower for the intervention group than
for the control (one-sided).
Step 3: Set the Significance Level (a)
The significance level (denoted by the Greek letter alpha— a) is generally set at
0.05. This means that there is a 5% chance that you will accept your alternative
hypothesis when your null hypothesis is actually true.
The smaller the significance level, the greater the burden of proof needed to reject
the null hypothesis, or in other words, to support the alternative hypothesis.
Step 4: Calculate the Test Statistic and Corresponding P-Value
In another section we present some basic test statistics to evaluate a hypothesis.
Hypothesis testing generally uses a test statistic that compares groups or examines
associations between variables.
When describing a single sample without establishing relationships between
variables, a confidence interval is commonly used.
The p-value describes the probability of obtaining a sample statistic as or more
extreme by chance alone if your null hypothesis is true.
This p-value is determined based on the result of your test statistic. Your conclusions
about the hypothesis are based on your p-value and your significance level.
Step 5: Drawing a Conclusion
P-value <= significance level (a) => Reject your null hypothesis in favor of your
alternative hypothesis. Your result is statistically significant.
P-value > significance level (a) => Fail to reject your null hypothesis. Your result
is not statistically significant.
Hypothesis testing is not set up so that you can absolutely prove a null
hypothesis. Therefore, when you do not find evidence against the null hypothesis,
you fail to reject the null hypothesis. When you do find strong enough evidence
against the null hypothesis, you reject the null hypothesis.
Your conclusions also translate into a statement about your alternative
hypothesis.
*
*In statistics, a Type I error is a false positive conclusion, while a Type II
error is a false negative conclusion.
*Making a statistical decision always involves uncertainties, so the risks of
making these errors are unavoidable in hypothesis testing.
*The probability of making a Type I error is the significance level, or alpha
(α), while the probability of making a Type II error is beta (β). These risks
can be minimized through careful planning in your study design.
*Example: Type I vs. Type II error You decide to get tested for COVID-19 based
on mild symptoms. There are two errors that could potentially occur:
*Type I error (false positive): the test result says you have coronavirus, but
you actually don’t.
*Type II error (false negative): the test result says you don’t have
coronavirus, but you actually do.
Type I Error
A type I error appears when the null hypothesis (H0) of an experiment is true, but
still, it is rejected. It is stating something which is not present or a false hit.
A type I error is often called a false positive (an event that shows that a given
condition is present when it is absent). In words of community tales, a person
may see the bear when there is none (raising a false alarm) where the null
hypothesis (H0) contains the statement: “There is no bear”.
The type I error significance level or rate level is the probability of refusing the
null hypothesis given that it is true. It is represented by Greek letter α (alpha)
and is also known as alpha level.
Usually, the significance level or the probability of type i error is set to 0.05 (5%),
assuming that it is satisfactory to have a 5% probability of inaccurately rejecting
the null hypothesis.
Type II Error
A type II error appears when the null hypothesis is false but mistakenly fails to be
refused. It is losing to state what is present and a miss.
A type II error is also known as false negative (where a real hit was rejected by the
test and is observed as a miss), in an experiment checking for a condition with a final
outcome of true or false.
A type II error is assigned when a true alternative hypothesis is not acknowledged. In
other words, an examiner may miss discovering the bear when in fact a bear is present
(hence fails in raising the alarm).
Again, H0, the null hypothesis, consists of the statement that, “There is no bear”,
wherein, if a wolf is indeed present, is a type II error on the part of the investigator.
Here, the bear either exists or does not exist within given circumstances, the question
arises here is if it is correctly identified or not, either missing detecting it when it is
present, or identifying it when it is not present.
The rate level of the type II error is represented by the Greek letter β (beta) and
linked to the power of a test (which equals 1−β).
Chi-Square Test
A chi-squared test (symbolically represented as χ2) is basically a data analysis on
the basis of observations of a random set of variables. Usually, it is a comparison of
two statistical data sets.
This test was introduced by Karl Pearson in 1900 for categorical data analysis and
distribution. So it was mentioned as Pearson’s chi-squared test.
The chi-square test is used to estimate how likely the observations that are made
would be, by considering the assumption of the null hypothesis as true.
A hypothesis is a consideration that a given condition or statement might be true,
which we can test afterwards.
Chi-squared tests are usually created from a sum of squared falsities or errors
over the sample variance.
Finding P-Value
P stands for probability here. To calculate the p-value, the chi-square test is used
in statistics. The different values of p indicates the different hypothesis
interpretation, are given below:
P≤ 0.05; Hypothesis rejected
P>.05; Hypothesis Accepted
Formula
The chi-squared test is done to check if there is any difference between the
observed value and expected value. The formula for chi-square can be written as;
or
χ2 = ∑(Oi – Ei)2/Ei
where Oi is the observed value and Ei is the expected value.
*An F-test is any statistical test in which the test statistic has an F-
distribution under the null hypothesis. It is most often used when
comparing statistical models that have been fitted to a data set, in
order to identify the model that best fits the population from which
the data were sampled.
T- Test
A t test is a statistical test that is used to compare the means of two groups.
It is often used in hypothesis testing to determine whether a process or treatment
actually has an effect on the population of interest, or whether two groups are
different from one another.
When to use a t test?
A t test can only be used when comparing the means of two groups (a.k.a. pairwise
comparison). If you want to compare more than two groups, or if you want to do
multiple pairwise comparisons, use an ANOVA test or a post-hoc test.
The t test is a parametric test of difference, meaning that it makes the same
assumptions about your data as other parametric tests.
Z Test
Z test is a statistical test that is conducted on data that approximately follows a
normal distribution.
The z test can be performed on one sample, two samples, or on proportions for
hypothesis testing.
It checks if the means of two large samples are different or not when the population
variance is known.
A z test is a test that is used to check if the means of two populations are different
or not provided the data follows a normal distribution.
Z Test T-Test
A z test is a statistical test that is used to check if
the means of two data sets are different when the
population variance is known.
A t-test is used to check if the means of two data
sets are different when the population variance is not
known.
The sample size is greater than or equal to 30. The sample size is lesser than 30.
The data follows a normal distribution. The data follows a student-t distribution.

More Related Content

Similar to Research Methodology Unit-4 Notes.pptx

Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfJamieDornan2
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfStephenAmell4
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfJamieDornan2
 
Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingSOMASUNDARAM T
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data AnalyticsExploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analyticsharshrnotaria
 
Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Stats Statswork
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxJANNU VINAY
 
Unit 4 editing and coding (2)
Unit 4 editing and coding (2)Unit 4 editing and coding (2)
Unit 4 editing and coding (2)kalailakshmi
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxCasylouMendozaBorqui
 
Approaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey DataApproaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey DataRichard Hogue
 
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdfStatistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdfAdebisiAdetayo1
 
Action research data analysis
Action research data analysis Action research data analysis
Action research data analysis Nasrun Ahmad
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxProf. Kanchan Kumari
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxProf. Kanchan Kumari
 
Research EDU821-1.pptx
Research EDU821-1.pptxResearch EDU821-1.pptx
Research EDU821-1.pptxSalmaNiazi2
 

Similar to Research Methodology Unit-4 Notes.pptx (20)

7.pptx
7.pptx7.pptx
7.pptx
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
 
Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report Writing
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data AnalyticsExploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analytics
 
Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Unit 4 editing and coding (2)
Unit 4 editing and coding (2)Unit 4 editing and coding (2)
Unit 4 editing and coding (2)
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptx
 
Approaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey DataApproaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey Data
 
محاضرة 9
محاضرة 9محاضرة 9
محاضرة 9
 
Approaches to the_analysis_of_survey_data
Approaches to the_analysis_of_survey_dataApproaches to the_analysis_of_survey_data
Approaches to the_analysis_of_survey_data
 
Measures of Condensation.pptx
Measures of Condensation.pptxMeasures of Condensation.pptx
Measures of Condensation.pptx
 
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdfStatistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
 
Action research data analysis
Action research data analysis Action research data analysis
Action research data analysis
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptx
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptx
 
Research EDU821-1.pptx
Research EDU821-1.pptxResearch EDU821-1.pptx
Research EDU821-1.pptx
 

More from munnatiwari5

Consumer behaviourhghgjhkkkggftkkggfftjj
Consumer behaviourhghgjhkkkggftkkggfftjjConsumer behaviourhghgjhkkkggftkkggfftjj
Consumer behaviourhghgjhkkkggftkkggfftjjmunnatiwari5
 
218401Noun Sentence Improvement (Set-1) _(38).pdf
218401Noun Sentence Improvement (Set-1) _(38).pdf218401Noun Sentence Improvement (Set-1) _(38).pdf
218401Noun Sentence Improvement (Set-1) _(38).pdfmunnatiwari5
 
218399Para Jumbljkkkkkkes (SSC)_(41).pdf
218399Para Jumbljkkkkkkes (SSC)_(41).pdf218399Para Jumbljkkkkkkes (SSC)_(41).pdf
218399Para Jumbljkkkkkkes (SSC)_(41).pdfmunnatiwari5
 
218398Cloze Tesghghgghhhght - 02_(40).pdf
218398Cloze Tesghghgghhhght - 02_(40).pdf218398Cloze Tesghghgghhhght - 02_(40).pdf
218398Cloze Tesghghgghhhght - 02_(40).pdfmunnatiwari5
 
218398Cloze Tefgfgst - 02_(40) - Copy.pdf
218398Cloze Tefgfgst - 02_(40) - Copy.pdf218398Cloze Tefgfgst - 02_(40) - Copy.pdf
218398Cloze Tefgfgst - 02_(40) - Copy.pdfmunnatiwari5
 
Research Methodology Unit-2 Note s.pptx
Research Methodology Unit-2 Note  s.pptxResearch Methodology Unit-2 Note  s.pptx
Research Methodology Unit-2 Note s.pptxmunnatiwari5
 

More from munnatiwari5 (6)

Consumer behaviourhghgjhkkkggftkkggfftjj
Consumer behaviourhghgjhkkkggftkkggfftjjConsumer behaviourhghgjhkkkggftkkggfftjj
Consumer behaviourhghgjhkkkggftkkggfftjj
 
218401Noun Sentence Improvement (Set-1) _(38).pdf
218401Noun Sentence Improvement (Set-1) _(38).pdf218401Noun Sentence Improvement (Set-1) _(38).pdf
218401Noun Sentence Improvement (Set-1) _(38).pdf
 
218399Para Jumbljkkkkkkes (SSC)_(41).pdf
218399Para Jumbljkkkkkkes (SSC)_(41).pdf218399Para Jumbljkkkkkkes (SSC)_(41).pdf
218399Para Jumbljkkkkkkes (SSC)_(41).pdf
 
218398Cloze Tesghghgghhhght - 02_(40).pdf
218398Cloze Tesghghgghhhght - 02_(40).pdf218398Cloze Tesghghgghhhght - 02_(40).pdf
218398Cloze Tesghghgghhhght - 02_(40).pdf
 
218398Cloze Tefgfgst - 02_(40) - Copy.pdf
218398Cloze Tefgfgst - 02_(40) - Copy.pdf218398Cloze Tefgfgst - 02_(40) - Copy.pdf
218398Cloze Tefgfgst - 02_(40) - Copy.pdf
 
Research Methodology Unit-2 Note s.pptx
Research Methodology Unit-2 Note  s.pptxResearch Methodology Unit-2 Note  s.pptx
Research Methodology Unit-2 Note s.pptx
 

Recently uploaded

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Research Methodology Unit-4 Notes.pptx

  • 1. Unit-4 Processing and Analysis of Data Miss Tanu Shree *
  • 2. * Data processing occurs when data is collected and translated into usable information. Usually performed by a data scientist or team of data scientists, it is important for data processing to be done correctly as not to negatively affect the end product, or data output. Data processing is concerned with Editing, Coding, Classifying, Tabulating and Charting and Diagramming research data. Data Processing in research consists of five important steps: 1. Editing of Data 2. Coding of Data 3. Classification of Data 4. Tabulation of Data 5. Data diagram
  • 3. * *Data editing is the application of checks to detect missing, invalid or inconsistent entries or to point to data records that are potentially in error. No matter what type of data you are working with, certain edits are performed at different stages or phases of data collection and processing. Purpose of Data Editing 1. Clarify Responses 2. Make omissions 3. Avoid biased editing 4. Make judgements 5. Logical adjustments
  • 4. * *Data coding in research methodology is a preliminary step to analyzing data. The data that is obtained from surveys, experiments or secondary sources are in raw form. *This data needs to be refined and organized to evaluate and draw conclusions. *Data coding is not an easy job and the person or persons involved in data coding must have knowledge and experience of it. *Data coding is the process of converting data into a form that can be analyzed. It involves assigning numerical or categorical codes to data items, such as responses to survey questions or demographic information. Coded data can then be analyzed using statistical software or other tools.
  • 5. * *Classification is the way of arranging the data in different classes in order to give a definite form and a coherent structure to the data collected, facilitating their use in the most systematic and effective manner. Objectives of classification of data *To group heterogeneous data under the homogeneous group of common characteristics *To facility similarity of various group *To facilitate effective comparison *To present complex, haphazard and scattered dates in a concise, logical, homogeneous, and intelligible form *To maintain clarity and simplicity of complex data *To identify independent and dependent variables and establish their relationship
  • 6. * *Tabulation is a method of presenting numeric data in rows and columns in a logical and systematic manner to aid comparison and statistical analysis. It allows for easier comparison by putting relevant data closer together, and it aids in statistical analysis and interpretation. *Tabulation, in other terms, is the process of arranging organized data into a tabular format. *Depending on the nature of the classification, it might be complicated, double, or simple. *The goal of a tabulation chart/data is to present a significant amount of complicated data in a systematic way that allows readers to derive logical conclusions and interpretations from it. Objectives of Tabulation *For the Purpose of Data Simplification *To Draw Attention to Important Information *To Make Comparisons Easier *To Assist with Data Statistical Analysis *Conserves space
  • 7. * *Diagrams have been used to collect data from research subjects by asking them to either draw a diagram themselves or modify a prototypic diagram supplied by the researcher. *The use of diagrams in data collection has been viewed favorably in helping to gather rich data on healthcare topics. *Diagrams and charts are important because they present information visually. *The adage “a picture is worth a thousand words” applies when it comes diagrams and charts. This handout provide a few hints on understanding information visually
  • 8. Creative presentation of data is possible. The data diagrams classified into: *Charts: A chart is a diagrammatic form of data presentation. Bar charts, rectangles, squares and circles can be used to present data. Bar charts are uni-dimensional, while rectangular, squares and circles are two-dimensional. *Graphs: The method of presenting numerical data in visual form is called graph, A graph gives relationship between two variables by means of either a curve or a straight line. Graphs may be divided into two categories. (1)Graphs of Time Series and (2) Graphs of Frequency Distribution. In graphs of time series one of the factors is time and other or others is / are the study factors. Graphs on frequency show the distribution of by income, age, etc. of executives and so on.
  • 9. *Collection of Data:The very first challenge in data processing comes in the collection or acquisition of the correct data for the input. The challenge here is to collect the exact data to get the proper result. The result directly depends on the input data. Hence, it is vital to collect the correct data to get the desired result. *Duplication of data: As the data is collected from different data sources, it often happens that there is duplication in data. The same entries and entities may present a number of times during the data encoding stage. This duplicate data is redundant and may produce an incorrect result. Hence, we need to check the data for duplication and proactively remove the duplicate data.
  • 10. *Inconsistency of Data: When we collect a huge amount of data, there is no guarantee that the data would be complete or all the fields that we need are filled correctly. Moreover, the data may be ambiguous. As the input/raw data is heterogeneous in nature and is collected from autonomous data sources, the data may conflict with each other. *Variety of data: The input data, as it is collected from different sources, can contain different forms. The rows and columns of a relational database don’t limit the data. The data varies from application to application and source to source. Much of this data is unstructured and cannot fit into a spreadsheet or a relational database. *Data Integration: Data integration means to combine the data from various sources and present it in a unified view. With the increased variety of data and different formats of data, the challenge to integrate the data becomes bigger.
  • 11. * Data analysis is an aspect of data science that is all about analyzing data for different kinds of purposes. It involves inspecting, cleaning, transforming and modeling data to draw useful insights from it. WHAT ARE THE DIFFERENT TYPES OF DATA ANALYSIS? *Descriptive analysis *Exploratory analysis *Inferential analysis *Predictive analysis *Causal analysis *Mechanistic analysis
  • 12. 1. DESCRIPTIVE ANALYSIS The goal of descriptive analysis is to describe or summarize a set of data. Here’s what you need to know: Descriptive analysis is the very first analysis performed. It generates simple summaries about samples and measurements. It involves common, descriptive statistics like measures of central tendency, variability, frequency, and position. 2. EXPLORATORY ANALYSIS (EDA) Exploratory analysis involves examining or exploring data and finding relationships between variables that were previously unknown. Here’s what you need to know: EDA helps you discover relationships between measures in your data, which are not evidence for the existence of the correlation, as denoted by the phrase, “Correlation doesn’t imply causation.” It’s useful for discovering new connections and forming hypotheses. It drives design planning and data collection.
  • 13. 3. INFERENTIAL ANALYSIS Inferential analysis involves using a small sample of data to infer information about a larger population of data. The goal of statistical modeling itself is all about using a small amount of information to extrapolate and generalize information to a larger group. Here’s what you need to know: Inferential analysis involves using estimated data that is representative of a population and gives a measure of uncertainty or standard deviation to your estimation. The accuracy of inference depends heavily on your sampling scheme. If the sample isn’t representative of the population, the generalization will be inaccurate. This is known as the central limit theorem. 4. PREDICTIVE ANALYSIS Predictive analysis involves using historical or current data to find patterns and make predictions about the future. Here’s what you need to know: The accuracy of the predictions depends on the input variables. Accuracy also depends on the types of models. A linear model might work well in some cases, and in other cases it might not. Using a variable to predict another one doesn’t denote a causal relationship.
  • 14. 5. CAUSAL ANALYSIS Causal analysis looks at the cause and effect of relationships between variables and is focused on finding the cause of a correlation. Here’s what you need to know: To find the cause, you have to question whether the observed correlations driving your conclusion are valid. Just looking at the surface data won’t help you discover the hidden mechanisms underlying the correlations. Causal analysis is applied in randomized studies focused on identifying causation. 6. MECHANISTIC ANALYSIS Mechanistic analysis is used to understand exact changes in variables that lead to other changes in other variables. Here’s what you need to know: It’s applied in physical or engineering sciences, situations that require high precision and little room for error, only noise in data is measurement error. It’s designed to understand a biological or behavioral process, the pathophysiology of a disease or the mechanism of action of an intervention.
  • 15. Descriptive analysis summarizes the data at hand and presents your data in a comprehensible way. Exploratory data analysis helps you discover correlations and relationships between variables in your data. Inferential analysis is for generalizing the larger population with a smaller sample size of data. Predictive analysis helps you make predictions about the future with data. Causal analysis emphasizes on finding the cause of a correlation between variables. Mechanistic analysis is for measuring the exact changes in variables that lead to other changes in other variables.
  • 16. * *A hypothesis is an assumption that is made based on some evidence. *This is the initial point of any investigation that translates the research questions into predictions. *It includes components like variables, population and the relation between the variables. *A research hypothesis is a hypothesis that is used to test the relationship between two or more variables. Sources of Hypothesis Following are the sources of hypothesis: *The resemblance between the phenomenon. *Observations from past studies, present-day experiences and from the competitors. *Scientific theories. *General patterns that influence the thinking process of people.
  • 17. Characteristics of Hypothesis Following are the characteristics of the hypothesis: 1. The hypothesis should be clear and precise to consider it to be reliable. 2. If the hypothesis is a relational hypothesis, then it should be stating the relationship between variables. 3. The hypothesis must be specific and should have scope for conducting more tests. 4. The way of explanation of the hypothesis must be very simple and it should also be understood that the simplicity of the hypothesis is not related to its significance. Examples of Hypothesis Following are the examples of hypotheses based on their types: 1. Consumption of sugary drinks every day leads to obesity is an example of a simple hypothesis. 2. All lilies have the same number of petals is an example of a null hypothesis. If a person gets 7 hours of sleep, then he will feel less fatigue than if he sleeps less. It is an example of a directional hypothesis.
  • 18. Types of Hypothesis There are six forms of hypothesis and they are: 1. Simple hypothesis 2. Complex hypothesis 3. Directional hypothesis 4. Non-directional hypothesis 5. Null hypothesis 6. Associative and casual hypothesis 1.Simple Hypothesis It shows a relationship between one dependent variable and a single independent variable. For example – If you eat more vegetables, you will lose weight faster. Here, eating more vegetables is an independent variable, while losing weight is the dependent variable. 2. Complex Hypothesis It shows the relationship between two or more dependent variables and two or more independent variables. Eating more vegetables and fruits leads to weight loss, glowing skin, and reduces the risk of many diseases such as heart disease.
  • 19. 3. Directional Hypothesis It shows how a researcher is intellectual and committed to a particular outcome. The relationship between the variables can also predict its nature. For example- children aged four years eating proper food over a five-year period are having higher IQ levels than children not having a proper meal. This shows the effect and direction of the effect. 4. Non-directional Hypothesis It is used when there is no theory involved. It is a statement that a relationship exists between two variables, without predicting the exact nature (direction) of the relationship. 5. Null Hypothesis It provides a statement which is contrary to the hypothesis. It’s a negative statement, and there is no relationship between independent and dependent variables. The symbol is denoted by “HO”. 6. Associative and Causal Hypothesis Associative hypothesis occurs when there is a change in one variable resulting in a change in the other variable. Whereas, the causal hypothesis proposes a cause and effect interaction between two or more variables.
  • 20. * Hypothesis testing is a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population. Hypothesis testing uses sample data to evaluate a hypothesis about a population. Hypothesis testing in statistics refers to analyzing an assumption about a population parameter. It is used to make an educated guess about an assumption using statistics. With the use of sample data, hypothesis testing makes an assumption about how true the assumption is for the entire population from where the sample is being taken. For example, you might implement protocols for performing intubation on pediatric patients in the pre-hospital setting. To evaluate whether these protocols were successful in improving intubation rates, you could measure the intubation rate over time in one group randomly assigned to training in the new protocols, and compare this to the intubation rate over time in another control group that did not receive training in the new protocols.
  • 21. Five Steps in Hypothesis Testing: 1.Specify the Null Hypothesis 2.Specify the Alternative Hypothesis 3.Set the Significance Level (a) 4.Calculate the Test Statistic and Corresponding P-Value 5.Drawing a Conclusion
  • 22. Step 1: Specify the Null Hypothesis The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more groups or factors. In research studies, a researcher is usually interested in disproving the null hypothesis. Examples: There is no difference in intubation rates across ages 0 to 5 years. The intervention and control groups have the same survival rate (or, the intervention does not improve survival rate). Step 2: Specify the Alternative Hypothesis The alternative hypothesis (H1) is the statement that there is an effect or difference. This is usually the hypothesis the researcher is interested in proving. The alternative hypothesis can be one-sided (only provides one direction, e.g., lower) or two-sided. We often use two-sided tests even when our true hypothesis is one-sided because it requires more evidence against the null hypothesis to accept the alternative hypothesis. Examples: The intubation success rate differs with the age of the patient being treated (two-sided). The time to resuscitation from cardiac arrest is lower for the intervention group than for the control (one-sided).
  • 23. Step 3: Set the Significance Level (a) The significance level (denoted by the Greek letter alpha— a) is generally set at 0.05. This means that there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is actually true. The smaller the significance level, the greater the burden of proof needed to reject the null hypothesis, or in other words, to support the alternative hypothesis. Step 4: Calculate the Test Statistic and Corresponding P-Value In another section we present some basic test statistics to evaluate a hypothesis. Hypothesis testing generally uses a test statistic that compares groups or examines associations between variables. When describing a single sample without establishing relationships between variables, a confidence interval is commonly used. The p-value describes the probability of obtaining a sample statistic as or more extreme by chance alone if your null hypothesis is true. This p-value is determined based on the result of your test statistic. Your conclusions about the hypothesis are based on your p-value and your significance level.
  • 24. Step 5: Drawing a Conclusion P-value <= significance level (a) => Reject your null hypothesis in favor of your alternative hypothesis. Your result is statistically significant. P-value > significance level (a) => Fail to reject your null hypothesis. Your result is not statistically significant. Hypothesis testing is not set up so that you can absolutely prove a null hypothesis. Therefore, when you do not find evidence against the null hypothesis, you fail to reject the null hypothesis. When you do find strong enough evidence against the null hypothesis, you reject the null hypothesis. Your conclusions also translate into a statement about your alternative hypothesis.
  • 25. * *In statistics, a Type I error is a false positive conclusion, while a Type II error is a false negative conclusion. *Making a statistical decision always involves uncertainties, so the risks of making these errors are unavoidable in hypothesis testing. *The probability of making a Type I error is the significance level, or alpha (α), while the probability of making a Type II error is beta (β). These risks can be minimized through careful planning in your study design. *Example: Type I vs. Type II error You decide to get tested for COVID-19 based on mild symptoms. There are two errors that could potentially occur: *Type I error (false positive): the test result says you have coronavirus, but you actually don’t. *Type II error (false negative): the test result says you don’t have coronavirus, but you actually do.
  • 26.
  • 27. Type I Error A type I error appears when the null hypothesis (H0) of an experiment is true, but still, it is rejected. It is stating something which is not present or a false hit. A type I error is often called a false positive (an event that shows that a given condition is present when it is absent). In words of community tales, a person may see the bear when there is none (raising a false alarm) where the null hypothesis (H0) contains the statement: “There is no bear”. The type I error significance level or rate level is the probability of refusing the null hypothesis given that it is true. It is represented by Greek letter α (alpha) and is also known as alpha level. Usually, the significance level or the probability of type i error is set to 0.05 (5%), assuming that it is satisfactory to have a 5% probability of inaccurately rejecting the null hypothesis.
  • 28. Type II Error A type II error appears when the null hypothesis is false but mistakenly fails to be refused. It is losing to state what is present and a miss. A type II error is also known as false negative (where a real hit was rejected by the test and is observed as a miss), in an experiment checking for a condition with a final outcome of true or false. A type II error is assigned when a true alternative hypothesis is not acknowledged. In other words, an examiner may miss discovering the bear when in fact a bear is present (hence fails in raising the alarm). Again, H0, the null hypothesis, consists of the statement that, “There is no bear”, wherein, if a wolf is indeed present, is a type II error on the part of the investigator. Here, the bear either exists or does not exist within given circumstances, the question arises here is if it is correctly identified or not, either missing detecting it when it is present, or identifying it when it is not present. The rate level of the type II error is represented by the Greek letter β (beta) and linked to the power of a test (which equals 1−β).
  • 29. Chi-Square Test A chi-squared test (symbolically represented as χ2) is basically a data analysis on the basis of observations of a random set of variables. Usually, it is a comparison of two statistical data sets. This test was introduced by Karl Pearson in 1900 for categorical data analysis and distribution. So it was mentioned as Pearson’s chi-squared test. The chi-square test is used to estimate how likely the observations that are made would be, by considering the assumption of the null hypothesis as true. A hypothesis is a consideration that a given condition or statement might be true, which we can test afterwards. Chi-squared tests are usually created from a sum of squared falsities or errors over the sample variance. Finding P-Value P stands for probability here. To calculate the p-value, the chi-square test is used in statistics. The different values of p indicates the different hypothesis interpretation, are given below: P≤ 0.05; Hypothesis rejected P>.05; Hypothesis Accepted
  • 30. Formula The chi-squared test is done to check if there is any difference between the observed value and expected value. The formula for chi-square can be written as; or χ2 = ∑(Oi – Ei)2/Ei where Oi is the observed value and Ei is the expected value.
  • 31. *An F-test is any statistical test in which the test statistic has an F- distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled.
  • 32.
  • 33. T- Test A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. When to use a t test? A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test. The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests.
  • 34.
  • 35. Z Test Z test is a statistical test that is conducted on data that approximately follows a normal distribution. The z test can be performed on one sample, two samples, or on proportions for hypothesis testing. It checks if the means of two large samples are different or not when the population variance is known. A z test is a test that is used to check if the means of two populations are different or not provided the data follows a normal distribution.
  • 36. Z Test T-Test A z test is a statistical test that is used to check if the means of two data sets are different when the population variance is known. A t-test is used to check if the means of two data sets are different when the population variance is not known. The sample size is greater than or equal to 30. The sample size is lesser than 30. The data follows a normal distribution. The data follows a student-t distribution.