SlideShare a Scribd company logo
1 of 38
DATA
ANALYTICS
ASSIGNMENT
NAME - SAMIR KUMAR
MTECH (INDUSTRIAL
ENGINEERING AND
MANAGEMENT)
Data analysis
• Data analysis is defined as the technique that analyse
the data to enhance the productivity and the business
growth by involving process like cleansing,
transforming, inspecting and modelling data to
perform market analysis, to gather the hidden insight
of the data, to improve business study and for the
generation of the report based upon the available data
using the data analysis tools such as Tableau, Power BI,
R and Python, Apache Spark, etc.
• It refers to the technique to analyze data to enhance
productivity and grow business. It is the process of
inspecting, cleansing, transforming, and modeling the
data.
Reality
Raw data
collection
Data processing &
Data cleaning
Insight
visualization
Data product
Data analysis &
Models
Why we Need Data
Analysis?
We need Data Analysis basically for the reasons
mentioned below:
• Gather hidden insights.
• To generate reports based on the available data.
• Perform market analysis.
• Improvement of business Strategy.
• Decision Science is the collection of quantitative
techniques used to inform decision-making at the
individual and population levels.
• It includes decision analysis, risk analysis, cost-benefit and
cost-effectiveness analysis, constrained optimization,
simulation modeling, and behavioral decision theory, as
well as parts of operations research, microeconomics,
statistical inference, management control, cognitive and
social psychology, and computer science
Decision Science
Data analytics
• Data analytics is the collection, transformation, and
organization of data in order to draw conclusions, make
predictions, and drive informed decision making.
• Data analytics is often confused with data analysis.
While these are related terms, they aren’t exactly the
same. In fact, data analysis is a subcategory of data
analytics that deals specifically with extracting meaning
from data. Data analytics, as a whole, includes
processes beyond analysis, including data
science (using data to theorize and forecast) and data
engineering (building data systems).
So why Data Analytics?
With Data Analytics businesses can
understand hidden patterns and meanings
within the behavior of the customer.
For businesses,
1. Informed Decision Making.
2. More Effective Marketing
3. More Efficient Operations
4. Cutting Costs.
What is Sampling?
Sampling is a method that allows us to get information about the
population based on the statistics from a subset of the population
(sample), without having to investigate every individual.
Why do we need Sampling?
Sampling is done to draw conclusions about populations
from samples, and it enables us to determine a
population’s characteristics by directly observing only a
portion (or sample) of the population.
• Selecting a sample requires less time than selecting every item in
a population
• Sample selection is a cost-efficient method
• Analysis of the sample is less cumbersome and more practical
than an analysis of the entire population
Population vs sample
• The population is the entire
group that you want to draw
conclusions about.
• The sample is the specific group
of individuals that you will
collect data from.
The population can be defined in
terms of geographical location, age,
income, and many other
characteristics.
Learn how to determine sample size
Stage 1: Consider your sample size variables
1. Population size
2. Margin of error (confidence interval)
3. Confidence level
4. Standard deviation
Stage 2: Calculate sample size
5. Find your Z-score
6. Use the sample size formula
Different Types of Sampling Techniques
• Probability Sampling: In probability sampling, every element of the population has an equal
chance of being selected. Probability sampling gives us the best chance to create a sample
that is truly representative of the population
• Non-Probability Sampling: In non-probability sampling, all elements do not have an equal
chance of being selected. Consequently, there is a significant risk of ending up with a non-
representative sample which does not produce generalizable results
Types of Probability Sampling
1. Simple Random Sampling
This is a type of sampling technique you must have come across at some point.
Here, every individual is chosen entirely by chance and each member of the
population has an equal chance of being selected.
Simple random sampling reduces selection bias.
One big advantage of this technique is
that it is the most direct method of
probability sampling. But it comes with a
caveat – it may not select enough
individuals with our characteristics of
interest.
Monte Carlo methods use repeated
random sampling for the estimation of
unknown parameters
2.Systematic Sampling
In this type of sampling, the first individual is selected randomly and others are
selected using a fixed ‘sampling interval’. Let’s take a simple example to
understand this.
Say our population size is x and we have to select a sample size of n. Then, the
next individual that we will select would be x/nth intervals away from the first
individual. We can select the rest in the same way.
Systematic sampling is more convenient than
simple random sampling. However, it might
also lead to bias if there is an underlying
pattern in which we are selecting items from
the population (though the chances of that
happening are quite rare).
3.Stratified Sampling
In this type of sampling, we divide the population into subgroups
(called strata) based on different traits like gender, category, etc.
And then we select the sample(s) from these subgroups:
We use this type of sampling
when we want
representation from all the
subgroups of the
population. However,
stratified sampling requires
proper knowledge of the
characteristics of the
population.
4.Cluster Sampling
In a clustered sample, we use the subgroups of the population as the
sampling unit rather than individuals. The population is divided into
subgroups, known as clusters, and a whole cluster is randomly selected
to be included in the study:
In the above example, we have
divided our population into 5
clusters. Each cluster consists of 4
individuals and we have taken the
4th cluster in our sample. We can
include more clusters as per our
sample size.
This type of sampling is used
when we focus on a specific
region or area.
Types of Non-Probability Sampling
1.Convenience Sampling
This is perhaps the easiest method of sampling because individuals are selected
based on their availability and willingness to take part.
Here, let’s say individuals numbered 4, 7, 12, 15 and 20 want to be part of our
sample, and hence, we will include them in the sample.
Convenience sampling is prone to
significant bias, because the
sample may not be the
representation of the specific
characteristics such as religion or,
say the gender, of the population.
2.Quota Sampling
In this type of sampling, we choose items based on predetermined
characteristics of the population. Consider that we have to select
individuals having a number in multiples of four for our sample:
Therefore, the individuals
numbered 4, 8, 12, 16, and 20 are
already reserved for our sample.
In quota sampling, the chosen
sample might not be the best
representation of the
characteristics of the population
that weren’t considered
3.Judgment Sampling
It is also known as selective sampling. It depends on the
judgment of the experts when choosing whom to ask to
participate.
Suppose, our experts believe that
people numbered 1, 7, 10, 15,
and 19 should be considered for
our sample as they may help us
to infer the population in a better
way. As you can imagine, quota
sampling is also prone to bias by
the experts and may not
necessarily be representative.
4.Snowball Sampling
I quite like this sampling technique. Existing people are asked to
nominate further people known to them so that the sample increases
in size like a rolling snowball. This method of sampling is effective when
a sampling frame is difficult to identify.
Here, we had randomly chosen person 1 for
our sample, and then he/she recommended
person 6, and person 6 recommended person
11, and so on.
1->6->11->14->19
There is a significant risk of selection bias in
snowball sampling, as the referenced
individuals will share common traits with the
person who recommends them.
Statistics simply means numerical data,
and is field of math that generally deals
with collection of data, tabulation, and
interpretation of numerical data.
Statistics
1. Descriptive Statistics :
Descriptive statistics uses data that provides a description of the
population either through numerical calculation or graph or table. It
provides a graphical summary of data. It is simply used for
summarizing objects, etc. There are two categories in this as
following below.
(a). Measure of central tendency –
Measure of central tendency is also known as summary
statistics that is used to represents the center point or
a particular value of a data set or sample set.
In statistics, there are three common measures of
central tendency as shown below:
(i) Mean :
It is measure of average of all value in a sample set.
For example,
(ii) Median :
It is measure of central value of a sample set.
In these, data set is ordered from lowest to
highest value and then finds exact middle.
For example,
(iii) Mode :
It is value most frequently arrived in sample
set. The value repeated most of time in central
set is actually mode.
For example,
(b). Measure of Variability –
Measure of Variability is also known as measure of dispersion and used to
describe variability in a sample or population. In statistics, there are three
common measures of variability as shown below:
(i) Range :
It is given measure of how to spread apart values in sample set or data set.
Range = Maximum value - Minimum value
(ii) Variance :
It simply describes how much a random variable defers from expected value and
it is also computed as square of deviation.
S2= ∑n
i=1 [(xi - ͞
x)2 ÷ n]
In these formula, n represent total data points, ͞x represent mean of data points
and xi represent individual data points.
(iii) Dispersion :
It is measure of dispersion of set of data from its mean.
σ= √ (1÷n) ∑n
i=1 (xi - μ)2
2. Inferential Statistics :
• Inferential Statistics makes inference and prediction about population based on a sample
of data taken from population. It generalizes a large dataset and applies probabilities to
draw a conclusion.
• It is simply used for explaining meaning of descriptive stats.
• It is simply used to analyze, interpret result, and draw conclusion.
• Inferential Statistics is mainly related to and associated with hypothesis testing whose
main target is to reject null hypothesis.
• Hypothesis testing is a type of inferential procedure that takes help of sample data to
evaluate and assess credibility of a hypothesis about a population.
• Inferential statistics are generally used to determine how strong relationship is within
sample. But it is very difficult to obtain a population list and draw a random sample.
Inferential statistics can be done with help of various steps as given below:
• Obtain and start with a theory.
• Generate a research hypothesis.
• Operationalize or use variables
• Identify or find out population to which we can apply study material.
• Generate or form a null hypothesis for these population.
• Collect and gather a sample of children from population and simply run study.
• Then, perform all tests of statistical to clarify if obtained characteristics of sample are sufficiently
different from what would be expected under null hypothesis so that we can be able to find and
reject null hypothesis.
Types of inferential statistics –
Various types of inferential statistics are used widely
nowadays and are very easy to interpret. These are
given below:
• One sample test of difference/One sample
hypothesis test
• Confidence Interval
• Contingency Tables and Chi-Square Statistic
• T-test or Anova
• Pearson Correlation
• Bi-variate Regression
• Multi-variate Regression
Prescriptive analytics is a process that analyzes data
and provides instant recommendations on how to
optimize business practices to suit multiple predicted
outcomes.
In essence, prescriptive analytics takes the “what we
know” (data), comprehensively understands that data to
predict what could happen, and suggests the best steps
forward based on informed simulations.
Predictive analytics: Predictive analytics applies
mathematical models to the current data to inform
(predict) future behavior. It is the “what could happen."
Types of Variables in Statistics
1. Quantitative Variables: Sometimes referred to as “numeric” variables,
these are variables that represent a measurable quantity. Examples
include:
• Number of students in a class
• Number of square feet in a house
• Population size of a city
• Age of an individual
• Height of an individual
2. Qualitative Variables: Sometimes referred to as “categorical” variables, these
are variables that take on names or labels and can fit into categories. Examples
include:
• Eye color (e.g. “blue”, “green”, “brown”)
• Gender (e.g. “male”, “female”)
• Breed of dog (e.g. “lab”, “bulldog”, “poodle”)
• Level of education (e.g. “high school”, “Associate’s degree”, “Bachelor’s
degree”)
• Marital status (e.g. “married”, “single”, “divorced”)
•
Scales of measurements
Nominal Scale
A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or
“labels” to classify or identify the objects. A nominal scale usually deals with the non-numeric
variables or the numbers that do not have any value.
Characteristics of Nominal Scale
• A nominal scale variable is classified into two or more categories. In this measurement
mechanism, the answer should fall into either of the classes.
• It is qualitative. The numbers are used here to identify the objects.
• The numbers don’t define the object characteristics. The only permissible aspect of
numbers in the nominal scale is “counting.”
Example:
An example of a nominal scale measurement is given below:
What is your gender?
M- Male
F- Female
Here, the variables are used as tags, and the answer to this question should be either M or F.
Ordinal Scale
The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data
without establishing the degree of variation between them. Ordinal represents the “order.”
Ordinal data is known as qualitative data or categorical data. It can be grouped, named and
also ranked.
Characteristics of the Ordinal Scale
• The ordinal scale shows the relative ranking of the variables
• It identifies and describes the magnitude of a variable
• Along with the information provided by the nominal scale, ordinal scales give the rankings
of those variables
• The interval properties are not known
• The surveyors can quickly analyse the degree of agreement concerning the identified order
of variables
Example:
Ranking of school students – 1st, 2nd, 3rd, etc.
Ratings in restaurants
Evaluating the frequency of occurrences
• Very often
• Often
Assessing the degree of agreement
• Totally agree
• Agree
• Totally disagree
Interval Scale
The interval scale is the 3rd level of measurement scale. It is defined as a
quantitative measurement scale in which the difference between the
two variables is meaningful. In other words, the variables are measured
in an exact manner, not as in a relative way in which the presence of
zero is arbitrary.
Characteristics of Interval Scale:
• The interval scale is quantitative as it can quantify the difference between the values
• It allows calculating the mean and median of the variables
• To understand the difference between the variables, you can subtract the values between
the variables
• The interval scale is the preferred scale in Statistics as it helps to assign any numerical values
to arbitrary assessment such as feelings, calendar types, etc.
Example:
• Likert Scale
• Net Promoter Score (NPS)
• Bipolar Matrix Table
Ratio Scale
The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of
variable measurement scale. It allows researchers to compare the differences or intervals. The
ratio scale has a unique feature. It possesses the character of the origin or zero points.
Characteristics of Ratio Scale:
• Ratio scale has a feature of absolute zero
• It doesn’t have negative numbers, because of its zero-point feature
• It affords unique opportunities for statistical analysis. The variables can be orderly added,
subtracted, multiplied, divided. Mean, median, and mode can be calculated using the ratio
scale.
• Ratio scale has unique and useful properties. One such feature is that it allows unit
conversions like kilogram – calories, gram – calories, etc.
Example:
An example of a ratio scale is:
What is your weight in Kgs?
Less than 55 kgs
55 – 75 kgs
76 – 85 kgs
86 – 95 kgs
More than 95 kgs

More Related Content

Similar to DATA ANALYTICS ASSIGNMENT.pptx

Similar to DATA ANALYTICS ASSIGNMENT.pptx (20)

RESEARCH COURSE WORK Makerere University.pptx
RESEARCH COURSE WORK Makerere University.pptxRESEARCH COURSE WORK Makerere University.pptx
RESEARCH COURSE WORK Makerere University.pptx
 
Chapter 7 sampling methods
Chapter 7 sampling methodsChapter 7 sampling methods
Chapter 7 sampling methods
 
SAMPLING.pptx
SAMPLING.pptxSAMPLING.pptx
SAMPLING.pptx
 
RM UNIT 5.pptx
RM UNIT 5.pptxRM UNIT 5.pptx
RM UNIT 5.pptx
 
Chapter5_Sampling_28.10.22 (1).ppt
Chapter5_Sampling_28.10.22 (1).pptChapter5_Sampling_28.10.22 (1).ppt
Chapter5_Sampling_28.10.22 (1).ppt
 
SAMPLING METHODS in Research Methodology.pptx
SAMPLING METHODS in Research Methodology.pptxSAMPLING METHODS in Research Methodology.pptx
SAMPLING METHODS in Research Methodology.pptx
 
Sampling Design
Sampling DesignSampling Design
Sampling Design
 
Research Methods and Statistics.....pptx
Research Methods and Statistics.....pptxResearch Methods and Statistics.....pptx
Research Methods and Statistics.....pptx
 
Mm22
Mm22Mm22
Mm22
 
Sampling
SamplingSampling
Sampling
 
Chapter 6 Selecting a Sample
Chapter 6 Selecting a SampleChapter 6 Selecting a Sample
Chapter 6 Selecting a Sample
 
Sampling Design
Sampling DesignSampling Design
Sampling Design
 
Sampling
SamplingSampling
Sampling
 
Sampling.pptx
Sampling.pptxSampling.pptx
Sampling.pptx
 
Sampling Methods.pptx
Sampling Methods.pptxSampling Methods.pptx
Sampling Methods.pptx
 
Chapter5.ppt
Chapter5.pptChapter5.ppt
Chapter5.ppt
 
sampling
samplingsampling
sampling
 
Chapter5
Chapter5Chapter5
Chapter5
 
Chapter5.ppt
Chapter5.pptChapter5.ppt
Chapter5.ppt
 
Sampling method son research methodology
Sampling method son research methodologySampling method son research methodology
Sampling method son research methodology
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

DATA ANALYTICS ASSIGNMENT.pptx

  • 1. DATA ANALYTICS ASSIGNMENT NAME - SAMIR KUMAR MTECH (INDUSTRIAL ENGINEERING AND MANAGEMENT)
  • 2. Data analysis • Data analysis is defined as the technique that analyse the data to enhance the productivity and the business growth by involving process like cleansing, transforming, inspecting and modelling data to perform market analysis, to gather the hidden insight of the data, to improve business study and for the generation of the report based upon the available data using the data analysis tools such as Tableau, Power BI, R and Python, Apache Spark, etc. • It refers to the technique to analyze data to enhance productivity and grow business. It is the process of inspecting, cleansing, transforming, and modeling the data.
  • 3. Reality Raw data collection Data processing & Data cleaning Insight visualization Data product Data analysis & Models
  • 4. Why we Need Data Analysis? We need Data Analysis basically for the reasons mentioned below: • Gather hidden insights. • To generate reports based on the available data. • Perform market analysis. • Improvement of business Strategy.
  • 5. • Decision Science is the collection of quantitative techniques used to inform decision-making at the individual and population levels. • It includes decision analysis, risk analysis, cost-benefit and cost-effectiveness analysis, constrained optimization, simulation modeling, and behavioral decision theory, as well as parts of operations research, microeconomics, statistical inference, management control, cognitive and social psychology, and computer science Decision Science
  • 6.
  • 7. Data analytics • Data analytics is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision making. • Data analytics is often confused with data analysis. While these are related terms, they aren’t exactly the same. In fact, data analysis is a subcategory of data analytics that deals specifically with extracting meaning from data. Data analytics, as a whole, includes processes beyond analysis, including data science (using data to theorize and forecast) and data engineering (building data systems).
  • 8.
  • 9. So why Data Analytics? With Data Analytics businesses can understand hidden patterns and meanings within the behavior of the customer. For businesses, 1. Informed Decision Making. 2. More Effective Marketing 3. More Efficient Operations 4. Cutting Costs.
  • 10.
  • 11. What is Sampling? Sampling is a method that allows us to get information about the population based on the statistics from a subset of the population (sample), without having to investigate every individual.
  • 12. Why do we need Sampling? Sampling is done to draw conclusions about populations from samples, and it enables us to determine a population’s characteristics by directly observing only a portion (or sample) of the population. • Selecting a sample requires less time than selecting every item in a population • Sample selection is a cost-efficient method • Analysis of the sample is less cumbersome and more practical than an analysis of the entire population
  • 13. Population vs sample • The population is the entire group that you want to draw conclusions about. • The sample is the specific group of individuals that you will collect data from. The population can be defined in terms of geographical location, age, income, and many other characteristics.
  • 14. Learn how to determine sample size Stage 1: Consider your sample size variables 1. Population size 2. Margin of error (confidence interval) 3. Confidence level 4. Standard deviation Stage 2: Calculate sample size 5. Find your Z-score 6. Use the sample size formula
  • 15. Different Types of Sampling Techniques • Probability Sampling: In probability sampling, every element of the population has an equal chance of being selected. Probability sampling gives us the best chance to create a sample that is truly representative of the population • Non-Probability Sampling: In non-probability sampling, all elements do not have an equal chance of being selected. Consequently, there is a significant risk of ending up with a non- representative sample which does not produce generalizable results
  • 16. Types of Probability Sampling 1. Simple Random Sampling This is a type of sampling technique you must have come across at some point. Here, every individual is chosen entirely by chance and each member of the population has an equal chance of being selected. Simple random sampling reduces selection bias. One big advantage of this technique is that it is the most direct method of probability sampling. But it comes with a caveat – it may not select enough individuals with our characteristics of interest. Monte Carlo methods use repeated random sampling for the estimation of unknown parameters
  • 17. 2.Systematic Sampling In this type of sampling, the first individual is selected randomly and others are selected using a fixed ‘sampling interval’. Let’s take a simple example to understand this. Say our population size is x and we have to select a sample size of n. Then, the next individual that we will select would be x/nth intervals away from the first individual. We can select the rest in the same way. Systematic sampling is more convenient than simple random sampling. However, it might also lead to bias if there is an underlying pattern in which we are selecting items from the population (though the chances of that happening are quite rare).
  • 18. 3.Stratified Sampling In this type of sampling, we divide the population into subgroups (called strata) based on different traits like gender, category, etc. And then we select the sample(s) from these subgroups: We use this type of sampling when we want representation from all the subgroups of the population. However, stratified sampling requires proper knowledge of the characteristics of the population.
  • 19. 4.Cluster Sampling In a clustered sample, we use the subgroups of the population as the sampling unit rather than individuals. The population is divided into subgroups, known as clusters, and a whole cluster is randomly selected to be included in the study: In the above example, we have divided our population into 5 clusters. Each cluster consists of 4 individuals and we have taken the 4th cluster in our sample. We can include more clusters as per our sample size. This type of sampling is used when we focus on a specific region or area.
  • 20. Types of Non-Probability Sampling 1.Convenience Sampling This is perhaps the easiest method of sampling because individuals are selected based on their availability and willingness to take part. Here, let’s say individuals numbered 4, 7, 12, 15 and 20 want to be part of our sample, and hence, we will include them in the sample. Convenience sampling is prone to significant bias, because the sample may not be the representation of the specific characteristics such as religion or, say the gender, of the population.
  • 21. 2.Quota Sampling In this type of sampling, we choose items based on predetermined characteristics of the population. Consider that we have to select individuals having a number in multiples of four for our sample: Therefore, the individuals numbered 4, 8, 12, 16, and 20 are already reserved for our sample. In quota sampling, the chosen sample might not be the best representation of the characteristics of the population that weren’t considered
  • 22. 3.Judgment Sampling It is also known as selective sampling. It depends on the judgment of the experts when choosing whom to ask to participate. Suppose, our experts believe that people numbered 1, 7, 10, 15, and 19 should be considered for our sample as they may help us to infer the population in a better way. As you can imagine, quota sampling is also prone to bias by the experts and may not necessarily be representative.
  • 23. 4.Snowball Sampling I quite like this sampling technique. Existing people are asked to nominate further people known to them so that the sample increases in size like a rolling snowball. This method of sampling is effective when a sampling frame is difficult to identify. Here, we had randomly chosen person 1 for our sample, and then he/she recommended person 6, and person 6 recommended person 11, and so on. 1->6->11->14->19 There is a significant risk of selection bias in snowball sampling, as the referenced individuals will share common traits with the person who recommends them.
  • 24. Statistics simply means numerical data, and is field of math that generally deals with collection of data, tabulation, and interpretation of numerical data. Statistics
  • 25.
  • 26. 1. Descriptive Statistics : Descriptive statistics uses data that provides a description of the population either through numerical calculation or graph or table. It provides a graphical summary of data. It is simply used for summarizing objects, etc. There are two categories in this as following below. (a). Measure of central tendency – Measure of central tendency is also known as summary statistics that is used to represents the center point or a particular value of a data set or sample set. In statistics, there are three common measures of central tendency as shown below: (i) Mean : It is measure of average of all value in a sample set. For example,
  • 27. (ii) Median : It is measure of central value of a sample set. In these, data set is ordered from lowest to highest value and then finds exact middle. For example, (iii) Mode : It is value most frequently arrived in sample set. The value repeated most of time in central set is actually mode. For example,
  • 28. (b). Measure of Variability – Measure of Variability is also known as measure of dispersion and used to describe variability in a sample or population. In statistics, there are three common measures of variability as shown below: (i) Range : It is given measure of how to spread apart values in sample set or data set. Range = Maximum value - Minimum value (ii) Variance : It simply describes how much a random variable defers from expected value and it is also computed as square of deviation. S2= ∑n i=1 [(xi - ͞ x)2 ÷ n] In these formula, n represent total data points, ͞x represent mean of data points and xi represent individual data points. (iii) Dispersion : It is measure of dispersion of set of data from its mean. σ= √ (1÷n) ∑n i=1 (xi - μ)2
  • 29. 2. Inferential Statistics : • Inferential Statistics makes inference and prediction about population based on a sample of data taken from population. It generalizes a large dataset and applies probabilities to draw a conclusion. • It is simply used for explaining meaning of descriptive stats. • It is simply used to analyze, interpret result, and draw conclusion. • Inferential Statistics is mainly related to and associated with hypothesis testing whose main target is to reject null hypothesis. • Hypothesis testing is a type of inferential procedure that takes help of sample data to evaluate and assess credibility of a hypothesis about a population. • Inferential statistics are generally used to determine how strong relationship is within sample. But it is very difficult to obtain a population list and draw a random sample. Inferential statistics can be done with help of various steps as given below: • Obtain and start with a theory. • Generate a research hypothesis. • Operationalize or use variables • Identify or find out population to which we can apply study material. • Generate or form a null hypothesis for these population. • Collect and gather a sample of children from population and simply run study. • Then, perform all tests of statistical to clarify if obtained characteristics of sample are sufficiently different from what would be expected under null hypothesis so that we can be able to find and reject null hypothesis.
  • 30. Types of inferential statistics – Various types of inferential statistics are used widely nowadays and are very easy to interpret. These are given below: • One sample test of difference/One sample hypothesis test • Confidence Interval • Contingency Tables and Chi-Square Statistic • T-test or Anova • Pearson Correlation • Bi-variate Regression • Multi-variate Regression
  • 31. Prescriptive analytics is a process that analyzes data and provides instant recommendations on how to optimize business practices to suit multiple predicted outcomes. In essence, prescriptive analytics takes the “what we know” (data), comprehensively understands that data to predict what could happen, and suggests the best steps forward based on informed simulations. Predictive analytics: Predictive analytics applies mathematical models to the current data to inform (predict) future behavior. It is the “what could happen."
  • 32. Types of Variables in Statistics 1. Quantitative Variables: Sometimes referred to as “numeric” variables, these are variables that represent a measurable quantity. Examples include: • Number of students in a class • Number of square feet in a house • Population size of a city • Age of an individual • Height of an individual 2. Qualitative Variables: Sometimes referred to as “categorical” variables, these are variables that take on names or labels and can fit into categories. Examples include: • Eye color (e.g. “blue”, “green”, “brown”) • Gender (e.g. “male”, “female”) • Breed of dog (e.g. “lab”, “bulldog”, “poodle”) • Level of education (e.g. “high school”, “Associate’s degree”, “Bachelor’s degree”) • Marital status (e.g. “married”, “single”, “divorced”) •
  • 34.
  • 35. Nominal Scale A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or “labels” to classify or identify the objects. A nominal scale usually deals with the non-numeric variables or the numbers that do not have any value. Characteristics of Nominal Scale • A nominal scale variable is classified into two or more categories. In this measurement mechanism, the answer should fall into either of the classes. • It is qualitative. The numbers are used here to identify the objects. • The numbers don’t define the object characteristics. The only permissible aspect of numbers in the nominal scale is “counting.” Example: An example of a nominal scale measurement is given below: What is your gender? M- Male F- Female Here, the variables are used as tags, and the answer to this question should be either M or F.
  • 36. Ordinal Scale The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data without establishing the degree of variation between them. Ordinal represents the “order.” Ordinal data is known as qualitative data or categorical data. It can be grouped, named and also ranked. Characteristics of the Ordinal Scale • The ordinal scale shows the relative ranking of the variables • It identifies and describes the magnitude of a variable • Along with the information provided by the nominal scale, ordinal scales give the rankings of those variables • The interval properties are not known • The surveyors can quickly analyse the degree of agreement concerning the identified order of variables Example: Ranking of school students – 1st, 2nd, 3rd, etc. Ratings in restaurants Evaluating the frequency of occurrences • Very often • Often Assessing the degree of agreement • Totally agree • Agree • Totally disagree
  • 37. Interval Scale The interval scale is the 3rd level of measurement scale. It is defined as a quantitative measurement scale in which the difference between the two variables is meaningful. In other words, the variables are measured in an exact manner, not as in a relative way in which the presence of zero is arbitrary. Characteristics of Interval Scale: • The interval scale is quantitative as it can quantify the difference between the values • It allows calculating the mean and median of the variables • To understand the difference between the variables, you can subtract the values between the variables • The interval scale is the preferred scale in Statistics as it helps to assign any numerical values to arbitrary assessment such as feelings, calendar types, etc. Example: • Likert Scale • Net Promoter Score (NPS) • Bipolar Matrix Table
  • 38. Ratio Scale The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of variable measurement scale. It allows researchers to compare the differences or intervals. The ratio scale has a unique feature. It possesses the character of the origin or zero points. Characteristics of Ratio Scale: • Ratio scale has a feature of absolute zero • It doesn’t have negative numbers, because of its zero-point feature • It affords unique opportunities for statistical analysis. The variables can be orderly added, subtracted, multiplied, divided. Mean, median, and mode can be calculated using the ratio scale. • Ratio scale has unique and useful properties. One such feature is that it allows unit conversions like kilogram – calories, gram – calories, etc. Example: An example of a ratio scale is: What is your weight in Kgs? Less than 55 kgs 55 – 75 kgs 76 – 85 kgs 86 – 95 kgs More than 95 kgs