SlideShare a Scribd company logo
1 of 49
“Data Science”
CS-325
Lecture 3
“Introduction to Statistical Inference”
Mr. Sana Ullah Khan
By:
Lecturer in Computer Science
Institute of Computing
KUST, Pakistan
Recommended and Reference Books
Topic: Introduction to Statistical
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Outline
• What Is Data?
• Categories Of Data
• What Is Statistics?
• Basic Terminologies In Statistics
• Sampling Techniques
• Types Of Statistics
• Descriptive Statistics
• Statistical Description of Data
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
What Is Data
• Data refers to facts and statistics collected together for reference or
analysis.
• Data can be collected, measured and analyzed. It can also be
visualized by using statistical models and graphs.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Categories Of Data…
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Categories Of Data…
• Qualitative Data: Qualitative data deals with characteristics and descriptors
that can’t be easily measured, but can be observed subjectively.
• such as smells, tastes, textures, attractiveness, and color.
• In statistics, qualitative data—sometimes referred to as categorical data—is
data that can be arranged into categories based on physical traits, gender,
colors or anything that does not have a number associated with it.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Categories Of Data…
Nominal Data: Data with no inherent order or ranking such as gender
or race.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Categories Of Data…
• Ordinal Data: Data with an ordered series of information is called ordinal
data.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Categories Of Data…
• Quantitative Data: Quantitative data deals with numbers and things you
can measure objectively. This is further divided into two:
• Discrete Data: Also known as categorical data, it can hold a finite number
of possible values.
• Example: Number of students in a class.
• Continuous Data: Data that can hold an infinite number of possible values.
• Example: Weight of a person.
• So these were the different categories of data. The upcoming sections will
focus on the basic Statistics concepts, so buckle up and get ready to do
some math.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Methods of Data Collection
In an observational study, a researcher observes and measures characteristics
of interest of part of a population.
In an experiment, a treatment is applied to part of a population, and
responses are observed.
A simulation is the use of a mathematical or physical model to reproduce the
conditions of a situation or process.
A survey is an investigation of one or more characteristics of a population.
A census is a measurement of an entire population.
A sampling is a measurement of part of a population.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Basics of Statistics
Definition: Science of collection, presentation, analysis, and reasonable interpretation of data.
• Statistics presents a rigorous scientific method for gaining insight into data.
• suppose we measure the weight of 100 patients in a study.
• With so many measurements, simply looking at the data fails to provide an informative account.
• However statistics can give an instant overall picture of data based on graphical presentation or
numerical summarization irrespective to the number of data points.
• Besides data summarization, another important task of statistics is to make inference and predict
relations of variables.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
What is Statistics
• Statistics is an area of applied mathematics concerned with data collection,
analysis, interpretation, and presentation.
• This area of mathematics deals with understanding how data can be used to
solve complex problems.
• Here are a couple of example problems that can be solved by using statistics:
• Your company has created a new drug that may cure cancer. How would you
conduct a test to confirm the drug’s effectiveness?
• You and a friend are at a baseball game, and out of the blue, he offers you a bet
that neither team will hit a home run in that game. Should you take the bet?
• The latest sales data have just come in, and your boss wants you to prepare a
report for management on places where the company could improve its
business. What should you look for? What should you not look for?
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Basic Terminologies In Statistics
• The two most important terminologies in statistics are population
and sample.
• Population: A collection or set of individuals or objects or events
whose properties are to be analyzed
• Sample: A subset of the population is called ‘Sample’. A well-chosen
sample will contain most of the information about a particular
population parameter.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Basic Terminologies In Statistics..
• Now you must be wondering how can one choose a sample that best
represents the entire population.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
A Taxonomy of Statistics
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Sampling Techniques…
• Sampling is a statistical method that deals with the selection of individual
observations within a population.
• It is performed to infer statistical knowledge about a population.
• Consider a scenario wherein you’re asked to perform a survey about the
eating habits of teenagers in the US. There are over 42 million teens in the
US at present and this number is growing as you read this blog.
• Is it possible to survey each of these 42 million individuals about their
health? Obviously not! That’s why sampling is used.
• It is a method wherein a sample of the population is studied in order to
draw inference about the entire population.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Sampling Techniques…
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Sampling Techniques…
• Probability Sampling: This is a sampling technique in which samples
from a large population are chosen using the theory of probability.
There are three types of probability sampling:
• Random Sampling: In this method, each member of the population
has an equal chance of being selected in the sample.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Sampling Techniques…
• Systematic Sampling: In Systematic sampling, every nth record is
chosen from the population to be a part of the sample. Refer the
below figure to better understand how Systematic sampling works.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Sampling Techniques…
• Stratified Sampling: In Stratified sampling, a stratum is used to form
samples from a large population. A stratum is a subset of the
population that shares at least one common characteristic. After this,
the random sampling method is used to select a sufficient number of
subjects from each stratum.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Types Of Statistics…
• There are two well-defined types of statistics:
• Descriptive Statistics
• Inferential Statistics
• Descriptive Statistics is a method used to describe and understand
the features of a specific data set by giving short summaries about
the sample and measures of the data.
• Descriptive Statistics is mainly focused upon the main characteristics
of data. It provides a graphical summary of the data.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Types Of Statistics…
• Suppose you want to gift all your classmate’s t-shirts.
• To study the average shirt size of students in a classroom, in descriptive statistics
you would record the shirt size of all students in the class and then you would find
out the maximum, minimum and average shirt size of the class.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Types Of Statistics…
• Inferential statistics makes inferences and predictions about a population
based on a sample of data taken from the population in question.
• Inferential statistics generalizes a large dataset and applies probability to
draw a conclusion.
• It allows us to infer data parameters based on a statistical model using
sample data.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Types Of Statistics…
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Types Of Statistics…
• So, if we consider the same example of finding the average shirt size
of students in a class, in Inferential Statistics, you will take a sample
set of the class, which is basically a few people from the entire class.
• You already have had grouped the class into large, medium and small.
In this method, you basically build a statistical model and expand it
for the entire population in the class.
• So that was a brief understanding of Descriptive and Inferential
Statistics.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Statistical Description of Data
• Statistics describes a numeric set of data by its
• Center
• Variability
• Shape
• Statistics describes a categorical set of data by
• Frequency, percentage or proportion of each category
• Measures of the center
• A measure of spread
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Measures of Central Tendency
• Measures of the center are statistical measures that represent the
summary of a dataset. There are three main measures of center:
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Measures of Central Tendency…
• Mean: The arithmetic mean (or, simply, mean) is computed by
summing all the observations in the sample and dividing the sum
by the number of
observations.
• For a sample of five household incomes, 6000, 10,000, 10,000,
14000, 50,000 the sample mean is,
X =
6000 + 10000 + 10000 + 14000 + 50000
5
= 18000
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Measures of Central Tendency…
• Median: In a list ranked from smallest measurement to the
highest, the median is the middle value
• In our example of five household incomes, first we rank the
measurements
• 6,000 10,000 10,000 14,000 50,000
• Sample Median is 10,000
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Measures of Central Tendency…
• Mode:
• In nominal data:
• The value which occurs with the greatest frequency
• 6,000 10,000 10,000 14,000 50,000
• Sample Mode is 10,000
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Measures of spread
• A measure of spread, sometimes also called a measure of dispersion,
is used to describe the variability in a sample or population.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Measures of spread…
• Range (present highest and lowest value in a distribution. The
difference between these values is the range)
• Range = Max(𝑥_𝑖) – Min(𝑥_𝑖)
• Variance: It describes how much a random variable differs from its
expected value. It entails
• Standard deviation (the square root of the variance)
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Sample Variance
2 i=1
n
i
2
s =
(x - x )
n -1

S = standard deviation
(square root of variance)
Measures of spread…
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Measures of spread…
• Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is
4
1
3
)
5
7
(
)
5
3
(
)
5
5
( 2
2
2
Standard Deviation: Square root of the variance. The standard deviation
of the above example is 2.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Examples
7 7
7 7 7
7
7 8
7 7 7
6 3 2
7 8 13
9
Mean = 7
SD=0
Mean = 7
SD=0.63
Mean = 7
SD=4.04
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Examples…
Problem statement: Daenerys has 20 Dragons. They have the
numbers 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9,
4. Work out the Standard Deviation.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Examples…
For a Normal distribution approximately,
a) 68% of the measurements fall within one standard deviation around the
mean
b) 95% of the measurements fall within two standard deviations around the
mean
c) 99.7% of the measurements fall within three standard deviations around
the mean
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Suppose the reaction time of a particular drug has a Normal distribution
with a mean of 10 minutes and a standard deviation of 2 minutes
Approximately,
a) 68% of the subjects taking the drug will have reaction time between
8 and 12 minutes
b) 95% of the subjects taking the drug will have reaction tome
between 6 and 14 minutes
c) 99.7% of the subjects taking the drug will have reaction tome
between 4 and 16 minutes
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Examples…
• Quartile: Quartiles tell us about the spread of a data set by breaking
the data set into quarters, just like the median breaks it in half.
• To better understand how quartile and the IQR are calculated, let’s
look at an example.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Examples…
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Examples…
• The first quartile (Q1) lies between the 25th and 26th observation.
• The second quartile (Q2) lies between the 50th and 51st observation.
• The third quartile (Q3) lies between the 75th and 76th observation.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Examples…
• Inter Quartile Range (IQR): It is the measure of variability, based on
dividing a data set into quartiles. The interquartile range is equal to
Q3 minus Q1, i.e. IQR = Q3 – Q1
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Frequency Distribution
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Age Group 1-2 3-4 5-6
Frequency 8 12 6
Frequency Distribution of Age
Grouped Frequency Distribution of Age:
Consider a data set of 26 children of ages 1-6 years. Then the frequency distribution of
variable ‘age’ can be tabulated as follows:
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Frequency Distribution…
Age Group 1-2 3-4 5-6
Frequency 8 12 6
Cumulative Frequency 8 20 26
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Cumulative Frequency 5 8 15 20 24 26
Cumulative frequency of data in previous page
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Graphical Data
• Two types of statistical presentation of data - graphical and numerical.
• Graphical Presentation: We look for the overall pattern and for striking
deviations from that pattern.
• Over all pattern usually described by shape, center, and spread of the data.
• An individual value that falls outside the overall pattern is called an outlier.
• Bar diagram and Pie charts are used for categorical variables.
• Histogram, stem and leaf and Box-plot are used for numerical variable.
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Graphical Data
Bar Diagram: Lists the categories and presents the percent or count of individuals who fall in
each category.
Treatment
Group
Frequency Proportion Percent
(%)
1 15 (15/60)=0.25 25.0
2 25 (25/60)=0.333 41.7
3 20 (20/60)=0.417 33.3
Total 60 1.00 100
0 20 40 60 80 100 120
1
2
3
Total
1 2 3 Total
Percent (%) 25 41.7 33.3 100
Proportion 0 0 0 1
Frequency 15 25 20 60
Chart Title
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Graphical Data
Pie Chart: Lists the categories and presents the percent or count of individuals who fall in each
category.
Treatment
Group
Frequency Proportion Percent
(%)
1 15 (15/60)=0.25 25.0
2 25 (25/60)=0.333 41.7
3 20 (20/60)=0.417 33.3
Total 60 1.00 100
Frequency
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Summary
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical
Thanks
Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
Topic: Introduction to Statistical

More Related Content

Similar to Lecture3 (1).pptx

Australian university teacher’s engagement with learning analytics: Still ea...
Australian university teacher’s engagement with learning analytics:  Still ea...Australian university teacher’s engagement with learning analytics:  Still ea...
Australian university teacher’s engagement with learning analytics: Still ea...Blackboard APAC
 
Carter ACSPRI July2016
Carter ACSPRI July2016Carter ACSPRI July2016
Carter ACSPRI July2016Jackie Carter
 
PSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertationPSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertationveades
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014ICPSR
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlDr. Bruce A. Johnson
 
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCETOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCESKILL-LYNC SUPPORT
 
Auburn CSSE graduate student orientation
Auburn CSSE graduate student orientationAuburn CSSE graduate student orientation
Auburn CSSE graduate student orientationXiao Qin
 
ICPSR presentation for Research Advisory
ICPSR presentation for Research AdvisoryICPSR presentation for Research Advisory
ICPSR presentation for Research AdvisoryLynda Kellam
 
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptxAngelineAbella2
 
Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataAndrew Deacon
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
 
Case Study of Land-Grants and Retention of Students with Disabilities
Case Study of Land-Grants and Retention of Students with DisabilitiesCase Study of Land-Grants and Retention of Students with Disabilities
Case Study of Land-Grants and Retention of Students with Disabilitieslarachellesmith
 
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...DataScienceConferenc1
 
josirias_IS205_MajorAssignment
josirias_IS205_MajorAssignmentjosirias_IS205_MajorAssignment
josirias_IS205_MajorAssignmentJoshua Sirias
 
How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share ilmideas
 

Similar to Lecture3 (1).pptx (20)

Australian university teacher’s engagement with learning analytics: Still ea...
Australian university teacher’s engagement with learning analytics:  Still ea...Australian university teacher’s engagement with learning analytics:  Still ea...
Australian university teacher’s engagement with learning analytics: Still ea...
 
Chapter 7 Knowing Our Data
Chapter 7 Knowing Our DataChapter 7 Knowing Our Data
Chapter 7 Knowing Our Data
 
Carter ACSPRI July2016
Carter ACSPRI July2016Carter ACSPRI July2016
Carter ACSPRI July2016
 
PSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertationPSY4035 PG Literature searching for dissertation
PSY4035 PG Literature searching for dissertation
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting Bl
 
Statistik
StatistikStatistik
Statistik
 
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCETOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
 
Auburn CSSE graduate student orientation
Auburn CSSE graduate student orientationAuburn CSSE graduate student orientation
Auburn CSSE graduate student orientation
 
ICPSR presentation for Research Advisory
ICPSR presentation for Research AdvisoryICPSR presentation for Research Advisory
ICPSR presentation for Research Advisory
 
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
 
Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational data
 
Aiec & csr presentation
Aiec & csr presentationAiec & csr presentation
Aiec & csr presentation
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
 
Case Study of Land-Grants and Retention of Students with Disabilities
Case Study of Land-Grants and Retention of Students with DisabilitiesCase Study of Land-Grants and Retention of Students with Disabilities
Case Study of Land-Grants and Retention of Students with Disabilities
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Ca in patna
Ca in patnaCa in patna
Ca in patna
 
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
 
josirias_IS205_MajorAssignment
josirias_IS205_MajorAssignmentjosirias_IS205_MajorAssignment
josirias_IS205_MajorAssignment
 
How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share How to Design Research from Ilm Ideas on Slide Share
How to Design Research from Ilm Ideas on Slide Share
 

Recently uploaded

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 

Recently uploaded (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 

Lecture3 (1).pptx

  • 1. “Data Science” CS-325 Lecture 3 “Introduction to Statistical Inference” Mr. Sana Ullah Khan By: Lecturer in Computer Science Institute of Computing KUST, Pakistan
  • 2. Recommended and Reference Books Topic: Introduction to Statistical Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk
  • 3. Outline • What Is Data? • Categories Of Data • What Is Statistics? • Basic Terminologies In Statistics • Sampling Techniques • Types Of Statistics • Descriptive Statistics • Statistical Description of Data Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 4. What Is Data • Data refers to facts and statistics collected together for reference or analysis. • Data can be collected, measured and analyzed. It can also be visualized by using statistical models and graphs. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 5. Categories Of Data… Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 6. Categories Of Data… • Qualitative Data: Qualitative data deals with characteristics and descriptors that can’t be easily measured, but can be observed subjectively. • such as smells, tastes, textures, attractiveness, and color. • In statistics, qualitative data—sometimes referred to as categorical data—is data that can be arranged into categories based on physical traits, gender, colors or anything that does not have a number associated with it. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 7. Categories Of Data… Nominal Data: Data with no inherent order or ranking such as gender or race. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 8. Categories Of Data… • Ordinal Data: Data with an ordered series of information is called ordinal data. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 9. Categories Of Data… • Quantitative Data: Quantitative data deals with numbers and things you can measure objectively. This is further divided into two: • Discrete Data: Also known as categorical data, it can hold a finite number of possible values. • Example: Number of students in a class. • Continuous Data: Data that can hold an infinite number of possible values. • Example: Weight of a person. • So these were the different categories of data. The upcoming sections will focus on the basic Statistics concepts, so buckle up and get ready to do some math. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 10. Methods of Data Collection In an observational study, a researcher observes and measures characteristics of interest of part of a population. In an experiment, a treatment is applied to part of a population, and responses are observed. A simulation is the use of a mathematical or physical model to reproduce the conditions of a situation or process. A survey is an investigation of one or more characteristics of a population. A census is a measurement of an entire population. A sampling is a measurement of part of a population. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 11. Basics of Statistics Definition: Science of collection, presentation, analysis, and reasonable interpretation of data. • Statistics presents a rigorous scientific method for gaining insight into data. • suppose we measure the weight of 100 patients in a study. • With so many measurements, simply looking at the data fails to provide an informative account. • However statistics can give an instant overall picture of data based on graphical presentation or numerical summarization irrespective to the number of data points. • Besides data summarization, another important task of statistics is to make inference and predict relations of variables. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 12. What is Statistics • Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation, and presentation. • This area of mathematics deals with understanding how data can be used to solve complex problems. • Here are a couple of example problems that can be solved by using statistics: • Your company has created a new drug that may cure cancer. How would you conduct a test to confirm the drug’s effectiveness? • You and a friend are at a baseball game, and out of the blue, he offers you a bet that neither team will hit a home run in that game. Should you take the bet? • The latest sales data have just come in, and your boss wants you to prepare a report for management on places where the company could improve its business. What should you look for? What should you not look for? Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 13. Basic Terminologies In Statistics • The two most important terminologies in statistics are population and sample. • Population: A collection or set of individuals or objects or events whose properties are to be analyzed • Sample: A subset of the population is called ‘Sample’. A well-chosen sample will contain most of the information about a particular population parameter. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 14. Basic Terminologies In Statistics.. • Now you must be wondering how can one choose a sample that best represents the entire population. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 15. A Taxonomy of Statistics Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 16. Sampling Techniques… • Sampling is a statistical method that deals with the selection of individual observations within a population. • It is performed to infer statistical knowledge about a population. • Consider a scenario wherein you’re asked to perform a survey about the eating habits of teenagers in the US. There are over 42 million teens in the US at present and this number is growing as you read this blog. • Is it possible to survey each of these 42 million individuals about their health? Obviously not! That’s why sampling is used. • It is a method wherein a sample of the population is studied in order to draw inference about the entire population. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 17. Sampling Techniques… Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 18. Sampling Techniques… • Probability Sampling: This is a sampling technique in which samples from a large population are chosen using the theory of probability. There are three types of probability sampling: • Random Sampling: In this method, each member of the population has an equal chance of being selected in the sample. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 19. Sampling Techniques… • Systematic Sampling: In Systematic sampling, every nth record is chosen from the population to be a part of the sample. Refer the below figure to better understand how Systematic sampling works. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 20. Sampling Techniques… • Stratified Sampling: In Stratified sampling, a stratum is used to form samples from a large population. A stratum is a subset of the population that shares at least one common characteristic. After this, the random sampling method is used to select a sufficient number of subjects from each stratum. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 21. Types Of Statistics… • There are two well-defined types of statistics: • Descriptive Statistics • Inferential Statistics • Descriptive Statistics is a method used to describe and understand the features of a specific data set by giving short summaries about the sample and measures of the data. • Descriptive Statistics is mainly focused upon the main characteristics of data. It provides a graphical summary of the data. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 22. Types Of Statistics… • Suppose you want to gift all your classmate’s t-shirts. • To study the average shirt size of students in a classroom, in descriptive statistics you would record the shirt size of all students in the class and then you would find out the maximum, minimum and average shirt size of the class. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 23. Types Of Statistics… • Inferential statistics makes inferences and predictions about a population based on a sample of data taken from the population in question. • Inferential statistics generalizes a large dataset and applies probability to draw a conclusion. • It allows us to infer data parameters based on a statistical model using sample data. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 24. Types Of Statistics… Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 25. Types Of Statistics… • So, if we consider the same example of finding the average shirt size of students in a class, in Inferential Statistics, you will take a sample set of the class, which is basically a few people from the entire class. • You already have had grouped the class into large, medium and small. In this method, you basically build a statistical model and expand it for the entire population in the class. • So that was a brief understanding of Descriptive and Inferential Statistics. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 26. Statistical Description of Data • Statistics describes a numeric set of data by its • Center • Variability • Shape • Statistics describes a categorical set of data by • Frequency, percentage or proportion of each category • Measures of the center • A measure of spread Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 27. Measures of Central Tendency • Measures of the center are statistical measures that represent the summary of a dataset. There are three main measures of center: Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 28. Measures of Central Tendency… • Mean: The arithmetic mean (or, simply, mean) is computed by summing all the observations in the sample and dividing the sum by the number of observations. • For a sample of five household incomes, 6000, 10,000, 10,000, 14000, 50,000 the sample mean is, X = 6000 + 10000 + 10000 + 14000 + 50000 5 = 18000 Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 29. Measures of Central Tendency… • Median: In a list ranked from smallest measurement to the highest, the median is the middle value • In our example of five household incomes, first we rank the measurements • 6,000 10,000 10,000 14,000 50,000 • Sample Median is 10,000 Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 30. Measures of Central Tendency… • Mode: • In nominal data: • The value which occurs with the greatest frequency • 6,000 10,000 10,000 14,000 50,000 • Sample Mode is 10,000 Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 31. Measures of spread • A measure of spread, sometimes also called a measure of dispersion, is used to describe the variability in a sample or population. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 32. Measures of spread… • Range (present highest and lowest value in a distribution. The difference between these values is the range) • Range = Max(𝑥_𝑖) – Min(𝑥_𝑖) • Variance: It describes how much a random variable differs from its expected value. It entails • Standard deviation (the square root of the variance) Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 33. Sample Variance 2 i=1 n i 2 s = (x - x ) n -1  S = standard deviation (square root of variance) Measures of spread… Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 34. Measures of spread… • Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is 4 1 3 ) 5 7 ( ) 5 3 ( ) 5 5 ( 2 2 2 Standard Deviation: Square root of the variance. The standard deviation of the above example is 2. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 35. Examples 7 7 7 7 7 7 7 8 7 7 7 6 3 2 7 8 13 9 Mean = 7 SD=0 Mean = 7 SD=0.63 Mean = 7 SD=4.04 Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 36. Examples… Problem statement: Daenerys has 20 Dragons. They have the numbers 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4. Work out the Standard Deviation. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 37. Examples… For a Normal distribution approximately, a) 68% of the measurements fall within one standard deviation around the mean b) 95% of the measurements fall within two standard deviations around the mean c) 99.7% of the measurements fall within three standard deviations around the mean Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 38. Suppose the reaction time of a particular drug has a Normal distribution with a mean of 10 minutes and a standard deviation of 2 minutes Approximately, a) 68% of the subjects taking the drug will have reaction time between 8 and 12 minutes b) 95% of the subjects taking the drug will have reaction tome between 6 and 14 minutes c) 99.7% of the subjects taking the drug will have reaction tome between 4 and 16 minutes Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 39. Examples… • Quartile: Quartiles tell us about the spread of a data set by breaking the data set into quarters, just like the median breaks it in half. • To better understand how quartile and the IQR are calculated, let’s look at an example. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 40. Examples… Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 41. Examples… • The first quartile (Q1) lies between the 25th and 26th observation. • The second quartile (Q2) lies between the 50th and 51st observation. • The third quartile (Q3) lies between the 75th and 76th observation. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 42. Examples… • Inter Quartile Range (IQR): It is the measure of variability, based on dividing a data set into quartiles. The interquartile range is equal to Q3 minus Q1, i.e. IQR = Q3 – Q1 Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 43. Frequency Distribution Age 1 2 3 4 5 6 Frequency 5 3 7 5 4 2 Age Group 1-2 3-4 5-6 Frequency 8 12 6 Frequency Distribution of Age Grouped Frequency Distribution of Age: Consider a data set of 26 children of ages 1-6 years. Then the frequency distribution of variable ‘age’ can be tabulated as follows: Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 44. Frequency Distribution… Age Group 1-2 3-4 5-6 Frequency 8 12 6 Cumulative Frequency 8 20 26 Age 1 2 3 4 5 6 Frequency 5 3 7 5 4 2 Cumulative Frequency 5 8 15 20 24 26 Cumulative frequency of data in previous page Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 45. Graphical Data • Two types of statistical presentation of data - graphical and numerical. • Graphical Presentation: We look for the overall pattern and for striking deviations from that pattern. • Over all pattern usually described by shape, center, and spread of the data. • An individual value that falls outside the overall pattern is called an outlier. • Bar diagram and Pie charts are used for categorical variables. • Histogram, stem and leaf and Box-plot are used for numerical variable. Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 46. Graphical Data Bar Diagram: Lists the categories and presents the percent or count of individuals who fall in each category. Treatment Group Frequency Proportion Percent (%) 1 15 (15/60)=0.25 25.0 2 25 (25/60)=0.333 41.7 3 20 (20/60)=0.417 33.3 Total 60 1.00 100 0 20 40 60 80 100 120 1 2 3 Total 1 2 3 Total Percent (%) 25 41.7 33.3 100 Proportion 0 0 0 1 Frequency 15 25 20 60 Chart Title Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 47. Graphical Data Pie Chart: Lists the categories and presents the percent or count of individuals who fall in each category. Treatment Group Frequency Proportion Percent (%) 1 15 (15/60)=0.25 25.0 2 25 (25/60)=0.333 41.7 3 20 (20/60)=0.417 33.3 Total 60 1.00 100 Frequency Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 48. Summary Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical
  • 49. Thanks Course: Data Science Course Code: CS-325 -- Instructor: Sana Ullah Khan, Lecturer. Institute of Computing, KUST -- Email: sana.ullah@kust.edu.pk Topic: Introduction to Statistical