SlideShare a Scribd company logo
1 of 53
Graph for Quantitative data
Histogram
• A histogram is a special kind of bar graph that applies to
quantitative data(discrete or continuous). The horizontal axis
represents the range of data values. The bar height
represents the frequency of data values falling within the
interval formed by the width of the bar. The bars are also
pushed together with no spaces between them.
• A diagram consisting of rectangles whose area is
proportional to the frequency of a variable and whose width
is equal to the class interval.
Frequency polygon
• Frequency polygon are a graphical device fro understanding
for shapes of the distribution.
• Frequency polygon serve the same purpose as histograms
but it is helpful for comparing the sets of data
• Frequency polygon are a good choice for displaying
cumulative frequency distributions
Describing variability
• Variability, almost by definition, is the extent to which data
points in a statistical distribution or data set diverge, vary
from the average value, as well as the extent to which these
data points differ from each other. Variability refers to the
divergence of data from its mean value
• Central tendency describes the central point of the
distribution and variability describes the scores are scattered
around that central point
• Variability can be measured within the range, the
interquartile range and the standard deviation/ variance
Range
• The range is the total distance covered by the distribution,
from the highest score to the lowest score
• Range = maximum value – minimum value
Merits:-
• It is easier to compute.
• It can be used as a measure of variability where precision is
not required.
Demerits :-
• Its values depends on only two scores
• It is not sensitive to total condition of the distribution
Variance
• Variance is the expected value of the squared deviation of a
random variable from its mean.
• Variance is used in statistics for understanding of data set
distribution
Standard Deviation
• Standard deviation is simply the square of the variance.
Standard deviation measures the standard deviation
between a score and the mean
The Interquartile Range
• The interquartile range is the distance covered by the middle
50% of the distribution.
Variability for qualitative and Ranked Data
• Data is collection of facts and figures which relay something
specific, but which are not organized in any way
• Data set is a collection of related records or information . The
information may be on some entity or some subject area
• Collection of data objects and their attributes. Attributes
caputured the basic characteristics of an object
• Each row of a data set is called a record. Each data set also
has multiple attributes, each of which gives information on a
specific characteristics.
Qualitative And Quantitative data
• Qualitative :- Qualitative data provides information about the quality
of an object or information which cannot be measured. Qualitative
data cannot be expressed in number.
• Qualitative data is data concerned with description which can be
observed but cannot be computed
• Qualitative data is also known as categorical data
• Qualitative data can be further subdivided into two types
• 1) Nominal data
• 2) ordinal data
Scales of Measurement
Qualitative Quantitative
Numerical Numerical
Nonnumerical
Data
Nominal Ordinal Nominal Ordinal Interval Ratio
Quantitative data
• Quantitative data is the one that focus on numbers and
mathematical calculations and the calculated value can be
computed
• Quantitative data that can be expressed as a number or
quantified
• Quantitative data are of two types of quantitative data :-
• 1 ) Interval
• 2) Ratio data
Difference between Quantitative data and Qualitative
data
`Qualitative data Quantitative data
Qualitative data provides
information about the
quality of an object or
information which cannot
be measured
Quantitative data relates to
information about the
quantity of a object , hence it
can be measured
Types : nominal data and
ordinal data
Types : interval data and ratio
data
They are descriptive rather
than numerical in nature
Expressed in numerical form
Example : The team is well
prepared
Example : The team has 7
players.
Advantages and Disadvantages of Qualitative
data
• Advantages:-
• It helps in depth analysis
• Avoid pre-judgements
• Disadvantages:-
• Time consuming
• Not easy to generalize
Advantages and Disadvantages of Quantitative
data
• Advantages :-
• Easier to summarize and make comparisons
• It is often easier to obtain large sample size
• Disadvantages:-
• The cost is relatively high
• There is no accurate generalization of the data
Ranked data
• Ranked data is variable in which the value of the data is captured
from an ordered set which is recorded in the order of magnitude
• Ordinal represents the “order”. Ordinal dta is also known as
qualitative data or categorical data
• Characteristics of the ranked data:-
• 1)The ordinal data shows the relative ranking of the variables
• 2)The interval properties are not known
• 3)It identifies and describes the magnitude of a variable
• Examples :-
• a) level of agreement :- yes , no
• Time of a day :- Morning, Evening, Afternoon ,Night
Scale of measurement
• Scales of measurement is also known as levels of
measurements. Each level of measurement scale has a
specific properties that determine the various use of
statistical analysis
• There are four types of scales of measurement
• Nominal
• Ordinal
• Interval
• Ratio
Nominal
• Data are labels or names used to identify an attribute of the
element.
• A nonnumeric label or numeric code may be used.
• Example:- Students of a university are classified by the dorm
that they live in using a nonnumeric label such as Farley,
Keenan, Zahm, Breen-Phillips, and so on.
Ordinal
• The data have the properties of nominal data and the order or
rank of the data is meaningful
• A nonnumeric label or numeric code may be used.
• Example :- Students of a university are classified by their class
standing using a nonnumeric label such as Freshman,
Sophomore, Junior, or Senior.
Interval
• The data have the properties of ordinal data, and the interval
between observations is expressed in terms of a fixed unit of
measure.
• Interval data are always numeric.
• Example :- : Average Starting Salary Offer 2003
• Economics/Finance: $40,084
• History: $32,108
• Psychology: $27,454
Ratio
• The data have all the properties of interval data and the ratio of
two values is meaningful.
• Variables such as distance, height, weight, and time use the
ratio scale.
• This scale must contain a zero value that indicates that nothing
exists for the variable at the zero point.
• Example :- Econ & Finance majors salaries are 1.24 times
• History major salaries and are 1.46 times
• Psychology major salaries
Normal distribution and z-scores
• The normal distribution is a continuous probability
distribution that is symmetrical on both sides.
• The normal distribution is often called the bell curve because
the graph of its probability looks like a bell and it is also
known as gaussian distribution
• A normal distribution is determined by two parameters the
mean and the variance. A normal distribution with mean 0
and standard deviation 1 is a standard normal distribution
Z-scores
• The z-score or standard score is a fractional representation of
standard deviation from the mean value
• A score consists of two parts
• a) positive or negative sign indicating whether it’s above or below the
mean
• b) number indicating the size of its deviation from the mean in SD
units
Why are z-scores important?
• It is useful to standardize the values of a normal distribution
by converting them into z-scores
• Using the z-score technique , one can compare two different
test results based on relative performance , not on individual
scale
Correlation
• Correlation refers to a relationship between two or more
objects. In statistics the word correlation refers to the
relationship between two variables
• Covariance is the extent to which a change in one variable
corresponds systematically to change in order
Types of correlation
• Positive and negative
• Simple and multiple
• Partial and total
• Linear and non-linear
• Positive correlation:- Association between variables such that high
scores on one variable tends to have high scores on the other
variable. A direct relation between the variables.
• Negative correlation:- Association between variables such that high
scores on one variable tends to have high scores on the other
variable. A inverse relation between the variables.
Simple and multiple
• Single:- it is about the study of only two variables, the
relationships is described as a simple correlation
• Example:- quantity of money
• Multiple :- it is about the study of more than two variables
simultaneously , the relationship is described as a multiple
correlation
• Example:- the relationships of price
Partial and total correlation
• Partial correlation :- analysis recognizes more than two
variables but considers only two variables keeping the other
constant
• Total correlation :- Total correlation is based on all the
relevant variables, which is normally not feasible in total
correlation, all the facts are taken into account
Linear and non-linear correlation
• Linear correlation : - correlation is said to be linear when the amount
of change in one variable tends to bear a constant ratio to the
amount of change in the other
• Non linear correlation :- correlation is said to be non linear if the
amount of change in one variable does not bear a constant ratio to
the amount of change in the other
Classification of correlation
• Two methods are used for finding relationship between variables:-
• Graphic methods
• Mathematical methods
• Graphic methods are further divided into scatter diagram and simple
graph
• Types of mathematical methods
• Karl Pearson's Coefficient of Correlation
• Spearman's Rank Correlation Coefficient
• Coefficient of concurrent deviation
• Method of least squares
Coefficient of correlation
• Correlation : the degree of the relationship between the
variables under consideration is measure through the
correlation analysis
• The measure of correlation called the correlation Coefficient
• If two variables vary in the movement in one are
accompanied in other these variables are called as cause and
effect relationship
• The degree of the relationship is expressed by (-1<= r >= +1)
Properties of correlation
• Correlation requires that both variables be
quantitative
• The correlation coefficient is always between -1 and
+1
• The correlation coefficient is a pure number without
units
• The correlation can be misleading in the presence of
outliers or non linear associations
• Correlation measures association
Scatter plots
• When two variables x and y have an association (or
relationship), we say there exists a correlation between them.
Alternatively, we could say x and y are correlated
• One variable is called independent (X) and the second is called
dependent(Y)
• Scatterplot is a graph in which the paired (x,y) sample data are
plotted with a horizontal x axis and vertical y axis
Advantages and Disadvantages scatter
diagram
• Advantages :-
• It is a simple to implement and attractive method to find out
the nature of correlation
• It is easy to understand
• User will get rough idea about correlation
• Not influenced by the size of extreme item
• First step in investing the relationship between two variables
Disadvantages:-
• Can not adopt an exact degree of correlation
Correlation coefficient for quantitative data
• The product moment correlation , r, summarizes the strength of
association between two metric variables X and Y
• It is an index used to determine a linear or a straight lime relationship
between X and Y
• It measures the nature and strength between two variables of the
quantitative data
• The sign of r denotes the nature of association. While the value of r
denotes the strength of association
• The value of r range between -1 and +1
• If r = 0 -> no correlation
• If 0 < r < 0.25-> weak correlation
• If 0.25<= r < 0.75->Intermediate correlation
• If 0.75 <= r < 1 -> strong correlation
• r = 1 -> perfect correlation
Regression
• X, if the output is continuous this is called a regression problem.
• Regression is concerned with the prediction of continuous quantities.
Linear regression is the oldest and widely used predictive model in
the field of machine learning
• The goal of the regression is to minimize the sum of the squared
errors to a fit straight line to a set of data points
• It is one of the supervised learning algorithm. A regression model
requires the knowledge of both the dependent and the independent
variables in the training data set
• Simple linear regression is a statistical model in which there is only
one independent and the functional relationship between the
dependent variable and the regression coefficient is linear
• Regression is the line which gives the best estimate of one variable
Regression line of Y and X
• Y = a + b(x)
• Where
• A -> Y – intercept
• B -> slope of the line
• Y -> dependent variable
• X -> independent variable
Regression line
• A way of making a somewhat precise prediction based
upon the relationships between two variables
• Regression line is placed so that it minimizes the
predictive error
• A negative residual indicates that the model is over
predicting
• A positive residual indicates that the model is under-
prediction
Linear regression
• The simplest form of regression to visualize is linear
regression with a single predicate. A linear regression
technique can be used if the relationship between X and Y
can be approximated with a straight line
Non linear regression
• Non linear regression is used when it cannot be approximated with a
straight line
• The X and Y have a nonlinear relationship
• There are two important shortcomings of linear regression
• 1) predictive ability :- the linear regression fit often has low bias but
high variance
• 2) interpretative ability :- linear regression freely assigns a coefficient
to each predictor variable
Least Squares Regression Line
• The method of least squares is about estimating parameters by
minimizing the squared discrepancies between observed data
• The least squares (LS) criterion states that the sum of the squares of
errors is minimum. The least-squares solution yield y(x) whose
element sum 1, but do not ensure the outputs to be in the in the
range[0,1]
• The process of getting parameter estimators is called estimators.
Least squares method is the estimation method of Ordinary Least
Squares (OLS).
Disadvantages of least square
• Last robustness to outliers
• Certain datasets unsuitable for least squares classification
• Decision boundary corresponds to ML solution
Interpretation of R(square)
• The following measures are used to validate the simple linear
regression models:
• Coefficient of determination (R-square)
• Outliers analysis
• Residual analysis to validate the regression model
• Hypothesis test for the regression coefficient b₁
Characteristics of R-square
• R-square is a proportion , it is always a number between 0 and 1
• R(square) = 1 -> all of the data points full perfectly on the regression
line. The predictor x accounts for all the variation in y!
• Coefficient of determination R(square) a measure that assesses the
ability of a model to predict or explain the linear regression setting.
• More R₂ indicates the model good fit
Spurious regression
• The regression is spurious when we regress one random walk onto
independent random walk
• The coefficient estimate will not converge toward 0
• The t value most often is significant.
• R₂ is typically very high
• Spurious regression is linked to serially
Hypothesis test for regression co-efficient (t-
test)
• The regression co-efficient captures the existence of a of a linear
relationship between the response variable and the explanatory
variable
• Using the analysis of variance (ANOVA), we can whether the overall
model is statistically significant
Residual analysis
• Residual (error) analysis is important to check whether the
assumption of regression models have been satisfied
• The residuals are normally distributed
• If there are any outliers
• The functional form of regression is correctly specified
• The variance of residual is constant
Multiple regression equation
• Multiple linear regression is an extension of linear regression , which
allows a response variable , y to be modelled as a linear function of as
a linear function or more predictor variables
• In a multiple regression model, two or more independent variables,
prediction are involved in a model. The simple linear regression
model and the multiple regression model assume that the dependent
variable is continuous
Difference between Simple and Multiple
regression
Simple regression Multiple regression
One dependent variable Y
predicted from one independent
variable X
One dependent variable Y
predicted from a set of
independent (X1,X2,……….Xn)
One regression coefficient One regression coefficient for each
independent variables

More Related Content

Similar to fundamentals of data science and analytics on descriptive analysis.pptx

CHAPTER 2 - NORM, CORRELATION AND REGRESSION.ppt
CHAPTER 2  - NORM, CORRELATION AND REGRESSION.pptCHAPTER 2  - NORM, CORRELATION AND REGRESSION.ppt
CHAPTER 2 - NORM, CORRELATION AND REGRESSION.pptkriti137049
 
Organizational Data Analysis by Mr Mumba.pptx
Organizational Data Analysis by Mr Mumba.pptxOrganizational Data Analysis by Mr Mumba.pptx
Organizational Data Analysis by Mr Mumba.pptxbentrym2
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptxTawhid4
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersRupa Verma
 
Levels of Measurement
Levels of MeasurementLevels of Measurement
Levels of MeasurementSarfraz Ahmad
 
Levels of measurement
Levels of measurementLevels of measurement
Levels of measurementSarfraz Ahmad
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptDiptoKumerSarker1
 
Stats !.pdf
Stats !.pdfStats !.pdf
Stats !.pdfphweb
 
Scaling and measurement technique
Scaling and measurement techniqueScaling and measurement technique
Scaling and measurement techniqueSiddharth Gupta
 
Chapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index NumbersChapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index NumbersInternational advisers
 

Similar to fundamentals of data science and analytics on descriptive analysis.pptx (20)

CHAPTER 2 - NORM, CORRELATION AND REGRESSION.ppt
CHAPTER 2  - NORM, CORRELATION AND REGRESSION.pptCHAPTER 2  - NORM, CORRELATION AND REGRESSION.ppt
CHAPTER 2 - NORM, CORRELATION AND REGRESSION.ppt
 
Organizational Data Analysis by Mr Mumba.pptx
Organizational Data Analysis by Mr Mumba.pptxOrganizational Data Analysis by Mr Mumba.pptx
Organizational Data Analysis by Mr Mumba.pptx
 
Intro statistics
Intro statisticsIntro statistics
Intro statistics
 
RM7.ppt
RM7.pptRM7.ppt
RM7.ppt
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Statistics
StatisticsStatistics
Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
1 introduction to psychological statistics
1 introduction to psychological statistics1 introduction to psychological statistics
1 introduction to psychological statistics
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptx
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse Researchers
 
Levels of Measurement
Levels of MeasurementLevels of Measurement
Levels of Measurement
 
Levels of measurement
Levels of measurementLevels of measurement
Levels of measurement
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
 
Stats !.pdf
Stats !.pdfStats !.pdf
Stats !.pdf
 
Scaling and measurement technique
Scaling and measurement techniqueScaling and measurement technique
Scaling and measurement technique
 
PRESENTATION.pptx
PRESENTATION.pptxPRESENTATION.pptx
PRESENTATION.pptx
 
BMS.ppt
BMS.pptBMS.ppt
BMS.ppt
 
Chapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index NumbersChapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index Numbers
 
RM UNIT 6.pptx
RM UNIT 6.pptxRM UNIT 6.pptx
RM UNIT 6.pptx
 
R training4
R training4R training4
R training4
 

Recently uploaded

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 

Recently uploaded (20)

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

fundamentals of data science and analytics on descriptive analysis.pptx

  • 2. Histogram • A histogram is a special kind of bar graph that applies to quantitative data(discrete or continuous). The horizontal axis represents the range of data values. The bar height represents the frequency of data values falling within the interval formed by the width of the bar. The bars are also pushed together with no spaces between them. • A diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval.
  • 3. Frequency polygon • Frequency polygon are a graphical device fro understanding for shapes of the distribution. • Frequency polygon serve the same purpose as histograms but it is helpful for comparing the sets of data • Frequency polygon are a good choice for displaying cumulative frequency distributions
  • 5. • Variability, almost by definition, is the extent to which data points in a statistical distribution or data set diverge, vary from the average value, as well as the extent to which these data points differ from each other. Variability refers to the divergence of data from its mean value • Central tendency describes the central point of the distribution and variability describes the scores are scattered around that central point • Variability can be measured within the range, the interquartile range and the standard deviation/ variance
  • 6. Range • The range is the total distance covered by the distribution, from the highest score to the lowest score • Range = maximum value – minimum value
  • 7. Merits:- • It is easier to compute. • It can be used as a measure of variability where precision is not required. Demerits :- • Its values depends on only two scores • It is not sensitive to total condition of the distribution
  • 8. Variance • Variance is the expected value of the squared deviation of a random variable from its mean. • Variance is used in statistics for understanding of data set distribution
  • 9. Standard Deviation • Standard deviation is simply the square of the variance. Standard deviation measures the standard deviation between a score and the mean
  • 10. The Interquartile Range • The interquartile range is the distance covered by the middle 50% of the distribution.
  • 11. Variability for qualitative and Ranked Data • Data is collection of facts and figures which relay something specific, but which are not organized in any way • Data set is a collection of related records or information . The information may be on some entity or some subject area • Collection of data objects and their attributes. Attributes caputured the basic characteristics of an object • Each row of a data set is called a record. Each data set also has multiple attributes, each of which gives information on a specific characteristics.
  • 12. Qualitative And Quantitative data • Qualitative :- Qualitative data provides information about the quality of an object or information which cannot be measured. Qualitative data cannot be expressed in number. • Qualitative data is data concerned with description which can be observed but cannot be computed • Qualitative data is also known as categorical data • Qualitative data can be further subdivided into two types • 1) Nominal data • 2) ordinal data
  • 13. Scales of Measurement Qualitative Quantitative Numerical Numerical Nonnumerical Data Nominal Ordinal Nominal Ordinal Interval Ratio
  • 14. Quantitative data • Quantitative data is the one that focus on numbers and mathematical calculations and the calculated value can be computed • Quantitative data that can be expressed as a number or quantified • Quantitative data are of two types of quantitative data :- • 1 ) Interval • 2) Ratio data
  • 15. Difference between Quantitative data and Qualitative data `Qualitative data Quantitative data Qualitative data provides information about the quality of an object or information which cannot be measured Quantitative data relates to information about the quantity of a object , hence it can be measured Types : nominal data and ordinal data Types : interval data and ratio data They are descriptive rather than numerical in nature Expressed in numerical form Example : The team is well prepared Example : The team has 7 players.
  • 16. Advantages and Disadvantages of Qualitative data • Advantages:- • It helps in depth analysis • Avoid pre-judgements • Disadvantages:- • Time consuming • Not easy to generalize
  • 17. Advantages and Disadvantages of Quantitative data • Advantages :- • Easier to summarize and make comparisons • It is often easier to obtain large sample size • Disadvantages:- • The cost is relatively high • There is no accurate generalization of the data
  • 18. Ranked data • Ranked data is variable in which the value of the data is captured from an ordered set which is recorded in the order of magnitude • Ordinal represents the “order”. Ordinal dta is also known as qualitative data or categorical data • Characteristics of the ranked data:- • 1)The ordinal data shows the relative ranking of the variables • 2)The interval properties are not known • 3)It identifies and describes the magnitude of a variable • Examples :- • a) level of agreement :- yes , no • Time of a day :- Morning, Evening, Afternoon ,Night
  • 19. Scale of measurement • Scales of measurement is also known as levels of measurements. Each level of measurement scale has a specific properties that determine the various use of statistical analysis • There are four types of scales of measurement • Nominal • Ordinal • Interval • Ratio
  • 20. Nominal • Data are labels or names used to identify an attribute of the element. • A nonnumeric label or numeric code may be used. • Example:- Students of a university are classified by the dorm that they live in using a nonnumeric label such as Farley, Keenan, Zahm, Breen-Phillips, and so on.
  • 21. Ordinal • The data have the properties of nominal data and the order or rank of the data is meaningful • A nonnumeric label or numeric code may be used. • Example :- Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior.
  • 22. Interval • The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure. • Interval data are always numeric. • Example :- : Average Starting Salary Offer 2003 • Economics/Finance: $40,084 • History: $32,108 • Psychology: $27,454
  • 23. Ratio • The data have all the properties of interval data and the ratio of two values is meaningful. • Variables such as distance, height, weight, and time use the ratio scale. • This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. • Example :- Econ & Finance majors salaries are 1.24 times • History major salaries and are 1.46 times • Psychology major salaries
  • 24. Normal distribution and z-scores • The normal distribution is a continuous probability distribution that is symmetrical on both sides. • The normal distribution is often called the bell curve because the graph of its probability looks like a bell and it is also known as gaussian distribution • A normal distribution is determined by two parameters the mean and the variance. A normal distribution with mean 0 and standard deviation 1 is a standard normal distribution
  • 25. Z-scores • The z-score or standard score is a fractional representation of standard deviation from the mean value • A score consists of two parts • a) positive or negative sign indicating whether it’s above or below the mean • b) number indicating the size of its deviation from the mean in SD units
  • 26. Why are z-scores important? • It is useful to standardize the values of a normal distribution by converting them into z-scores • Using the z-score technique , one can compare two different test results based on relative performance , not on individual scale
  • 27. Correlation • Correlation refers to a relationship between two or more objects. In statistics the word correlation refers to the relationship between two variables • Covariance is the extent to which a change in one variable corresponds systematically to change in order
  • 28. Types of correlation • Positive and negative • Simple and multiple • Partial and total • Linear and non-linear
  • 29. • Positive correlation:- Association between variables such that high scores on one variable tends to have high scores on the other variable. A direct relation between the variables. • Negative correlation:- Association between variables such that high scores on one variable tends to have high scores on the other variable. A inverse relation between the variables.
  • 30. Simple and multiple • Single:- it is about the study of only two variables, the relationships is described as a simple correlation • Example:- quantity of money • Multiple :- it is about the study of more than two variables simultaneously , the relationship is described as a multiple correlation • Example:- the relationships of price
  • 31. Partial and total correlation • Partial correlation :- analysis recognizes more than two variables but considers only two variables keeping the other constant • Total correlation :- Total correlation is based on all the relevant variables, which is normally not feasible in total correlation, all the facts are taken into account
  • 32. Linear and non-linear correlation • Linear correlation : - correlation is said to be linear when the amount of change in one variable tends to bear a constant ratio to the amount of change in the other • Non linear correlation :- correlation is said to be non linear if the amount of change in one variable does not bear a constant ratio to the amount of change in the other
  • 33. Classification of correlation • Two methods are used for finding relationship between variables:- • Graphic methods • Mathematical methods • Graphic methods are further divided into scatter diagram and simple graph • Types of mathematical methods • Karl Pearson's Coefficient of Correlation • Spearman's Rank Correlation Coefficient • Coefficient of concurrent deviation • Method of least squares
  • 34. Coefficient of correlation • Correlation : the degree of the relationship between the variables under consideration is measure through the correlation analysis • The measure of correlation called the correlation Coefficient • If two variables vary in the movement in one are accompanied in other these variables are called as cause and effect relationship • The degree of the relationship is expressed by (-1<= r >= +1)
  • 35. Properties of correlation • Correlation requires that both variables be quantitative • The correlation coefficient is always between -1 and +1 • The correlation coefficient is a pure number without units • The correlation can be misleading in the presence of outliers or non linear associations • Correlation measures association
  • 36. Scatter plots • When two variables x and y have an association (or relationship), we say there exists a correlation between them. Alternatively, we could say x and y are correlated • One variable is called independent (X) and the second is called dependent(Y) • Scatterplot is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and vertical y axis
  • 37. Advantages and Disadvantages scatter diagram • Advantages :- • It is a simple to implement and attractive method to find out the nature of correlation • It is easy to understand • User will get rough idea about correlation • Not influenced by the size of extreme item • First step in investing the relationship between two variables Disadvantages:- • Can not adopt an exact degree of correlation
  • 38. Correlation coefficient for quantitative data • The product moment correlation , r, summarizes the strength of association between two metric variables X and Y • It is an index used to determine a linear or a straight lime relationship between X and Y • It measures the nature and strength between two variables of the quantitative data • The sign of r denotes the nature of association. While the value of r denotes the strength of association • The value of r range between -1 and +1
  • 39. • If r = 0 -> no correlation • If 0 < r < 0.25-> weak correlation • If 0.25<= r < 0.75->Intermediate correlation • If 0.75 <= r < 1 -> strong correlation • r = 1 -> perfect correlation
  • 40. Regression • X, if the output is continuous this is called a regression problem. • Regression is concerned with the prediction of continuous quantities. Linear regression is the oldest and widely used predictive model in the field of machine learning • The goal of the regression is to minimize the sum of the squared errors to a fit straight line to a set of data points • It is one of the supervised learning algorithm. A regression model requires the knowledge of both the dependent and the independent variables in the training data set • Simple linear regression is a statistical model in which there is only one independent and the functional relationship between the dependent variable and the regression coefficient is linear • Regression is the line which gives the best estimate of one variable
  • 41. Regression line of Y and X • Y = a + b(x) • Where • A -> Y – intercept • B -> slope of the line • Y -> dependent variable • X -> independent variable
  • 42. Regression line • A way of making a somewhat precise prediction based upon the relationships between two variables • Regression line is placed so that it minimizes the predictive error • A negative residual indicates that the model is over predicting • A positive residual indicates that the model is under- prediction
  • 43. Linear regression • The simplest form of regression to visualize is linear regression with a single predicate. A linear regression technique can be used if the relationship between X and Y can be approximated with a straight line
  • 44. Non linear regression • Non linear regression is used when it cannot be approximated with a straight line • The X and Y have a nonlinear relationship • There are two important shortcomings of linear regression • 1) predictive ability :- the linear regression fit often has low bias but high variance • 2) interpretative ability :- linear regression freely assigns a coefficient to each predictor variable
  • 45. Least Squares Regression Line • The method of least squares is about estimating parameters by minimizing the squared discrepancies between observed data • The least squares (LS) criterion states that the sum of the squares of errors is minimum. The least-squares solution yield y(x) whose element sum 1, but do not ensure the outputs to be in the in the range[0,1] • The process of getting parameter estimators is called estimators. Least squares method is the estimation method of Ordinary Least Squares (OLS).
  • 46. Disadvantages of least square • Last robustness to outliers • Certain datasets unsuitable for least squares classification • Decision boundary corresponds to ML solution
  • 47. Interpretation of R(square) • The following measures are used to validate the simple linear regression models: • Coefficient of determination (R-square) • Outliers analysis • Residual analysis to validate the regression model • Hypothesis test for the regression coefficient b₁
  • 48. Characteristics of R-square • R-square is a proportion , it is always a number between 0 and 1 • R(square) = 1 -> all of the data points full perfectly on the regression line. The predictor x accounts for all the variation in y! • Coefficient of determination R(square) a measure that assesses the ability of a model to predict or explain the linear regression setting. • More R₂ indicates the model good fit
  • 49. Spurious regression • The regression is spurious when we regress one random walk onto independent random walk • The coefficient estimate will not converge toward 0 • The t value most often is significant. • R₂ is typically very high • Spurious regression is linked to serially
  • 50. Hypothesis test for regression co-efficient (t- test) • The regression co-efficient captures the existence of a of a linear relationship between the response variable and the explanatory variable • Using the analysis of variance (ANOVA), we can whether the overall model is statistically significant
  • 51. Residual analysis • Residual (error) analysis is important to check whether the assumption of regression models have been satisfied • The residuals are normally distributed • If there are any outliers • The functional form of regression is correctly specified • The variance of residual is constant
  • 52. Multiple regression equation • Multiple linear regression is an extension of linear regression , which allows a response variable , y to be modelled as a linear function of as a linear function or more predictor variables • In a multiple regression model, two or more independent variables, prediction are involved in a model. The simple linear regression model and the multiple regression model assume that the dependent variable is continuous
  • 53. Difference between Simple and Multiple regression Simple regression Multiple regression One dependent variable Y predicted from one independent variable X One dependent variable Y predicted from a set of independent (X1,X2,……….Xn) One regression coefficient One regression coefficient for each independent variables