DESCRIBING DATA
ABSTRACT
• Descriptive Statistics.
• The information (data) from your sample or population can be visualized with
graphs or summarized by numbers.
• This will show key information in a simpler way than just looking at raw data.
It can help us understand how the data is distributed.
• Graphs can visually show the data distribution.
TYPES OF DATA
• Nominal, Ordinal, Discrete And Continuous.
• Integer (int) It is the most common numeric data type used to store numbers without a fractional component (-707, 0, 707).
• Floating Point (float) ...
• Character (char) ...
• String (str or text) ...
• Boolean (bool) ...
• Enumerated type (enum) ...
• Array. ...
• Date.
TYPES OF VARIABLES
• Independent ...
• dependent variables...
• Active and attribute variables...
• Continuous...
• discrete and categorical variable...
• Extraneous variables and Demographic …
DESCRIBING DATA WITH TABLES AND GRAPHS
• Tables and graphs are used to describe data by summarizing it and making it easier
to understand.
• Frequency distribution
• Bar chart
• Histogram
• Pie chart
• Scatter plot
DESCRIBING DATA WITH AVERAGES
• The mean of a data set is the average of all the data values.
• The sample mean x is the point estimator of the population mean μ.
• The median of a data set is the value in the middle when the data items are
arranged in ascending order.
• Mean is the most frequently used measure of central tendency and generally
considered the best measure of it.
DESCRIBING VARIABILITY
• Variability describes how far apart data points lie from each other and from the
center of a distribution.
• The four main ways to describe variability in a data set are:
• Range – The range is the amount between the smallest and largest item in the set.
• Interquartile range – The interquartile range s a number that indicates how spread
out scores are and tells users what the range is in the middle of a set of scores.
NORMAL DISTRIBUTIONS AND STANDARD (Z)
SCORES
• Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the
mean are more frequent in occurrence than data far from the mean.
• The normal distribution appears as a “bell curve” when graphed.
• A normal distribution is a type of continuous probability distribution in which
most data points cluster toward the middle of the range, while the rest taper
off symmetrically toward either extreme.
• The middle of the range is also known as the mean of the distribution.
CONCLUSION
• Not employing statistical methods can lead to an analysis of data without
accounting for variability, which, as a result, can show up wrong estimates.
• Data science is the study of data to extract meaningful insights for business
• Principles and practices from the fields of mathematics, statistics, artificial
intelligence, and computer engineering.
• Data processing has become a crucial aspect of modern computing and
business operations, allowing organizations to store, analyze, manipulate.
THANK YOU !

Presentation.pdf describing data with foundation of data science

  • 1.
  • 2.
    ABSTRACT • Descriptive Statistics. •The information (data) from your sample or population can be visualized with graphs or summarized by numbers. • This will show key information in a simpler way than just looking at raw data. It can help us understand how the data is distributed. • Graphs can visually show the data distribution.
  • 3.
    TYPES OF DATA •Nominal, Ordinal, Discrete And Continuous. • Integer (int) It is the most common numeric data type used to store numbers without a fractional component (-707, 0, 707). • Floating Point (float) ... • Character (char) ... • String (str or text) ... • Boolean (bool) ... • Enumerated type (enum) ... • Array. ... • Date.
  • 4.
    TYPES OF VARIABLES •Independent ... • dependent variables... • Active and attribute variables... • Continuous... • discrete and categorical variable... • Extraneous variables and Demographic …
  • 6.
    DESCRIBING DATA WITHTABLES AND GRAPHS • Tables and graphs are used to describe data by summarizing it and making it easier to understand. • Frequency distribution • Bar chart • Histogram • Pie chart • Scatter plot
  • 7.
    DESCRIBING DATA WITHAVERAGES • The mean of a data set is the average of all the data values. • The sample mean x is the point estimator of the population mean μ. • The median of a data set is the value in the middle when the data items are arranged in ascending order. • Mean is the most frequently used measure of central tendency and generally considered the best measure of it.
  • 8.
    DESCRIBING VARIABILITY • Variabilitydescribes how far apart data points lie from each other and from the center of a distribution. • The four main ways to describe variability in a data set are: • Range – The range is the amount between the smallest and largest item in the set. • Interquartile range – The interquartile range s a number that indicates how spread out scores are and tells users what the range is in the middle of a set of scores.
  • 9.
    NORMAL DISTRIBUTIONS ANDSTANDARD (Z) SCORES
  • 11.
    • Normal distribution,also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. • The normal distribution appears as a “bell curve” when graphed. • A normal distribution is a type of continuous probability distribution in which most data points cluster toward the middle of the range, while the rest taper off symmetrically toward either extreme. • The middle of the range is also known as the mean of the distribution.
  • 12.
    CONCLUSION • Not employingstatistical methods can lead to an analysis of data without accounting for variability, which, as a result, can show up wrong estimates. • Data science is the study of data to extract meaningful insights for business • Principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering. • Data processing has become a crucial aspect of modern computing and business operations, allowing organizations to store, analyze, manipulate.
  • 13.