This will help understand the basic concepts of Statistics like data types, level of measurements, central tendency, dispersion, graphs, univaraite analysis, bivariate analysis and more. Moreover, it will also help you to select appropriate summary statistics and charts for your data.
Introduction to Statistics - Basic concepts
- How to be a good doctor - A step in Health promotion
- By Ibrahim A. Abdelhaleem - Zagazig Medical Research Society (ZMRS)
Introduction to Statistics - Basic concepts
- How to be a good doctor - A step in Health promotion
- By Ibrahim A. Abdelhaleem - Zagazig Medical Research Society (ZMRS)
Introduction to Statistics -
Sampling Techniques, Types of Statistics, Descriptive Statistics,
Inferential Statistics,
Variables and Types of Data: Qualitative, Quantitative, Discrete,
Continuous, Organizing and Graphing Data: Qualitative Data, Quantitative Data
This presentation was intended for employees of Dubai Municipality. It is about how to use SPSS and other statistical data analysis tools like Excel and Minitab in data analysis. The course presented some statistical concepts and definitions.
Measure of Dispersion, Range, Mean and Standard Deviation, Correlation and Re...Parth Chuahan
It is simply the difference between the maximum value and the minimum value given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
Standard Deviation: The square root of the variance is known as the standard deviation i.e. S.D. = √σ
Mean and Mean Deviation: The average of numbers is known as the mean and the arithmetic mean of the absolute deviations of the observations from a measure of central tendency is known as the mean deviation (also called mean absolute deviation).
There are two main types of dispersion methods in statistics which are:
Absolute Measure of Dispersion
Relative Measure of Dispersion
The most commonly used techniques for investigating the relationship between two quantitative variables are correlation and linear regression. Correlation quantifies the strength of the linear relationship between a pair of variables, whereas regression expresses the relationship in the form of an equation.
Understanding data type is an important concept in statistics, when you are designing an experiment, you want to know what type of data you are dealing with, that will decide what type of statistical analysis, visualizations and prediction algorithms could be used.
#data #data types #ai #machine learning #statistics #data science #data analytics #artificial intelligence
Measures of Central Tendency, Variability and ShapesScholarsPoint1
The PPT describes the Measures of Central Tendency in detail such as Mean, Median, Mode, Percentile, Quartile, Arthemetic mean. Measures of Variability: Range, Mean Absolute deviation, Standard Deviation, Z-Score, Variance, Coefficient of Variance as well as Measures of Shape such as kurtosis and skewness in the grouped and normal data.
Measure of Central Tendency (Mean, Median, Mode and Quantiles)Salman Khan
A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method.
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesParag Shah
This presentation will give perfect understanding of data, data types, level of measurements, exploratory data analysis and more importantly, when to use which type of summary statistics and graphs
Introduction to Statistics -
Sampling Techniques, Types of Statistics, Descriptive Statistics,
Inferential Statistics,
Variables and Types of Data: Qualitative, Quantitative, Discrete,
Continuous, Organizing and Graphing Data: Qualitative Data, Quantitative Data
This presentation was intended for employees of Dubai Municipality. It is about how to use SPSS and other statistical data analysis tools like Excel and Minitab in data analysis. The course presented some statistical concepts and definitions.
Measure of Dispersion, Range, Mean and Standard Deviation, Correlation and Re...Parth Chuahan
It is simply the difference between the maximum value and the minimum value given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
Standard Deviation: The square root of the variance is known as the standard deviation i.e. S.D. = √σ
Mean and Mean Deviation: The average of numbers is known as the mean and the arithmetic mean of the absolute deviations of the observations from a measure of central tendency is known as the mean deviation (also called mean absolute deviation).
There are two main types of dispersion methods in statistics which are:
Absolute Measure of Dispersion
Relative Measure of Dispersion
The most commonly used techniques for investigating the relationship between two quantitative variables are correlation and linear regression. Correlation quantifies the strength of the linear relationship between a pair of variables, whereas regression expresses the relationship in the form of an equation.
Understanding data type is an important concept in statistics, when you are designing an experiment, you want to know what type of data you are dealing with, that will decide what type of statistical analysis, visualizations and prediction algorithms could be used.
#data #data types #ai #machine learning #statistics #data science #data analytics #artificial intelligence
Measures of Central Tendency, Variability and ShapesScholarsPoint1
The PPT describes the Measures of Central Tendency in detail such as Mean, Median, Mode, Percentile, Quartile, Arthemetic mean. Measures of Variability: Range, Mean Absolute deviation, Standard Deviation, Z-Score, Variance, Coefficient of Variance as well as Measures of Shape such as kurtosis and skewness in the grouped and normal data.
Measure of Central Tendency (Mean, Median, Mode and Quantiles)Salman Khan
A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method.
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesParag Shah
This presentation will give perfect understanding of data, data types, level of measurements, exploratory data analysis and more importantly, when to use which type of summary statistics and graphs
This presentation covers statistics, its importance, its applications, branches of statistics, basic concepts used in statistics, data sampling, types of sampling,types of data and collection of data.
Segunda parte del Curso de Perfeccionamiento Profesional no Conducente a Grado Académico: Inglés Técnico para Profesionales de Ciencias de la Salud. DEPARTAMENTO ADMINISTRATIVO SOCIAL. Escuela de Enfermería. ULA. Mérida. Venezuela. Se oferta en la modalidad presencial de 3 ó 4 unidades crédito y los costos son solidarios y dependen de la zona del país que lo solicite.
El inglés técnico se basa en el tipo de vocabulario que va a manejar y el objetivo para el que va a estudiar inglés. En general en inglés técnico se busca poder comprender textos, y principalmente, textos técnicos de las disciplinas de salud en este caso que esté buscando, por ejemplo, si estas estudiando algo que tenga que ver con Medicina o Enfermería, empezara a ver nombres de enfermedades, enfoques epidemiológicos, entre otros. A diferencia del inglés normal que es mayormente comunicación diaria y gramática.
Durante las sesiones de aprendizaje se presentan las nociones generales acerca de la gramática de escritura inglesa y su transferencia en nuestra lengua española. En este módulo, se inicia la experiencia práctica eligiendo textos para observar los elementos facilitados.
Seguidamente, los participantes las ideas que se encuentran alrededor de fuentes en línea para profundizar en el aprendizaje en materia de inglés técnico.
Data:
A set of values recorded on one or more observational units i.e. Object, person etc
Types of data:
Qualitative/ Quantitative data
Discrete/ Continuous data
Primary/ Secondary data
Nominal/ Ordinal data
Non-parametric tests are sometimes called distribution-free tests because they are based on fewer assumptions (e.g., they do not assume that the outcome is approximately normally distributed). The cost of fewer assumptions is that non-parametric tests are generally less powerful than their parametric counterparts.
Correlation & Regression Analysis using SPSSParag Shah
Concept of Correlation, Simple Linear Regression & Multiple Linear Regression and its analysis using SPSS. How it check the validity of assumptions in Regression
SPSS does not have Z test for proportions, So, we use Chi-Square test for proportion tests. Test for single proportion and Test for proportions of two samples
Chi Square test for independence of attributes / Testing association between two categorical variables, Chi-Square test for Goodness of fit / Testing significant difference between observed and expected frequencies
Chi-Square test for independence of attributes / Chi-Square test for checking association between two categorical variables, Chi-Square test for goodness of fit
t test for single mean, t test for means of independent samples, t test for means of dependent sample ( Paired t test). Case study / Examples for hands on experience of how SPSS can be used for different hypothesis testing - t test.
Basics of Hypothesis testing for PharmacyParag Shah
This presentation will clarify all basic concepts and terms of hypothesis testing. It will also help you to decide correct Parametric & Non-Parametric test for your data
This presentation will clarify all your basic concepts of Probability. It includes Random Experiment, Sample Space, Event, Complementary event, Union - Intersection and difference of events, favorable cases, probability definitions, conditional probability, Bayes theorem
This ppt includes basic concepts about data types, levels of measurements. It also explains which descriptive measure, graph and tests should be used for different types of data. A brief of Pivot tables and charts is also included.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Testing of hypothesis - large sample testParag Shah
Different type of test which are used for large sample has been included in this presentation. Steps for each test and a case study is included for concept clarity and practice.
This ppt is to guide students opting for Statistics major. It gives an idea of skills required and job prospects. It also emphasizes on the important life skills along with Statistics knowledge, analytical thinking and hands on analytical software .
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. SESSION
FLOW
What is Statistics?
Population & Sample
What is Data?
Types of Data
Level of Measurements
Summary Statistics
Types of Charts
Presentation of data
Univariate Analysis
Bivariate Analysis
3. Statistics
Statistics is the science concerned with developing and
studying methods for collecting, analysing, interpreting
and presenting data.
4. Population is the entire group that you
want to draw conclusions about.
Sample is a subset of a population that
contains characteristics of that
population.
6. What is Data ?
Data is a collection of facts or information from which
conclusions may be drawn.
7. Data
Laal Singh Chaddha (Aamir Khan) is that passenger on your train who has a lot of
stories to tell, even if you don’t want to be part of it. That’s how the story starts by
Laal making the viewers the co-passengers on a train to Chandigarh and starting to
narrate his journey from a dim-witted guy wearing leg-braces to the front-page
celebrity of a famous magazine. Laal grows up with just one person Rupa (Kareena
Kapoor Khan) who actually gets him after his mother (Mona Singh).
Cust ID Gender Age Region Source Payment Product Amount Time Of Day
10001 Male 38 East TV advt Credit Card Books 617 22:19
10002 Female 25 West Email Paypal Clothing 3083 13:27
10003 Male 24 North Email Net Banking Grocery 1762 14:27
10004 Male 33 West Email Paypal Home Kitchen 2248 15:38
10005 Male 21 South TV advt Cash On Delivery Grocery 1299 15:21
10006 Male 28 West Web Paypal Mobile 13041 13:11
10007 Male 20 East Email Paypal Mobile 14455 21:59
10008 Female 20 West TV advt Credit Card Home Kitchen 13090 04:04
10009 Female 38 West TV advt Cash On Delivery Grocery 16322 19:35
10010 Male 26 South Newspaper Credit Card Grocery 11716 13:26
10011 Female 27 South Newspaper Paypal Home Kitchen 18176 14:17
10012 Male 45 East Newspaper Credit Card Books 15505 01:01
10013 Male 58 North Email Cash On Delivery Books 21649 10:04
10014 Male 49 East Email Debit Card Home Kitchen 18227 09:09
10015 Female 29 West Email Net Banking Clothing 10971 05:05
10016 Male 19 West TV advt Credit Card Clothing 12956 20:29
8. Types of Data
Qualitative or Attribute data - the characteristic being
studied is nonnumeric.
E.g.: Gender, religious affiliation, state of birth, condition of
patient, words, images, videos.
Quantitative data - the characteristic being studied is
numeric.
E.g.: time (in seconds) for 400 mts race, age of corona patient,
no. of WBC in blood sample.
9. Quantitative
Data
Discrete variables: can only assume certain values.
E.g.: no. of pregnancies, no. of missing teeth in children of a
school, no. of visits made by doctor ,the number of goals
in a football match, the number of wickets by a bowler in
a cricket match.
Continuous variable can assume any value within a specified
range.
E.g.: the height of an athlete or the weight of a boxer, skull
circumference, diastolic blood pressure, serum-
cholesterol.
12. Nominal-Level Data
Properties:
• Observations of a qualitative variable can only
be classified and counted.
• There is no particular order to the labels.
E.g. Blood group, Marital status, Eye colour,
Gender, Religion
Favorite
beverage
Group
Membership
13. Ordinal-Level Data
Properties:
• Data classifications are represented by sets of
labels or names (high, medium, low) that have
relative values.
• Because of the relative values, the data
classified can be ranked or ordered.
E.g. Stage of disease, Severity of pain, level of
satisfaction, Likert scale
14. Interval-Level Data
Properties:
• Data classifications are ordered according to
the amount of the characteristic they possess.
• Equal differences in the characteristic are
represented by equal differences in the
measurements.
E.g. Temperature , SAT score, Shoe size, Dress
Size, distance from landmark, geographical
coordinates ( longitudes, latitudes)
Dress Size
15. Ratio-Level Data
Properties:
• Data classifications are ordered according to the amount of the
characteristics they possess.
• Equal differences in the characteristic are represented by equal
differences in the numbers assigned to the classifications.
• The zero point is the absence of the characteristic and the ratio
between two numbers is meaningful.
E.g. Head circumference, Time until death, weight, Kelvin
temperature
Height
Weight
23. Pie Chart
The pie (circle) represents 100% of the variable and is divided into sectors.
The area of each sector represents the frequency of each category in the
variable it represents.
24. Bar Chart
Bar graphs are more
commonly used to
represent categorical
variables. It can be
vertical or horizontal
graphs and can show
the frequency or the
percentage of each
category.
25. Histogram
It is similar to the bar chart, but
there are no gaps between the
bars as the variable is continuous.
The width of each bar of the
histogram relates to a range of
values for the variable, but in
most cases, the width is kept the
same.
26. Scatter Diagram
If we have two variables that are
numerical, the relationship between
them can be illustrated using a scatter
diagram.
It plots one variable against the other in
a two-way diagram. One variable is
represented on the horizontal axis and
the other is plotted on the vertical axis
with each dot representing one case.
27. Box-Whisker Plot
The boxplot (also called Box and Whisker plot) is used to summarize numerical
variables based on the five-number summary.
Those five numbers are minimum, maximum, median, upper quartile, and lower
quartile.
28. Which Chart ?
ONLY ONE VARIABLE SCALE CATEGORICAL
SCALE
HISTOGRAM SCATTER PLOT BOX-PLOT
CATEGORICAL
PIE / BAR BOX-PLOT MULTIPLE / STACKED
31. Univariate
Analysis
Univariate analysis is a basic kind of analysis technique for
statistical data. Here the data contains just one variable.
The main objective of the univariate analysis is to describe
the data in order to find out the patterns in the data.
Some of the measures in Univariate Analysis:
• Central Tendency
• Dispersion
• Skewness
• Kurtosis
32. Central Tendency
The Mean of a variable
can be computed as the
sum of the observed
values divided by the
number of observations.
The Median is the point
at the centre of the data,
where half of the values
are above, and half are
below it.
The Mode is the most
frequently occurring
value in the dataset
Measures that indicate the approximate centre of the data are called
Measures of Central Tendency.
33. Dispersion
The Range is simply the
difference between the
largest and smallest values.
The Inter-Quartile Range is
simply the difference
between the upper quartile
and the lower quartile
The Variance is an average
of squared deviations from
mean.
Standard deviation is
calculated as the square
root of the variance
Measures that describe the spread of the data from central tendency are
Measures of Dispersion.
35. Kurtosis
Kurtosis is a statistical measure used to describe the degree to which
observations cluster in the tails or the peak of a frequency distribution.
36. Choosing Summary Statistics
Type of Variable
Scale
Normally distributed
Mean
(Standard deviation)
Skewed data
Median
(Interquartile range)
Categorical
Ordinal:
Median
(Interquartile range)
Nominal:
Mode
(None)
37. Bivariate
Analysis
Bivariate analysis is stated to be an analysis of any
concurrent relation between two variables or attributes.
This study explores the relationship of two variables as
well as the depth of this relationship to figure out if there
are any discrepancies between two variables and any
causes of this difference.
Some of the measures in Bivariate Analysis:
• Correlation
• Regression
• Time Series
38. Correlation
Positive Correlation
If the change in the two variables is
in the same direction.
E.g. Temperature and Sales of Ice-cream
Negative Correlation
If the change in the two variables is
in the opposite direction.
E.g. Temperature and Sales of Woollen
clothes
If there is a simultaneous changes in the variables due to direct or indirect
cause-effect then there is a correlation between variables.
39. Correlation Coefficient
Scatter Plot
A scatterplot is a type of
data display that shows
the relationship between
two numerical variables.
Karl Pearson
It measures the linear
association between two
numeric variables.
Correlation coefficient is a statistical measure that indicates the extent to
which two or more variables fluctuate in relation to each other.
Spearman
It measures the linear
association between ranks
assigned to individual
items of two variables.
40. Regression
If these functional relationship is linear
in nature, it is called Linear Regression.
The regression line is given as
𝑦 = a + 𝑏𝑦𝑥 𝑥
𝒃𝒚𝒙 is the regression coefficient, which
measures the change in variable 𝑦 for a
unit change in independent variable 𝑥 .
Regression is the functional relationship between two or more variables, such
that we can estimate value of dependent variable for given value of
independent variable(s)
41. Time Series
A time series is a time ordered sequence of observations taken at regular interval (e.g.
Hourly, daily, weekly, monthly, quarterly, annually).
Examples of Time Series
• Daily: Stock Price, temperature Weekly: Retail sales of departmental store
• Monthly: Unemployment rate, consumer price index
• Quarterly: GDP of a country, Yearly: Production of crops
42. Multivariate
Analysis
Multivariate analysis is stated to be an analysis of any
concurrent relation between more than two variables or
attributes.
Some of the measures in Multivariate Analysis:
• Multiple Correlation
• Multiple Regression
• Discriminant Analysis
• ANOVA
• Structural Equation Modelling
43. References
https://ncert.nic.in/textbook.php?kest1=7-9
Std_11 - Google Drive
Std_12 - Google Drive
https://cdn1.byjus.com/wp-content/uploads/2020/07/GSEB-
Class-12-Statistics-Part-1-Textbook-Commerce-Stream.pdf
https://schools.freshersnow.com/wp-
content/uploads/2021/12/Std-12-Statistics-Part-2-E.M.pdf
44. THANK YOU
Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics)
pbshah@hlcollege.edu
www.paragstatistics.wordpress.com