SlideShare a Scribd company logo
1 of 25
Download to read offline
R for Statistics
Prof. Ashwini Mathur
Descriptive Statistics ..
Overview of Descriptive analysis ...
Descriptive statistics are used to summarize data in a way that provides insight
into the information contained in the data.
For Example:
Examining the mean or median of numeric data or the frequency of
observations for nominal data. Plots can be created that show the data and
indicating summary statistics.
Descriptive statistics for different types of data
Choosing which summary statistics are appropriate depend on the type of
variable being examined.
Different statistics should be used for interval/ratio, ordinal, and nominal
data.
These statistical terms are also called as Measurement scales.
Nominal Data
Nominal scales are used for labeling variables, without any quantitative value. “Nominal” scales
could simply be called “labels.”
Ordinal Data
With ordinal scales, the order of the values is what’s important and significant,
but the differences between each one is not really known.
For example, is the difference between “OK” and “Unhappy” the same as the
difference between “Very Happy” and “Happy?” We can’t say.
Ordinal scales are typically measures of non-numeric concepts like
satisfaction, happiness, discomfort, etc.
Ordinal Data
Interval Data
Interval scales are numeric scales in which we know both the order and the
exact differences between the values.
The classic example of an interval scale is Celsius temperature because the
difference between each value is the same.
For example, the difference between 60 and 50 degrees is a measurable 10
degrees, as is the difference between 80 and 70 degrees.
Interval
Ratio
Ratio scales are the mostly used when it comes to data measurement scales
because they tell us about the order, they tell us the exact value between
units, AND they also have an absolute zero–which allows for a wide range of
both descriptive and inferential statistics to be applied
Ratio scales provide a wealth of possibilities when it comes to statistical
analysis.
These variables can be meaningfully added, subtracted, multiplied, divided
(ratios).
Central tendency can be measured by mode, median, or mean; measures of
dispersion, such as standard deviation and coefficient of variation can also be
calculated from ratio scales.
Summary
Describing or examining data :
Location is also called central tendency. It is a measure of the values of the
data.
For example, are the values close to 10 or 100 or 1000?
Measures of location include mean and median, as well as somewhat more exotic
statistics like M-estimators or Winsorized means.
Cont..
Variation is also called dispersion. It is a measure of how far the data points
lie from one another. Common statistics include standard deviation and
coefficient of variation. For data that aren’t normally-distributed, percentiles or
the interquartile range might be used.
Shape refers to the distribution of values. The best tools to evaluate the
shape of data are histograms and related plots. Statistics include skewness and
kurtosis, though they are less useful than visual inspection. We can describe data
shape as normally-distributed, log-normal, uniform, skewed, bi-modal, and
others.
Descriptive statistics for interval/ratio data
For this example, imagine that Ren and Stimpy have
each held eight workshops to educating the public
about water conservation at home.
They are interested in how many people showed up
to the workshops.
Because the data formatted in a data frame, we can use the convention Data$Attendees to access the
variable Attendees
within the data frame Data.
Input = ("
Instructor Location Attendees
Ren North 7
Ren North 22
Ren North 6
Ren North 15
Ren South 12
Ren South 13
Ren South 14
Ren South 16
Stimpy North 18
Stimpy North 17
Stimpy North 15
Stimpy North 9
Stimpy South 15
Stimpy South 11
Stimpy South 19
Stimpy South 23
")
R Syntax :
Data = read.table(textConnection(Input),header=TRUE)
Data ### Will output data frame called Data
str(Data) ### but shows the structure of the data
frame
summary(Data) ### but summarizes variables in the data
frame
Functions sum and length
The sum of a variable can be found with the sum function, and the number of
observations can be found with the length function.
sum(Data$Attendees)
232
length(Data$Attendees)
16
Statistics of location for interval/ratio data
Mean
The mean is the arithmetic average, and is a common statistic used with
interval/ratio data. It is simply the sum of the values divided by the number of
values. The mean function in R will return the mean.
sum(Data$Attendees) / length(Data$Attendees)
14.5
mean(Data$Attendees)
14.5
Important :
Caution should be used when reporting mean values with skewed data, as the mean
may not be representative of the center of the data.
For example, imagine a town with 10 families, nine of whom have an income of less than $50, 000 per
year, but with one family with an income of $2,000,000 per year. The mean income for families in the
town would be $233,000, but this may not be a reasonable way to summarize the income of the
town.
Income = c(49000, 44000, 25000, 18000, 32000, 47000, 37000, 45000, 36000,
2000000)
mean(Income)
233300
Median
The median is defined as the value below which are 50% of the observations. To find this value
manually, you would order the observations, and separate the lowest 50% from the highest 50%. For data
sets with an odd number of observations, the median is the middle value. For data sets with an even
number of observations, the median falls half-way between the two middle values.
The median is a robust statistic in that it is not affected by adding extreme values. For example, if we
changed Stimpy’s last Attendees value from 23 to 1000, it would not affect the median.
median(Data$Attendees)
15
### Note that in this case the mean and median are close in value
### to one another. The mean and median will be more different
### the more the data are skewed.
The median is appropriate for either skewed or unskewed data. The median income for the town
discussed above is $40,500. Half the families in the town have an income above this amount, and half
have an income below this amount.
Income = c(49000, 44000, 25000, 18000, 32000, 47000, 37000, 45000, 36000,
2000000)
median(Income)
40500
Note that medians are sometimes reported as the “average person” or “typical family”. Saying, “The
average American family earned $54,000 last year” means that the median income for families was
$54,000. The “average family” is that one with the median income.
Mode
The mode is a summary statistic that is used rarely in practice, but is normally included in any
discussion of mean and medians. When there are discrete values for a variable, the mode is simply the
value which occurs most frequently. For example, in the Statistics Learning Center video in the
Required Readings below, Dr. Nic gives an example of counting the number of pairs of shoes each student
owns. The most common answer was 10, and therefore 10 is the mode for that data set.
For our Ren and Stimpy example, the value 15 occurs three times and so is the mode.
The Mode function can be found in the package DescTools.
library(DescTools)
Mode(Data$Attendees)
15
DOUBT ???

More Related Content

What's hot

Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -IAkhila Prabhakaran
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statisticsguestc94d8c
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsRozainita Rosley
 
Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Ken Plummer
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE Smitha Sumod
 
Exploratory data analysis project
Exploratory data analysis project Exploratory data analysis project
Exploratory data analysis project BabatundeSogunro
 
Major types of statistics terms that you should know
Major types of statistics terms that you should knowMajor types of statistics terms that you should know
Major types of statistics terms that you should knowStat Analytica
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and StatisticsT.S. Lim
 
Fernandos Statistics
Fernandos StatisticsFernandos Statistics
Fernandos Statisticsteresa_soto
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boaraileeanne
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
PG STAT 531 lecture 1 introduction about statistics and collection, compilati...
PG STAT 531 lecture 1 introduction about statistics and collection, compilati...PG STAT 531 lecture 1 introduction about statistics and collection, compilati...
PG STAT 531 lecture 1 introduction about statistics and collection, compilati...Aashish Patel
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 

What's hot (20)

Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statistics
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
Basic statisctis -Anandh Shankar
Basic statisctis -Anandh ShankarBasic statisctis -Anandh Shankar
Basic statisctis -Anandh Shankar
 
Panel slides
Panel slidesPanel slides
Panel slides
 
Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE
 
Statistics
StatisticsStatistics
Statistics
 
Exploratory data analysis project
Exploratory data analysis project Exploratory data analysis project
Exploratory data analysis project
 
Major types of statistics terms that you should know
Major types of statistics terms that you should knowMajor types of statistics terms that you should know
Major types of statistics terms that you should know
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
Fernandos Statistics
Fernandos StatisticsFernandos Statistics
Fernandos Statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
statistic
statisticstatistic
statistic
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
PG STAT 531 lecture 1 introduction about statistics and collection, compilati...
PG STAT 531 lecture 1 introduction about statistics and collection, compilati...PG STAT 531 lecture 1 introduction about statistics and collection, compilati...
PG STAT 531 lecture 1 introduction about statistics and collection, compilati...
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Statistical analysis and its applications
Statistical analysis and its applicationsStatistical analysis and its applications
Statistical analysis and its applications
 
Statistical table
Statistical tableStatistical table
Statistical table
 

Similar to R for statistics session 1

Data science Course: Statistics overview
Data science Course: Statistics overviewData science Course: Statistics overview
Data science Course: Statistics overviewTahmid Bin Taslim Rafi
 
Data science 101 statistics overview
Data science 101 statistics overviewData science 101 statistics overview
Data science 101 statistics overviewAlejandro Gonzalez
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2marshalkalra
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxPETTIROSETALISIC
 
initial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docxinitial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docxJeniceStuckeyoo
 
Quatitative Data Analysis
Quatitative Data Analysis Quatitative Data Analysis
Quatitative Data Analysis maneesh mani
 
EDUCATIONAL STATISTICS_Unit_I.ppt
EDUCATIONAL STATISTICS_Unit_I.pptEDUCATIONAL STATISTICS_Unit_I.ppt
EDUCATIONAL STATISTICS_Unit_I.pptSasi Kumar
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfHimakshi7
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptxCallplanetsDeveloper
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptAkhtaruzzamanlimon1
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptJapnaamKaurAhluwalia
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptRajnishSingh367990
 
Statistics as a discipline
Statistics as a disciplineStatistics as a discipline
Statistics as a disciplineRosalinaTPayumo
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)AhmedToheed3
 
DATA UNIT-3.pptx
DATA UNIT-3.pptxDATA UNIT-3.pptx
DATA UNIT-3.pptxbibha737
 
Descriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docxDescriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docxRobertLogrono
 

Similar to R for statistics session 1 (20)

Data science Course: Statistics overview
Data science Course: Statistics overviewData science Course: Statistics overview
Data science Course: Statistics overview
 
Data science 101 statistics overview
Data science 101 statistics overviewData science 101 statistics overview
Data science 101 statistics overview
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
 
initial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docxinitial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docx
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 
Presentation1
Presentation1Presentation1
Presentation1
 
Quatitative Data Analysis
Quatitative Data Analysis Quatitative Data Analysis
Quatitative Data Analysis
 
EDUCATIONAL STATISTICS_Unit_I.ppt
EDUCATIONAL STATISTICS_Unit_I.pptEDUCATIONAL STATISTICS_Unit_I.ppt
EDUCATIONAL STATISTICS_Unit_I.ppt
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptx
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Statistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.pptStatistics-24-04-2021-20210618114031.ppt
Statistics-24-04-2021-20210618114031.ppt
 
Statistics as a discipline
Statistics as a disciplineStatistics as a discipline
Statistics as a discipline
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
 
DATA UNIT-3.pptx
DATA UNIT-3.pptxDATA UNIT-3.pptx
DATA UNIT-3.pptx
 
Descriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docxDescriptive and Inferential Statistics.docx
Descriptive and Inferential Statistics.docx
 
statistics class 11
statistics class 11statistics class 11
statistics class 11
 

More from Ashwini Mathur

S3 classes and s4 classes
S3 classes and s4 classesS3 classes and s4 classes
S3 classes and s4 classesAshwini Mathur
 
Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Ashwini Mathur
 
R programming lab 3 - jupyter notebook
R programming lab   3 - jupyter notebookR programming lab   3 - jupyter notebook
R programming lab 3 - jupyter notebookAshwini Mathur
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebookAshwini Mathur
 
R programming lab 1 - jupyter notebook
R programming lab   1 - jupyter notebookR programming lab   1 - jupyter notebook
R programming lab 1 - jupyter notebookAshwini Mathur
 
Object oriented programming in r
Object oriented programming in r Object oriented programming in r
Object oriented programming in r Ashwini Mathur
 
Linear programming optimization in r
Linear programming optimization in r Linear programming optimization in r
Linear programming optimization in r Ashwini Mathur
 
Introduction to python along with the comparitive analysis with r
Introduction to python   along with the comparitive analysis with r Introduction to python   along with the comparitive analysis with r
Introduction to python along with the comparitive analysis with r Ashwini Mathur
 
Example 2 summerization notes for descriptive statistics using r
Example   2    summerization notes for descriptive statistics using r Example   2    summerization notes for descriptive statistics using r
Example 2 summerization notes for descriptive statistics using r Ashwini Mathur
 
Descriptive statistics assignment
Descriptive statistics assignment Descriptive statistics assignment
Descriptive statistics assignment Ashwini Mathur
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming languageAshwini Mathur
 
Data analysis for covid 19
Data analysis for covid 19Data analysis for covid 19
Data analysis for covid 19Ashwini Mathur
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression Ashwini Mathur
 
Calling c functions from r programming unit 5
Calling c functions from r programming    unit 5Calling c functions from r programming    unit 5
Calling c functions from r programming unit 5Ashwini Mathur
 
Anova (analysis of variance) test
Anova (analysis of variance) test Anova (analysis of variance) test
Anova (analysis of variance) test Ashwini Mathur
 

More from Ashwini Mathur (19)

S3 classes and s4 classes
S3 classes and s4 classesS3 classes and s4 classes
S3 classes and s4 classes
 
Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r
 
Rpy2 demonstration
Rpy2 demonstrationRpy2 demonstration
Rpy2 demonstration
 
Rpy package
Rpy packageRpy package
Rpy package
 
R programming lab 3 - jupyter notebook
R programming lab   3 - jupyter notebookR programming lab   3 - jupyter notebook
R programming lab 3 - jupyter notebook
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebook
 
R programming lab 1 - jupyter notebook
R programming lab   1 - jupyter notebookR programming lab   1 - jupyter notebook
R programming lab 1 - jupyter notebook
 
R for statistics 2
R for statistics 2R for statistics 2
R for statistics 2
 
Play with matrix in r
Play with matrix in rPlay with matrix in r
Play with matrix in r
 
Object oriented programming in r
Object oriented programming in r Object oriented programming in r
Object oriented programming in r
 
Linear programming optimization in r
Linear programming optimization in r Linear programming optimization in r
Linear programming optimization in r
 
Introduction to python along with the comparitive analysis with r
Introduction to python   along with the comparitive analysis with r Introduction to python   along with the comparitive analysis with r
Introduction to python along with the comparitive analysis with r
 
Example 2 summerization notes for descriptive statistics using r
Example   2    summerization notes for descriptive statistics using r Example   2    summerization notes for descriptive statistics using r
Example 2 summerization notes for descriptive statistics using r
 
Descriptive statistics assignment
Descriptive statistics assignment Descriptive statistics assignment
Descriptive statistics assignment
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming language
 
Data analysis for covid 19
Data analysis for covid 19Data analysis for covid 19
Data analysis for covid 19
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
 
Calling c functions from r programming unit 5
Calling c functions from r programming    unit 5Calling c functions from r programming    unit 5
Calling c functions from r programming unit 5
 
Anova (analysis of variance) test
Anova (analysis of variance) test Anova (analysis of variance) test
Anova (analysis of variance) test
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

R for statistics session 1

  • 1. R for Statistics Prof. Ashwini Mathur
  • 3. Overview of Descriptive analysis ... Descriptive statistics are used to summarize data in a way that provides insight into the information contained in the data. For Example: Examining the mean or median of numeric data or the frequency of observations for nominal data. Plots can be created that show the data and indicating summary statistics.
  • 4. Descriptive statistics for different types of data Choosing which summary statistics are appropriate depend on the type of variable being examined. Different statistics should be used for interval/ratio, ordinal, and nominal data. These statistical terms are also called as Measurement scales.
  • 5. Nominal Data Nominal scales are used for labeling variables, without any quantitative value. “Nominal” scales could simply be called “labels.”
  • 6. Ordinal Data With ordinal scales, the order of the values is what’s important and significant, but the differences between each one is not really known. For example, is the difference between “OK” and “Unhappy” the same as the difference between “Very Happy” and “Happy?” We can’t say. Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.
  • 8. Interval Data Interval scales are numeric scales in which we know both the order and the exact differences between the values. The classic example of an interval scale is Celsius temperature because the difference between each value is the same. For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees.
  • 10. Ratio Ratio scales are the mostly used when it comes to data measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero–which allows for a wide range of both descriptive and inferential statistics to be applied
  • 11. Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales.
  • 13. Describing or examining data : Location is also called central tendency. It is a measure of the values of the data. For example, are the values close to 10 or 100 or 1000? Measures of location include mean and median, as well as somewhat more exotic statistics like M-estimators or Winsorized means.
  • 14. Cont.. Variation is also called dispersion. It is a measure of how far the data points lie from one another. Common statistics include standard deviation and coefficient of variation. For data that aren’t normally-distributed, percentiles or the interquartile range might be used. Shape refers to the distribution of values. The best tools to evaluate the shape of data are histograms and related plots. Statistics include skewness and kurtosis, though they are less useful than visual inspection. We can describe data shape as normally-distributed, log-normal, uniform, skewed, bi-modal, and others.
  • 15. Descriptive statistics for interval/ratio data
  • 16. For this example, imagine that Ren and Stimpy have each held eight workshops to educating the public about water conservation at home. They are interested in how many people showed up to the workshops.
  • 17. Because the data formatted in a data frame, we can use the convention Data$Attendees to access the variable Attendees within the data frame Data. Input = (" Instructor Location Attendees Ren North 7 Ren North 22 Ren North 6 Ren North 15 Ren South 12 Ren South 13 Ren South 14 Ren South 16 Stimpy North 18 Stimpy North 17 Stimpy North 15 Stimpy North 9 Stimpy South 15 Stimpy South 11 Stimpy South 19 Stimpy South 23 ")
  • 18. R Syntax : Data = read.table(textConnection(Input),header=TRUE) Data ### Will output data frame called Data str(Data) ### but shows the structure of the data frame summary(Data) ### but summarizes variables in the data frame
  • 19. Functions sum and length The sum of a variable can be found with the sum function, and the number of observations can be found with the length function. sum(Data$Attendees) 232 length(Data$Attendees) 16
  • 20. Statistics of location for interval/ratio data Mean The mean is the arithmetic average, and is a common statistic used with interval/ratio data. It is simply the sum of the values divided by the number of values. The mean function in R will return the mean. sum(Data$Attendees) / length(Data$Attendees) 14.5 mean(Data$Attendees) 14.5
  • 21. Important : Caution should be used when reporting mean values with skewed data, as the mean may not be representative of the center of the data. For example, imagine a town with 10 families, nine of whom have an income of less than $50, 000 per year, but with one family with an income of $2,000,000 per year. The mean income for families in the town would be $233,000, but this may not be a reasonable way to summarize the income of the town. Income = c(49000, 44000, 25000, 18000, 32000, 47000, 37000, 45000, 36000, 2000000) mean(Income) 233300
  • 22. Median The median is defined as the value below which are 50% of the observations. To find this value manually, you would order the observations, and separate the lowest 50% from the highest 50%. For data sets with an odd number of observations, the median is the middle value. For data sets with an even number of observations, the median falls half-way between the two middle values. The median is a robust statistic in that it is not affected by adding extreme values. For example, if we changed Stimpy’s last Attendees value from 23 to 1000, it would not affect the median. median(Data$Attendees) 15 ### Note that in this case the mean and median are close in value ### to one another. The mean and median will be more different ### the more the data are skewed.
  • 23. The median is appropriate for either skewed or unskewed data. The median income for the town discussed above is $40,500. Half the families in the town have an income above this amount, and half have an income below this amount. Income = c(49000, 44000, 25000, 18000, 32000, 47000, 37000, 45000, 36000, 2000000) median(Income) 40500 Note that medians are sometimes reported as the “average person” or “typical family”. Saying, “The average American family earned $54,000 last year” means that the median income for families was $54,000. The “average family” is that one with the median income.
  • 24. Mode The mode is a summary statistic that is used rarely in practice, but is normally included in any discussion of mean and medians. When there are discrete values for a variable, the mode is simply the value which occurs most frequently. For example, in the Statistics Learning Center video in the Required Readings below, Dr. Nic gives an example of counting the number of pairs of shoes each student owns. The most common answer was 10, and therefore 10 is the mode for that data set. For our Ren and Stimpy example, the value 15 occurs three times and so is the mode. The Mode function can be found in the package DescTools. library(DescTools) Mode(Data$Attendees) 15