SlideShare a Scribd company logo
1 of 16
A BRIEF INTRO TO ‘R’ – APPLIED 
STATS & TIME SERIES ANALYSIS 
- Shanmukha Sreenivas P
THE R ENVIRONMENT 
 R is an integrated suite of software facilities for data 
manipulation, calculation and graphical display. 
 An effective data handling and storage facility 
 A suite of operators for calculations on arrays, in particular 
matrices 
 A large, coherent, integrated collection of intermediate tools 
for data analysis 
 Graphical facilities for data analysis 
 A well developed, simple and effective programming 
language (called ‘S’) which includes conditionals, loops, user 
defined recursive functions and I/O facilities.
“OPEN SOURCE”... THAT JUST 
MEANS I DON’T HAVE TO PAY FOR 
IT, RIGHT? 
5 
•No. Much more: 
–Provides full access to algorithms and their implementation 
–Ability to fix bugs and extend software 
–Provides a forum allowing researchers to explore and 
expand the methods used to analyze data 
–Promotes reproducible research by providing open and 
accessible tools 
–Most of R is written in… R! This makes it quite easy to see 
what functions are actually doing.
WHAT IS IT? 
•R is an interpreted computer language. 
–Most user-visible functions are written in R itself, calling upon a 
smaller set of internal primitives. 
– It is possible to interface procedures written in C, C+, or 
FORTRAN languages for efficiency, and to write additional 
primitives. 
–System commands can be called from within R 
•R is used for data manipulation, statistics, and graphics. 
It is made up of: 
– operators (+ - <- * %*% …) for calculations on arrays & 
matrices 
– large, coherent, integrated collection of functions 
– facilities for making unlimited types of publication quality 
graphics 
– user written functions & sets of functions (packages); 800+ 
contributed packages so far & growing
R 
ADVANTAGES 
DISADVANTAGES 
oNot user friendly @ start - steep 
learning curve, minimal GUI. 
oNo commercial support; figuring out 
correct methods or how to use a function 
on your own can be frustrating. 
oEasy to make mistakes and not know. 
oWorking with large datasets is limited 
by RAM 
oData prep & cleaning can be messier & 
more mistake prone in R vs. SPSS or 
SAS 
oFast and free. 
oState of the art: Statistical 
researchers provide their methods as 
R packages. SPSS and SAS are 
years behind R! 
o2nd only to MATLAB for graphics. 
oMx, WinBugs, and other programs 
use or will use R. 
oActive user community 
oExcellent for simulation, 
programming, computer intensive 
analyses, etc. 
oForces you to think about your 
analysis. 
oInterfaces with database storage 
software (SQL)
TYPICAL R SESSION 
 Start up R via the GUI or favorite text editor 
 Two windows: 
 1+ new or existing scripts (text files) - these will be saved 
 Terminal – output & temporary input - usually unsaved
STATISTICAL METHODS 
 Statistics: “meaningful” quantities about a sample of 
objects, things, persons, events, phenomena, etc. 
 Simple to complex issues. E.g. 
 Correlation 
 ANOVA 
 MANOVA 
 Regression – linear, multiple, logistic 
 LDA 
 PCA/ Factor Analysis 
 Frequency domain analysis 
 Econometric modelling (TSA) 
 Two main categories: 
* Descriptive statistics 
* Inferential statistics
DESCRIPTIVE STATISTICS 
 Use sample information to explain/make abstraction of 
population “phenomena”. 
 Common “phenomena”: 
 * Association (e.g. σ1,2.3 = 0.75) 
 * Tendency (left-skew, right-skew) 
 * Causal relationship (e.g. if X, then, Y) 
 * Trend, pattern, dispersion, range 
 Used in non-parametric analysis
INFERENTIAL STATISTICS 
 Using sample statistics to infer some “phenomena” of 
population parameters 
 Hypothesis Testing 
 Common “phenomena”: cause-and-effect 
* One-way r/ship - ANOVA 
* Multi-directional r/ship - MANOVA 
 Use parametric analysis
COMMON MISTAKES (CONTD.) – “ABUSE OF 
STATISTICS” 
Issue Data analysis techniques 
Example of abuse Correct technique 
Measure the “influence” of a variable 
on another 
Using partial correlation 
(e.g. Spearman coeff.) 
Using a regression 
parameter 
Finding the “relationship” between one 
variable with another 
Multi-dimensional 
scaling, Likert scaling 
Simple regression 
coefficient 
To evaluate whether a model fits data 
better than the other 
Using R2 Many – a.o.t. Box-Cox 
c2 test for model 
equivalence 
To evaluate accuracy of “prediction” Using R2 and/or F-value 
of a model 
Hold-out sample’s 
MAPE,MAD 
“Compare” whether a group is different 
from another 
Multi-dimensional 
scaling, Likert scaling 
Many – a.o.t. two-way 
anova, c2, Z test 
To determine whether a group of 
factors “significantly influence” the 
observed phenomenon 
Multi-dimensional 
scaling, Likert scaling 
Many – a.o.t. manova, 
regression
TIME SERIES ANALYSIS 
 A time series is a collection of observations made 
sequentially in time. 
11
STOCHASTIC PROCESSES USEFUL 
IN MODELING TIME SERIES 
(1) a purely random process, 
 (2) a random walk, 
(3) a moving average (MA) process, 
(4) an autoregressive (AR) process, 
(5) an autoregressive moving average (ARMA) 
process, and 
(6) an autoregressive integrated moving 
average (ARIMA)process. 
12
13
14
 
M->Multiplicative Error 
N->No trend 
N->No seasonality alpha = 0.1713 15
VALIDATION 
Forecasts using ARIMA(1,1,2) Rel Err Forecasts using ETS(M,N,N) Rel Err 
13-03-12 65 60.48468 0.069466 57.33989 0.117848 
12-03-12 73 55.66896 0.237412 57.33989 0.214522 
11-03-12 80 58.24566 0.271929 57.33989 0.283251 
10-03-12 54 56.86697 0.053092 57.33989 0.06185 
09-03-12 55 57.60465 0.047357 57.33989 0.042543 
08-03-12 55 57.20995 0.040181 57.33989 0.042543 
07-03-12 51 57.42114 0.125905 57.33989 0.124312 
MAPE 0.120763 0.126696 
16

More Related Content

What's hot

Intro to R statistic programming
Intro to R statistic programming Intro to R statistic programming
Intro to R statistic programming Bryan Downing
 
Introduction to R Language
Introduction to R LanguageIntroduction to R Language
Introduction to R LanguageVisuality
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of RAnalyticsWeek
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data scienceSovello Hildebrand
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basicsC. Tobin Magle
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyrC. Tobin Magle
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using RC. Tobin Magle
 
Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsAjay Ohri
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...తేజ దండిభట్ల
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programmingRamon Salazar
 
Data and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefineData and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefineC. Tobin Magle
 

What's hot (20)

Intro to R statistic programming
Intro to R statistic programming Intro to R statistic programming
Intro to R statistic programming
 
Introduction to R Language
Introduction to R LanguageIntroduction to R Language
Introduction to R Language
 
R programming language
R programming languageR programming language
R programming language
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of R
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
R programming
R programmingR programming
R programming
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and concepts
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
R introduction
R introductionR introduction
R introduction
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programming
 
Essentials of R
Essentials of REssentials of R
Essentials of R
 
Data and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefineData and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefine
 

Viewers also liked

Proactive planning for catastrophic events in supply chains
Proactive planning for catastrophic events in supply chainsProactive planning for catastrophic events in supply chains
Proactive planning for catastrophic events in supply chainsShanmukha S. Potti
 
Commercialization Options for a set of Wireless Patents
Commercialization Options for a set of Wireless PatentsCommercialization Options for a set of Wireless Patents
Commercialization Options for a set of Wireless PatentsShanmukha S. Potti
 
Merrill Lynch: Understanding financial statements
Merrill Lynch: Understanding financial statementsMerrill Lynch: Understanding financial statements
Merrill Lynch: Understanding financial statementsSreehari Menon CFSA, CAMS
 
OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...
OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...
OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...Sreehari Menon CFSA, CAMS
 
How NOT to make a presentation!!
How NOT to make a presentation!!How NOT to make a presentation!!
How NOT to make a presentation!!Shanmukha S. Potti
 
HR Analytics: New approaches, higher returns on human capital investment
HR Analytics: New approaches, higher returns on human capital investmentHR Analytics: New approaches, higher returns on human capital investment
HR Analytics: New approaches, higher returns on human capital investmentShanmukha S. Potti
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniquesShanmukha S. Potti
 
Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...
Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...
Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...Shanmukha S. Potti
 
Introduction to strategic management
Introduction to strategic managementIntroduction to strategic management
Introduction to strategic managementDr Bryan Mills
 
Marketing, Value, Value Propositions, Selling, Value Adding, Sales
Marketing, Value, Value Propositions, Selling, Value Adding, SalesMarketing, Value, Value Propositions, Selling, Value Adding, Sales
Marketing, Value, Value Propositions, Selling, Value Adding, SalesDr Bryan Mills
 
BIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLER
BIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLERBIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLER
BIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLERShanmukha S. Potti
 
Loan impairment modeling according to IAS 39 by using Basel II parameters
Loan impairment modeling according to IAS 39 by using Basel II parametersLoan impairment modeling according to IAS 39 by using Basel II parameters
Loan impairment modeling according to IAS 39 by using Basel II parametersSreehari Menon CFSA, CAMS
 

Viewers also liked (16)

Proactive planning for catastrophic events in supply chains
Proactive planning for catastrophic events in supply chainsProactive planning for catastrophic events in supply chains
Proactive planning for catastrophic events in supply chains
 
Commercialization Options for a set of Wireless Patents
Commercialization Options for a set of Wireless PatentsCommercialization Options for a set of Wireless Patents
Commercialization Options for a set of Wireless Patents
 
Merrill Lynch: Understanding financial statements
Merrill Lynch: Understanding financial statementsMerrill Lynch: Understanding financial statements
Merrill Lynch: Understanding financial statements
 
VaR Methodologies Jp Morgan
VaR Methodologies Jp MorganVaR Methodologies Jp Morgan
VaR Methodologies Jp Morgan
 
OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...
OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...
OTC Derivatives: Pervasive Regulatory Changes and Impact on Market Participan...
 
How NOT to make a presentation!!
How NOT to make a presentation!!How NOT to make a presentation!!
How NOT to make a presentation!!
 
Private Placement
Private PlacementPrivate Placement
Private Placement
 
HR Analytics: New approaches, higher returns on human capital investment
HR Analytics: New approaches, higher returns on human capital investmentHR Analytics: New approaches, higher returns on human capital investment
HR Analytics: New approaches, higher returns on human capital investment
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...
Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...
Construction of 6 CPCL Oil storage tankers - A critical Project Management pe...
 
Introduction to strategic management
Introduction to strategic managementIntroduction to strategic management
Introduction to strategic management
 
Marketing, Value, Value Propositions, Selling, Value Adding, Sales
Marketing, Value, Value Propositions, Selling, Value Adding, SalesMarketing, Value, Value Propositions, Selling, Value Adding, Sales
Marketing, Value, Value Propositions, Selling, Value Adding, Sales
 
BIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLER
BIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLERBIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLER
BIDIRECTIONAL SPEED CONTROL OF DC MOTOR USING 8051 MICROCONTROLLER
 
Loan impairment modeling according to IAS 39 by using Basel II parameters
Loan impairment modeling according to IAS 39 by using Basel II parametersLoan impairment modeling according to IAS 39 by using Basel II parameters
Loan impairment modeling according to IAS 39 by using Basel II parameters
 
FATCA
FATCAFATCA
FATCA
 
The financial crisis
The financial crisisThe financial crisis
The financial crisis
 

Similar to A brief introduction to 'R' statistical package

Educ 190_Data Analysis and Collection Tools
Educ 190_Data Analysis and Collection ToolsEduc 190_Data Analysis and Collection Tools
Educ 190_Data Analysis and Collection ToolsTeacher Pauline
 
Computer assistance in statistical methods.28.04.2021
Computer assistance in statistical methods.28.04.2021Computer assistance in statistical methods.28.04.2021
Computer assistance in statistical methods.28.04.2021DrAnjaliUpadhye
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analyticstempledf
 
Presentation on spss
Presentation on spssPresentation on spss
Presentation on spssalfiyajamalcj
 
Various statistical software's in data analysis.
Various statistical software's in data analysis.Various statistical software's in data analysis.
Various statistical software's in data analysis.SelvaMani69
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
Sas profile csg_0413
Sas  profile csg_0413Sas  profile csg_0413
Sas profile csg_0413C.S. Ganti
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
RES814 U1 Individual Project
RES814 U1 Individual ProjectRES814 U1 Individual Project
RES814 U1 Individual ProjectThienSi Le
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsIBM
 

Similar to A brief introduction to 'R' statistical package (20)

Spss
SpssSpss
Spss
 
Educ 190_Data Analysis and Collection Tools
Educ 190_Data Analysis and Collection ToolsEduc 190_Data Analysis and Collection Tools
Educ 190_Data Analysis and Collection Tools
 
Computer assistance in statistical methods.28.04.2021
Computer assistance in statistical methods.28.04.2021Computer assistance in statistical methods.28.04.2021
Computer assistance in statistical methods.28.04.2021
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Uses of SPSS and Excel to analyze data
Uses of SPSS and Excel   to analyze dataUses of SPSS and Excel   to analyze data
Uses of SPSS and Excel to analyze data
 
Presentation on spss
Presentation on spssPresentation on spss
Presentation on spss
 
Various statistical software's in data analysis.
Various statistical software's in data analysis.Various statistical software's in data analysis.
Various statistical software's in data analysis.
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
R tutorial
R tutorialR tutorial
R tutorial
 
Social_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_SeriesSocial_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_Series
 
Sas profile csg_0413
Sas  profile csg_0413Sas  profile csg_0413
Sas profile csg_0413
 
E05312426
E05312426E05312426
E05312426
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
RES814 U1 Individual Project
RES814 U1 Individual ProjectRES814 U1 Individual Project
RES814 U1 Individual Project
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

A brief introduction to 'R' statistical package

  • 1. A BRIEF INTRO TO ‘R’ – APPLIED STATS & TIME SERIES ANALYSIS - Shanmukha Sreenivas P
  • 2. THE R ENVIRONMENT  R is an integrated suite of software facilities for data manipulation, calculation and graphical display.  An effective data handling and storage facility  A suite of operators for calculations on arrays, in particular matrices  A large, coherent, integrated collection of intermediate tools for data analysis  Graphical facilities for data analysis  A well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and I/O facilities.
  • 3. “OPEN SOURCE”... THAT JUST MEANS I DON’T HAVE TO PAY FOR IT, RIGHT? 5 •No. Much more: –Provides full access to algorithms and their implementation –Ability to fix bugs and extend software –Provides a forum allowing researchers to explore and expand the methods used to analyze data –Promotes reproducible research by providing open and accessible tools –Most of R is written in… R! This makes it quite easy to see what functions are actually doing.
  • 4. WHAT IS IT? •R is an interpreted computer language. –Most user-visible functions are written in R itself, calling upon a smaller set of internal primitives. – It is possible to interface procedures written in C, C+, or FORTRAN languages for efficiency, and to write additional primitives. –System commands can be called from within R •R is used for data manipulation, statistics, and graphics. It is made up of: – operators (+ - <- * %*% …) for calculations on arrays & matrices – large, coherent, integrated collection of functions – facilities for making unlimited types of publication quality graphics – user written functions & sets of functions (packages); 800+ contributed packages so far & growing
  • 5. R ADVANTAGES DISADVANTAGES oNot user friendly @ start - steep learning curve, minimal GUI. oNo commercial support; figuring out correct methods or how to use a function on your own can be frustrating. oEasy to make mistakes and not know. oWorking with large datasets is limited by RAM oData prep & cleaning can be messier & more mistake prone in R vs. SPSS or SAS oFast and free. oState of the art: Statistical researchers provide their methods as R packages. SPSS and SAS are years behind R! o2nd only to MATLAB for graphics. oMx, WinBugs, and other programs use or will use R. oActive user community oExcellent for simulation, programming, computer intensive analyses, etc. oForces you to think about your analysis. oInterfaces with database storage software (SQL)
  • 6. TYPICAL R SESSION  Start up R via the GUI or favorite text editor  Two windows:  1+ new or existing scripts (text files) - these will be saved  Terminal – output & temporary input - usually unsaved
  • 7. STATISTICAL METHODS  Statistics: “meaningful” quantities about a sample of objects, things, persons, events, phenomena, etc.  Simple to complex issues. E.g.  Correlation  ANOVA  MANOVA  Regression – linear, multiple, logistic  LDA  PCA/ Factor Analysis  Frequency domain analysis  Econometric modelling (TSA)  Two main categories: * Descriptive statistics * Inferential statistics
  • 8. DESCRIPTIVE STATISTICS  Use sample information to explain/make abstraction of population “phenomena”.  Common “phenomena”:  * Association (e.g. σ1,2.3 = 0.75)  * Tendency (left-skew, right-skew)  * Causal relationship (e.g. if X, then, Y)  * Trend, pattern, dispersion, range  Used in non-parametric analysis
  • 9. INFERENTIAL STATISTICS  Using sample statistics to infer some “phenomena” of population parameters  Hypothesis Testing  Common “phenomena”: cause-and-effect * One-way r/ship - ANOVA * Multi-directional r/ship - MANOVA  Use parametric analysis
  • 10. COMMON MISTAKES (CONTD.) – “ABUSE OF STATISTICS” Issue Data analysis techniques Example of abuse Correct technique Measure the “influence” of a variable on another Using partial correlation (e.g. Spearman coeff.) Using a regression parameter Finding the “relationship” between one variable with another Multi-dimensional scaling, Likert scaling Simple regression coefficient To evaluate whether a model fits data better than the other Using R2 Many – a.o.t. Box-Cox c2 test for model equivalence To evaluate accuracy of “prediction” Using R2 and/or F-value of a model Hold-out sample’s MAPE,MAD “Compare” whether a group is different from another Multi-dimensional scaling, Likert scaling Many – a.o.t. two-way anova, c2, Z test To determine whether a group of factors “significantly influence” the observed phenomenon Multi-dimensional scaling, Likert scaling Many – a.o.t. manova, regression
  • 11. TIME SERIES ANALYSIS  A time series is a collection of observations made sequentially in time. 11
  • 12. STOCHASTIC PROCESSES USEFUL IN MODELING TIME SERIES (1) a purely random process,  (2) a random walk, (3) a moving average (MA) process, (4) an autoregressive (AR) process, (5) an autoregressive moving average (ARMA) process, and (6) an autoregressive integrated moving average (ARIMA)process. 12
  • 13. 13
  • 14. 14
  • 15.  M->Multiplicative Error N->No trend N->No seasonality alpha = 0.1713 15
  • 16. VALIDATION Forecasts using ARIMA(1,1,2) Rel Err Forecasts using ETS(M,N,N) Rel Err 13-03-12 65 60.48468 0.069466 57.33989 0.117848 12-03-12 73 55.66896 0.237412 57.33989 0.214522 11-03-12 80 58.24566 0.271929 57.33989 0.283251 10-03-12 54 56.86697 0.053092 57.33989 0.06185 09-03-12 55 57.60465 0.047357 57.33989 0.042543 08-03-12 55 57.20995 0.040181 57.33989 0.042543 07-03-12 51 57.42114 0.125905 57.33989 0.124312 MAPE 0.120763 0.126696 16