SlideShare a Scribd company logo
1 of 39
Introduction
to R
CHAPTER I. Course Overview and
Preliminary Steps
Learningobjectives
• Know what is R and how it works
• Learn basics of working with data in R
• Get familiar with basic commands/functions
• Learn how to do basic analysis on any dataset and be able to
create basic charts
What is R & Why we use it
• It’s a tool : Open-Source, cross platform, free programming
language designed to build statistical solutions
• Powerful : Gives access to CRAN repository containing over
10,000 packages with pre-defined functions for almost every
purpose
• Stays Relevant : Constantly being updated by users ( Scientists,
Statisticians, Researchers, Students!)
• More: Makes beautiful graphs, can create custom functions or
modify existing ones, can be integrated into many environments
and platforms such as Hadoop etc
Installing R
• Can be downloaded for free from
http://www.r-project.org/
• Download the version compatible with your OS
• Simple/Standard installation process
• Can be downloaded for free from:
• https://www.rstudio.com/products/rstudio/download/
• Download the free version compatible with your OS
• R needs to be installed before installing R- Studio
Installing R -Studio
R-Studio UI
Write your
code here
Global Environment-
See your datasets here
Console - see
your code run
here
See your files,
graphs, help
documentation
and installed
packages here
R Commands
• Assignments E.g.: x = 1, or x <- 1
• Functions E.g.: print(“Hello World”)
• Computations E.g.: 17 + 3 ; x + 5
• Mix E.g.: y = sqrt(16); y = 15 + 5
• Assignment queries will update objects in your
R environment
• Queries without assignment, as well as ‘call’ of
R objects will either generate an output in the
console, or in the plot tab
CHAPTER II. R Basics: DataTypes
Variable Assignment in R
• A basic construct in programming is "variable"
• A variable allows you to store a piece of data (‘datum’, e.g.
6, ‘Hello’, etc.. ) or several pieces of data of a common
type, and assign them a unique name
• You can then later ‘call’ this variable's name to easily access
the value(s) that is/are stored within this variable.
Careful, R is case sensitive: The variables ‘x’ and ‘X’ can coexist
in R environment and have different values.
Basic data types in R
• R works with numerous data types. The most common
types are:
• Decimals values like 3.5, called 'numeric'
• Natural numbers like 3 are called 'integers'. Integers
are also numeric
• Boolean variables (TRUE or FALSE) are
classified as ‘logical’
• Text (or string) values are classified as
'character’
Basic data types in R
• Categorical variables are called ‘factors’. They
have a finite and defined set of values they can
take (e.g. eye_color can take have a value
contained in {‘blue’, ‘green’, ’brown’, ‘black’})
• Other variables can contain time data such as
dates, day of the week, hours, minutes, etc..
CHAPTER III. R Basics: Data
Structures
R Objects:Vectors
• To assign multiple values to a variable, we can use an R
object called a ‘vector’
• A vector is a sequence/collection of data elements of the
same basic type. Members in a vector are officially called
components. For Example: my_vector = c(14,26,38,30)
• To access a specific element in the vector, we simply need
to call variable_name[i], ‘i’ being the element’s position in
the vector. For example: vect[3] would return 38
R Objects: Matrices
• A matrix is a sequence/collection of data elements of the
same basic type arranged in a two-dimensional
rectangular layout.
• Being a 2-dimensional object, in order to obtain a specific
value within the matrix, 2 coordinates needs to be
entered. For example: my_matrix[i,j] would return the
element on the ith row, in the jth column
• my_matrix[i,] would return the entire ith row
• my_matrix[,j] would return the entire jth column
• A data frame is used for storing data tables. It is a list of
vectors of equal length. Unlike matrices, it can gather
vectors containing different basic types
• Selection of specific elements in data frames works the
same way as for matrices. For example: my_dataframe[i,j]
would return the element on the ith row, in the jth
column
R Objects: Data Frames
R Objects: Lists
• A list in R allows you to gather a variety of objects under
one name (that is, the name of the list) in an ordered way.
These objects can be matrices, vectors, data frames, even
other lists, etc. It is not even required that these objects
are related to each other.
• To access the ith object in the list, write my_list[[i]]
• If you want to access a variable in the ith object in the list,
write my_list[[i]] [variable coordinates]. See examples in R
CHAPTER IV. Importing Packages
and Datasets,Viewing Data
R: Packages
• R Packages are collections of R functions and data sets
• Some standard ones come with R installation
• Others can be installed in a few clicks in Rstudio, or
using install.packages(“package name”) function.
You can choose the CRAN Mirror closest to your
location, but the default Rstudio is consistently good
all over the world.
• Some have to be downloaded ( from http://cran.r-
project.org/, or through Google and manually
installed
• Once installed we need to call the package in when
needed using “library(“package name”)”
R: Importing Data
• More often than not, data is already available in
different formats ready to be imported to R.
• R accepts files of many formats, we will learn importing
files of the following formats:
• Text (.txt)
• CSV (.csv)
• Excel (.xls)
R: Importing Data
• Text files: use read.table() for space separated files,
comma separated files etc..
• CSV files: use read_csv() from readr package (used by
Rstudio interface)
• Excel files: use read_excel() from readxl package (used
by Rstudio interface)
See Rstudio examples to set Working Directory and import
different datasets
R: Importing Data
• For more formats (such as SPSS, SAS, STATA files etc…)
you can visit http://cran.rproject.org/doc/manuals/R-
data.pdf , here you get information on how to import
image files as well !
Data Views
There are several ways to look at a data set:
• First, you can simply look at it entirely by double clicking
on it in the Global Environment, or by using
View(data_name) function
• You can look a specific column by calling it. E.g. data-
name$column_name
• Else, you can look at the first k rows, or the last k rows by
using head(data_name, k) or tail(data_name, k)
respectively
Data Overviews
You can also use functions to have a quick overview of the
data set you are working with:
• Try to use summary(data_name)
• You can also use str(data_name)
CHAPTERV. Data Manipulations
Filtering/Subsetting
• Use a Logical Operator
• ==, >, <, <=, >=, != are all logical operators.
• Note that the “equals” logical operator is two "==" signs, as one
"= " only is reserved for assignment.
• Result is a Logical variable
• To filter out rows in a dataset, place logic condition(s) in the
dataset’s squared brackets, before the coma
• You can filter using several conditions and separate them with
logical operators “|” (OR) and/or “&” (AND)
• See examples in Rstudio
Binding
• Binding columns: If 2 datasets, a dataset and a vector, or
2 vectors have the same number of values (rows in the
case of datasets), they can be placed together into one
same dataset using cbind()
• This is different from « merging » (see later chapter),
hence there is no row matching system: rows need to be
in the exact same order for the data to make sense.
• See example in Rstudio
• Binding rows: If 2 datasets have the same columns
(order, data types, names), one can be appended under
the other using rbind()
• See example in Rstudio
Transforming
• You can create new columns or modify existing ones by
applying transformations to them
• Transformations can be adding, subtracting,
multiplying, dividing, powering etc..
• But it can also be using functions such as log(), exp()
etc..
• See examples in R studio
Sorting
• In R, you can sort your dataset’s rows based on a
column’s alphabetical order (character variables), or
numerical order (numeric variables)
• You can apply an ascending or descending direction to
this order
• See examples in R studio
CHAPTERVI. Joins, Summary
Tables and Data Export
Joins
• Joining consists in combining 2 or more datasets’ rows based
on a common column/field between them
• For a join to happen, 2 datasets need at least one same
column. It matches rows that have identical values in this
column.
• Eg.
• Note: It is not like what the cbind() function does: cbind() fuses datasets by pasting
them one next to the other, regardless of what is in the data
Table 1 Table 2
Column A Column B Column B Column C
A1 B1 B2 C1
A2 B1 B1 C2
A3 B2 B2 C3
Joined Tables
Column A Column B Column C
A1 B1 C2
A2 B1 C2
A3 B2 C1
A3 B2 C3
Joins
• There are different types of joins :
Summary Tables
• Contingency tables: Use table(cat_var1,cat_var2)
(where cat_var1 and cat_var2 are categorical
variables) to obtain the observations count for
each combination of these variables’ levels.
• Diverse summary tables: Use data %>%
group_by(cat_var1) %>% summarise() from the
“Dplyr” package to aggregate datasets and obtain
the summary numbers you want.
• See examples in Rstudio
Export Data
• Export data to use outside of R: You can export
your datasets as .csv files using the write.csv()
function.
• Export data for later use in R: You can export your
datasets as R objects called .RDS files using
saveRDS(). You can import them into R using
readRDS(). These execute a lot faster.
• See examples in Rstudio
CHAPTERVII. Plots
Plots
• Plots (Graphs, Visualisations,..) are very powerful
tools. They allow you to quickly grasp trends and
patterns in data sets, some of which could not be
spotted by analysing summary tables only
• In R, ‘ggplot2’ package gives you endless
possibilities to create visualisations.
• In this video, we focus on qplot() function (from
‘ggplot2’), which can provide high quality graphs
with very little effort.
Plots
With qplot(), we can create:
• Histograms and Density plots to visualise Numerical
variables
• Bar plots to visualise categorical variables
• Box plots to visualise correlations between numerical and
categorical variables
• Dot Plots to visualise correlations between numerical
variables
We can also use color coding to add information to
graphs while keeping them easily interpretable. See
examples in R studio.
Plots
Finally, you can save your graphs as images.
Simply use the ggsave() function from the ggplot2
package
See examples in Rstudio
Thank you

More Related Content

Similar to Introduction to R _IMPORTANT FOR DATA ANALYTICS

Similar to Introduction to R _IMPORTANT FOR DATA ANALYTICS (20)

R training2
R training2R training2
R training2
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي   R program د.هديل القفيديمحاضرة برنامج التحليل الكمي   R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
 
Advanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAdvanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.ppt
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Data structure
Data structureData structure
Data structure
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
 
Moving Data to and From R
Moving Data to and From RMoving Data to and From R
Moving Data to and From R
 
DataStructures.pptx
DataStructures.pptxDataStructures.pptx
DataStructures.pptx
 
Essentials of R
Essentials of REssentials of R
Essentials of R
 
Unit 3
Unit 3Unit 3
Unit 3
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
 
R Get Started I
R Get Started IR Get Started I
R Get Started I
 
Data Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxData Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptx
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
 
java.pdf
java.pdfjava.pdf
java.pdf
 
Unit1_Introduction to R.pdf
Unit1_Introduction to R.pdfUnit1_Introduction to R.pdf
Unit1_Introduction to R.pdf
 

More from HaritikaChhatwal1

Visualization IN DATA ANALYTICS IN TIME SERIES
Visualization IN DATA ANALYTICS IN TIME SERIESVisualization IN DATA ANALYTICS IN TIME SERIES
Visualization IN DATA ANALYTICS IN TIME SERIESHaritikaChhatwal1
 
TS Decomposition IN data Time Series Fore
TS Decomposition IN data Time Series ForeTS Decomposition IN data Time Series Fore
TS Decomposition IN data Time Series ForeHaritikaChhatwal1
 
TIMES SERIES FORECASTING ON HISTORICAL DATA IN R
TIMES SERIES FORECASTING ON HISTORICAL DATA  IN RTIMES SERIES FORECASTING ON HISTORICAL DATA  IN R
TIMES SERIES FORECASTING ON HISTORICAL DATA IN RHaritikaChhatwal1
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSHaritikaChhatwal1
 
Additional Reading material-Probability.ppt
Additional Reading material-Probability.pptAdditional Reading material-Probability.ppt
Additional Reading material-Probability.pptHaritikaChhatwal1
 
Decision Tree_Loan Delinquent_Problem Statement.pdf
Decision Tree_Loan Delinquent_Problem Statement.pdfDecision Tree_Loan Delinquent_Problem Statement.pdf
Decision Tree_Loan Delinquent_Problem Statement.pdfHaritikaChhatwal1
 
Frequency Based Classification Algorithms_ important
Frequency Based Classification Algorithms_ importantFrequency Based Classification Algorithms_ important
Frequency Based Classification Algorithms_ importantHaritikaChhatwal1
 
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptxM2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptxHaritikaChhatwal1
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTHaritikaChhatwal1
 
SESSION 1-2 [Autosaved] [Autosaved].pptx
SESSION 1-2 [Autosaved] [Autosaved].pptxSESSION 1-2 [Autosaved] [Autosaved].pptx
SESSION 1-2 [Autosaved] [Autosaved].pptxHaritikaChhatwal1
 
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdfWORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdfHaritikaChhatwal1
 
Epigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptxEpigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptxHaritikaChhatwal1
 
MED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptxMED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptxHaritikaChhatwal1
 
Nw Microsoft PowerPoint Presentation.pptx
Nw Microsoft PowerPoint Presentation.pptxNw Microsoft PowerPoint Presentation.pptx
Nw Microsoft PowerPoint Presentation.pptxHaritikaChhatwal1
 
HOWs CORRELATIONAL STUDIES are performed
HOWs CORRELATIONAL STUDIES are performedHOWs CORRELATIONAL STUDIES are performed
HOWs CORRELATIONAL STUDIES are performedHaritikaChhatwal1
 
CHAPTER 4 -TYPES OF BUSINESS.pptx
CHAPTER 4 -TYPES OF BUSINESS.pptxCHAPTER 4 -TYPES OF BUSINESS.pptx
CHAPTER 4 -TYPES OF BUSINESS.pptxHaritikaChhatwal1
 
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptxCHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptxHaritikaChhatwal1
 

More from HaritikaChhatwal1 (20)

Visualization IN DATA ANALYTICS IN TIME SERIES
Visualization IN DATA ANALYTICS IN TIME SERIESVisualization IN DATA ANALYTICS IN TIME SERIES
Visualization IN DATA ANALYTICS IN TIME SERIES
 
TS Decomposition IN data Time Series Fore
TS Decomposition IN data Time Series ForeTS Decomposition IN data Time Series Fore
TS Decomposition IN data Time Series Fore
 
TIMES SERIES FORECASTING ON HISTORICAL DATA IN R
TIMES SERIES FORECASTING ON HISTORICAL DATA  IN RTIMES SERIES FORECASTING ON HISTORICAL DATA  IN R
TIMES SERIES FORECASTING ON HISTORICAL DATA IN R
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICS
 
Additional Reading material-Probability.ppt
Additional Reading material-Probability.pptAdditional Reading material-Probability.ppt
Additional Reading material-Probability.ppt
 
Decision Tree_Loan Delinquent_Problem Statement.pdf
Decision Tree_Loan Delinquent_Problem Statement.pdfDecision Tree_Loan Delinquent_Problem Statement.pdf
Decision Tree_Loan Delinquent_Problem Statement.pdf
 
Frequency Based Classification Algorithms_ important
Frequency Based Classification Algorithms_ importantFrequency Based Classification Algorithms_ important
Frequency Based Classification Algorithms_ important
 
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptxM2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
SESSION 1-2 [Autosaved] [Autosaved].pptx
SESSION 1-2 [Autosaved] [Autosaved].pptxSESSION 1-2 [Autosaved] [Autosaved].pptx
SESSION 1-2 [Autosaved] [Autosaved].pptx
 
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdfWORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
 
Epigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptxEpigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptx
 
MED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptxMED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptx
 
Nw Microsoft PowerPoint Presentation.pptx
Nw Microsoft PowerPoint Presentation.pptxNw Microsoft PowerPoint Presentation.pptx
Nw Microsoft PowerPoint Presentation.pptx
 
HOWs CORRELATIONAL STUDIES are performed
HOWs CORRELATIONAL STUDIES are performedHOWs CORRELATIONAL STUDIES are performed
HOWs CORRELATIONAL STUDIES are performed
 
DIAS PRESENTATION.pptx
DIAS PRESENTATION.pptxDIAS PRESENTATION.pptx
DIAS PRESENTATION.pptx
 
FULLTEXT01.pdf
FULLTEXT01.pdfFULLTEXT01.pdf
FULLTEXT01.pdf
 
CHAPTER 4 -TYPES OF BUSINESS.pptx
CHAPTER 4 -TYPES OF BUSINESS.pptxCHAPTER 4 -TYPES OF BUSINESS.pptx
CHAPTER 4 -TYPES OF BUSINESS.pptx
 
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptxCHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
 

Recently uploaded

Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts ServiceVip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Serviceankitnayak356677
 
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptxBanana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptxgeorgebrinton95
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...lizamodels9
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCRsoniya singh
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedKaiNexus
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 

Recently uploaded (20)

KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)
 
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts ServiceVip Female Escorts Noida 9711199171 Greater Noida Escorts Service
Vip Female Escorts Noida 9711199171 Greater Noida Escorts Service
 
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptxBanana Powder Manufacturing Plant Project Report 2024 Edition.pptx
Banana Powder Manufacturing Plant Project Report 2024 Edition.pptx
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 

Introduction to R _IMPORTANT FOR DATA ANALYTICS

  • 2. CHAPTER I. Course Overview and Preliminary Steps
  • 3. Learningobjectives • Know what is R and how it works • Learn basics of working with data in R • Get familiar with basic commands/functions • Learn how to do basic analysis on any dataset and be able to create basic charts
  • 4. What is R & Why we use it • It’s a tool : Open-Source, cross platform, free programming language designed to build statistical solutions • Powerful : Gives access to CRAN repository containing over 10,000 packages with pre-defined functions for almost every purpose • Stays Relevant : Constantly being updated by users ( Scientists, Statisticians, Researchers, Students!) • More: Makes beautiful graphs, can create custom functions or modify existing ones, can be integrated into many environments and platforms such as Hadoop etc
  • 5. Installing R • Can be downloaded for free from http://www.r-project.org/ • Download the version compatible with your OS • Simple/Standard installation process
  • 6. • Can be downloaded for free from: • https://www.rstudio.com/products/rstudio/download/ • Download the free version compatible with your OS • R needs to be installed before installing R- Studio Installing R -Studio
  • 7. R-Studio UI Write your code here Global Environment- See your datasets here Console - see your code run here See your files, graphs, help documentation and installed packages here
  • 8. R Commands • Assignments E.g.: x = 1, or x <- 1 • Functions E.g.: print(“Hello World”) • Computations E.g.: 17 + 3 ; x + 5 • Mix E.g.: y = sqrt(16); y = 15 + 5 • Assignment queries will update objects in your R environment • Queries without assignment, as well as ‘call’ of R objects will either generate an output in the console, or in the plot tab
  • 9. CHAPTER II. R Basics: DataTypes
  • 10. Variable Assignment in R • A basic construct in programming is "variable" • A variable allows you to store a piece of data (‘datum’, e.g. 6, ‘Hello’, etc.. ) or several pieces of data of a common type, and assign them a unique name • You can then later ‘call’ this variable's name to easily access the value(s) that is/are stored within this variable. Careful, R is case sensitive: The variables ‘x’ and ‘X’ can coexist in R environment and have different values.
  • 11. Basic data types in R • R works with numerous data types. The most common types are: • Decimals values like 3.5, called 'numeric' • Natural numbers like 3 are called 'integers'. Integers are also numeric • Boolean variables (TRUE or FALSE) are classified as ‘logical’ • Text (or string) values are classified as 'character’
  • 12. Basic data types in R • Categorical variables are called ‘factors’. They have a finite and defined set of values they can take (e.g. eye_color can take have a value contained in {‘blue’, ‘green’, ’brown’, ‘black’}) • Other variables can contain time data such as dates, day of the week, hours, minutes, etc..
  • 13. CHAPTER III. R Basics: Data Structures
  • 14. R Objects:Vectors • To assign multiple values to a variable, we can use an R object called a ‘vector’ • A vector is a sequence/collection of data elements of the same basic type. Members in a vector are officially called components. For Example: my_vector = c(14,26,38,30) • To access a specific element in the vector, we simply need to call variable_name[i], ‘i’ being the element’s position in the vector. For example: vect[3] would return 38
  • 15. R Objects: Matrices • A matrix is a sequence/collection of data elements of the same basic type arranged in a two-dimensional rectangular layout. • Being a 2-dimensional object, in order to obtain a specific value within the matrix, 2 coordinates needs to be entered. For example: my_matrix[i,j] would return the element on the ith row, in the jth column • my_matrix[i,] would return the entire ith row • my_matrix[,j] would return the entire jth column
  • 16. • A data frame is used for storing data tables. It is a list of vectors of equal length. Unlike matrices, it can gather vectors containing different basic types • Selection of specific elements in data frames works the same way as for matrices. For example: my_dataframe[i,j] would return the element on the ith row, in the jth column R Objects: Data Frames
  • 17. R Objects: Lists • A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other. • To access the ith object in the list, write my_list[[i]] • If you want to access a variable in the ith object in the list, write my_list[[i]] [variable coordinates]. See examples in R
  • 18. CHAPTER IV. Importing Packages and Datasets,Viewing Data
  • 19. R: Packages • R Packages are collections of R functions and data sets • Some standard ones come with R installation • Others can be installed in a few clicks in Rstudio, or using install.packages(“package name”) function. You can choose the CRAN Mirror closest to your location, but the default Rstudio is consistently good all over the world. • Some have to be downloaded ( from http://cran.r- project.org/, or through Google and manually installed • Once installed we need to call the package in when needed using “library(“package name”)”
  • 20. R: Importing Data • More often than not, data is already available in different formats ready to be imported to R. • R accepts files of many formats, we will learn importing files of the following formats: • Text (.txt) • CSV (.csv) • Excel (.xls)
  • 21. R: Importing Data • Text files: use read.table() for space separated files, comma separated files etc.. • CSV files: use read_csv() from readr package (used by Rstudio interface) • Excel files: use read_excel() from readxl package (used by Rstudio interface) See Rstudio examples to set Working Directory and import different datasets
  • 22. R: Importing Data • For more formats (such as SPSS, SAS, STATA files etc…) you can visit http://cran.rproject.org/doc/manuals/R- data.pdf , here you get information on how to import image files as well !
  • 23. Data Views There are several ways to look at a data set: • First, you can simply look at it entirely by double clicking on it in the Global Environment, or by using View(data_name) function • You can look a specific column by calling it. E.g. data- name$column_name • Else, you can look at the first k rows, or the last k rows by using head(data_name, k) or tail(data_name, k) respectively
  • 24. Data Overviews You can also use functions to have a quick overview of the data set you are working with: • Try to use summary(data_name) • You can also use str(data_name)
  • 26. Filtering/Subsetting • Use a Logical Operator • ==, >, <, <=, >=, != are all logical operators. • Note that the “equals” logical operator is two "==" signs, as one "= " only is reserved for assignment. • Result is a Logical variable • To filter out rows in a dataset, place logic condition(s) in the dataset’s squared brackets, before the coma • You can filter using several conditions and separate them with logical operators “|” (OR) and/or “&” (AND) • See examples in Rstudio
  • 27. Binding • Binding columns: If 2 datasets, a dataset and a vector, or 2 vectors have the same number of values (rows in the case of datasets), they can be placed together into one same dataset using cbind() • This is different from « merging » (see later chapter), hence there is no row matching system: rows need to be in the exact same order for the data to make sense. • See example in Rstudio • Binding rows: If 2 datasets have the same columns (order, data types, names), one can be appended under the other using rbind() • See example in Rstudio
  • 28. Transforming • You can create new columns or modify existing ones by applying transformations to them • Transformations can be adding, subtracting, multiplying, dividing, powering etc.. • But it can also be using functions such as log(), exp() etc.. • See examples in R studio
  • 29. Sorting • In R, you can sort your dataset’s rows based on a column’s alphabetical order (character variables), or numerical order (numeric variables) • You can apply an ascending or descending direction to this order • See examples in R studio
  • 31. Joins • Joining consists in combining 2 or more datasets’ rows based on a common column/field between them • For a join to happen, 2 datasets need at least one same column. It matches rows that have identical values in this column. • Eg. • Note: It is not like what the cbind() function does: cbind() fuses datasets by pasting them one next to the other, regardless of what is in the data Table 1 Table 2 Column A Column B Column B Column C A1 B1 B2 C1 A2 B1 B1 C2 A3 B2 B2 C3 Joined Tables Column A Column B Column C A1 B1 C2 A2 B1 C2 A3 B2 C1 A3 B2 C3
  • 32. Joins • There are different types of joins :
  • 33. Summary Tables • Contingency tables: Use table(cat_var1,cat_var2) (where cat_var1 and cat_var2 are categorical variables) to obtain the observations count for each combination of these variables’ levels. • Diverse summary tables: Use data %>% group_by(cat_var1) %>% summarise() from the “Dplyr” package to aggregate datasets and obtain the summary numbers you want. • See examples in Rstudio
  • 34. Export Data • Export data to use outside of R: You can export your datasets as .csv files using the write.csv() function. • Export data for later use in R: You can export your datasets as R objects called .RDS files using saveRDS(). You can import them into R using readRDS(). These execute a lot faster. • See examples in Rstudio
  • 36. Plots • Plots (Graphs, Visualisations,..) are very powerful tools. They allow you to quickly grasp trends and patterns in data sets, some of which could not be spotted by analysing summary tables only • In R, ‘ggplot2’ package gives you endless possibilities to create visualisations. • In this video, we focus on qplot() function (from ‘ggplot2’), which can provide high quality graphs with very little effort.
  • 37. Plots With qplot(), we can create: • Histograms and Density plots to visualise Numerical variables • Bar plots to visualise categorical variables • Box plots to visualise correlations between numerical and categorical variables • Dot Plots to visualise correlations between numerical variables We can also use color coding to add information to graphs while keeping them easily interpretable. See examples in R studio.
  • 38. Plots Finally, you can save your graphs as images. Simply use the ggsave() function from the ggplot2 package See examples in Rstudio