GET STARTED
WITH FOR
R
DATA SCIENCE
© Copyright 2023. United States Data Science Institute. All Rights Reserved us .org
dsi
In modern-day businesses, no organization can afford to ignore the
importance of data science. By leveraging the power of data
science, many companies have reached new heights that used to be
impossible once. By properly analyzing the data, these
organizations are able to make a more accurate data-driven
decision that not only helps to improve their business operations
but also enhances their customer experience.
The recent ranking released by the in May
World Economic Forum
2023 ranking the fastest growing jobs has ranked the data science
th
jobs 5 on the list. Also, the data science market is witnessing a
rapid growth rate and it is expected to reach a market value of $501
billion by 2032, as reported by Precedence Research.
With the growing amount of data, where millions of terabytes of
data are generated every day, businesses are looking for skilled data
science professionals who can process this enormous amount of
data for their organization’s best interest. And it is the perfect time
to step into this domain for a successful career ahead.
When it comes to performing data science tasks, there are several
programming languages and tools that come in handy such as
Python, R, Scala, Java, SQL, etc. Here we will guide you about one
of the most popular languages used for Data Science – the R
programming language. So, if you are a beginner, then this
document is for you to learn everything you need to know about R.
© Copyright 2023. United States Data Science Institute. All Rights Reserved
1 us .org
dsi
What is R?
R is among the very popular programming
languages that several data science professionals
use specifically for data analysis and statistical
computing. This programming language was
created by Ross Ihaka and Robert Gentleman
from the University of Auckland, New Zealand in
the 1990s. And since then, R has gained immense
popularity in the data science community. As this
language offers extensive libraries and robust
statistical capabilities, it has managed to garner a
huge user base since its inception.
How to Install R?
Installing R is very easy. It is available for all
major platforms including Windows, MacOS, and
Linux. Before you can start using this software,
you can download it from the Comprehensive R
Archive Network (CRAN) website (https://cran.r-
project.org/) and install it in your system. Ensure
you have downloaded the latest version to make
use of the latest features. You will get the
installation instructions for your operating
system on this website as well.
Getting started with Rstudio
So, now that you have downloaded and installed
R on your system, it’s time to get started with
RStudio. It is a popular Integrated Development
Environment that makes working with R more
user-friendly. Even though R can be used from
the command line, many data science
professionals prefer using IDEs i.e. RStudion for
their work.
Rstudio can be downloaded from their official
website
https://www.rstudio.com/products/rstudio/dow
nload/. After installing this, you will see a user-
friendly interface where you get access to panels
for writing code, viewing data, and visualizing
results.
Basic concepts of R:
So now let us get started with the basics of R and
its components. The first thing to understand is,
that R is an object-oriented language which
means every operation you perform in R is
around objects. These objects also known as
building blocks of R, are:
© Copyright 2023. United States Data Science Institute. All Rights Reserved
2 usdsi.org
© Copyright 2023. United States Data Science Institute. All Rights Reserved
3 usdsi.org
Variables
InR,youcanassignvaluestovariablesusingtheassignmentoperator<-or=.Forexample:
Thisassignsthevalue10tothevariable‘x’.
DataStructures
Data Structure refers to the nouns of Programming in R and data items of different types are organized
intodatastructures.Thesedatastructurescantaketheformof
Ÿ Vectors–theyareone-dimensionalarraysthatcanholdmultiplevaluesofthesamedatatype
Ÿ Data frames - Two-dimensional tables used to store data, with rows and columns, similar to
spreadsheets
Ÿ Lists-Containersthatcanholdelementsofdifferentdatatypes
Functions
Another thing that makes R a popular programming language is its built-in functions that can perform
various functions. For example, you can use the ‘mean()’ function to calculate the mean of a vector of
numbers:
© Copyright 2023. United States Data Science Institute. All Rights Reserved usdsi.org
Packages
PackagesinRrefertothecollectionsoffunctions,data,anddocumentation.ByinstallingandloadingR
packages, you can enhance the R’s functionality in doing specific data science tasks. The
‘install.packages()’functioninstallspackages,whilethe‘library()’functionloadsthem.Forexample:
Thisassignsthevalue10tothevariable‘x’.
DataManipulationinR
Data manipulation is the most important part of data analysis and R offers several packages to perform data
manipulationtasks.‘dplyr’and‘tidyr’aresuchpackagesthatareusedtoperformthebelow-mentionedtasks
easily:
DataImport
You need to import the data first before you can start working on it. R can work on various data formats
such as CSV, Excel, and different databases. The ‘read.csv()’ function is commonly used for reading CSV
files.Forexample:
DataExploration
After the data has been imported, you can now explore these using different functions like ‘head()’,
‘tail()’,and‘summary()’.Thesefunctionswillhelpyouprovideaquickoverviewofyourdataset.
DataFiltering
The ‘filter()’ function in ‘dplyr’ is used to subset data based on specific criteria. For example, if you want
tofilterdatawithagemorethan30,thenusethecommand
4
© Copyright 2023. United States Data Science Institute. All Rights Reserved
5 usdsi.org
DataTransformation
Data transformation refers to the modification of variables or the creation of new ones. To perform this
‘mutate()’functioniscommonlyusedin‘dplyr’.Hereisanexampleofatransformfunction:
DataAggregation
Itinvolvesaggregationofdatai.e.summarizingorgroupingthemtoobtaininsights.Forexample:
DataVisualizationinR
The core of any data science project is Data Visualization which refers to creating beautiful and interactive
visuals of the findings from the data analysis. With this technique, complex insights can be easily conveyed to
stakeholders. With R, packages like ‘ggplot2’ can be used for data visualization. With this package, the
followingtypesofplotscanbecreated:
ScatterPlot
Itisusedtovisualizetherelationshipbetweentwonumericalvalues.
© Copyright 2023. United States Data Science Institute. All Rights Reserved
6 usdsi.org
Histogram
Histogramsareusedtorepresentthedistributionofsinglevariables.
BarChart
Itissuitableforvisualizingcategoricaldata.
These are just only a few of the examples of how you can use ggplot2 for data visualization. You can
explore it more and use your creativity to visualize your data in your own way. Since the R and its
package offer great flexibility, you can customize the visualization for your findings.
StatisticalAnalysisinR
What distinguishes R from other programming languages is its statistical capabilities. And this makes it an
ideal choice among data scientists. With the help of R, you can perform various statistical tests and analyses
toderivemeaningfulinsightsfromyourdata.SomecommonstatisticalfunctionsinRinclude:
Ÿ T-tests–forcomparingmeansoftwogroups
Ÿ ANOVA–foranalysisofvariancetestsforcomparingmultiplegroups
Ÿ LinearRegression–forbuildingrelationshipsbetweenvariables
Ÿ Correlation – it measures the strength and direction of the relationship between two numerical
values
© Copyright 2023. United States Data Science Institute. All Rights Reserved us .org
dsi
7
Conclusion
So, these are some basics of R that can help you get
started with this programming language for data
science. You must remember, that R is a comprehensive
programming language and offers several functions and
packages. It requires a lot of practice and working on
several different types of data and projects to completely
understandallitsfunctionalities.
The more you practice, the more you will concepts will
get cleared. So, download it now, jump into online
communities, dive into video tutorials, get enrolled in
data science certification, and master this incredible tool
fordatascience.
© Copyright 2023. United States Data Science Institute. All Rights Reserved
LOCATIONS
info@usdsi.org | www.usdsi.org
Arizona
1345 E. Chandler BLVD.,
Suite 111-D Phoenix,
AZ 85048,
info.az@usdsi.org
Connecticut
Connecticut 680 E Main Street
#699, Stamford, CT 06901
info.ct@usdsi.org
Illinois
1 East Erie St, Suite 525
Chicago, IL 60611
info.il@usdsi.org
Singapore
No 7 Temasek Boulevard#12-07
Suntec Tower One, Singapore, 038987
Singapore, info.sg@usdsi.org
United Kingdom
29 Whitmore Road, Whitnash
Learmington Spa, Warwickshire,
United Kingdom CV312JQ
info.uk@usdsi.org
About
The United States Data Science Institute
®
(USDSI ) is deemed a high-end and in-depth
technical certification provider for Data Science
Professionals and leads the global panorama in
Data Science Organizational Transformation,
®
Innovation, and Leadership. USDSI researches,
designs, and certifies personnel who enter or
engage in various emerging Data Science
Majors.
GROW BIG WITH
DATA SCIENTIST
EXPERTISE
VIA
CERTIFICATIONS
REGISTER NOW

GET STARTED WITH R FOR DATA SCIENCE

  • 1.
    GET STARTED WITH FOR R DATASCIENCE © Copyright 2023. United States Data Science Institute. All Rights Reserved us .org dsi
  • 2.
    In modern-day businesses,no organization can afford to ignore the importance of data science. By leveraging the power of data science, many companies have reached new heights that used to be impossible once. By properly analyzing the data, these organizations are able to make a more accurate data-driven decision that not only helps to improve their business operations but also enhances their customer experience. The recent ranking released by the in May World Economic Forum 2023 ranking the fastest growing jobs has ranked the data science th jobs 5 on the list. Also, the data science market is witnessing a rapid growth rate and it is expected to reach a market value of $501 billion by 2032, as reported by Precedence Research. With the growing amount of data, where millions of terabytes of data are generated every day, businesses are looking for skilled data science professionals who can process this enormous amount of data for their organization’s best interest. And it is the perfect time to step into this domain for a successful career ahead. When it comes to performing data science tasks, there are several programming languages and tools that come in handy such as Python, R, Scala, Java, SQL, etc. Here we will guide you about one of the most popular languages used for Data Science – the R programming language. So, if you are a beginner, then this document is for you to learn everything you need to know about R. © Copyright 2023. United States Data Science Institute. All Rights Reserved 1 us .org dsi
  • 3.
    What is R? Ris among the very popular programming languages that several data science professionals use specifically for data analysis and statistical computing. This programming language was created by Ross Ihaka and Robert Gentleman from the University of Auckland, New Zealand in the 1990s. And since then, R has gained immense popularity in the data science community. As this language offers extensive libraries and robust statistical capabilities, it has managed to garner a huge user base since its inception. How to Install R? Installing R is very easy. It is available for all major platforms including Windows, MacOS, and Linux. Before you can start using this software, you can download it from the Comprehensive R Archive Network (CRAN) website (https://cran.r- project.org/) and install it in your system. Ensure you have downloaded the latest version to make use of the latest features. You will get the installation instructions for your operating system on this website as well. Getting started with Rstudio So, now that you have downloaded and installed R on your system, it’s time to get started with RStudio. It is a popular Integrated Development Environment that makes working with R more user-friendly. Even though R can be used from the command line, many data science professionals prefer using IDEs i.e. RStudion for their work. Rstudio can be downloaded from their official website https://www.rstudio.com/products/rstudio/dow nload/. After installing this, you will see a user- friendly interface where you get access to panels for writing code, viewing data, and visualizing results. Basic concepts of R: So now let us get started with the basics of R and its components. The first thing to understand is, that R is an object-oriented language which means every operation you perform in R is around objects. These objects also known as building blocks of R, are: © Copyright 2023. United States Data Science Institute. All Rights Reserved 2 usdsi.org
  • 4.
    © Copyright 2023.United States Data Science Institute. All Rights Reserved 3 usdsi.org Variables InR,youcanassignvaluestovariablesusingtheassignmentoperator<-or=.Forexample: Thisassignsthevalue10tothevariable‘x’. DataStructures Data Structure refers to the nouns of Programming in R and data items of different types are organized intodatastructures.Thesedatastructurescantaketheformof Ÿ Vectors–theyareone-dimensionalarraysthatcanholdmultiplevaluesofthesamedatatype Ÿ Data frames - Two-dimensional tables used to store data, with rows and columns, similar to spreadsheets Ÿ Lists-Containersthatcanholdelementsofdifferentdatatypes Functions Another thing that makes R a popular programming language is its built-in functions that can perform various functions. For example, you can use the ‘mean()’ function to calculate the mean of a vector of numbers:
  • 5.
    © Copyright 2023.United States Data Science Institute. All Rights Reserved usdsi.org Packages PackagesinRrefertothecollectionsoffunctions,data,anddocumentation.ByinstallingandloadingR packages, you can enhance the R’s functionality in doing specific data science tasks. The ‘install.packages()’functioninstallspackages,whilethe‘library()’functionloadsthem.Forexample: Thisassignsthevalue10tothevariable‘x’. DataManipulationinR Data manipulation is the most important part of data analysis and R offers several packages to perform data manipulationtasks.‘dplyr’and‘tidyr’aresuchpackagesthatareusedtoperformthebelow-mentionedtasks easily: DataImport You need to import the data first before you can start working on it. R can work on various data formats such as CSV, Excel, and different databases. The ‘read.csv()’ function is commonly used for reading CSV files.Forexample: DataExploration After the data has been imported, you can now explore these using different functions like ‘head()’, ‘tail()’,and‘summary()’.Thesefunctionswillhelpyouprovideaquickoverviewofyourdataset. DataFiltering The ‘filter()’ function in ‘dplyr’ is used to subset data based on specific criteria. For example, if you want tofilterdatawithagemorethan30,thenusethecommand 4
  • 6.
    © Copyright 2023.United States Data Science Institute. All Rights Reserved 5 usdsi.org DataTransformation Data transformation refers to the modification of variables or the creation of new ones. To perform this ‘mutate()’functioniscommonlyusedin‘dplyr’.Hereisanexampleofatransformfunction: DataAggregation Itinvolvesaggregationofdatai.e.summarizingorgroupingthemtoobtaininsights.Forexample: DataVisualizationinR The core of any data science project is Data Visualization which refers to creating beautiful and interactive visuals of the findings from the data analysis. With this technique, complex insights can be easily conveyed to stakeholders. With R, packages like ‘ggplot2’ can be used for data visualization. With this package, the followingtypesofplotscanbecreated: ScatterPlot Itisusedtovisualizetherelationshipbetweentwonumericalvalues.
  • 7.
    © Copyright 2023.United States Data Science Institute. All Rights Reserved 6 usdsi.org Histogram Histogramsareusedtorepresentthedistributionofsinglevariables. BarChart Itissuitableforvisualizingcategoricaldata. These are just only a few of the examples of how you can use ggplot2 for data visualization. You can explore it more and use your creativity to visualize your data in your own way. Since the R and its package offer great flexibility, you can customize the visualization for your findings. StatisticalAnalysisinR What distinguishes R from other programming languages is its statistical capabilities. And this makes it an ideal choice among data scientists. With the help of R, you can perform various statistical tests and analyses toderivemeaningfulinsightsfromyourdata.SomecommonstatisticalfunctionsinRinclude: Ÿ T-tests–forcomparingmeansoftwogroups Ÿ ANOVA–foranalysisofvariancetestsforcomparingmultiplegroups Ÿ LinearRegression–forbuildingrelationshipsbetweenvariables Ÿ Correlation – it measures the strength and direction of the relationship between two numerical values
  • 8.
    © Copyright 2023.United States Data Science Institute. All Rights Reserved us .org dsi 7 Conclusion So, these are some basics of R that can help you get started with this programming language for data science. You must remember, that R is a comprehensive programming language and offers several functions and packages. It requires a lot of practice and working on several different types of data and projects to completely understandallitsfunctionalities. The more you practice, the more you will concepts will get cleared. So, download it now, jump into online communities, dive into video tutorials, get enrolled in data science certification, and master this incredible tool fordatascience.
  • 9.
    © Copyright 2023.United States Data Science Institute. All Rights Reserved LOCATIONS info@usdsi.org | www.usdsi.org Arizona 1345 E. Chandler BLVD., Suite 111-D Phoenix, AZ 85048, info.az@usdsi.org Connecticut Connecticut 680 E Main Street #699, Stamford, CT 06901 info.ct@usdsi.org Illinois 1 East Erie St, Suite 525 Chicago, IL 60611 info.il@usdsi.org Singapore No 7 Temasek Boulevard#12-07 Suntec Tower One, Singapore, 038987 Singapore, info.sg@usdsi.org United Kingdom 29 Whitmore Road, Whitnash Learmington Spa, Warwickshire, United Kingdom CV312JQ info.uk@usdsi.org About The United States Data Science Institute ® (USDSI ) is deemed a high-end and in-depth technical certification provider for Data Science Professionals and leads the global panorama in Data Science Organizational Transformation, ® Innovation, and Leadership. USDSI researches, designs, and certifies personnel who enter or engage in various emerging Data Science Majors. GROW BIG WITH DATA SCIENTIST EXPERTISE VIA CERTIFICATIONS REGISTER NOW