Data Analytics with R and SQL Server

Stéphane Fréchette
Stéphane FréchetteData & Business Intelligence Solutions Architect | Consultant | Big Data | NoSQL | Data Science | Data Platform MVP
Data Analytics with R and SQL Server
Stéphane Fréchette
Thursday March 19, 2015
Who am I?
My name is Stéphane Fréchette
SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data
|NoSQL | Data Science. Drums, good food and fine wine.
I have a passion for architecting, designing and building solutions that
matter.
Twitter: @sfrechette
Blog: stephanefrechette.com
Email: stephanefrechette@ukubu.com
Topics
• What is R?
• Should I use R?
• Data Structures
• Graphics
• Data Manipulation in R
• Connecting to SQL Server
• Demos
• Resources
• Q&A
DISCLAIMER
This is not a course nor a tutorial, but
an introduction, a walkthrough to
inspire you to further explore and
learn more about R and statistical computing
“ Analysis of data is a process of inspecting, cleaning,
transforming, and modeling data with the goal of
discovering useful information, suggesting conclusions,
and supporting decision-making. Data analysis has
multiple facets and approaches, encompassing diverse
techniques under a variety of names, in different business,
science, and social science domains.”
- Wikipedia
What is R?
• A programming language, environment for statistical computing and graphics
• R has its origins in the S programming language created in the 1970’s
• Best used to manipulate moderately sized datasets, do statistical analysis and
produce data-centric documents and presentations
• These tools are distributed as packages, which any user can download to
customize the R environment
• Cross-platform: runs on Mac, Windows and Unix based systems
Should I use R?
Are you
doing
statistics
?
No Yes
No Yes
Where “statistics” can mean machine learning, predictive analytics, data
science, anything that falls under a rather broad umbrella…
But if you have some data that makes sense to represent in a tabular like
structure, and you want to do some cool analytical or statistics stuff with it, R is
definitely a good choice…
Downloading and Installing R
http://www.r-project.org/ http://www.rstudio.com/
The IDE (RStudio)
1. View Files and Data
2. See Workspace and
History
3. See Files, Plots,
Packages and Help
4. Console
1 2
34
Installing Packages
• To use packages in R, one must first install them using the install.packages
function
• Downloads the packages from CRAN and installs it to ready to be use
Loading Packages
• To use particular packages in your current R session, one must load it into the
R environment using the library or require functions
Common Data Structures in R
To make the best of the R language, one needs a strong understanding of the
basic data types and data structures and how to operate and use them.
R has a wide variety of data types including scalars, vectors (numerical,
character, logical), matrices, data frames, and lists…
To understand computations in R, two slogans are helpful:
• Everything that exists is an object
• Everything that happens is a function call
John Chambers
creator of the S programming language, and core member of the R programming language project.
Data Structures - Vectors
The simplest structure is the numeric vector, which is a single entity consisting of an ordered
collection of numbers.
Data Structures - Matrices
Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function
matrix.
Data Structures - Data frames
Time series are often ordered in data frames. A data frame is a matrix with names above the
columns. This is nice, because you can call and use one of the columns without knowing in
which position it is.
Data Structures - Lists
An R list is an object consisting of an ordered collection of objects known as its components.
Data Structures - Date and Time
Sys.time() # returns the current system date time
Data Structures - Date and Time
Two main (internal) formats for date-time are: POSIXct and POSIXlt
• POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame
• POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
Data Structures - Others
Other useful and important data type
• NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero.
The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not.
• NA: Used for denoting missing values. (x = NA) creates a variable x with missing values.
The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not.
• NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function
is.nan() lets you check whether the value of a variable is NaN or not.
• Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The
function is.finite() lets you check whether the value of a variable is infinity or not.
Graphics
One of the main reasons data analysts and data
scientists turn to R is for its strong graphic
capabilities.
Basic Graphs:
• These include density plots (histograms and kernel
density plots), dot plots, bar charts (simple,
stacked, grouped), line charts, pie charts (simple,
annotated, 3D), boxplots (simple, notched, violin
plots, bagplots) and scatter plots (simple, with fit
lines, scatterplot matrices, high density plots, and
3D plots).
Graphics
Advances Graphs:
• Graphical parameters describes how to change a
graph's symbols, fonts, colors, and lines. Axes and
text describe how to customize a graph's axes, add
reference lines, text annotations and a legend.
Combining plots describes how to organize
multiple plots into a single graph.
• The lattice package provides a comprehensive
system for visualizing multivariate data, including
the ability to create plots conditioned on one or
more variables. The ggplot2 package offers a
elegant systems for generating univariate and
multivariate graphs based on a grammar of
graphics.
Data Manipulation in R
dplyr an R package for fast and easy data manipulation.
Data manipulation often involves common tasks, such as selecting certain variables, filtering
on certain conditions, deriving new variables from existing variables, and so forth. If we
think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation.
In dplyr the main verbs (or functions) are:
• filter: select a subset of the rows of a data frame
• arrange: works similarly to filter, except that instead of filtering or selecting rows, it
reorders them
• select: select columns of a data frame
• mutate: add new columns to a data frame that are functions of existing columns
• summarize: summarize values
• group_by: describe how to break a data frame into groups of rows
Demo
[dplyr – manipulating data]
Connecting R and SQL Server
The RODBC package provides access to databases (including Microsoft Access
and Microsoft SQL Server) through an ODBC interface
Function Description
odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database
sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame
sqlQuery(channel, query) Submit a query to an ODBC database and return the
results
sqlSave(channel, mydf, tablename = sqtable, append
= FALSE)
Write or update (append=TRUE) a data frame to a
table in the ODBC database
sqlDrop(channel, sqtable) Remove a table from the ODBC database
close(channel) Close the connection
RODBC Example
Other interface
The RJDBC package provides access to databases through a JDBC interface.
(requires JDBC driver from Microsoft)
Demo
[Let’s analyze - R and SQL Server]
Resources
• The R Project for Statistical Computing http://www.r-project.org/
• RStudio http://www.rstudio.com/
• Revolution Analytics http://www.revolutionanalytics.com/
• Shiny http://shiny.rstudio.com/
• {swirl} Learn R, in R http://swirlstats.com/
• R-bloggers http://www.r-bloggers.com/
• Online R resources for Beginners http://bit.ly/1x2q6Gl
• 60+ R resources to improve your data skills http://bit.ly/1BzW4ox
• Stack Overflow - R http://stackoverflow.com/tags/r
• Cerebral Mastication - R Resources http://bit.ly/17YhZj4
• Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
What Questions Do You Have?
Thank You
For attending this session
1 of 30

Recommended

Data visualization using R by
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
2.3K views20 slides
Window functions in MySQL 8.0 by
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0Mydbops
856 views30 slides
Visualizing the Model Selection Process by
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection ProcessBenjamin Bengfort
3.2K views59 slides
Data models by
Data modelsData models
Data modelsAnuj Modi
18.8K views10 slides
DMQL(Data Mining Query Language).pptx by
DMQL(Data Mining Query Language).pptxDMQL(Data Mining Query Language).pptx
DMQL(Data Mining Query Language).pptxDr. Jasmine Beulah Gnanadurai
227 views12 slides
Data manipulation on r by
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
1.2K views32 slides

More Related Content

What's hot

Python Pandas for Data Science cheatsheet by
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Dr. Volkan OBAN
10.5K views1 slide
Data mining query language by
Data mining query languageData mining query language
Data mining query languageGowriLatha1
4.4K views17 slides
Advanced pg_stat_statements: Filtering, Regression Testing & more by
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreLukas Fittl
2.7K views71 slides
Data analytics with R by
Data analytics with RData analytics with R
Data analytics with RDr. C.V. Suresh Babu
851 views12 slides
R programming groundup-basic-section-i by
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
1.4K views94 slides
Hierarchical Clustering by
Hierarchical ClusteringHierarchical Clustering
Hierarchical ClusteringCarlos Castillo (ChaTo)
23.9K views33 slides

What's hot(20)

Python Pandas for Data Science cheatsheet by Dr. Volkan OBAN
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
Dr. Volkan OBAN10.5K views
Data mining query language by GowriLatha1
Data mining query languageData mining query language
Data mining query language
GowriLatha14.4K views
Advanced pg_stat_statements: Filtering, Regression Testing & more by Lukas Fittl
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
Lukas Fittl2.7K views
Heart Disease Prediction Using Data Mining Techniques by IJRES Journal
Heart Disease Prediction Using Data Mining TechniquesHeart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining Techniques
IJRES Journal6K views
Linear models and multiclass classification by NdSv94
Linear models and multiclass classificationLinear models and multiclass classification
Linear models and multiclass classification
NdSv942.9K views
Oracle Architecture by Neeraj Singh
Oracle ArchitectureOracle Architecture
Oracle Architecture
Neeraj Singh14.4K views
Data science life cycle by Manoj Mishra
Data science life cycleData science life cycle
Data science life cycle
Manoj Mishra3.8K views
3. R- list and data frame by krishna singh
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
krishna singh2.3K views
Unsupervised learning clustering by Arshad Farhad
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad1K views

Viewers also liked

A Workshop on R by
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
8.5K views113 slides
R and Data Science by
R and Data ScienceR and Data Science
R and Data ScienceRevolution Analytics
24.3K views21 slides
RHadoop by
RHadoopRHadoop
RHadoopPraveen Kumar Donta
3.9K views70 slides
Training in Analytics, R and Social Media Analytics by
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
7.4K views45 slides
Introduction to Data Analytics with R by
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with RWei Zhong Toh
1K views36 slides
Tata consultancy services final by
Tata consultancy services finalTata consultancy services final
Tata consultancy services finalWasim Akram
28.8K views70 slides

Viewers also liked(6)

A Workshop on R by Ajay Ohri
A Workshop on RA Workshop on R
A Workshop on R
Ajay Ohri8.5K views
Training in Analytics, R and Social Media Analytics by Ajay Ohri
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
Ajay Ohri7.4K views
Introduction to Data Analytics with R by Wei Zhong Toh
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with R
Wei Zhong Toh1K views
Tata consultancy services final by Wasim Akram
Tata consultancy services finalTata consultancy services final
Tata consultancy services final
Wasim Akram28.8K views

Similar to Data Analytics with R and SQL Server

Big data analytics with R tool.pptx by
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptxsalutiontechnology
5 views34 slides
Unit 2 - Data Manipulation with R.pptx by
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
316 views68 slides
Essentials of R by
Essentials of REssentials of R
Essentials of RExternalEvents
513 views46 slides
R programming by ganesh kavhar by
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavharSavitribai Phule Pune University
145 views54 slides
An R primer for SQL folks by
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
60 views20 slides
Introduction to basic statistics by
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsIBM
305 views26 slides

Similar to Data Analytics with R and SQL Server(20)

Introduction to basic statistics by IBM
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
IBM305 views
R programming & Machine Learning by AmanBhalla14
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14676 views
microsoft r server for distributed computing by BAINIDA
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
BAINIDA2.1K views
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez) by Sudhir Mallem
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Sudhir Mallem1.7K views
Review of Basic Data Analytic Methods using R.pptx by SIVAPRIYAK6
 Review of Basic Data Analytic Methods using R.pptx Review of Basic Data Analytic Methods using R.pptx
Review of Basic Data Analytic Methods using R.pptx
SIVAPRIYAK667 views
Week-3 – System RSupplemental material1Recap •.docx by helzerpatrina
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina3 views
Data Wrangling and Visualization Using Python by MOHITKUMAR1379
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
MOHITKUMAR1379464 views
Analytics Beyond RAM Capacity using R by Alex Palamides
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
Alex Palamides263 views
II B.Sc IT DATA STRUCTURES.pptx by sabithabanu83
II B.Sc IT DATA STRUCTURES.pptxII B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
sabithabanu8355 views

More from Stéphane Fréchette

Back to the future - Temporal Table in SQL Server 2016 by
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Stéphane Fréchette
4.8K views16 slides
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston by
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston Stéphane Fréchette
1.4K views24 slides
Power BI - Bring your data together by
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data togetherStéphane Fréchette
1.9K views28 slides
Self-Service Data Integration with Power Query by
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power QueryStéphane Fréchette
2.5K views24 slides
Introduction to Azure HDInsight by
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsightStéphane Fréchette
3.2K views29 slides
Le journalisme de données... par où commencer? by
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Stéphane Fréchette
1.1K views36 slides

More from Stéphane Fréchette(18)

Back to the future - Temporal Table in SQL Server 2016 by Stéphane Fréchette
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston by Stéphane Fréchette
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg by Stéphane Fréchette
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...) by Stéphane Fréchette
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
Introduction to Master Data Services in SQL Server 2012 by Stéphane Fréchette
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette19.3K views

Recently uploaded

Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
66 views46 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
345 views20 slides
STPI OctaNE CoE Brochure.pdf by
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
14 views1 slide
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
17 views3 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
317 views86 slides
Design Driven Network Assurance by
Design Driven Network AssuranceDesign Driven Network Assurance
Design Driven Network AssuranceNetwork Automation Forum
19 views42 slides

Recently uploaded(20)

Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10345 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana17 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software317 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi139 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays33 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely29 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays17 views
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by Simone Puorto
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
Simone Puorto13 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty22 views

Data Analytics with R and SQL Server

  • 1. Data Analytics with R and SQL Server Stéphane Fréchette Thursday March 19, 2015
  • 2. Who am I? My name is Stéphane Fréchette SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data |NoSQL | Data Science. Drums, good food and fine wine. I have a passion for architecting, designing and building solutions that matter. Twitter: @sfrechette Blog: stephanefrechette.com Email: stephanefrechette@ukubu.com
  • 3. Topics • What is R? • Should I use R? • Data Structures • Graphics • Data Manipulation in R • Connecting to SQL Server • Demos • Resources • Q&A
  • 4. DISCLAIMER This is not a course nor a tutorial, but an introduction, a walkthrough to inspire you to further explore and learn more about R and statistical computing
  • 5. “ Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.” - Wikipedia
  • 6. What is R? • A programming language, environment for statistical computing and graphics • R has its origins in the S programming language created in the 1970’s • Best used to manipulate moderately sized datasets, do statistical analysis and produce data-centric documents and presentations • These tools are distributed as packages, which any user can download to customize the R environment • Cross-platform: runs on Mac, Windows and Unix based systems
  • 7. Should I use R? Are you doing statistics ? No Yes No Yes Where “statistics” can mean machine learning, predictive analytics, data science, anything that falls under a rather broad umbrella… But if you have some data that makes sense to represent in a tabular like structure, and you want to do some cool analytical or statistics stuff with it, R is definitely a good choice…
  • 8. Downloading and Installing R http://www.r-project.org/ http://www.rstudio.com/
  • 9. The IDE (RStudio) 1. View Files and Data 2. See Workspace and History 3. See Files, Plots, Packages and Help 4. Console 1 2 34
  • 10. Installing Packages • To use packages in R, one must first install them using the install.packages function • Downloads the packages from CRAN and installs it to ready to be use
  • 11. Loading Packages • To use particular packages in your current R session, one must load it into the R environment using the library or require functions
  • 12. Common Data Structures in R To make the best of the R language, one needs a strong understanding of the basic data types and data structures and how to operate and use them. R has a wide variety of data types including scalars, vectors (numerical, character, logical), matrices, data frames, and lists… To understand computations in R, two slogans are helpful: • Everything that exists is an object • Everything that happens is a function call John Chambers creator of the S programming language, and core member of the R programming language project.
  • 13. Data Structures - Vectors The simplest structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers.
  • 14. Data Structures - Matrices Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function matrix.
  • 15. Data Structures - Data frames Time series are often ordered in data frames. A data frame is a matrix with names above the columns. This is nice, because you can call and use one of the columns without knowing in which position it is.
  • 16. Data Structures - Lists An R list is an object consisting of an ordered collection of objects known as its components.
  • 17. Data Structures - Date and Time Sys.time() # returns the current system date time
  • 18. Data Structures - Date and Time Two main (internal) formats for date-time are: POSIXct and POSIXlt • POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame • POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
  • 19. Data Structures - Others Other useful and important data type • NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero. The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not. • NA: Used for denoting missing values. (x = NA) creates a variable x with missing values. The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not. • NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function is.nan() lets you check whether the value of a variable is NaN or not. • Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The function is.finite() lets you check whether the value of a variable is infinity or not.
  • 20. Graphics One of the main reasons data analysts and data scientists turn to R is for its strong graphic capabilities. Basic Graphs: • These include density plots (histograms and kernel density plots), dot plots, bar charts (simple, stacked, grouped), line charts, pie charts (simple, annotated, 3D), boxplots (simple, notched, violin plots, bagplots) and scatter plots (simple, with fit lines, scatterplot matrices, high density plots, and 3D plots).
  • 21. Graphics Advances Graphs: • Graphical parameters describes how to change a graph's symbols, fonts, colors, and lines. Axes and text describe how to customize a graph's axes, add reference lines, text annotations and a legend. Combining plots describes how to organize multiple plots into a single graph. • The lattice package provides a comprehensive system for visualizing multivariate data, including the ability to create plots conditioned on one or more variables. The ggplot2 package offers a elegant systems for generating univariate and multivariate graphs based on a grammar of graphics.
  • 22. Data Manipulation in R dplyr an R package for fast and easy data manipulation. Data manipulation often involves common tasks, such as selecting certain variables, filtering on certain conditions, deriving new variables from existing variables, and so forth. If we think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation. In dplyr the main verbs (or functions) are: • filter: select a subset of the rows of a data frame • arrange: works similarly to filter, except that instead of filtering or selecting rows, it reorders them • select: select columns of a data frame • mutate: add new columns to a data frame that are functions of existing columns • summarize: summarize values • group_by: describe how to break a data frame into groups of rows
  • 24. Connecting R and SQL Server The RODBC package provides access to databases (including Microsoft Access and Microsoft SQL Server) through an ODBC interface Function Description odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame sqlQuery(channel, query) Submit a query to an ODBC database and return the results sqlSave(channel, mydf, tablename = sqtable, append = FALSE) Write or update (append=TRUE) a data frame to a table in the ODBC database sqlDrop(channel, sqtable) Remove a table from the ODBC database close(channel) Close the connection
  • 26. Other interface The RJDBC package provides access to databases through a JDBC interface. (requires JDBC driver from Microsoft)
  • 27. Demo [Let’s analyze - R and SQL Server]
  • 28. Resources • The R Project for Statistical Computing http://www.r-project.org/ • RStudio http://www.rstudio.com/ • Revolution Analytics http://www.revolutionanalytics.com/ • Shiny http://shiny.rstudio.com/ • {swirl} Learn R, in R http://swirlstats.com/ • R-bloggers http://www.r-bloggers.com/ • Online R resources for Beginners http://bit.ly/1x2q6Gl • 60+ R resources to improve your data skills http://bit.ly/1BzW4ox • Stack Overflow - R http://stackoverflow.com/tags/r • Cerebral Mastication - R Resources http://bit.ly/17YhZj4 • Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
  • 29. What Questions Do You Have?
  • 30. Thank You For attending this session