R SCRIPTING BASICS
Ross Wickham
Senior Hydraulic Engineer
NWW, Hydrology Branch
Date: 17 March 2022
2
OUTLINE
• R Background
• Basic Syntax and Scripting
• Key Packages
• Example Applications
• Demo
• Summary
• Resources
3
R BACKGROUND
• Object oriented, like Python, C++, Java
• Interpreted language (you don’t need to pre-compile anything)
• Free, open-source
• Implementation of S programming, a statistical programming language
• Created for statistics, data mining, and data analysis
4
Assignment operators are bidirectional, and can be chained
a = 3, b = 4, c = 4, d = 5, e = 6
Create sequences, or specified values; c = default concatenate function
BASIC SYNTAX
5
Vectors are a key data type, like Python lists:
Standard (or base) R syntax is to wrap objects with a function for evaluation:
Subset vectors using brackets
BASIC SYNTAX
6
Data.frames are an essential data storage type for any tabular/paired data (e.g., time series),
like pandas dataframe in Python:
Data.frames are easily plotted:
BASIC SYNTAX
7
Code is streamlined for statistical testing
BASIC SYNTAX
8
Loading and installing packages is very simple:
Add a “?” in front of any function to see its
help page in RStudio:
Examples and additional documentation
can be found in package vignettes:
INSTALLING AND HELP
9
• An alternative to basic R syntax that has quickly gained popularity
• Tidyverse style has a core philosophy for data structure and analysis (“tidy data”):
• Every column is a variable
• Every row is an observation
• Every cell is a single value
• Rich library of functions to streamline data analysis using “pipelines”
• %>% operator indicates you are passing the object on the left to the function on the right.
• These can be chained, passing the object between multiple functions
TIDYVERSE SYNTAX
10
My preferred method: use RStudio, an Integrated Developer Environment (IDE)
• See your code
• Test evaluations
• View errors
• See plots
• Get help
• See current objects being used (your “environment”)
• See history
• Customize the user interface
• Configure R version
• Available on App Portal
SCRIPTING IN R
11
SCRIPTING WITH RSTUDIO
Source Code
Current Objects
Help, Plots,
Packages, and
File
management
Console for
testing code and
viewing output
Navigate through script tabs
Code
Outline
12
Able to control which version of R is being used: Tools > Global Options
SCRIPTING WITH RSTUDIO – CONFIGURING R
*Use 64-bit unless
you have a good
reason to use 32-bit
13
R VS PYTHON
• Use what you know and are comfortable with – both are great
• R is more portable
• Python is generally considered easier to learn
• R is better for statistics
• R considered easier for plotting
• Python typically faster, better for machine learning
14
KEY PACKAGES (FOR H&H)
• tidyverse – set of packages designed to work together for data analysis under a core philosophy
– ggplot2 – plotting library
– dplyr – data manipulation
– purrr – create complex data pipelines
– readr – fast, user-friendly way to read rectangular data
– tidyr – consistently organize tabular data
– and others…
• reshape2 – data.frame manipulation
• lubridate – parse and manipulate dates
• rgdal, sp, raster – read, write, and manipulate geospatial objects
• plotly – interactive plots
• dataRetrieval - USGS data web retrieval
Specific to the Corps:
• cwms_read – read publicly available NWD CWMS data, Jeff Tilton (NWD)
• dssrip – read, manipulate, write DSS data, Evan Heisman (HEC)
Advanced Users:
• shiny, shinydashboard – develop web app user interfaces; dynamically interact with data
• leaflet – interactive maps, with interactive content
15
QUESTIONS?
16
RESOURCES
Simple R cheat sheet
Cheat sheets for multiple packages
Portable version of R and RStudio with example code:
<HEC share drive>
The R Manuals (written by R development core team)
The Little Book of R for Time Series
Code Academy tutorials
Translation between Python pandas and R data.frame
Stackoverflow: help forum
dssrip package: read and write DSS in R
cwms_read package: read NWW CWMS data in Python and R

R_Scripting_Basics_2022-03aaaaaaaaa.pptx

  • 1.
    R SCRIPTING BASICS RossWickham Senior Hydraulic Engineer NWW, Hydrology Branch Date: 17 March 2022
  • 2.
    2 OUTLINE • R Background •Basic Syntax and Scripting • Key Packages • Example Applications • Demo • Summary • Resources
  • 3.
    3 R BACKGROUND • Objectoriented, like Python, C++, Java • Interpreted language (you don’t need to pre-compile anything) • Free, open-source • Implementation of S programming, a statistical programming language • Created for statistics, data mining, and data analysis
  • 4.
    4 Assignment operators arebidirectional, and can be chained a = 3, b = 4, c = 4, d = 5, e = 6 Create sequences, or specified values; c = default concatenate function BASIC SYNTAX
  • 5.
    5 Vectors are akey data type, like Python lists: Standard (or base) R syntax is to wrap objects with a function for evaluation: Subset vectors using brackets BASIC SYNTAX
  • 6.
    6 Data.frames are anessential data storage type for any tabular/paired data (e.g., time series), like pandas dataframe in Python: Data.frames are easily plotted: BASIC SYNTAX
  • 7.
    7 Code is streamlinedfor statistical testing BASIC SYNTAX
  • 8.
    8 Loading and installingpackages is very simple: Add a “?” in front of any function to see its help page in RStudio: Examples and additional documentation can be found in package vignettes: INSTALLING AND HELP
  • 9.
    9 • An alternativeto basic R syntax that has quickly gained popularity • Tidyverse style has a core philosophy for data structure and analysis (“tidy data”): • Every column is a variable • Every row is an observation • Every cell is a single value • Rich library of functions to streamline data analysis using “pipelines” • %>% operator indicates you are passing the object on the left to the function on the right. • These can be chained, passing the object between multiple functions TIDYVERSE SYNTAX
  • 10.
    10 My preferred method:use RStudio, an Integrated Developer Environment (IDE) • See your code • Test evaluations • View errors • See plots • Get help • See current objects being used (your “environment”) • See history • Customize the user interface • Configure R version • Available on App Portal SCRIPTING IN R
  • 11.
    11 SCRIPTING WITH RSTUDIO SourceCode Current Objects Help, Plots, Packages, and File management Console for testing code and viewing output Navigate through script tabs Code Outline
  • 12.
    12 Able to controlwhich version of R is being used: Tools > Global Options SCRIPTING WITH RSTUDIO – CONFIGURING R *Use 64-bit unless you have a good reason to use 32-bit
  • 13.
    13 R VS PYTHON •Use what you know and are comfortable with – both are great • R is more portable • Python is generally considered easier to learn • R is better for statistics • R considered easier for plotting • Python typically faster, better for machine learning
  • 14.
    14 KEY PACKAGES (FORH&H) • tidyverse – set of packages designed to work together for data analysis under a core philosophy – ggplot2 – plotting library – dplyr – data manipulation – purrr – create complex data pipelines – readr – fast, user-friendly way to read rectangular data – tidyr – consistently organize tabular data – and others… • reshape2 – data.frame manipulation • lubridate – parse and manipulate dates • rgdal, sp, raster – read, write, and manipulate geospatial objects • plotly – interactive plots • dataRetrieval - USGS data web retrieval Specific to the Corps: • cwms_read – read publicly available NWD CWMS data, Jeff Tilton (NWD) • dssrip – read, manipulate, write DSS data, Evan Heisman (HEC) Advanced Users: • shiny, shinydashboard – develop web app user interfaces; dynamically interact with data • leaflet – interactive maps, with interactive content
  • 15.
  • 16.
    16 RESOURCES Simple R cheatsheet Cheat sheets for multiple packages Portable version of R and RStudio with example code: <HEC share drive> The R Manuals (written by R development core team) The Little Book of R for Time Series Code Academy tutorials Translation between Python pandas and R data.frame Stackoverflow: help forum dssrip package: read and write DSS in R cwms_read package: read NWW CWMS data in Python and R

Editor's Notes

  • #9 %>% implies the first argument of the following function will be the object to the left: mutate function’s first argument is the ‘a’ data.frame, and outputs the modified data.frame to be passed to the subset function