The document discusses visualizing data with R and ggplot2. It recommends accessing materials through RStudio Cloud and provides links for doing so. It introduces ggplot2 as a flexible and complete graphics system for R that uses a layer-based approach. Key components of ggplot2 layers are discussed, including specifying the data, aesthetic mappings, and geometry. The document also mentions using RMarkdown for reproducible reports that combine narrative text and code.
1. Before we start!
Access content for today. One of two ways:
Recommended:
• Go to https://rstudio.cloud and create an account.
• Once that’s completed, go to https://rstudio.cloud/project/358879
Less Recommended
• Go to https://github.com/rharrington31/drexel_visualization_workshop.
You can clone or download the repository. You will need R and RStudio installed locally.
4. Goals for today
Understand how to think about exploratory data analysis
Understand how to use R to create graphs
5. ggplot2
“Grammar of Graphics”
Plot specification at a high level of abstraction
Very flexible
Theme system for polishing plot appearance
Mature and complete graphics system
Source: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.htm
7. ggplot2 is all about layers.
Example: https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#35
8. What makes up a layer?
• data— what data is being used to build the layer?
• mapping— what field(s) from the data are being used to build the layer?
• geometry— what shape should our data take when building the layer?
• statistic— how should our data be transformed when building the layer?
• position— where should our data be placed when building the layer?
Source: https://rpubs.com/hadley/ggplot2-layers
9. Getting Started
Access content for today. One of two ways:
Recommended:
• Go to https://rstudio.cloud and create an account.
• Once that’s completed, go to https://rstudio.cloud/project/358879
Less Recommended
• Go to https://github.com/rharrington31/drexel_visualization_workshop.
You can clone or download the repository. You will need R and RStudio installed locally.
10. Data we’ll be working with
Last month, the City of Chicago began publishing anonymized
rideshare data
• Drivers
• Vehicles
• Trips
We’ll be specifically focused on a sample of the Trips dataset
Goal:
Can we predict whether or not a ride will be tipped?
Trips Data: https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips/m6dm-c72p
Background Info: https://www.chicago.gov/city/en/depts/bacp/provdrs/vehic/news/2019/april/tnpdata.html
11. RMarkdown
More Information: https://rmarkdown.rstudio.com
Like Jupyter Notebooks, but for R.
R Markdown documents are fully reproducible. Use a
productive notebook interface to weave together narrative text
and code to produce elegantly formatted output. Use multiple
languages including R, Python, and SQL.
12. YAML Header
Only occurs once at the
top of the document
Allows you to specify
meta-data about your
document
13. Markdown
Text that is not evaluated
as code. Basic
formatting of text is
possible, from bolding
and italicizing text to
utilizing lists.
It is possible to include
code inline with the
markdown that will be
evaluated when the
document is “knit"
14. Code Chunks
Actual code. Each chunk
can be evaluated
independently.
It is possible to use a
variety of languages
beyond R in the chunks.
15. Output
Output from the code
chunks are included
immediately below the
chunk. This allows for
easier exploration.
17. ggplot2 Layers
More Information: https://rpubs.com/hadley/ggplot2-layers
Date Item Quantit Price
3/1/19 Pants 2 $19.99
3/2/19 Shirt 1 $14.99
3/3/19 Shirt 4 $14.99
3/3/19 Belt 2 $8.99
3/4/19 Pants 1 $19.99
3/5/19 Hat 1 $12.99
3/6/19 Pants 3 $19.99
3/6/19 Belt 3 $8.99
data
data = _
18. ggplot2 Layers
More Information: https://rpubs.com/hadley/ggplot2-layers
Date Item Quantit Price
3/1/19 Pants 2 $19.99
3/2/19 Shirt 1 $14.99
3/3/19 Shirt 4 $14.99
3/3/19 Belt 2 $8.99
3/4/19 Pants 1 $19.99
3/5/19 Hat 1 $12.99
3/6/19 Pants 3 $19.99
3/6/19 Belt 3 $8.99
x =
y=
alpha=
colour =
fill=
group=
linetype=
size=
+
data
data = _
aesthetic mapping
aes(x =_, y=_, …)
19. ggplot2 Layers
More Information: https://rpubs.com/hadley/ggplot2-layers
Cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
Date Item Quantit Price
3/1/19 Pants 2 $19.99
3/2/19 Shirt 1 $14.99
3/3/19 Shirt 4 $14.99
3/3/19 Belt 2 $8.99
3/4/19 Pants 1 $19.99
3/5/19 Hat 1 $12.99
3/6/19 Pants 3 $19.99
3/6/19 Belt 3 $8.99
x =
y=
alpha=
colour =
fill=
group=
linetype=
size=
data
data = _
aesthetic mapping
aes(x =_, y=_, …)
geometry
geom_bar()
+ +
20. Where can you get data?
Open Data Network (opendatanetwork.com/)
OpenDataPhilly (opendataphilly.org)
Data is Plural (tinyletter.com/data-is-plural)
Kaggle (kaggle.com)
Data.World (data.world)