Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Examining data and importing data in R
Richard L. Zijdeman
...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
1 Recap
2 Getting data in R
3 Do it yourself!
4 Plotting us...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Recap
Richard L. Zijdeman Examining data and importing data...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
The structure of objects
Store just about anything in R: nu...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Study the structure of object “ships”"
str(ships)
## 'data....
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Characteristics of objects
Class: class()
Length: length()
...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Closer inspection of data.frames
names of columns (variable...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Summarizing data in data.frames
descriptive statistics: sum...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
is.na(ships)
## year inbound outbound
## [1,] FALSE FALSE F...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Visualizing your data
Not just for analyses!
Data quality
r...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
plot(ships)
year
215 220 225 230 235
1850186018701880
21522...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Getting data in R
Richard L. Zijdeman Examining data and im...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Data already in R
The “datasets” package
very slim datasets...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Reading in data
Different functions for different files:
Base ...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
read.xlsx() from openxlsx package
file: your file, including ...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
read.csv()
file: your file, including directory
header: varia...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Do it yourself!
Richard L. Zijdeman Examining data and impo...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Read in the following files as data.frames:
HSN_basic.xlsx
c...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Plotting using ggplot2
Richard L. Zijdeman Examining data a...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
ggplot2
Package by Hadley Wickham
Generic plotting for a gr...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Building your graph
Each plot consists of multiple layers
T...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Data layer
data.frame and aesthetics
ggplot(data.frame, aes...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
an example
Reading in the data
hmar <- read.csv("./../data/...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Plotting the data
install.packages(ggplot2)
library(ggplot2...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
Richa...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Improving the plot
Specify characteristics of the geom_laye...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Specify characteristics of the geom_layer
20
30
40
50
1830 ...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A PTE example
Does age at marriage depend on educational at...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A request from yesterday
Can I plot labels?
ggplot(hmar, ae...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Yes you can!
Not really useful though. . .
h
a
h
h
h
a
h
a
...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Let’s try with colours. . .
ggplot(hmar, aes(x= M_year, y =...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
M_year
Age_bride
facto...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Finalizing the graph
ggplot(hmar, aes(x= M_year, y = Age_br...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
time (years since A.D....
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Satisfied?
Richard L. Zijdeman Examining data and importing ...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Actually not. . . the points are plotted on top of each oth...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
20
30
40
50
1830 1840 1850 1860 1870
time (years since A.D....
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
Final remarks on ggplot2
We have just scratched the surface...
Recap
Getting data in R
Do it yourself!
Plotting using ggplot2
A small PTE project
Look at the variables in the HSN files
T...
Upcoming SlideShare
Loading in …5
×

Introduction into R for historians (part 3: examine and import data)

363 views

Published on

Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Introduction into R for historians (part 3: examine and import data)

  1. 1. Recap Getting data in R Do it yourself! Plotting using ggplot2 Examining data and importing data in R Richard L. Zijdeman May 29, 2015 Richard L. Zijdeman Examining data and importing data in R
  2. 2. Recap Getting data in R Do it yourself! Plotting using ggplot2 1 Recap 2 Getting data in R 3 Do it yourself! 4 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
  3. 3. Recap Getting data in R Do it yourself! Plotting using ggplot2 Recap Richard L. Zijdeman Examining data and importing data in R
  4. 4. Recap Getting data in R Do it yourself! Plotting using ggplot2 The structure of objects Store just about anything in R: numbers, sentences, datasets Objects Study the structure of objects: str() type of object features of object ships <- data.frame(year = c(1850, 1860, 1870, 1880), inbound = c(215, 237, 237, NA), outbound = c(212, 239, 260, 265)) Richard L. Zijdeman Examining data and importing data in R
  5. 5. Recap Getting data in R Do it yourself! Plotting using ggplot2 Study the structure of object “ships”" str(ships) ## 'data.frame': 4 obs. of 3 variables: ## $ year : num 1850 1860 1870 1880 ## $ inbound : num 215 237 237 NA ## $ outbound: num 212 239 260 265 Richard L. Zijdeman Examining data and importing data in R
  6. 6. Recap Getting data in R Do it yourself! Plotting using ggplot2 Characteristics of objects Class: class() Length: length() Dimensions: dim() class(ships) ## [1] "data.frame" length(ships) ## [1] 3 dim(ships) # rows, columns ## [1] 4 3 Richard L. Zijdeman Examining data and importing data in R
  7. 7. Recap Getting data in R Do it yourself! Plotting using ggplot2 Closer inspection of data.frames names of columns (variables): names() top/bottom rows: head(), tail() missing data: is.na() names(ships) ## [1] "year" "inbound" "outbound" is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE Richard L. Zijdeman Examining data and importing data in R
  8. 8. Recap Getting data in R Do it yourself! Plotting using ggplot2 Summarizing data in data.frames descriptive statistics: summary() calculus: e.g. min(), mean(), sum() results table format: table() summary(ships) ## year inbound outbound ## Min. :1850 Min. :215.0 Min. :212.0 ## 1st Qu.:1858 1st Qu.:226.0 1st Qu.:232.2 ## Median :1865 Median :237.0 Median :249.5 ## Mean :1865 Mean :229.7 Mean :244.0 ## 3rd Qu.:1872 3rd Qu.:237.0 3rd Qu.:261.2 ## Max. :1880 Max. :237.0 Max. :265.0 ## NA's :1 Richard L. Zijdeman Examining data and importing data in R
  9. 9. Recap Getting data in R Do it yourself! Plotting using ggplot2 is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE table(is.na(ships)) ## ## FALSE TRUE ## 11 1 Richard L. Zijdeman Examining data and importing data in R
  10. 10. Recap Getting data in R Do it yourself! Plotting using ggplot2 Visualizing your data Not just for analyses! Data quality representativeness missing data Richard L. Zijdeman Examining data and importing data in R
  11. 11. Recap Getting data in R Do it yourself! Plotting using ggplot2 plot(ships) year 215 220 225 230 235 1850186018701880 215220225230235 inbound 1850 1855 1860 1865 1870 1875 1880 210 220 230 240 250 260 210220230240250260 outbound Richard L. Zijdeman Examining data and importing data in R
  12. 12. Recap Getting data in R Do it yourself! Plotting using ggplot2 Getting data in R Richard L. Zijdeman Examining data and importing data in R
  13. 13. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data already in R The “datasets” package very slim datasets specific example data To obtain list of datasets, type: library(help = "datasets") To obtain information on a specific dataset, type: help(swiss) # thus: help(name_of_package) or to just see the data: help(swiss) Richard L. Zijdeman Examining data and importing data in R
  14. 14. Recap Getting data in R Do it yourself! Plotting using ggplot2 Reading in data Different functions for different files: Base R: read.table() (read.csv()) foreign package: read.spss(), read.dta(), read.dbf() openxlsx package: read.xlsx() alternatives packages: xlsx(Java required) gdata (perl-based) Richard L. Zijdeman Examining data and importing data in R
  15. 15. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.xlsx() from openxlsx package file: your file, including directory sheet: name of sheet Richard L. Zijdeman Examining data and importing data in R
  16. 16. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.csv() file: your file, including directory header: variable names or not? sep: seperator read.csv default: “,” read.csv2 default: “;” skip: number of rows to skip nrows: total number of rows to read stringsAsFactors encoding (e.g. “latin1” or “UTF-8”) Richard L. Zijdeman Examining data and importing data in R
  17. 17. Recap Getting data in R Do it yourself! Plotting using ggplot2 Do it yourself! Richard L. Zijdeman Examining data and importing data in R
  18. 18. Recap Getting data in R Do it yourself! Plotting using ggplot2 Read in the following files as data.frames: HSN_basic.xlsx check the data.frame: using dim(), length() check the variables: using summary(), min(), table() Repeat for HSN_marriages.csv: read in only 100 lines Richard L. Zijdeman Examining data and importing data in R
  19. 19. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
  20. 20. Recap Getting data in R Do it yourself! Plotting using ggplot2 ggplot2 Package by Hadley Wickham Generic plotting for a great range of plots ggplot2 website: http://ggplot2.org excellent tutorial: https://jofrhwld.github.io/avml2012/#Section_1.1 Richard L. Zijdeman Examining data and importing data in R
  21. 21. Recap Getting data in R Do it yourself! Plotting using ggplot2 Building your graph Each plot consists of multiple layers Think of a canvas on which you ‘paint’ data layer geometries layer statistics layer Richard L. Zijdeman Examining data and importing data in R
  22. 22. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data layer data.frame and aesthetics ggplot(data.frame, aes(x= ..., y = ...)) geometries layer ggplot(..., aes(x= ..., y = ...)) + geom_...() # e.g. geom_line statistics layer ggplot(..., aes(x= ..., y = ...)) + geom_...() + stat_...() # e.g. stat_smooth Richard L. Zijdeman Examining data and importing data in R
  23. 23. Recap Getting data in R Do it yourself! Plotting using ggplot2 an example Reading in the data hmar <- read.csv("./../data/derived/HSN_marriages.csv", stringsAsFactors = FALSE, encoding = "latin1", header = TRUE, nrows = 100) Richard L. Zijdeman Examining data and importing data in R
  24. 24. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting the data install.packages(ggplot2) library(ggplot2) ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point() Richard L. Zijdeman Examining data and importing data in R
  25. 25. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  26. 26. Recap Getting data in R Do it yourself! Plotting using ggplot2 Improving the plot Specify characteristics of the geom_layer ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(colour = "blue", size = 3, shape = 18) See http: //www.cookbook-r.com/Graphs/Shapes_and_line_types/ Richard L. Zijdeman Examining data and importing data in R
  27. 27. Recap Getting data in R Do it yourself! Plotting using ggplot2 Specify characteristics of the geom_layer 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  28. 28. Recap Getting data in R Do it yourself! Plotting using ggplot2 A PTE example Does age at marriage depend on educational attainment? To marry you need resources the more attainment the longer it takes to acquire resources ergo: brides with edu attainment marry later in life Not a statistical test: but let’s graph this Richard L. Zijdeman Examining data and importing data in R
  29. 29. Recap Getting data in R Do it yourself! Plotting using ggplot2 A request from yesterday Can I plot labels? ggplot(hmar, aes(x= M_year, y = Age_bride, label = SIgn_bride)) + geom_text() Richard L. Zijdeman Examining data and importing data in R
  30. 30. Recap Getting data in R Do it yourself! Plotting using ggplot2 Yes you can! Not really useful though. . . h a h h h a h a h a a a a h a a h h h h h h h a a h h a a h a a a hh h hh a a a a h a h a h h a a h hh h a h h h h h h h a h a h h a h a h h a hh a h h h h h h a a h h h h h h h h h a h a a h a h 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
  31. 31. Recap Getting data in R Do it yourself! Plotting using ggplot2 Let’s try with colours. . . ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) Richard L. Zijdeman Examining data and importing data in R
  32. 32. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride factor(SIgn_bride) a h No real pattern, though. . . Richard L. Zijdeman Examining data and importing data in R
  33. 33. Recap Getting data in R Do it yourself! Plotting using ggplot2 Finalizing the graph ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
  34. 34. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
  35. 35. Recap Getting data in R Do it yourself! Plotting using ggplot2 Satisfied? Richard L. Zijdeman Examining data and importing data in R
  36. 36. Recap Getting data in R Do it yourself! Plotting using ggplot2 Actually not. . . the points are plotted on top of each other. . . Solution: geom_jitter ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_jitter(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
  37. 37. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
  38. 38. Recap Getting data in R Do it yourself! Plotting using ggplot2 Final remarks on ggplot2 We have just scratched the surface of ggplot2 Build your graph slowly start with the basics add complexity step-wise Now it’s your turn! Richard L. Zijdeman Examining data and importing data in R
  39. 39. Recap Getting data in R Do it yourself! Plotting using ggplot2 A small PTE project Look at the variables in the HSN files Think of a research question Provide a general mechanism and hypothesis Plot your results Richard L. Zijdeman Examining data and importing data in R

×