Successfully reported this slideshow.                                       Upcoming SlideShare
×

# Introduction into R for historians (part 3: examine and import data)

363 views

Published on

Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No • Be the first to comment

• Be the first to like this

### Introduction into R for historians (part 3: examine and import data)

1. 1. Recap Getting data in R Do it yourself! Plotting using ggplot2 Examining data and importing data in R Richard L. Zijdeman May 29, 2015 Richard L. Zijdeman Examining data and importing data in R
2. 2. Recap Getting data in R Do it yourself! Plotting using ggplot2 1 Recap 2 Getting data in R 3 Do it yourself! 4 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
3. 3. Recap Getting data in R Do it yourself! Plotting using ggplot2 Recap Richard L. Zijdeman Examining data and importing data in R
4. 4. Recap Getting data in R Do it yourself! Plotting using ggplot2 The structure of objects Store just about anything in R: numbers, sentences, datasets Objects Study the structure of objects: str() type of object features of object ships <- data.frame(year = c(1850, 1860, 1870, 1880), inbound = c(215, 237, 237, NA), outbound = c(212, 239, 260, 265)) Richard L. Zijdeman Examining data and importing data in R
5. 5. Recap Getting data in R Do it yourself! Plotting using ggplot2 Study the structure of object “ships”" str(ships) ## 'data.frame': 4 obs. of 3 variables: ## \$ year : num 1850 1860 1870 1880 ## \$ inbound : num 215 237 237 NA ## \$ outbound: num 212 239 260 265 Richard L. Zijdeman Examining data and importing data in R
6. 6. Recap Getting data in R Do it yourself! Plotting using ggplot2 Characteristics of objects Class: class() Length: length() Dimensions: dim() class(ships) ##  "data.frame" length(ships) ##  3 dim(ships) # rows, columns ##  4 3 Richard L. Zijdeman Examining data and importing data in R
7. 7. Recap Getting data in R Do it yourself! Plotting using ggplot2 Closer inspection of data.frames names of columns (variables): names() top/bottom rows: head(), tail() missing data: is.na() names(ships) ##  "year" "inbound" "outbound" is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE Richard L. Zijdeman Examining data and importing data in R
8. 8. Recap Getting data in R Do it yourself! Plotting using ggplot2 Summarizing data in data.frames descriptive statistics: summary() calculus: e.g. min(), mean(), sum() results table format: table() summary(ships) ## year inbound outbound ## Min. :1850 Min. :215.0 Min. :212.0 ## 1st Qu.:1858 1st Qu.:226.0 1st Qu.:232.2 ## Median :1865 Median :237.0 Median :249.5 ## Mean :1865 Mean :229.7 Mean :244.0 ## 3rd Qu.:1872 3rd Qu.:237.0 3rd Qu.:261.2 ## Max. :1880 Max. :237.0 Max. :265.0 ## NA's :1 Richard L. Zijdeman Examining data and importing data in R
9. 9. Recap Getting data in R Do it yourself! Plotting using ggplot2 is.na(ships) ## year inbound outbound ## [1,] FALSE FALSE FALSE ## [2,] FALSE FALSE FALSE ## [3,] FALSE FALSE FALSE ## [4,] FALSE TRUE FALSE table(is.na(ships)) ## ## FALSE TRUE ## 11 1 Richard L. Zijdeman Examining data and importing data in R
10. 10. Recap Getting data in R Do it yourself! Plotting using ggplot2 Visualizing your data Not just for analyses! Data quality representativeness missing data Richard L. Zijdeman Examining data and importing data in R
11. 11. Recap Getting data in R Do it yourself! Plotting using ggplot2 plot(ships) year 215 220 225 230 235 1850186018701880 215220225230235 inbound 1850 1855 1860 1865 1870 1875 1880 210 220 230 240 250 260 210220230240250260 outbound Richard L. Zijdeman Examining data and importing data in R
12. 12. Recap Getting data in R Do it yourself! Plotting using ggplot2 Getting data in R Richard L. Zijdeman Examining data and importing data in R
13. 13. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data already in R The “datasets” package very slim datasets speciﬁc example data To obtain list of datasets, type: library(help = "datasets") To obtain information on a speciﬁc dataset, type: help(swiss) # thus: help(name_of_package) or to just see the data: help(swiss) Richard L. Zijdeman Examining data and importing data in R
14. 14. Recap Getting data in R Do it yourself! Plotting using ggplot2 Reading in data Diﬀerent functions for diﬀerent ﬁles: Base R: read.table() (read.csv()) foreign package: read.spss(), read.dta(), read.dbf() openxlsx package: read.xlsx() alternatives packages: xlsx(Java required) gdata (perl-based) Richard L. Zijdeman Examining data and importing data in R
15. 15. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.xlsx() from openxlsx package ﬁle: your ﬁle, including directory sheet: name of sheet Richard L. Zijdeman Examining data and importing data in R
16. 16. Recap Getting data in R Do it yourself! Plotting using ggplot2 read.csv() ﬁle: your ﬁle, including directory header: variable names or not? sep: seperator read.csv default: “,” read.csv2 default: “;” skip: number of rows to skip nrows: total number of rows to read stringsAsFactors encoding (e.g. “latin1” or “UTF-8”) Richard L. Zijdeman Examining data and importing data in R
17. 17. Recap Getting data in R Do it yourself! Plotting using ggplot2 Do it yourself! Richard L. Zijdeman Examining data and importing data in R
18. 18. Recap Getting data in R Do it yourself! Plotting using ggplot2 Read in the following ﬁles as data.frames: HSN_basic.xlsx check the data.frame: using dim(), length() check the variables: using summary(), min(), table() Repeat for HSN_marriages.csv: read in only 100 lines Richard L. Zijdeman Examining data and importing data in R
19. 19. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting using ggplot2 Richard L. Zijdeman Examining data and importing data in R
20. 20. Recap Getting data in R Do it yourself! Plotting using ggplot2 ggplot2 Package by Hadley Wickham Generic plotting for a great range of plots ggplot2 website: http://ggplot2.org excellent tutorial: https://jofrhwld.github.io/avml2012/#Section_1.1 Richard L. Zijdeman Examining data and importing data in R
21. 21. Recap Getting data in R Do it yourself! Plotting using ggplot2 Building your graph Each plot consists of multiple layers Think of a canvas on which you ‘paint’ data layer geometries layer statistics layer Richard L. Zijdeman Examining data and importing data in R
22. 22. Recap Getting data in R Do it yourself! Plotting using ggplot2 Data layer data.frame and aesthetics ggplot(data.frame, aes(x= ..., y = ...)) geometries layer ggplot(..., aes(x= ..., y = ...)) + geom_...() # e.g. geom_line statistics layer ggplot(..., aes(x= ..., y = ...)) + geom_...() + stat_...() # e.g. stat_smooth Richard L. Zijdeman Examining data and importing data in R
23. 23. Recap Getting data in R Do it yourself! Plotting using ggplot2 an example Reading in the data hmar <- read.csv("./../data/derived/HSN_marriages.csv", stringsAsFactors = FALSE, encoding = "latin1", header = TRUE, nrows = 100) Richard L. Zijdeman Examining data and importing data in R
24. 24. Recap Getting data in R Do it yourself! Plotting using ggplot2 Plotting the data install.packages(ggplot2) library(ggplot2) ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point() Richard L. Zijdeman Examining data and importing data in R
25. 25. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
26. 26. Recap Getting data in R Do it yourself! Plotting using ggplot2 Improving the plot Specify characteristics of the geom_layer ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(colour = "blue", size = 3, shape = 18) See http: //www.cookbook-r.com/Graphs/Shapes_and_line_types/ Richard L. Zijdeman Examining data and importing data in R
27. 27. Recap Getting data in R Do it yourself! Plotting using ggplot2 Specify characteristics of the geom_layer 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
28. 28. Recap Getting data in R Do it yourself! Plotting using ggplot2 A PTE example Does age at marriage depend on educational attainment? To marry you need resources the more attainment the longer it takes to acquire resources ergo: brides with edu attainment marry later in life Not a statistical test: but let’s graph this Richard L. Zijdeman Examining data and importing data in R
29. 29. Recap Getting data in R Do it yourself! Plotting using ggplot2 A request from yesterday Can I plot labels? ggplot(hmar, aes(x= M_year, y = Age_bride, label = SIgn_bride)) + geom_text() Richard L. Zijdeman Examining data and importing data in R
30. 30. Recap Getting data in R Do it yourself! Plotting using ggplot2 Yes you can! Not really useful though. . . h a h h h a h a h a a a a h a a h h h h h h h a a h h a a h a a a hh h hh a a a a h a h a h h a a h hh h a h h h h h h h a h a h h a h a h h a hh a h h h h h h a a h h h h h h h h h a h a a h a h 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride Richard L. Zijdeman Examining data and importing data in R
31. 31. Recap Getting data in R Do it yourself! Plotting using ggplot2 Let’s try with colours. . . ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) Richard L. Zijdeman Examining data and importing data in R
32. 32. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 M_year Age_bride factor(SIgn_bride) a h No real pattern, though. . . Richard L. Zijdeman Examining data and importing data in R
33. 33. Recap Getting data in R Do it yourself! Plotting using ggplot2 Finalizing the graph ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_point(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
34. 34. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
35. 35. Recap Getting data in R Do it yourself! Plotting using ggplot2 Satisﬁed? Richard L. Zijdeman Examining data and importing data in R
36. 36. Recap Getting data in R Do it yourself! Plotting using ggplot2 Actually not. . . the points are plotted on top of each other. . . Solution: geom_jitter ggplot(hmar, aes(x= M_year, y = Age_bride)) + geom_jitter(aes(colour = factor(SIgn_bride)), size = 3, shape = 18) + labs(list(title = "Age of marriage over time", x = "time (years since A.D.)", y = "age of bride (years)", colour = "Signature")) # here we use colour since legend shows colour Richard L. Zijdeman Examining data and importing data in R
37. 37. Recap Getting data in R Do it yourself! Plotting using ggplot2 20 30 40 50 1830 1840 1850 1860 1870 time (years since A.D.) ageofbride(years) Signature a h Age of marriage over time Richard L. Zijdeman Examining data and importing data in R
38. 38. Recap Getting data in R Do it yourself! Plotting using ggplot2 Final remarks on ggplot2 We have just scratched the surface of ggplot2 Build your graph slowly start with the basics add complexity step-wise Now it’s your turn! Richard L. Zijdeman Examining data and importing data in R
39. 39. Recap Getting data in R Do it yourself! Plotting using ggplot2 A small PTE project Look at the variables in the HSN ﬁles Think of a research question Provide a general mechanism and hypothesis Plot your results Richard L. Zijdeman Examining data and importing data in R