Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
From HISCO to HISCAM
Richard L. Zijdeman
6 July 2015
Richard L. Zijdeman From HISCO to HISCAM
Getting started
Before we start, let’s first setup our working environment
rm(list = ls()) # ReMove objects in memory
setwd...
Data work flow
It is good practice to organize all of your files for a project (e.g. a
paper) in a specific folder. Here we s...
Reading in the data
OK, now let’s read in the data that we supposedly coded. Actually
these data are from the Historical S...
The read.csv() function
read.csv()
file: your file, including directory
header: variable names or not?
sep: seperator
read.c...
Ok, so now let’s read in the data for our training purposes:
df <- read.csv("./data/source/sample_data.csv",
stringsAsFact...
HISCAM Universal scale - male only
hcamU2 <- read.table(
"http://www.camsis.stir.ac.uk/hiscam/
v1_3_1/hiscam_u2.dat",
sep ...
Merging the data
We now need to merge these two dataframes. There should be at
least 1 variable that both dataframes have ...
So with this command, we’re saying take 2 files, df and hcamU2
and merge them by a variable, which is called “hisco” in the...
Now if we look at the df.h dataframe (the one that is the result of
our merge) with summary(), we see that the new variabl...
Final comments
I’m sure Ben now has provided you already more info on R(Studio)
and you’ll feel a bit more comfortable. Pl...
Upcoming SlideShare
Loading in …5
×

Using HISCO and HISCAM to code and analyze occupations

459 views

Published on

This is the lab session I provided for the European Historical Sample Network Summerschool on why occupations are important in historical research and how we can appropriately deal with them using HISCO and HISCAM

Published in: Data & Analytics
  • FREE TRAINING: "How to Earn a 6-Figure Side-Income Online" ... ➤➤ https://tinyurl.com/y3ylrovq
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Using HISCO and HISCAM to code and analyze occupations

  1. 1. From HISCO to HISCAM Richard L. Zijdeman 6 July 2015 Richard L. Zijdeman From HISCO to HISCAM
  2. 2. Getting started Before we start, let’s first setup our working environment rm(list = ls()) # ReMove objects in memory setwd("~/Dropbox/Historical Demography - Reconstructing Life Cource Dynamics/ Day 5 Occupational coding systems and R Studio descriptives and logistic regression/hisco2hiscam/") With the rm() (remove) function we remove objects from memory. Objects like datasets that still remain in memory from a previous session. Richard L. Zijdeman From HISCO to HISCAM
  3. 3. Data work flow It is good practice to organize all of your files for a project (e.g. a paper) in a specific folder. Here we set our working directory with setwd() to a particular folder. One of the advantages of working like is efficient collaboration. After a colleague has set his working directory to the folder you shared with him, all links inside that folder will be the same as yours and thus not require any tweaking of directories or file names. Richard L. Zijdeman From HISCO to HISCAM
  4. 4. Reading in the data OK, now let’s read in the data that we supposedly coded. Actually these data are from the Historical Sample of the Netherlands and can only be used for this summerschool. You’re free to use the HSN data (and I would recommend it), but you’d need to sign a license agreement stating that you’ll manage the data in a proper way. There many functions (commands) to read in the data. A common one is for reading in .csv files. Each function comes with multiple arguments that you can set, e.g. whether your file has column names (referred to as a ‘header’). Here are some of the obvious arguments for read.csv() Richard L. Zijdeman From HISCO to HISCAM
  5. 5. The read.csv() function read.csv() file: your file, including directory header: variable names or not? sep: seperator read.csv default: “,” read.csv2 default: “;” skip: number of rows to skip nrows: total number of rows to read stringsAsFactors encoding (e.g. “latin1” or “UTF-8”) Richard L. Zijdeman From HISCO to HISCAM
  6. 6. Ok, so now let’s read in the data for our training purposes: df <- read.csv("./data/source/sample_data.csv", stringsAsFactors = FALSE, encoding = "latin1", nrows = 1000) Richard L. Zijdeman From HISCO to HISCAM
  7. 7. HISCAM Universal scale - male only hcamU2 <- read.table( "http://www.camsis.stir.ac.uk/hiscam/ v1_3_1/hiscam_u2.dat", sep = "t", header = TRUE, stringsAsFactors = FALSE) NOTE: you cannot ‘break’ the filepath like that, but I needed to do it so you could see the url So now you should have two dataframes: df, which is our occupational data with HISCO and hcamU2, which is the universal HISCAM scale for men Richard L. Zijdeman From HISCO to HISCAM
  8. 8. Merging the data We now need to merge these two dataframes. There should be at least 1 variable that both dataframes have in common. That doesn’t mean they need to have the same name in both datasets. But even if they do (like in our case), I like specifying the name, so I’m sure what is being merged. df.h <- merge(df, hcamU2, by.x = "hisco", by.y = "hisco", all.x = TRUE) Richard L. Zijdeman From HISCO to HISCAM
  9. 9. So with this command, we’re saying take 2 files, df and hcamU2 and merge them by a variable, which is called “hisco” in the first (x) dataframe and “hisco” in the second (y) dataframe. Now, you can imagine, that you have occupations without a HISCO code, or that perhaps there’s only a small number of occupations in your file and not every HISCAM from the hcamU2 file finds a match with your occupational data. To make sure, you can preserve all your data, even if there was no match for it, you specify all = TRUE. Here, I specify, all.x which only preserved the non-matches from my ‘df’ dataframe. Richard L. Zijdeman From HISCO to HISCAM
  10. 10. Now if we look at the df.h dataframe (the one that is the result of our merge) with summary(), we see that the new variable HISCAM was added: summary(df.h) Richard L. Zijdeman From HISCO to HISCAM
  11. 11. Final comments I’m sure Ben now has provided you already more info on R(Studio) and you’ll feel a bit more comfortable. Plunging into the deep like this (learning how to merge in R, without getting to know R properly) is defintely not ideal, but actually you came a long way during class. If you’d like to practice, you could try and download more of the HISCAM files and see how they relate. E.g. you could plot the early vs. the late period, or just look at correlations between the HISCAM values for different scales. Good luck with the remainder of the course and your research projects afterwards. Best wishes, Richard Richard L. Zijdeman From HISCO to HISCAM

×