SlideShare a Scribd company logo
1 of 16
The IPUMS Historical
Project
Besufekad Alemu & Thu Vien
(Advised by Evan Roberts & Dave Hacker)
Introduction
Norwegian & Swedish Immigrant Men
Norwegian: 65,361 born 1885-1895
Swedish: 153,477 born 1885-1895
Populations in Censuses (born 1884-96)
Norwegian: 49,006 in 1920; 51,510 in 1930
Swedish: 79,486 in 1920; 84,698 in 1930
Goals
Link as many people as possible from the
immigration data to the censuses
Analyze their economic and social
outcomes in the 1920s & 1930s
How do we do that?
Linking Data
Use algorithm to link individuals from both
sources using their names
Focus on men
Women’s last names may change
Born in 1885-1895 (from immigration)
Steps for Linking
Extract men from both sources
Match individual to census data
Use string “distance” measure to narrow
matches to similar sounding names
Remaining data are analyzed further to
find the best matches (ongoing)
Summary
Results for Norwegian Matching (1920)
Potential Matches 356,758 Jaro-Winkler
Distance≥0.8
From Immigration 40,397 61.8% of target group
From 1920 Census 31,423 65.1% of target group
Summary
Results for Swedish Matching (1920)
Potential Matches 2,748,009 Jaro-Winkler
Distance≥0.8
From Immigration 102,646 66.9% of target group
From 1920 Census 55,844 74% of target group
Improving Links and Training Data
Maximizing number of immigrant men
Looked up names for possible gender errors
Missing data on gender
Develop “training data”
Multiple possible matches in some cases
Choose the true and best match for names
Example
True
Match
Immigration
ID
Immigration
First Name
Immigration
Last Name
Census
First Name
Census
Last Name
Census ID
0 1 Martin Langseth Martin Langseth 1
0 1 Martin Langseth Martin Langetiqe 2
Example
True
Match
Immigration
ID
Immigration
First Name
Immigration
Last Name
Census
First Name
Census
Last Name
Census ID
1 1 Martin Langseth Martin Langseth 1
0 1 Martin Langseth Martin Langetiqe 2
Economic and Social Outcomes
Investigate the outcomes
Where did most Norwegians and Swedes live
in the 1920s & 1930s?
What are the characteristics of the places?
What are other demographics of the places?
Challenges
Historical data
No unique IDs, disorganized
Making judgement calls
Missing county and state level information
Coding
Learning Stata
Learning linking methods
Future Directions
Expand to all birth years and censuses
Use result of pilot study for grant proposals
Norwegian and Swedish researchers may
partner in linking to people in their censuses
Unique Matches
According
to...
Norwegian Rate from
Remaining in
Potential
Matches
Swedish Rate from
Remaining in
Potential
Matches
Imm. ID 11,330 28% 17,864 17.4%
Census ID 7,924 25.2% 8,405 15.1%
Both Imm ID
& Census ID
2,854 7.1% & 9.1% 2,517 2.4% & 4.5%

More Related Content

Viewers also liked

Viewers also liked (7)

How-To Moodle
How-To MoodleHow-To Moodle
How-To Moodle
 
Mass Weight.Cente of Gravity States of Equlibrium
Mass Weight.Cente of Gravity States of EqulibriumMass Weight.Cente of Gravity States of Equlibrium
Mass Weight.Cente of Gravity States of Equlibrium
 
~OFFICAL PORTFOLIO 2015:16
~OFFICAL PORTFOLIO 2015:16~OFFICAL PORTFOLIO 2015:16
~OFFICAL PORTFOLIO 2015:16
 
Investigación invertebrados
Investigación invertebradosInvestigación invertebrados
Investigación invertebrados
 
Las maquinas
Las maquinasLas maquinas
Las maquinas
 
Surgical complications
Surgical complicationsSurgical complications
Surgical complications
 
Assessment of values ppt
Assessment of values pptAssessment of values ppt
Assessment of values ppt
 

The Historical Projects

  • 1. The IPUMS Historical Project Besufekad Alemu & Thu Vien (Advised by Evan Roberts & Dave Hacker)
  • 2. Introduction Norwegian & Swedish Immigrant Men Norwegian: 65,361 born 1885-1895 Swedish: 153,477 born 1885-1895 Populations in Censuses (born 1884-96) Norwegian: 49,006 in 1920; 51,510 in 1930 Swedish: 79,486 in 1920; 84,698 in 1930
  • 3. Goals Link as many people as possible from the immigration data to the censuses Analyze their economic and social outcomes in the 1920s & 1930s
  • 4. How do we do that?
  • 5. Linking Data Use algorithm to link individuals from both sources using their names Focus on men Women’s last names may change Born in 1885-1895 (from immigration)
  • 6. Steps for Linking Extract men from both sources Match individual to census data Use string “distance” measure to narrow matches to similar sounding names Remaining data are analyzed further to find the best matches (ongoing)
  • 7. Summary Results for Norwegian Matching (1920) Potential Matches 356,758 Jaro-Winkler Distance≥0.8 From Immigration 40,397 61.8% of target group From 1920 Census 31,423 65.1% of target group
  • 8. Summary Results for Swedish Matching (1920) Potential Matches 2,748,009 Jaro-Winkler Distance≥0.8 From Immigration 102,646 66.9% of target group From 1920 Census 55,844 74% of target group
  • 9. Improving Links and Training Data Maximizing number of immigrant men Looked up names for possible gender errors Missing data on gender Develop “training data” Multiple possible matches in some cases Choose the true and best match for names
  • 10. Example True Match Immigration ID Immigration First Name Immigration Last Name Census First Name Census Last Name Census ID 0 1 Martin Langseth Martin Langseth 1 0 1 Martin Langseth Martin Langetiqe 2
  • 11. Example True Match Immigration ID Immigration First Name Immigration Last Name Census First Name Census Last Name Census ID 1 1 Martin Langseth Martin Langseth 1 0 1 Martin Langseth Martin Langetiqe 2
  • 12. Economic and Social Outcomes Investigate the outcomes Where did most Norwegians and Swedes live in the 1920s & 1930s? What are the characteristics of the places? What are other demographics of the places?
  • 13. Challenges Historical data No unique IDs, disorganized Making judgement calls Missing county and state level information Coding Learning Stata Learning linking methods
  • 14. Future Directions Expand to all birth years and censuses Use result of pilot study for grant proposals Norwegian and Swedish researchers may partner in linking to people in their censuses
  • 15.
  • 16. Unique Matches According to... Norwegian Rate from Remaining in Potential Matches Swedish Rate from Remaining in Potential Matches Imm. ID 11,330 28% 17,864 17.4% Census ID 7,924 25.2% 8,405 15.1% Both Imm ID & Census ID 2,854 7.1% & 9.1% 2,517 2.4% & 4.5%

Editor's Notes

  1. One one hand we have the data on Norwegian and Swedish immigrants (that came through Ellis Island) in the late 1800s and early 1900s One the other hand we have Norwegian and Swedish born populations who remained in the United States and were enumerated by the 1920 and 1930 censuses. In the end, we want to say something about those immigrants who came in through Ellis island and other ports and their experiences in the United States
  2. For this study want to….link as many people as possible… And...analyze
  3. Pilot studies of a selected group to see how well the process works
  4. Matching means for a person born in 1885, he is matched with individuals in the census who are born in the years 1884-1886 For name “distance” use Jaro-Winkler Repeat for all censuses and all immigration files
  5. Based on the errors and missing information on gender, I looked at the first names of individuals to determine their gender by using the Dictionary of first name to potentially increase the number of male candidates Training data - is data that will be used later to automate the finding of true links.
  6. Show the 2 similar names (obvious one and complicate one) If the names are too similar then I look at the other variables such as the arrival year from the censuses and from the immigration data to decide which is the true match
  7. By using the state and county level Demographic and Economic data, I find the characteristics of places with high Norwegian and Swedish populations.