SlideShare a Scribd company logo
Careers, Professions, Entrepreneurs and R
The Berkeley R Language Beginner Study Group
Phoebe Wong
With Weiyi Ng and Professor Toby Stuart
Haas School of Business, UC Berkeley
My Background
• Senior Psychology student at Cal
• Research experience
1. Lab manager of experimental social psychology
2. Research assistant at computational sociology
• Programming experience
• Learned R and Python a year ago because of interest, love it!
• Used R in honor thesis
Background of my project
• Careers, entrepreneurship
• What are we interested in?
• Characterizing employment trajectories of CS graduates from the 9 elite CS
programs
• MIT, Stanford, Berkeley, Harvard, Princeton, UIUC, CMU, UTXA, UMCP
• involves classifying employers by employment characteristics (e.g. industry,
size, age)
• Why CS?
• Base rate = 10% for CS (random sample base rate = < .05%) for high potential
entrepreneurship
• High-tech firm incorporation is 69% higher in 2010 since 1980 (while general
private firm incorporation decreases)
People.co (my.emp.df)
• Profiles of CS graduates from the 9 elite CS programs
• MIT, Stanford, Berkeley, Harvard, Princeton, UIUC, CMU, UTXA, UMCP
• From 1995-2005
• All 3 degree levels included (BA, Masters & PhD)
• N=7160 profiles, with 6064 complete profiles
• What’s in the dataset?
• N=6064 profiles, multiple rows per person
People.co (my.emp.df)
Compustat (empmeta.df)
• Database of financial, statistical and market info on active & inactive
global publicly-listed companies
• 1950-2015
• What’s in the dataset?
Compustat (empmeta.df)
• Example: Bank of America
Top 40 employers of CS graduates(my.emp.df)
How much does the top 40 companies tell us?
4000
Clean up (1): Basic clean up
Lower case
Remove non-numeric or alphabets
Standardizing white space
Character encoding
Remove space at the beginning & at the
end that happens for 1 or more times
Clean up (2): Stopwords
Assigning company
names from Compustat
data to a variable
cleanup()
Tokenize
Clean up (2): Stopwords (cont.)
Clean up (3): Mturk
• ‘grep’ names of top 200 companies from Angelist
• Ask mturkers to identify synonyms of company names (2 per match)
• Only use synonyms agreed by 2 mturkers
Clean up (4): Alchemy API
• (show output)
Clean up (4): Alchemy API
(show output) Text: ‘University of California Berkeley’
Clean up (4): Alchemy API
Text: ‘UC Berkeley’
Text: ‘UCBerkeley’
Joining synonyms (dplyr left_join)
Combine.final = synonyms collected from mturk
Preliminary Result
• Black: Original company names
• Red: After recoding
Before recoding
Top 20 companies in People.Co
After recoding Before & After
Top 20 companies in People.Co data
Thank you!
• Future direction: Merging Compustat data into People.Co data
• Graduate Student: Weiyi Ng and Katherine Ullman
• Faculty sponsor: Professor Toby Stuart
• Lab members: Cindy Mo and Gao Xian Peh
Contact: phoebe.wong@berkeley.edu
LinkedIn: www.linkedin.com/in/wphoebe

More Related Content

Viewers also liked

USF Global Alumni Ambassador
USF Global Alumni AmbassadorUSF Global Alumni Ambassador
USF Global Alumni Ambassador
nncureton
 
2559 project
2559 project 2559 project
2559 project
Vipapan Chaikaew
 
Ciencia y tecnologia
Ciencia y tecnologiaCiencia y tecnologia
Ciencia y tecnologia
patricia garcia
 
Retete de-salate-super
Retete de-salate-superRetete de-salate-super
Retete de-salate-super
miruna dora
 
โครงร่างโครงงานคอมพิวเตอร์
โครงร่างโครงงานคอมพิวเตอร์โครงร่างโครงงานคอมพิวเตอร์
โครงร่างโครงงานคอมพิวเตอร์
Natcha Audnoon
 
Upwork Pro for Mobile Developers
Upwork Pro for Mobile DevelopersUpwork Pro for Mobile Developers
Upwork Pro for Mobile Developers
Max Belsky
 
Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.
Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.
Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.
Elias Gomez Galeana
 
Carte retete bine
Carte retete bineCarte retete bine
Carte retete bine
poweruperasmus
 
โครงร่างโครงงาน606
โครงร่างโครงงาน606โครงร่างโครงงาน606
โครงร่างโครงงาน606
Angkana Potha
 

Viewers also liked (9)

USF Global Alumni Ambassador
USF Global Alumni AmbassadorUSF Global Alumni Ambassador
USF Global Alumni Ambassador
 
2559 project
2559 project 2559 project
2559 project
 
Ciencia y tecnologia
Ciencia y tecnologiaCiencia y tecnologia
Ciencia y tecnologia
 
Retete de-salate-super
Retete de-salate-superRetete de-salate-super
Retete de-salate-super
 
โครงร่างโครงงานคอมพิวเตอร์
โครงร่างโครงงานคอมพิวเตอร์โครงร่างโครงงานคอมพิวเตอร์
โครงร่างโครงงานคอมพิวเตอร์
 
Upwork Pro for Mobile Developers
Upwork Pro for Mobile DevelopersUpwork Pro for Mobile Developers
Upwork Pro for Mobile Developers
 
Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.
Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.
Hidróxidos y los hidruros no metálicos. equipo 4, 2do f.
 
Carte retete bine
Carte retete bineCarte retete bine
Carte retete bine
 
โครงร่างโครงงาน606
โครงร่างโครงงาน606โครงร่างโครงงาน606
โครงร่างโครงงาน606
 

Similar to Rmeetup_PhoebeWong

Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
Nikolaos Aletras
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
Alexander Sibiryakov
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
santoshi mangalgi
 
Job search boot camp
Job search boot campJob search boot camp
Job search boot camp
nolken
 
Sourcing Strategies Process Tools overview Fall 2015
Sourcing Strategies Process Tools overview Fall 2015Sourcing Strategies Process Tools overview Fall 2015
Sourcing Strategies Process Tools overview Fall 2015
Glenn Gutmacher
 
Data-driven Approach to Launching your Career
Data-driven Approach to Launching your CareerData-driven Approach to Launching your Career
Data-driven Approach to Launching your Career
Viral Kadakia
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
Tao Xie
 
DMDW Unit 1.pdf
DMDW Unit 1.pdfDMDW Unit 1.pdf
DMDW Unit 1.pdf
ASISHRANJANSAMAL1
 
The Corpus of Business Discourse
The Corpus of Business DiscourseThe Corpus of Business Discourse
The Corpus of Business Discourse
ACBSP Global Accreditation
 
Productivity Hacks (Sourcing Lab) SourceCon 2015
Productivity Hacks (Sourcing Lab) SourceCon 2015Productivity Hacks (Sourcing Lab) SourceCon 2015
Productivity Hacks (Sourcing Lab) SourceCon 2015
Mei Lu
 
From student to professional – my experiences - 2010
From student to professional – my experiences - 2010From student to professional – my experiences - 2010
From student to professional – my experiences - 2010
Dennis Chong
 
Straight Talk on Applicant Tracking Systems
Straight Talk on Applicant Tracking SystemsStraight Talk on Applicant Tracking Systems
Straight Talk on Applicant Tracking Systems
Kyle Lagunas
 
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
Susanna Frazier
 
Actual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin RakutenActual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin Rakuten
Rakuten Group, Inc.
 
LI Resume for Jan Conti Sr Contract Tech Recruiter
LI  Resume for Jan Conti  Sr Contract Tech RecruiterLI  Resume for Jan Conti  Sr Contract Tech Recruiter
LI Resume for Jan Conti Sr Contract Tech Recruiter
jmar09
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
semanticsconference
 
ICIC 2013 New Product Introductions ChemAxon
ICIC 2013 New Product Introductions ChemAxonICIC 2013 New Product Introductions ChemAxon
ICIC 2013 New Product Introductions ChemAxon
Dr. Haxel Consult
 
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert HoytAIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
hazelhwtang
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
Ivan Berlocher
 
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
Simplilearn
 

Similar to Rmeetup_PhoebeWong (20)

Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
Job search boot camp
Job search boot campJob search boot camp
Job search boot camp
 
Sourcing Strategies Process Tools overview Fall 2015
Sourcing Strategies Process Tools overview Fall 2015Sourcing Strategies Process Tools overview Fall 2015
Sourcing Strategies Process Tools overview Fall 2015
 
Data-driven Approach to Launching your Career
Data-driven Approach to Launching your CareerData-driven Approach to Launching your Career
Data-driven Approach to Launching your Career
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
 
DMDW Unit 1.pdf
DMDW Unit 1.pdfDMDW Unit 1.pdf
DMDW Unit 1.pdf
 
The Corpus of Business Discourse
The Corpus of Business DiscourseThe Corpus of Business Discourse
The Corpus of Business Discourse
 
Productivity Hacks (Sourcing Lab) SourceCon 2015
Productivity Hacks (Sourcing Lab) SourceCon 2015Productivity Hacks (Sourcing Lab) SourceCon 2015
Productivity Hacks (Sourcing Lab) SourceCon 2015
 
From student to professional – my experiences - 2010
From student to professional – my experiences - 2010From student to professional – my experiences - 2010
From student to professional – my experiences - 2010
 
Straight Talk on Applicant Tracking Systems
Straight Talk on Applicant Tracking SystemsStraight Talk on Applicant Tracking Systems
Straight Talk on Applicant Tracking Systems
 
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
[Keynote] Trifecta for Recruitment Success, Susanna Frazier - Recruiters’ Hub...
 
Actual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin RakutenActual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin Rakuten
 
LI Resume for Jan Conti Sr Contract Tech Recruiter
LI  Resume for Jan Conti  Sr Contract Tech RecruiterLI  Resume for Jan Conti  Sr Contract Tech Recruiter
LI Resume for Jan Conti Sr Contract Tech Recruiter
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
 
ICIC 2013 New Product Introductions ChemAxon
ICIC 2013 New Product Introductions ChemAxonICIC 2013 New Product Introductions ChemAxon
ICIC 2013 New Product Introductions ChemAxon
 
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert HoytAIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
TCS Interview Questions and Answers 2022 | How to Crack TCS Interview for Fre...
 

Rmeetup_PhoebeWong

  • 1. Careers, Professions, Entrepreneurs and R The Berkeley R Language Beginner Study Group Phoebe Wong With Weiyi Ng and Professor Toby Stuart Haas School of Business, UC Berkeley
  • 2. My Background • Senior Psychology student at Cal • Research experience 1. Lab manager of experimental social psychology 2. Research assistant at computational sociology • Programming experience • Learned R and Python a year ago because of interest, love it! • Used R in honor thesis
  • 3. Background of my project • Careers, entrepreneurship • What are we interested in? • Characterizing employment trajectories of CS graduates from the 9 elite CS programs • MIT, Stanford, Berkeley, Harvard, Princeton, UIUC, CMU, UTXA, UMCP • involves classifying employers by employment characteristics (e.g. industry, size, age) • Why CS? • Base rate = 10% for CS (random sample base rate = < .05%) for high potential entrepreneurship • High-tech firm incorporation is 69% higher in 2010 since 1980 (while general private firm incorporation decreases)
  • 4. People.co (my.emp.df) • Profiles of CS graduates from the 9 elite CS programs • MIT, Stanford, Berkeley, Harvard, Princeton, UIUC, CMU, UTXA, UMCP • From 1995-2005 • All 3 degree levels included (BA, Masters & PhD) • N=7160 profiles, with 6064 complete profiles
  • 5. • What’s in the dataset? • N=6064 profiles, multiple rows per person People.co (my.emp.df)
  • 6. Compustat (empmeta.df) • Database of financial, statistical and market info on active & inactive global publicly-listed companies • 1950-2015 • What’s in the dataset?
  • 8. Top 40 employers of CS graduates(my.emp.df)
  • 9.
  • 10. How much does the top 40 companies tell us? 4000
  • 11. Clean up (1): Basic clean up Lower case Remove non-numeric or alphabets Standardizing white space Character encoding Remove space at the beginning & at the end that happens for 1 or more times
  • 12. Clean up (2): Stopwords Assigning company names from Compustat data to a variable cleanup() Tokenize
  • 13. Clean up (2): Stopwords (cont.)
  • 14. Clean up (3): Mturk • ‘grep’ names of top 200 companies from Angelist • Ask mturkers to identify synonyms of company names (2 per match) • Only use synonyms agreed by 2 mturkers
  • 15. Clean up (4): Alchemy API • (show output)
  • 16. Clean up (4): Alchemy API (show output) Text: ‘University of California Berkeley’
  • 17. Clean up (4): Alchemy API Text: ‘UC Berkeley’ Text: ‘UCBerkeley’
  • 18. Joining synonyms (dplyr left_join) Combine.final = synonyms collected from mturk
  • 19. Preliminary Result • Black: Original company names • Red: After recoding
  • 20. Before recoding Top 20 companies in People.Co After recoding Before & After
  • 21. Top 20 companies in People.Co data
  • 22. Thank you! • Future direction: Merging Compustat data into People.Co data • Graduate Student: Weiyi Ng and Katherine Ullman • Faculty sponsor: Professor Toby Stuart • Lab members: Cindy Mo and Gao Xian Peh Contact: phoebe.wong@berkeley.edu LinkedIn: www.linkedin.com/in/wphoebe