SlideShare a Scribd company logo
1 of 15
Gender, Education, Skills,
and Compensation in US
Data Science
COLLEEN M. FARRELLY
Kaggle 2017 US Data Scientist Sample
4197 data scientists included who identified the US as their country.
Factors examined focuses on gender, machine learning knowledge, education, and job title.
Visual explorations and a series of machine learning models were run to explore how these factors
impact compensation levels.
1544 provided compensation data to compare salaries by different demographic factors, and
this subsample was examined through machine learning models.
Demographics of US Data Scientists
US data scientists tend to be male and in the field for less than 5
years, though some have been in the field for more than 10 years.
Very few data scientists identify as LGTBQ in the US, despite
increasing levels of openness about this identity.
Education of US Data Scientists
Most data scientists in the US (67% have advanced education.
Common majors include math/stat, engineering, and computer science, though other
sciences are well-represented.
Many data scientists come from well-educated families, where parents have obtained
at least a Bachelor’s degree; 45% come from families with a Master’s degree or higher.
Importance of Different Factors in Job
Considerations
Diversity is not as important
a consideration as language
used, salary offered, impact
potential, and job industry.
Allocation of Time on Data Science
Projects
A lot of time is spent
on gathering data,
and this is a potential
bottleneck in data
science projects.
Education and Machine Learning
Knowledge
Those who are able to innovate new algorithms place the highest relative value on education; they comprise
12% of the US data scientist population.
Those know how to run code or tune parameters place the lowest relative value on education and comprise
19% of data scientists.
About 40% can explain it to someone without technical knowledge, a crucial skill in data science positions.
Skill Disparity between Male and Female
Data Scientists
Males are more likely to be able to innovate than females (13% vs. 9%). They
are also more likely to make the code faster/code from scratch (31% vs. 23%).
Females are more likely to only have enough knowledge to tune parameters or
run a library (25% vs. 17%).
Titles and Skills
Data scientist is the most common title (38%), but account for
only 29% of those who can innovate.
Researchers make up only 19% of titles but a whopping 40% of
those who can innovate.
Analysts make up 17% of titles but only 3% of those who can
innovate algorithms and only 9% of those who can explain the
algorithms to someone non-technical.
Education and Skills
Many more doctoral-
level data scientists are
able to innovate (24%)
than bachelor-level
(6%) or master-level
data scientists (9%).
Bachelor-level data
scientists are more
likely to only know how
to run a library (16%)
than master-level (9%)
or doctoral-level (5%)
data scientists.
Compensation by Skill: Innovation Pays
Compensation by Education and Gender
Finishing
college is
essential. A
professional
or doctoral
degree is
worth the
time and
effort, as
well.
Gender Compensation Disparities and
Compensation by Fields of Study
Females earn quite a bit less compensation than males and LGTBQ individuals.
Engineering provides the most compensation, while humanities provides the least.
IT folks tend to earn less than those in fields of
math/physics/engineering/computer science.
Predictive Modeling of Compensation
Analyses performed on 1522 data scientists providing
compensation information along with all predictors; 22
individuals were missing predictor information.
Several models were run to predict compensation using a
Tweedie distribution: random forest, conditional inference
trees, LASSO, extreme learning machines, evolved trees,
and MARS.
All models yielded similar performance (~3-10% of variance
accounted for).
Age, tenure, and industry were the largest predictors of
compensation.
Major, gender, education, and algorithm understanding
level do play a minor role in compensation, though.
Conclusions
Skills vary widely according to education, gender, and role.
Different skills are associated with different pay, as well as different values of education as a
path to data science.
Tenure, age, and industry play a large role in compensation, but these factors are difficult to
change for data scientists entering the field and studying at university.
Addressing the educational and gender disparities in skill level may be a way to even the
playing field through equipping new data scientists with the most valuable skills and knowledge
levels sought in the field.

More Related Content

What's hot

how to develop students to perform internation assessment
how to develop students to perform internation assessment how to develop students to perform internation assessment
how to develop students to perform internation assessment SamerYaqoob
 
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female PopesDebiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popeskjanowicz
 
Reliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical schemeReliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical schemeRone Ryan Desierto
 
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...Donna Madison-Bell
 
A Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing SystemsA Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing Systemssubarna89
 
Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)Selene Camargo Correa
 
Theoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the ClassroomTheoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the ClassroomPeople's Trust Insurance Company
 

What's hot (9)

how to develop students to perform internation assessment
how to develop students to perform internation assessment how to develop students to perform internation assessment
how to develop students to perform internation assessment
 
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female PopesDebiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
 
Reliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical schemeReliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical scheme
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Nick
NickNick
Nick
 
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
 
A Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing SystemsA Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing Systems
 
Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)
 
Theoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the ClassroomTheoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the Classroom
 

Viewers also liked

The Neurobiology of Addiction
The Neurobiology of AddictionThe Neurobiology of Addiction
The Neurobiology of AddictionColleen Farrelly
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsColleen Farrelly
 
Trauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceTrauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceColleen Farrelly
 

Viewers also liked (6)

The Neurobiology of Addiction
The Neurobiology of AddictionThe Neurobiology of Addiction
The Neurobiology of Addiction
 
Neuropsychopharmacology
NeuropsychopharmacologyNeuropsychopharmacology
Neuropsychopharmacology
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Profiles of the Gifted
Profiles of the GiftedProfiles of the Gifted
Profiles of the Gifted
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
Trauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceTrauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and Resilience
 

Similar to Gender, Education, Skills, and Compensation in US Data Scientists

Fisher Lit Review May 17
Fisher Lit Review May 17Fisher Lit Review May 17
Fisher Lit Review May 17Kathleen Fisher
 
Women who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWomen who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWBDC of Florida
 
Women in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next GenerationWomen in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next GenerationDerick Campbell
 
Schneider milla ict_skills_final
Schneider milla ict_skills_finalSchneider milla ict_skills_final
Schneider milla ict_skills_finalMillaSchneider
 
DIversity Gaps in Computer Science
DIversity Gaps in Computer ScienceDIversity Gaps in Computer Science
DIversity Gaps in Computer ScienceWBDC of Florida
 
Computer science advocacy
Computer science advocacyComputer science advocacy
Computer science advocacyDonghua Gu
 
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?Luis Taveras EMBA, MS
 
Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010Meagan Pollock
 
Equality and Technology_Gregory
Equality and Technology_GregoryEquality and Technology_Gregory
Equality and Technology_Gregorykarengregory2000
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptxonlineinfo4
 
Anaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdfAnaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdfkaasraa
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptxMohitMishra91878
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnPraj H
 

Similar to Gender, Education, Skills, and Compensation in US Data Scientists (20)

Stem careers
Stem careersStem careers
Stem careers
 
Fisher Lit Review May 17
Fisher Lit Review May 17Fisher Lit Review May 17
Fisher Lit Review May 17
 
Women who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWomen who choose Computer Science - what really matters
Women who choose Computer Science - what really matters
 
Women in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next GenerationWomen in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next Generation
 
Schneider milla ict_skills_final
Schneider milla ict_skills_finalSchneider milla ict_skills_final
Schneider milla ict_skills_final
 
DIversity Gaps in Computer Science
DIversity Gaps in Computer ScienceDIversity Gaps in Computer Science
DIversity Gaps in Computer Science
 
Computer science advocacy
Computer science advocacyComputer science advocacy
Computer science advocacy
 
What is Engineering?
What is Engineering?What is Engineering?
What is Engineering?
 
STEM@theTech-Preso
STEM@theTech-PresoSTEM@theTech-Preso
STEM@theTech-Preso
 
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
 
Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010
 
Equality and Technology_Gregory
Equality and Technology_GregoryEquality and Technology_Gregory
Equality and Technology_Gregory
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptx
 
Anaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdfAnaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdf
 
computer_science_advocacy.ppt
computer_science_advocacy.pptcomputer_science_advocacy.ppt
computer_science_advocacy.ppt
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptx
 
PowerBeach.ppt
PowerBeach.pptPowerBeach.ppt
PowerBeach.ppt
 
computrS.ppt
computrS.pptcomputrS.ppt
computrS.ppt
 
Post Digital Divide
Post Digital DividePost Digital Divide
Post Digital Divide
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
 

More from Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heapaashikalamichhane
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7gragkhusi
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 

Recently uploaded (20)

一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heap
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 

Gender, Education, Skills, and Compensation in US Data Scientists

  • 1. Gender, Education, Skills, and Compensation in US Data Science COLLEEN M. FARRELLY
  • 2. Kaggle 2017 US Data Scientist Sample 4197 data scientists included who identified the US as their country. Factors examined focuses on gender, machine learning knowledge, education, and job title. Visual explorations and a series of machine learning models were run to explore how these factors impact compensation levels. 1544 provided compensation data to compare salaries by different demographic factors, and this subsample was examined through machine learning models.
  • 3. Demographics of US Data Scientists US data scientists tend to be male and in the field for less than 5 years, though some have been in the field for more than 10 years. Very few data scientists identify as LGTBQ in the US, despite increasing levels of openness about this identity.
  • 4. Education of US Data Scientists Most data scientists in the US (67% have advanced education. Common majors include math/stat, engineering, and computer science, though other sciences are well-represented. Many data scientists come from well-educated families, where parents have obtained at least a Bachelor’s degree; 45% come from families with a Master’s degree or higher.
  • 5. Importance of Different Factors in Job Considerations Diversity is not as important a consideration as language used, salary offered, impact potential, and job industry.
  • 6. Allocation of Time on Data Science Projects A lot of time is spent on gathering data, and this is a potential bottleneck in data science projects.
  • 7. Education and Machine Learning Knowledge Those who are able to innovate new algorithms place the highest relative value on education; they comprise 12% of the US data scientist population. Those know how to run code or tune parameters place the lowest relative value on education and comprise 19% of data scientists. About 40% can explain it to someone without technical knowledge, a crucial skill in data science positions.
  • 8. Skill Disparity between Male and Female Data Scientists Males are more likely to be able to innovate than females (13% vs. 9%). They are also more likely to make the code faster/code from scratch (31% vs. 23%). Females are more likely to only have enough knowledge to tune parameters or run a library (25% vs. 17%).
  • 9. Titles and Skills Data scientist is the most common title (38%), but account for only 29% of those who can innovate. Researchers make up only 19% of titles but a whopping 40% of those who can innovate. Analysts make up 17% of titles but only 3% of those who can innovate algorithms and only 9% of those who can explain the algorithms to someone non-technical.
  • 10. Education and Skills Many more doctoral- level data scientists are able to innovate (24%) than bachelor-level (6%) or master-level data scientists (9%). Bachelor-level data scientists are more likely to only know how to run a library (16%) than master-level (9%) or doctoral-level (5%) data scientists.
  • 11. Compensation by Skill: Innovation Pays
  • 12. Compensation by Education and Gender Finishing college is essential. A professional or doctoral degree is worth the time and effort, as well.
  • 13. Gender Compensation Disparities and Compensation by Fields of Study Females earn quite a bit less compensation than males and LGTBQ individuals. Engineering provides the most compensation, while humanities provides the least. IT folks tend to earn less than those in fields of math/physics/engineering/computer science.
  • 14. Predictive Modeling of Compensation Analyses performed on 1522 data scientists providing compensation information along with all predictors; 22 individuals were missing predictor information. Several models were run to predict compensation using a Tweedie distribution: random forest, conditional inference trees, LASSO, extreme learning machines, evolved trees, and MARS. All models yielded similar performance (~3-10% of variance accounted for). Age, tenure, and industry were the largest predictors of compensation. Major, gender, education, and algorithm understanding level do play a minor role in compensation, though.
  • 15. Conclusions Skills vary widely according to education, gender, and role. Different skills are associated with different pay, as well as different values of education as a path to data science. Tenure, age, and industry play a large role in compensation, but these factors are difficult to change for data scientists entering the field and studying at university. Addressing the educational and gender disparities in skill level may be a way to even the playing field through equipping new data scientists with the most valuable skills and knowledge levels sought in the field.