EMPIRICAL
ANALYSIS
OF
PROGRAMMING
LANGUAGE
ADOPTION

Leo A. Meyerovich, UC
Berkeley
Ariel S. Rabkin, Princeton
October, 2013
Why Adoption?

2
Confession of a Language Salesman
[P. Coburn]

Change Function threshold to adopt:

perceived adoption need
perceived adoption pain

> 1

FP!!!
new language

3
Confession of a Language Salesman
“From now on, my goal in life
would be to also drive the
denominator down to zero”
- Erik Meijer
Confessions of a Used Programming Language
Salesman

4
Confession of a Language Salesman
[P. Coburn]

Change Function threshold to adopt:

perceived adoption need
perceived adoption pain

FP!!!
new language

> 1

FP!!
familiar language

5
Science?
Adoption literature
change function is switching costs

Data analysis
growth

decision
making

acquisition
6
Our Data Sets
Viral Campaign
[McIver]

2 year long web survey
13,271 respondents
[Patterson &
Fox]

1,142

massive open
online course
(MOOC)
survey
respondents

2 week web survey
1,679 respondents

software
repositorie
s

217,368 projects

7
Demographics
Age: ~30
Degree: ~BS in CS

Employment: ~programmer
8
How do languages grow?

9
Ecological model of adoption

Use language
in a niche

Grow libraries
and user base

Spread language to more niches
10
Popular Languages CDF (Ohloh data)
100%

90%
80%
70%
60%
50%

Cumulativ
40%
e
css
30%
Use
html

c
shell
java
javascript

20%
10%
0%

c++
python
make

php

bat
sql
rubyc#

Half the projects
use 5 languages

xml

Language
11
Popular Languages CDF (Ohloh data)
100%

90%
80%
70%
60%

DSLs
dominate

50%

Cumulativ
40%
e
css
30%
Use
html

c
shell
java
javascript

20%
10%
0%

c++
python
make

php

bat
sql
rubyc#

Half the projects
use 5 languages

xml

Language
12
Odds for Most Languages?
(PDF)

100.0000%

Java for
16% of projects

10.0000%

Long Tail!
Supports designing for
niches and then growing

Proportion
1.0000%
of
Projects for
Language
0.1000%

Processing for
0.09% of projects
0.0100%

1

10
Language Rank (Decreasing)

100
13
[PLATEAU 2013]

200K+

Projects (2000-2010)

14
Popularity Across Niches
60%
40%

blogging:
9%

Java

search:
29%

Popularity
20%
0%

Project categories (223)

4%
3%

Popularity

build tools:
1%

Scheme

2%
1%

0%

Project categories (223)
15
Popularity Across Niches
60%

low dispersion

40%

Popularity
20%
0%

Project categories (223)

4%
3%

high dispersion

Popularity
2%
1%

0%

Project categories (223)
16
Dispersion Decreases as Popularity
Increases
1

Java

0.1

C#
PL/SQL
Assembly

0.01

Fortran

Prolog

0.001

Scheme
VBScript

Popularity

Languages grow
niche by niche

0.0001
5

4

3
2
1
Dispersion across niches
(σ / μ)

0

17
How Do Programmers Pick
Languages?

18
P(L’ | L)

p(popular)
75%

Shows importance
of familiarity

p(repeat)
30%

19
How Do Languages Get Picked?
Development speed?

Performance?

strongly disagree

neutral

strongly agree

20
Relative Importance of Language Aspects (MedStrong)
0%
Open source libraries
Group legacy
Project legacy
Self familiarity
Team familiarity
Target platform
Performance
Tooling
Development speed
Hiring
Individual feature(s)
Correctness
Simplicity
Commercial libraries

10%

20%

30%

40%

50%

60%

70%

80%

Extrinsic niche-specific
factors dominate!
Intrinsics:
performance,
correctness,
…

Be Positive: Design Guides & Opportunities
Slashdot survey, Companies with 1-19 employees

21
Learning: Shelf Life of a Programmer?

“Baby Boomers and Gen Xers
tend to know C# and SQL.
Gen Y knows Python… and Hadoop”
Recruiter

22
Language Users are Age-Invariant
Mean # Langs. known

Languages are learned and forgotten
Programmers
have a working set
that they refresh!

8

6

4

2

know slightly
know well
0

20

30

40

Age

50

60
Median reported time required
to “learn a language well”
Time to learn is short compared to career

25
Probability of Knowing a Language

All

CS
Major

Not
CS
Major

Taught
in
school

Not
Taught
in
school

Functional
Scheme,
ML, ...

22%

24%

19%

40%

15%

Assembly
MIPS, …

14%

14%

14%

20%

10%

Mathematic
al
11% 10%
11%
31%
7%
Matlab, R,
CS degree unimportantbut coursework matters
…
26
Conclusions
Extrinsics dominate: Libraries and familiarity!
Model: Niche-by-niche growth
Intrinsics secondary:
Performance, semantics, IDEs
Fluidity = Hope: Programmers know few
languages but can refresh within 6 months.

27
Looking Ahead
Language Sociology
Programming is done by groups; big knowledge gaps

Streamline Empiricism
Surveys, experiments (mining already active)
Exploit MOOCs!
Social Language Design
Improve sharing and utilize networks
28
Socio-PLT
www.eecs.berkeley.edu/~lmeyerov

29

Empirical Analysis of Programming Language Adoption

  • 1.
    EMPIRICAL ANALYSIS OF PROGRAMMING LANGUAGE ADOPTION Leo A. Meyerovich,UC Berkeley Ariel S. Rabkin, Princeton October, 2013
  • 2.
  • 3.
    Confession of aLanguage Salesman [P. Coburn] Change Function threshold to adopt: perceived adoption need perceived adoption pain > 1 FP!!! new language 3
  • 4.
    Confession of aLanguage Salesman “From now on, my goal in life would be to also drive the denominator down to zero” - Erik Meijer Confessions of a Used Programming Language Salesman 4
  • 5.
    Confession of aLanguage Salesman [P. Coburn] Change Function threshold to adopt: perceived adoption need perceived adoption pain FP!!! new language > 1 FP!! familiar language 5
  • 6.
    Science? Adoption literature change functionis switching costs Data analysis growth decision making acquisition 6
  • 7.
    Our Data Sets ViralCampaign [McIver] 2 year long web survey 13,271 respondents [Patterson & Fox] 1,142 massive open online course (MOOC) survey respondents 2 week web survey 1,679 respondents software repositorie s 217,368 projects 7
  • 8.
    Demographics Age: ~30 Degree: ~BSin CS Employment: ~programmer 8
  • 9.
  • 10.
    Ecological model ofadoption Use language in a niche Grow libraries and user base Spread language to more niches 10
  • 11.
    Popular Languages CDF(Ohloh data) 100% 90% 80% 70% 60% 50% Cumulativ 40% e css 30% Use html c shell java javascript 20% 10% 0% c++ python make php bat sql rubyc# Half the projects use 5 languages xml Language 11
  • 12.
    Popular Languages CDF(Ohloh data) 100% 90% 80% 70% 60% DSLs dominate 50% Cumulativ 40% e css 30% Use html c shell java javascript 20% 10% 0% c++ python make php bat sql rubyc# Half the projects use 5 languages xml Language 12
  • 13.
    Odds for MostLanguages? (PDF) 100.0000% Java for 16% of projects 10.0000% Long Tail! Supports designing for niches and then growing Proportion 1.0000% of Projects for Language 0.1000% Processing for 0.09% of projects 0.0100% 1 10 Language Rank (Decreasing) 100 13
  • 14.
  • 15.
    Popularity Across Niches 60% 40% blogging: 9% Java search: 29% Popularity 20% 0% Projectcategories (223) 4% 3% Popularity build tools: 1% Scheme 2% 1% 0% Project categories (223) 15
  • 16.
    Popularity Across Niches 60% lowdispersion 40% Popularity 20% 0% Project categories (223) 4% 3% high dispersion Popularity 2% 1% 0% Project categories (223) 16
  • 17.
    Dispersion Decreases asPopularity Increases 1 Java 0.1 C# PL/SQL Assembly 0.01 Fortran Prolog 0.001 Scheme VBScript Popularity Languages grow niche by niche 0.0001 5 4 3 2 1 Dispersion across niches (σ / μ) 0 17
  • 18.
    How Do ProgrammersPick Languages? 18
  • 19.
    P(L’ | L) p(popular) 75% Showsimportance of familiarity p(repeat) 30% 19
  • 20.
    How Do LanguagesGet Picked? Development speed? Performance? strongly disagree neutral strongly agree 20
  • 21.
    Relative Importance ofLanguage Aspects (MedStrong) 0% Open source libraries Group legacy Project legacy Self familiarity Team familiarity Target platform Performance Tooling Development speed Hiring Individual feature(s) Correctness Simplicity Commercial libraries 10% 20% 30% 40% 50% 60% 70% 80% Extrinsic niche-specific factors dominate! Intrinsics: performance, correctness, … Be Positive: Design Guides & Opportunities Slashdot survey, Companies with 1-19 employees 21
  • 22.
    Learning: Shelf Lifeof a Programmer? “Baby Boomers and Gen Xers tend to know C# and SQL. Gen Y knows Python… and Hadoop” Recruiter 22
  • 23.
    Language Users areAge-Invariant
  • 24.
    Mean # Langs.known Languages are learned and forgotten Programmers have a working set that they refresh! 8 6 4 2 know slightly know well 0 20 30 40 Age 50 60
  • 25.
    Median reported timerequired to “learn a language well” Time to learn is short compared to career 25
  • 26.
    Probability of Knowinga Language All CS Major Not CS Major Taught in school Not Taught in school Functional Scheme, ML, ... 22% 24% 19% 40% 15% Assembly MIPS, … 14% 14% 14% 20% 10% Mathematic al 11% 10% 11% 31% 7% Matlab, R, CS degree unimportantbut coursework matters … 26
  • 27.
    Conclusions Extrinsics dominate: Librariesand familiarity! Model: Niche-by-niche growth Intrinsics secondary: Performance, semantics, IDEs Fluidity = Hope: Programmers know few languages but can refresh within 6 months. 27
  • 28.
    Looking Ahead Language Sociology Programmingis done by groups; big knowledge gaps Streamline Empiricism Surveys, experiments (mining already active) Exploit MOOCs! Social Language Design Improve sharing and utilize networks 28
  • 29.

Editor's Notes

  • #8 David McIver
  • #11 http://www.dreamstime.com/royalty-free-stock-image-small-plant-breaking-rock-image13902286http://jessgibbsphotography.com/wp-content/uploads/2010/05/bright_green_flowering_plants_grow_on_rocks_along_foreshore.jpg
  • #23 http://www.theaustralian.com.au/technology/legacy-languages-prove-lucractive-for-dying-breed-of-programmers/story-e6frgakx-1225993874788http://bits.blogs.nytimes.com/2013/07/05/technology-workers-are-young-really-young/