What do we really know about the differences
between static and dynamic types?

Stefan Hanenberg
University of Duisburg-Essen, Germany
Delft, NL, 15.01.2014
Initial Notes
I like static type systems

– Elegant specification
– I am Teaching type systems since 2006

I like Squeak/Smalltalk
– Nice programming environment
– Straight syntax

I have no personal interest in arguing for / against
static types
I have a personal interest in understanding whether a
type system improves or worsens software
development
Personal background (1)
●

●

PhD in Aspect-Oriented Software Development
(2006)
While doing PhD / after PhD:
●

●

Serious doubts about usefulness of current AO
languages / AO in general

Personal feeling:
„There is something wrong in how people
argue for or against given artefacts.“

●

Started reading about scientific methods
(philosophy, mainly Popper)
Personal background (2)
●

Personal conclusion (1)
●

●

●

We always argue why something should be in principle
good for developers.
We never take developers into account in our research
methods
Applied research methods are completely
unappropriate to argue for or against usefulness of
given artefact
(by the way....are we really applying any research
method?)
Personal conclusion (3)
●

I want to do „empirical studies“
●

●

„test whether something has a measurable effect on
developers

Why not testing type systems?
●

Not that much studies so far.....(amazing! How
come?)
Claim: State of the Art in Usability
●

Current dominating approach
(1) Find example
(2) Build construct
(3) Claim that construct helps developers

This leads to nowhere
●

●

Research methods needed that consider
developers / users … involved humans
Empirical Method!
Empirical SE
• Following the approach of
Karl Popper
– Falsification of hypothesis
(use of statically typed language
decreases development time)
– NO PROOFS / NO GENERALIZABILITY
• But always the hope that repeated observations reveal some
truth
7
Empirical SE - Example
• Hypothesis
• Using tool X reduces development time in
comparison to tool Y
• Approach
• Measure development time for X, measure
time for Y, do comparison
• Falsification
• ...in case development time for Y was less...
8
Context: CS Research Methods
Taken from [Hanenberg, Faith, Hope, Love, Onward'10]

[Hanenberg, Onward
2010]

9
Now, let's put the focus on type
systems
Type Systems.....
●

… in Teaching
●

Formal Approaches
–
–

●

Lambda Calculus, Featherweight Java, ...
Type soundness proofs, ...

What about Usability?
–
–
–

Static Types vs. Massive Testing?
Complexity of Static Type System?
...
Questions for Industry
●

Is it a rewarding investment to migrate
software to a new type system?
Java Generics, ...

●

Should you invest money on development of a
static type system?
statically typed Ruby, ...

●

Should you switch to a statically typed
language?
JavaScript vs. TypeScript, Groovy vs. Java, ...
State of Discussion Static vs.
Dynamic Types
State of Discussion Static vs.
Dynamic Types
State of Discussion Static vs.
Dynamic Types
●

Many fights, many arguments, lots of
anecdotes

●

Argumentations built on „personal impressions“

●

Arguments (hypothesis!) never actually tested
Overall Goal

Let's test the given arguments

(well, ok, the initial motivation was different)
Results so far....

It looks like (Java-like) static type system
(in Java-like languages) really help in
development!
10 Tested Statements and Results (1)
Naive Experiment: [OOPSLA'10]
Dynamic Type System are great....almost...

Do type casts matter? [DLS'11]
Not really.

Are dynamic TS as quick for fixing type errors as static TS?
No, not even close! But no difference for semantic errors.
[unpublished'11, ICPC'12]
10 Tested Statements and Results (2)
Are statically typed APIs faster to use?: [OOPSLA'12, ICPC'12]
Yes

Is the previous finding only a matter of syntax? [AOSD'13]
Yes, but in case there is an error in the (unchecked) type
it is worse than having no type declaration at all!

Can documentation compensate the positive effect of
static types? No. [submitted to ICSE'14]
10 Tested Statements and Results (3)
Do generics really help?: [OOPSLA'13]
Yes, if they occur in API interface. No, if application has
additional constraints because of generics.
Do current IDEs (for dynamic TSs) compensate the previous
measured positive effect of static types? [unpublished'14]
No
Is the previous finding only a matter of syntax? [AOSD'14]
Yes, but in case there is an error in the (unchecked) type
it is worse than having no type declaration at all!
Can documentation compensate the positive effect of
static types? No. [submitted to ICSE'14]
Summary of statements
●

●

●

Don't argue with type casts – they do not matter
Don't say that type error fixing time is the same
for dynamically typed languages
Don't say that good IDE support compensates
the positive effect of static types – they don't
Summary of Statements
●

●

In case dynamic languages have a benefit, it
has nothing to do with the absence of the
type system.
In case they do have a benefit, it is despite
the absense of the type system!
Let's take a look at the experiments
(...and let's skip the statistical parts)
Related Work
●

●

●

Two experiments available: Gannon'77,
PrecheltTichy'98
Both showed positive effect of static type
systems (measured development time)
Idea
●

Ok, let's do just another experiment
(...still in the learning phase of experimentation...)
First Experiment - Naive (1) [OOPSLA'10]
●

Idea
●

Experiment similar to Gannon'77, PrecheltTichy'98

●

Measure number of errors / time to completion

●

●

How
●
●

●

●

Make programming task larger
(more generalizable?)

~50 subjects write parser / scanner
Measure time required for minimal scanner / final test case coverage
for parser
~40 hours / subject = 1000 hours * subjects

Results
●

Opposite to Gannon'77, PrecheltTichy'98
First Experiment - Naive(2) [OOPSLA'10]

Scanner development took less time using dynamic types
●No difference for parser...
●
First Experiment - Naive (3) [OOPSLA'10]
●

Interpretation
●
●

●

There is at least one situation where static TS was counter productive
Falsification of „run an experiment and see the benefit of TS“

Personal conclusion
●
●

Relatively few insights

●

●

Experiment much too expensive
Unclear what the additional insights are

What's next?
●

Try to identify often mentioned statements in literature
–

Type casts are bad for programmers, Type error fixing time better with TS
Second Experiment – Casts(1) [DLS'10]
●

Idea
●
●

Only time to completion as dependent variable

●

●

Test „type casts are bad“
More tasks, smaller tasks

How
●
●

All programs in statically typed variant required type casts

●

●

~21 subjects write very small programs (3-10 LOCs)
~4 hours / subject = 85 subject hours

Results
●

For small tasks casts matter (decrease productivity)

●

For larger tasks (10 LOC) no difference measured
Second Experiment – Casts (2) [DLS'10]

●

Results
●

Differences only for completely trivial tasks

●

Our interpretation: Type casts are not that important
Second Experiment – Casts(3) [DLS'11]
●

Interpretation
●

●

Casts are not relevant enough for further studies

Personal conclusion
●
●

The more measurements the better

●

●

Small experiments work
Change in experimental design worked well

What's next?
●

Go on with often mentioned statements in literature
–

Type error fixing time better with TS
Third Experiment – Type Errors (1) [Unpublished'11]
●

Idea
●
●

Time to completion as dependent variable

●

●

Measure time until type error is fixed
Again more tasks, smaller tasks

How
●

●

~30 subjects, 120 subjects hours

Results
●

Clear benefit in fixing time
Third Experiment – Type Errors (2) [Unpublished'11]

●

Results
Really, really large differences pro Java!
(for first task, runtime error stops exactly at same position
as type error!)
Third Experiment – Type Errors (3) [Unpublished'11]
●

Interpretation
●
●

●

No idea how often this situation occurs in programming
(controlled experiments won't help here)

Personal conclusion
●
●

●

Type error fixing time validated without doubt

Fixing time considered as stable knowledge
Go on with different experiment, check fixing time from now on from time to
time

What's next?
●

Go on with often mentioned statements in literature
–

TS as documentation
4th Experiment - API Usage (1) [OOPSLA'12]
●

Idea
●

●

●

Time to completion as dependent variable

How
●

●

5 programming tasks on ondocumented API
(only source code)

~30 subjects, 210 subject hours

Results
●

No clear results, 3 tasks show benefit of TS, 2 benefit of
dynamic types (!?!)
4th Experiment - API Usage (2) [OOPSLA'12]

●

Results
Task 2 & 3 seem to show the opposite!
4th Experiment - API Usage (3) [OOPSLA'12]
●

Interpretation
●
●

●

Ups....no clear interpretation
What about „bad luck“?

Personal conclusion
●
●

●

Try to build up experiment from scratch, re-run it
There are situations where TS seem to be counterproductive

What's next?
●

Re-run experiment
5th Experiment – API usage (1) [ICPC'12]
●

Idea
●

●

How
●

●

9 programming tasks, 2 type error fixing tasks, (2
semantic errors fixing tasks), 5 documentation tasks

~30 subjects, 120 subjects hours

Results
●

Type Error fixing time confirmed, now clear results in
documentation pro TS
5th Experiment – API usage (2) [ICPC'12]

●

Results
Shows what expected (+ replication of type error +
semantic error tests)
5th Experiment – API usage (3) [ICPC'12]
●

Interpretation
●
●

●

Not the same as 4th experiment, maybe „something is different“
What about „bad luck“?

Personal conclusion
●
●

●

Consider positive documentation as proven
Keep in mind that „there might be still something out there....“

What's next?
●

What about different type systems?

●

What about different languages?

●

Has documentation anything to do with type systems at all?
6th Experiment – Generics (1) [OOPSLA'13]
●

Idea
●
●

One extension task for strategy implementation

●

●

3 programming tasks on API usage (raw vs. Generic)
One type error fixing task (strategy)

How
●

●

~Analysis on only 16 subjects

Results
●

API usage better in generics, terrible extension time for generic
strategy, no difference in type error fixing!
6th Experiment – Generics (2) [OOPSLA'13]

●

Results
Task 5 is extension task (in strategy) – almost all
subjects failed to do that in 55 minutes!
7th Experiment – Type Declaration vs.
Type Checking (1)
●

Idea
●

●

●

1 programming task, where a wrong type name is in the API – code
needs to be corrected

How
●

●

3 programming tasks on API usage (repetition of previous
experiments, but no type checking!)

~Analysis on only 20 subjects

Results
●

Type names already help.....but wrong type names reduce usability
7th Experiment – Type Declaration vs.
Type Checking (1)

Results
●

Type names already help.....but wrong type names reduce
usability
8th Experiment – Documentation (1)
●

Idea
●

●

How
●

●

One programming task, 2 variables (static vs. Dynamic
type system + with vs. without documentation)

~Analysis on only 25 subjects

Results
●

Type names help more than documentation!
8th Experiment – Documentation (1)

●

Results
●

Type names help more than documentation!
Personal conclusion (1)
●

Go on measuring
●

●

●

●

Hopefully, we come up with a theory

Follow rigorous methods
Use small sample sizes (!!!) - not convincing, but helps
doing more experiments!
Still only a few experiments to far....hopefully other
people start doing experiments on type systems
Personal conclusion (2)
●

Let's contribute to the type system war!

●

Let's use facts as arguments!

●

Let's start stop collecting annectodes!

●

Let's say more agressive that we do not accept annecdotes as arguments
Personal conclusion (3)
●

●

There is still plenty of experiments waiting to be done

Think about whether you would like to
contribute to the experiment series – all
additional measurements help!
Summary of Statements
●

●

In case dynamic languages have a benefit, it
has nothing to do with the absence of the
type system.
In case they do have a benefit, it is despite
the absense of the type system!
Conclusion
●

●

●

●

It is possible to collect data about language
constructs
Controlled experiments are really a way to extract
information / gather knowledge
Maybe small experiments more useful than larger
experiments
Try to do not only a single experiment, but a
collection of experiments in order to understand
the topic
What do we really know about the differences
between static and dynamic types

Stefan Hanenberg
University of Duisburg-Essen, Germany
Delft, NL, 15.01.2014

What do we really know about the differences between static and dynamic types?

  • 1.
    What do wereally know about the differences between static and dynamic types? Stefan Hanenberg University of Duisburg-Essen, Germany Delft, NL, 15.01.2014
  • 2.
    Initial Notes I likestatic type systems – Elegant specification – I am Teaching type systems since 2006 I like Squeak/Smalltalk – Nice programming environment – Straight syntax I have no personal interest in arguing for / against static types I have a personal interest in understanding whether a type system improves or worsens software development
  • 3.
    Personal background (1) ● ● PhDin Aspect-Oriented Software Development (2006) While doing PhD / after PhD: ● ● Serious doubts about usefulness of current AO languages / AO in general Personal feeling: „There is something wrong in how people argue for or against given artefacts.“ ● Started reading about scientific methods (philosophy, mainly Popper)
  • 4.
    Personal background (2) ● Personalconclusion (1) ● ● ● We always argue why something should be in principle good for developers. We never take developers into account in our research methods Applied research methods are completely unappropriate to argue for or against usefulness of given artefact (by the way....are we really applying any research method?)
  • 5.
    Personal conclusion (3) ● Iwant to do „empirical studies“ ● ● „test whether something has a measurable effect on developers Why not testing type systems? ● Not that much studies so far.....(amazing! How come?)
  • 6.
    Claim: State ofthe Art in Usability ● Current dominating approach (1) Find example (2) Build construct (3) Claim that construct helps developers This leads to nowhere ● ● Research methods needed that consider developers / users … involved humans Empirical Method!
  • 7.
    Empirical SE • Followingthe approach of Karl Popper – Falsification of hypothesis (use of statically typed language decreases development time) – NO PROOFS / NO GENERALIZABILITY • But always the hope that repeated observations reveal some truth 7
  • 8.
    Empirical SE -Example • Hypothesis • Using tool X reduces development time in comparison to tool Y • Approach • Measure development time for X, measure time for Y, do comparison • Falsification • ...in case development time for Y was less... 8
  • 9.
    Context: CS ResearchMethods Taken from [Hanenberg, Faith, Hope, Love, Onward'10] [Hanenberg, Onward 2010] 9
  • 10.
    Now, let's putthe focus on type systems
  • 11.
    Type Systems..... ● … inTeaching ● Formal Approaches – – ● Lambda Calculus, Featherweight Java, ... Type soundness proofs, ... What about Usability? – – – Static Types vs. Massive Testing? Complexity of Static Type System? ...
  • 12.
    Questions for Industry ● Isit a rewarding investment to migrate software to a new type system? Java Generics, ... ● Should you invest money on development of a static type system? statically typed Ruby, ... ● Should you switch to a statically typed language? JavaScript vs. TypeScript, Groovy vs. Java, ...
  • 13.
    State of DiscussionStatic vs. Dynamic Types
  • 14.
    State of DiscussionStatic vs. Dynamic Types
  • 15.
    State of DiscussionStatic vs. Dynamic Types ● Many fights, many arguments, lots of anecdotes ● Argumentations built on „personal impressions“ ● Arguments (hypothesis!) never actually tested
  • 16.
    Overall Goal Let's testthe given arguments (well, ok, the initial motivation was different)
  • 17.
    Results so far.... Itlooks like (Java-like) static type system (in Java-like languages) really help in development!
  • 18.
    10 Tested Statementsand Results (1) Naive Experiment: [OOPSLA'10] Dynamic Type System are great....almost... Do type casts matter? [DLS'11] Not really. Are dynamic TS as quick for fixing type errors as static TS? No, not even close! But no difference for semantic errors. [unpublished'11, ICPC'12]
  • 19.
    10 Tested Statementsand Results (2) Are statically typed APIs faster to use?: [OOPSLA'12, ICPC'12] Yes Is the previous finding only a matter of syntax? [AOSD'13] Yes, but in case there is an error in the (unchecked) type it is worse than having no type declaration at all! Can documentation compensate the positive effect of static types? No. [submitted to ICSE'14]
  • 20.
    10 Tested Statementsand Results (3) Do generics really help?: [OOPSLA'13] Yes, if they occur in API interface. No, if application has additional constraints because of generics. Do current IDEs (for dynamic TSs) compensate the previous measured positive effect of static types? [unpublished'14] No Is the previous finding only a matter of syntax? [AOSD'14] Yes, but in case there is an error in the (unchecked) type it is worse than having no type declaration at all! Can documentation compensate the positive effect of static types? No. [submitted to ICSE'14]
  • 21.
    Summary of statements ● ● ● Don'targue with type casts – they do not matter Don't say that type error fixing time is the same for dynamically typed languages Don't say that good IDE support compensates the positive effect of static types – they don't
  • 22.
    Summary of Statements ● ● Incase dynamic languages have a benefit, it has nothing to do with the absence of the type system. In case they do have a benefit, it is despite the absense of the type system!
  • 23.
    Let's take alook at the experiments (...and let's skip the statistical parts)
  • 24.
    Related Work ● ● ● Two experimentsavailable: Gannon'77, PrecheltTichy'98 Both showed positive effect of static type systems (measured development time) Idea ● Ok, let's do just another experiment (...still in the learning phase of experimentation...)
  • 25.
    First Experiment -Naive (1) [OOPSLA'10] ● Idea ● Experiment similar to Gannon'77, PrecheltTichy'98 ● Measure number of errors / time to completion ● ● How ● ● ● ● Make programming task larger (more generalizable?) ~50 subjects write parser / scanner Measure time required for minimal scanner / final test case coverage for parser ~40 hours / subject = 1000 hours * subjects Results ● Opposite to Gannon'77, PrecheltTichy'98
  • 26.
    First Experiment -Naive(2) [OOPSLA'10] Scanner development took less time using dynamic types ●No difference for parser... ●
  • 27.
    First Experiment -Naive (3) [OOPSLA'10] ● Interpretation ● ● ● There is at least one situation where static TS was counter productive Falsification of „run an experiment and see the benefit of TS“ Personal conclusion ● ● Relatively few insights ● ● Experiment much too expensive Unclear what the additional insights are What's next? ● Try to identify often mentioned statements in literature – Type casts are bad for programmers, Type error fixing time better with TS
  • 28.
    Second Experiment –Casts(1) [DLS'10] ● Idea ● ● Only time to completion as dependent variable ● ● Test „type casts are bad“ More tasks, smaller tasks How ● ● All programs in statically typed variant required type casts ● ● ~21 subjects write very small programs (3-10 LOCs) ~4 hours / subject = 85 subject hours Results ● For small tasks casts matter (decrease productivity) ● For larger tasks (10 LOC) no difference measured
  • 29.
    Second Experiment –Casts (2) [DLS'10] ● Results ● Differences only for completely trivial tasks ● Our interpretation: Type casts are not that important
  • 30.
    Second Experiment –Casts(3) [DLS'11] ● Interpretation ● ● Casts are not relevant enough for further studies Personal conclusion ● ● The more measurements the better ● ● Small experiments work Change in experimental design worked well What's next? ● Go on with often mentioned statements in literature – Type error fixing time better with TS
  • 31.
    Third Experiment –Type Errors (1) [Unpublished'11] ● Idea ● ● Time to completion as dependent variable ● ● Measure time until type error is fixed Again more tasks, smaller tasks How ● ● ~30 subjects, 120 subjects hours Results ● Clear benefit in fixing time
  • 32.
    Third Experiment –Type Errors (2) [Unpublished'11] ● Results Really, really large differences pro Java! (for first task, runtime error stops exactly at same position as type error!)
  • 33.
    Third Experiment –Type Errors (3) [Unpublished'11] ● Interpretation ● ● ● No idea how often this situation occurs in programming (controlled experiments won't help here) Personal conclusion ● ● ● Type error fixing time validated without doubt Fixing time considered as stable knowledge Go on with different experiment, check fixing time from now on from time to time What's next? ● Go on with often mentioned statements in literature – TS as documentation
  • 34.
    4th Experiment -API Usage (1) [OOPSLA'12] ● Idea ● ● ● Time to completion as dependent variable How ● ● 5 programming tasks on ondocumented API (only source code) ~30 subjects, 210 subject hours Results ● No clear results, 3 tasks show benefit of TS, 2 benefit of dynamic types (!?!)
  • 35.
    4th Experiment -API Usage (2) [OOPSLA'12] ● Results Task 2 & 3 seem to show the opposite!
  • 36.
    4th Experiment -API Usage (3) [OOPSLA'12] ● Interpretation ● ● ● Ups....no clear interpretation What about „bad luck“? Personal conclusion ● ● ● Try to build up experiment from scratch, re-run it There are situations where TS seem to be counterproductive What's next? ● Re-run experiment
  • 37.
    5th Experiment –API usage (1) [ICPC'12] ● Idea ● ● How ● ● 9 programming tasks, 2 type error fixing tasks, (2 semantic errors fixing tasks), 5 documentation tasks ~30 subjects, 120 subjects hours Results ● Type Error fixing time confirmed, now clear results in documentation pro TS
  • 38.
    5th Experiment –API usage (2) [ICPC'12] ● Results Shows what expected (+ replication of type error + semantic error tests)
  • 39.
    5th Experiment –API usage (3) [ICPC'12] ● Interpretation ● ● ● Not the same as 4th experiment, maybe „something is different“ What about „bad luck“? Personal conclusion ● ● ● Consider positive documentation as proven Keep in mind that „there might be still something out there....“ What's next? ● What about different type systems? ● What about different languages? ● Has documentation anything to do with type systems at all?
  • 40.
    6th Experiment –Generics (1) [OOPSLA'13] ● Idea ● ● One extension task for strategy implementation ● ● 3 programming tasks on API usage (raw vs. Generic) One type error fixing task (strategy) How ● ● ~Analysis on only 16 subjects Results ● API usage better in generics, terrible extension time for generic strategy, no difference in type error fixing!
  • 41.
    6th Experiment –Generics (2) [OOPSLA'13] ● Results Task 5 is extension task (in strategy) – almost all subjects failed to do that in 55 minutes!
  • 42.
    7th Experiment –Type Declaration vs. Type Checking (1) ● Idea ● ● ● 1 programming task, where a wrong type name is in the API – code needs to be corrected How ● ● 3 programming tasks on API usage (repetition of previous experiments, but no type checking!) ~Analysis on only 20 subjects Results ● Type names already help.....but wrong type names reduce usability
  • 43.
    7th Experiment –Type Declaration vs. Type Checking (1) Results ● Type names already help.....but wrong type names reduce usability
  • 44.
    8th Experiment –Documentation (1) ● Idea ● ● How ● ● One programming task, 2 variables (static vs. Dynamic type system + with vs. without documentation) ~Analysis on only 25 subjects Results ● Type names help more than documentation!
  • 45.
    8th Experiment –Documentation (1) ● Results ● Type names help more than documentation!
  • 46.
    Personal conclusion (1) ● Goon measuring ● ● ● ● Hopefully, we come up with a theory Follow rigorous methods Use small sample sizes (!!!) - not convincing, but helps doing more experiments! Still only a few experiments to far....hopefully other people start doing experiments on type systems
  • 47.
    Personal conclusion (2) ● Let'scontribute to the type system war! ● Let's use facts as arguments! ● Let's start stop collecting annectodes! ● Let's say more agressive that we do not accept annecdotes as arguments
  • 48.
    Personal conclusion (3) ● ● Thereis still plenty of experiments waiting to be done Think about whether you would like to contribute to the experiment series – all additional measurements help!
  • 49.
    Summary of Statements ● ● Incase dynamic languages have a benefit, it has nothing to do with the absence of the type system. In case they do have a benefit, it is despite the absense of the type system!
  • 50.
    Conclusion ● ● ● ● It is possibleto collect data about language constructs Controlled experiments are really a way to extract information / gather knowledge Maybe small experiments more useful than larger experiments Try to do not only a single experiment, but a collection of experiments in order to understand the topic
  • 51.
    What do wereally know about the differences between static and dynamic types Stefan Hanenberg University of Duisburg-Essen, Germany Delft, NL, 15.01.2014