Spreadsheets:Spreadsheets:
functional programmingfunctional programming
for the massesfor the masses
Simon Peyton JonesSimon Peyton Jones
Margaret BurnettMargaret Burnett
Alan BlackwellAlan Blackwell
Q1: What should a functional programmer
in Microsoft Research do?
Q1: What should a functional programmer
in Microsoft Research do?
A1: Persuade developers to implement
stuff in Haskell. No more C#!
Haskell is better!
Q1: What should a functional programmer
in Microsoft Research do?
A1: Ask Q2
Q2: What is the worlds most widely used
functional language, by far?
Q1: What should a functional programmer
in MSR do?
Q2: What is the worlds most widely used
functional language, by far?
Violent
exothermic
reaction
Excel!
Spreadsheets are functionalSpreadsheets are functional
programsprograms
B1 = A1*A1
C1 = A2*A2
D1 = B1-C1
B2 = A1+A2
C2 = A1-A2
D2 = B2*C2
 Just a big bunch ofJust a big bunch of
equationsequations
 No side effectsNo side effects
 Order of evaluationOrder of evaluation
controlled by datacontrolled by data
dependenciesdependencies
Q3: What chance does a pointy-headed
researcher have of influencing the
direction of a Microsoft cash cow?
Q3: What chance does a pointy-headed
researcher have of influencing the
direction of a Microsoft cash cow?
Market
size
Can use VB, C++
Can use
Excel
2m
50m
2m “classic” programmers
(write VB,C++, C#)
50m end-user
programmers
• Real job is engineering,
teaching, financial; NOT
programming
• Use Excel formulae to build
"models"
• No need to "sell" functional
programming: they are already
doing it!
Excel’s market is tall
Programmers
End users
Market
size
Can use VB, C++
Can use
Excel
2m
50m
2m “classic”
programmers
(write VB,C+
+, C#)
Research
effort
expended
Not much
Programmers
End users
Market
size
2m
50m
Programmers
End users
Application requirements
Tall, but narrow
When the task...
 becomes large or complex
 changes over time
 rewards re-use
 is mission-critical
... cells and formulas are not enough.
Current solution:
shift programming paradigm
Use Excel + VB, C#
Market
size
2m
50m Our vision
Programmers
End users
Application requirements
New territory
to colonise
Increase Excel’s
“reach” by
empowering end
users to write
“programs”
without hiring
programmers
Excel
Functional
programming
Research inputs
Excel Plus
End user &
visual software
engineering Psychology of
programming
Simon Peyton Jones
Margaret Burnett
Alan Blackwell
Ruthless design-time focus onRuthless design-time focus on
usabilityusability, based on empirically-, based on empirically-
grounded research.grounded research.
Our target end users
Of all Excel usersOf all Excel users
 Some just use Excel for listsSome just use Excel for lists
 Some can type very simple formulae e.g. =SUM(A1:A10)Some can type very simple formulae e.g. =SUM(A1:A10)
 Some use formulae, and understand copy-and-paste ofSome use formulae, and understand copy-and-paste of
formulae (absolute and relative cell references)formulae (absolute and relative cell references)
 Some can use Visual BasicSome can use Visual Basic
Our target end users
Of all Excel usersOf all Excel users
 Some just use Excel for listsSome just use Excel for lists
 Some can type very simple formulae e.g. =SUM(A1:A10)Some can type very simple formulae e.g. =SUM(A1:A10)
 Some use formulae, and understand copy-and-pasteSome use formulae, and understand copy-and-paste
of formulae (absolute and relative cell references)of formulae (absolute and relative cell references)
 Some can use Visual BasicSome can use Visual Basic
(professional programmers)(professional programmers)
This is our target audience.
A minority of Excel users, but still
extremely numerous
Market
size
2m
50m How?
Programmers
End users
Application requirements
Two
complementary
ideas
1. Functions
as ordinary
spreadsheets
2. First class
array values
Functions as ordinaryFunctions as ordinary
spreadsheetsspreadsheets
What's missing?What's missing?
B1 = A1*A1
C1 = A2*A2
D1 = B1-C1
B2 = A1+A2
C2 = A1-A2
D2 = B2*C2
ScenarioScenario
 teacher types formula to compute student gradeteacher types formula to compute student grade
 copies and pastes down a columncopies and pastes down a column
 (much later) wants to change the formula(much later) wants to change the formula
ProblemProblem
 must alter many cells to implement a single changemust alter many cells to implement a single change
 impacts re-use, error-proneness, modularityimpacts re-use, error-proneness, modularity
Obvious solutionObvious solution (to a programmer)(to a programmer)
 Make a named function to encapsulate the formulaMake a named function to encapsulate the formula
Functions as ordinary spreadsheetsFunctions as ordinary spreadsheets
User is working on a formulaUser is working on a formula
User brings up the right clickUser brings up the right click
menumenu
Cut
Copy
Paste
…
Make a function
…
A new function is automaticallyA new function is automatically
created in a sheet and calledcreated in a sheet and called
New
function
worksheet
Formula replaced
by call to
function
Now, fill down does not loseNow, fill down does not lose
sharingsharing
Regular
fill down
User can see/modify the functionUser can see/modify the function
definition as desireddefinition as desired
Functions as worksheetsFunctions as worksheets
 CreatingCreating a function is fasta function is fast
 UnderstandingUnderstanding a function requires no newa function requires no new
skills:skills: no paradigm shiftno paradigm shift
 UsingUsing a functiona function improves qualityimproves quality
Named abstraction is our primary weaponNamed abstraction is our primary weapon
in the war against complexity. Imaginein the war against complexity. Imagine
conventional programming with noconventional programming with no
procedures, only smart copy/paste!procedures, only smart copy/paste!
Creating a function from scratchCreating a function from scratch
 Build a worksheetBuild a worksheet to calculate the distance a ballto calculate the distance a ball
will travel, when at a particular angle and velocitywill travel, when at a particular angle and velocity
 Turn it into a functionTurn it into a function by identifying the inputby identifying the input
cells (a bit like “scenarios”, only callable)cells (a bit like “scenarios”, only callable)
 Call the functionCall the function many times to see the distancemany times to see the distance
the ball goes for different throwing anglesthe ball goes for different throwing angles
DebuggingDebugging
 The “call tree” becomes a tree of linkedThe “call tree” becomes a tree of linked
worksheets,worksheets, laid out in space, not in timelaid out in space, not in time..
 So debugging is particularly easy. Need newSo debugging is particularly easy. Need new
mechanisms for navigating the plethora ofmechanisms for navigating the plethora of
worksheets, via the tree structure.worksheets, via the tree structure.
 First year programming courses will be taughtFirst year programming courses will be taught
this way!this way!
Main program
Function CylVol
Function CircArea
Calls
Calls
Domain-specific librariesDomain-specific libraries
 Every domain (physics, electronics, statistics,Every domain (physics, electronics, statistics,
financial, marketing...) has domain-specificfinancial, marketing...) has domain-specific
abstractions.abstractions.
 Excel’s function libraries are an ideal way ofExcel’s function libraries are an ideal way of
packaging those abstractions for Excel users.packaging those abstractions for Excel users.
 Hence, we want to make it easy for end users toHence, we want to make it easy for end users to
build, encapsulate, and share their own functionbuild, encapsulate, and share their own function
libraries,libraries, without help from professional programmerswithout help from professional programmers..
First class data valuesFirst class data values
First class data valuesFirst class data values
 User-defined functions need array arguments.User-defined functions need array arguments.
e.g. SUM( A1:B9 )e.g. SUM( A1:B9 )
 Simple but powerful idea:Simple but powerful idea: anything a scalar can do,anything a scalar can do,
an array can doan array can do::
 be the value of a formulabe the value of a formula
 be the value of a cellbe the value of a cell
 be the argument or result of a functionbe the argument or result of a function
 Make Excel’s existing “array formulae”Make Excel’s existing “array formulae”
simplersimpler andand more powerful.more powerful.
First class valuesFirst class values
 Currency; units in general (unit-aware arithmetic)Currency; units in general (unit-aware arithmetic)
 HyperlinkHyperlink
 Matrix (index, add, multiply…)Matrix (index, add, multiply…)
 Relation (filter, select, join…)Relation (filter, select, join…)
 XML blob (query, combine)XML blob (query, combine)
 Picture (generate picture from numbers, combinePicture (generate picture from numbers, combine
pictures)pictures)
Each value type comes complete withEach value type comes complete with
a repertoire of functions over ita repertoire of functions over it
Bulk data operationsBulk data operations
A1 = …connect to a database relation…A1 = …connect to a database relation…
A2 = EXTEND( A1, [First Name], GetFirst( [Name] ) )A2 = EXTEND( A1, [First Name], GetFirst( [Name] ) )
A3 = EXTEND( A1, [Last Name], GetLast( [Name] ) )A3 = EXTEND( A1, [Last Name], GetLast( [Name] ) )
A4 = FILTER( A3, AND( [Age] > 30, [Age] < 50 ) )A4 = FILTER( A3, AND( [Age] > 30, [Age] < 50 ) )
A5 = SELECT( A4, [First Name], [Last Name], [Age] )A5 = SELECT( A4, [First Name], [Last Name], [Age] )
This stuff can be done today, by hand (e.g.
Data/AutoFilter), but it can’t be automated
robustly
Extensible typesExtensible types
It should be easy for a VB or C# programmer toIt should be easy for a VB or C# programmer to
add aadd a new data typenew data type. All Excel needs to know. All Excel needs to know
about it is:about it is:
 How to display itHow to display it
 How to “drill into” it to display its full valueHow to “drill into” it to display its full value
 Perhaps, how to downcast it to a number/stringPerhaps, how to downcast it to a number/string
The recalc chain and dependency analysis areThe recalc chain and dependency analysis are
completely unaffectedcompletely unaffected
Back to the supertankerBack to the supertanker
 Small crew, high-value payload, many customerSmall crew, high-value payload, many customer
requests, so systemic changes are not easyrequests, so systemic changes are not easy
 Excel 2003 is out -- the next version is beingExcel 2003 is out -- the next version is being
designeddesigned
 We’re talking to the Excel team regularlyWe’re talking to the Excel team regularly
(weekly)(weekly)
Back to the supertankerBack to the supertanker
 Small crew, high-value payload, many customerSmall crew, high-value payload, many customer
requests, so systemic changes are not easyrequests, so systemic changes are not easy
 Excel 2003 is out -- the next version is beingExcel 2003 is out -- the next version is being
designeddesigned
 We’re talking to the Excel team weeklyWe’re talking to the Excel team weekly
 Next:Next:
 higher order functionshigher order functions
 assertions, test generationassertions, test generation
 static type system?static type system?
Functional
programming
End user &
visual software
engineering
Psychology of
programming
Empower non-programmer end
users (accountants, engineers,
salesmen...) to do things they
could not do before
• Control complexity through building
re-usable abstractions
• Succeed in more ambitious applications
• Encapsulate domain-specific expertise
in function libraries
• Crush more errors earlier
SummarySummary
Multi-disciplinary inputs
http://research.microsoft.com/~simonpj/papers/excel

Spreadsheets: Functional Programming for the Masses

  • 1.
    Spreadsheets:Spreadsheets: functional programmingfunctional programming forthe massesfor the masses Simon Peyton JonesSimon Peyton Jones Margaret BurnettMargaret Burnett Alan BlackwellAlan Blackwell
  • 2.
    Q1: What shoulda functional programmer in Microsoft Research do?
  • 3.
    Q1: What shoulda functional programmer in Microsoft Research do? A1: Persuade developers to implement stuff in Haskell. No more C#! Haskell is better!
  • 4.
    Q1: What shoulda functional programmer in Microsoft Research do? A1: Ask Q2 Q2: What is the worlds most widely used functional language, by far?
  • 5.
    Q1: What shoulda functional programmer in MSR do? Q2: What is the worlds most widely used functional language, by far? Violent exothermic reaction Excel!
  • 6.
    Spreadsheets are functionalSpreadsheetsare functional programsprograms B1 = A1*A1 C1 = A2*A2 D1 = B1-C1 B2 = A1+A2 C2 = A1-A2 D2 = B2*C2  Just a big bunch ofJust a big bunch of equationsequations  No side effectsNo side effects  Order of evaluationOrder of evaluation controlled by datacontrolled by data dependenciesdependencies
  • 7.
    Q3: What chancedoes a pointy-headed researcher have of influencing the direction of a Microsoft cash cow?
  • 8.
    Q3: What chancedoes a pointy-headed researcher have of influencing the direction of a Microsoft cash cow?
  • 9.
    Market size Can use VB,C++ Can use Excel 2m 50m 2m “classic” programmers (write VB,C++, C#) 50m end-user programmers • Real job is engineering, teaching, financial; NOT programming • Use Excel formulae to build "models" • No need to "sell" functional programming: they are already doing it! Excel’s market is tall Programmers End users
  • 10.
    Market size Can use VB,C++ Can use Excel 2m 50m 2m “classic” programmers (write VB,C+ +, C#) Research effort expended Not much Programmers End users
  • 11.
    Market size 2m 50m Programmers End users Application requirements Tall,but narrow When the task...  becomes large or complex  changes over time  rewards re-use  is mission-critical ... cells and formulas are not enough. Current solution: shift programming paradigm Use Excel + VB, C#
  • 12.
    Market size 2m 50m Our vision Programmers Endusers Application requirements New territory to colonise Increase Excel’s “reach” by empowering end users to write “programs” without hiring programmers
  • 13.
    Excel Functional programming Research inputs Excel Plus Enduser & visual software engineering Psychology of programming Simon Peyton Jones Margaret Burnett Alan Blackwell Ruthless design-time focus onRuthless design-time focus on usabilityusability, based on empirically-, based on empirically- grounded research.grounded research.
  • 14.
    Our target endusers Of all Excel usersOf all Excel users  Some just use Excel for listsSome just use Excel for lists  Some can type very simple formulae e.g. =SUM(A1:A10)Some can type very simple formulae e.g. =SUM(A1:A10)  Some use formulae, and understand copy-and-paste ofSome use formulae, and understand copy-and-paste of formulae (absolute and relative cell references)formulae (absolute and relative cell references)  Some can use Visual BasicSome can use Visual Basic
  • 15.
    Our target endusers Of all Excel usersOf all Excel users  Some just use Excel for listsSome just use Excel for lists  Some can type very simple formulae e.g. =SUM(A1:A10)Some can type very simple formulae e.g. =SUM(A1:A10)  Some use formulae, and understand copy-and-pasteSome use formulae, and understand copy-and-paste of formulae (absolute and relative cell references)of formulae (absolute and relative cell references)  Some can use Visual BasicSome can use Visual Basic (professional programmers)(professional programmers) This is our target audience. A minority of Excel users, but still extremely numerous
  • 16.
    Market size 2m 50m How? Programmers End users Applicationrequirements Two complementary ideas 1. Functions as ordinary spreadsheets 2. First class array values
  • 17.
    Functions as ordinaryFunctionsas ordinary spreadsheetsspreadsheets
  • 18.
    What's missing?What's missing? B1= A1*A1 C1 = A2*A2 D1 = B1-C1 B2 = A1+A2 C2 = A1-A2 D2 = B2*C2
  • 19.
    ScenarioScenario  teacher typesformula to compute student gradeteacher types formula to compute student grade  copies and pastes down a columncopies and pastes down a column  (much later) wants to change the formula(much later) wants to change the formula ProblemProblem  must alter many cells to implement a single changemust alter many cells to implement a single change  impacts re-use, error-proneness, modularityimpacts re-use, error-proneness, modularity Obvious solutionObvious solution (to a programmer)(to a programmer)  Make a named function to encapsulate the formulaMake a named function to encapsulate the formula Functions as ordinary spreadsheetsFunctions as ordinary spreadsheets
  • 20.
    User is workingon a formulaUser is working on a formula
  • 21.
    User brings upthe right clickUser brings up the right click menumenu Cut Copy Paste … Make a function …
  • 22.
    A new functionis automaticallyA new function is automatically created in a sheet and calledcreated in a sheet and called New function worksheet Formula replaced by call to function
  • 23.
    Now, fill downdoes not loseNow, fill down does not lose sharingsharing Regular fill down
  • 24.
    User can see/modifythe functionUser can see/modify the function definition as desireddefinition as desired
  • 25.
    Functions as worksheetsFunctionsas worksheets  CreatingCreating a function is fasta function is fast  UnderstandingUnderstanding a function requires no newa function requires no new skills:skills: no paradigm shiftno paradigm shift  UsingUsing a functiona function improves qualityimproves quality Named abstraction is our primary weaponNamed abstraction is our primary weapon in the war against complexity. Imaginein the war against complexity. Imagine conventional programming with noconventional programming with no procedures, only smart copy/paste!procedures, only smart copy/paste!
  • 26.
    Creating a functionfrom scratchCreating a function from scratch  Build a worksheetBuild a worksheet to calculate the distance a ballto calculate the distance a ball will travel, when at a particular angle and velocitywill travel, when at a particular angle and velocity  Turn it into a functionTurn it into a function by identifying the inputby identifying the input cells (a bit like “scenarios”, only callable)cells (a bit like “scenarios”, only callable)  Call the functionCall the function many times to see the distancemany times to see the distance the ball goes for different throwing anglesthe ball goes for different throwing angles
  • 27.
    DebuggingDebugging  The “calltree” becomes a tree of linkedThe “call tree” becomes a tree of linked worksheets,worksheets, laid out in space, not in timelaid out in space, not in time..  So debugging is particularly easy. Need newSo debugging is particularly easy. Need new mechanisms for navigating the plethora ofmechanisms for navigating the plethora of worksheets, via the tree structure.worksheets, via the tree structure.  First year programming courses will be taughtFirst year programming courses will be taught this way!this way!
  • 28.
  • 29.
    Domain-specific librariesDomain-specific libraries Every domain (physics, electronics, statistics,Every domain (physics, electronics, statistics, financial, marketing...) has domain-specificfinancial, marketing...) has domain-specific abstractions.abstractions.  Excel’s function libraries are an ideal way ofExcel’s function libraries are an ideal way of packaging those abstractions for Excel users.packaging those abstractions for Excel users.  Hence, we want to make it easy for end users toHence, we want to make it easy for end users to build, encapsulate, and share their own functionbuild, encapsulate, and share their own function libraries,libraries, without help from professional programmerswithout help from professional programmers..
  • 30.
    First class datavaluesFirst class data values
  • 31.
    First class datavaluesFirst class data values  User-defined functions need array arguments.User-defined functions need array arguments. e.g. SUM( A1:B9 )e.g. SUM( A1:B9 )  Simple but powerful idea:Simple but powerful idea: anything a scalar can do,anything a scalar can do, an array can doan array can do::  be the value of a formulabe the value of a formula  be the value of a cellbe the value of a cell  be the argument or result of a functionbe the argument or result of a function  Make Excel’s existing “array formulae”Make Excel’s existing “array formulae” simplersimpler andand more powerful.more powerful.
  • 32.
    First class valuesFirstclass values  Currency; units in general (unit-aware arithmetic)Currency; units in general (unit-aware arithmetic)  HyperlinkHyperlink  Matrix (index, add, multiply…)Matrix (index, add, multiply…)  Relation (filter, select, join…)Relation (filter, select, join…)  XML blob (query, combine)XML blob (query, combine)  Picture (generate picture from numbers, combinePicture (generate picture from numbers, combine pictures)pictures) Each value type comes complete withEach value type comes complete with a repertoire of functions over ita repertoire of functions over it
  • 33.
    Bulk data operationsBulkdata operations A1 = …connect to a database relation…A1 = …connect to a database relation… A2 = EXTEND( A1, [First Name], GetFirst( [Name] ) )A2 = EXTEND( A1, [First Name], GetFirst( [Name] ) ) A3 = EXTEND( A1, [Last Name], GetLast( [Name] ) )A3 = EXTEND( A1, [Last Name], GetLast( [Name] ) ) A4 = FILTER( A3, AND( [Age] > 30, [Age] < 50 ) )A4 = FILTER( A3, AND( [Age] > 30, [Age] < 50 ) ) A5 = SELECT( A4, [First Name], [Last Name], [Age] )A5 = SELECT( A4, [First Name], [Last Name], [Age] ) This stuff can be done today, by hand (e.g. Data/AutoFilter), but it can’t be automated robustly
  • 34.
    Extensible typesExtensible types Itshould be easy for a VB or C# programmer toIt should be easy for a VB or C# programmer to add aadd a new data typenew data type. All Excel needs to know. All Excel needs to know about it is:about it is:  How to display itHow to display it  How to “drill into” it to display its full valueHow to “drill into” it to display its full value  Perhaps, how to downcast it to a number/stringPerhaps, how to downcast it to a number/string The recalc chain and dependency analysis areThe recalc chain and dependency analysis are completely unaffectedcompletely unaffected
  • 35.
    Back to thesupertankerBack to the supertanker  Small crew, high-value payload, many customerSmall crew, high-value payload, many customer requests, so systemic changes are not easyrequests, so systemic changes are not easy  Excel 2003 is out -- the next version is beingExcel 2003 is out -- the next version is being designeddesigned  We’re talking to the Excel team regularlyWe’re talking to the Excel team regularly (weekly)(weekly)
  • 36.
    Back to thesupertankerBack to the supertanker  Small crew, high-value payload, many customerSmall crew, high-value payload, many customer requests, so systemic changes are not easyrequests, so systemic changes are not easy  Excel 2003 is out -- the next version is beingExcel 2003 is out -- the next version is being designeddesigned  We’re talking to the Excel team weeklyWe’re talking to the Excel team weekly  Next:Next:  higher order functionshigher order functions  assertions, test generationassertions, test generation  static type system?static type system?
  • 37.
    Functional programming End user & visualsoftware engineering Psychology of programming Empower non-programmer end users (accountants, engineers, salesmen...) to do things they could not do before • Control complexity through building re-usable abstractions • Succeed in more ambitious applications • Encapsulate domain-specific expertise in function libraries • Crush more errors earlier SummarySummary Multi-disciplinary inputs http://research.microsoft.com/~simonpj/papers/excel