www.itu.dk 1
High-performance
sheet-defined functions
in spreadsheets
Peter Sestoft
IT University of Copenhagen
SEMS 2014-07-02
With thanks to Thomas S Iversen, Daniel Cortes, Morten Hansen,
Poul Serek, Morten Poulsen, Hui Xu, Mainul Liton, Poul Brønnum,
Tim Garbos, Kasper Videbæk, Jens Hamann, Jonas Druedahl Rask,
Simon Eikeland Timmermann
www.itu.dk 2
The trouble with functions
•  One cannot define functions in a spreadsheet
•  To define new functions, ”experts” use VBA
•  Often very poorly, witness newsgroup
microsoft.public.excel.programming
•  Many (Excel) built-in functions are bad:
–  Week numbers: two kinds, but not ISO standard
•  Possible answers to this mess:
–  ”People should not use spreadsheets”
–  ”Only computer scientists should define functions”
–  ”All necessary functions should be built in”
–  Or: Functions within the spreadsheet metaphor
(Nuñez 2000, Peyton-Jones et al 2003)
www.itu.dk
Problem example: Area of triangles
•  Area of triangle with sides a, b, c is
SQRT(s(s-a)(s-b)(s-c)) where s = (a+b+c)/2
Either (1) compute s in column D:
or (2) try to inline s in the area formula:
Annoying intermediate result
Horrible and error-prone
www.itu.dk
A solution:
Sheet-defined function TRIAREA
Function
sheet
Ordinary
sheet
Input
cells
Output
cell
www.itu.dk 5
How use sheet-defined functions
•  Assumptions
–  End-users understand spreadsheet models
–  End-users do not understand VBA, C#, VB.NET, …
•  Sheet-defined functions in the organization
–  Models are developed in ordinary spreadsheets
–  After a while functions are factored out of models
–  Functions can be further developed interactively
–  An organization can develop and share libraries
–  Without preventing further evolution by users
•  Works only if
–  Sheet-defined functions are fast enough
Dual implementation
•  Ordinary sheets, interpretive evaluation
–  Frequently edited, rarely evaluated (at "recalculation")
•  Function sheets, compiled evaluation
–  Rarely edited, frequently evaluated (at function calls)
–  Run-time code generation permits interactive editing
Runtime code generation
=SQRT(a1*a1+a2*a2)
ldloc 2
ldloc 2
mul
ldloc 3
ldloc 3
mul
add
call Math.Sqrt
fldl 0xfffffff0(%ebp)
fldl 0xfffffff0(%ebp)
fmulp %st,%st(1)
fldl 0xffffffe8(%ebp)
fldl 0xffffffe8(%ebp)
fmulp %st,%st(1)
faddp %st,%st(1)
fsqrt
Spreadsheet
formula
.NET
bytecode
x86 machine
code
My
compiler
JIT
compiler
Result: A very
fast, portable
spreadsheet
implementation
www.itu.dk
New book (next month)
•  Spreadsheet Implementation Technology,
MIT Press, August 2014
8
Peter Sestoft
Spreadsheet
Implementation
Technology
Basics and Extensions
Version 0.99.5 of 2014-05-10
The MIT Press
Cambridge, Massachusetts
London, England
•  A standard spreadsheet
implementation
•  Sheet-defined functions
•  Examples
•  Design choices
•  Scalability and speed
•  Implementation details
•  Funcalc user manual
www.itu.dk 9
Example function: NORMDISTCDF
•  Normal distribution N(0,1) cumulative distribution function
•  As accurate as Excel’s built-in NORMSDIST(z), and faster
Input cell Output cell
NORMDISTCDF generated code
•  Approximately 118 ns/call on 2.66 GHz Intel Core 2
•  VBA: 1760 ns; Excel built-in: 1140 ns; C#: 64 ns; C: 54 ns
0000 ldarg V_0 0068 ldloc.0 0198 div
0004 call ValueToDoubleOrNan 0069 call Double.IsInfinity 0199 add
0009 stloc.s V_6 006e brtrue IL_01a0 019a div
000b ldloc.s V_6 0073 ldloc.0 019b br IL_01a1
000d call Math.Abs 0074 call Double.IsNaN 01a0 ldloc.0
0012 stloc.3 0079 brtrue IL_01a0 01a1 br IL_01a7
0013 ldc.r8 -1 007e ldloc.0 01a6 ldloc.0
001c ldloc.3 007f ldc.r8 7.071 01a7 stloc.s V_5
001d mul 0088 bge IL_0144 01a9 ldloc.s V_6
001e ldloc.3 008d ldloc.s V_4 01ab stloc.0
001f mul 008f ldc.r8 220.206867912376 01ac ldloc.0
0020 ldc.r8 2 0098 ldloc.3 01ad call Double.IsInfinity
0029 div 0099 ldc.r8 221.213596169931 01b2 brtrue IL_01f3
002a call Math.Exp 00a2 ldloc.3 01b7 ldloc.0
002f stloc.s V_4 00a3 ldc.r8 112.07929149787 01b8 call Double.IsNaN
0031 ldloc.3 00ac ldloc.3 01bd brtrue IL_01f3
0032 stloc.0 00ad ldc.r8 33.912866078383 01c2 ldloc.0
0033 ldloc.0 00b6 ldloc.3 01c3 ldc.r8 0
0034 call Double.IsInfinity 00b7 ldc.r8 6.37396220353165 01cc bge IL_01dd
0039 brtrue IL_01a6 00c0 ldloc.3 01d1 ldloc.s V_5
003e ldloc.0 00c1 ldc.r8 0.700383064443688 01d3 call NumberValue.Make
003f call Double.IsNaN 00ca ldloc.3 01d8 br IL_01ee
0044 brtrue IL_01a6 00cb ldc.r8 0.035262496599891 01dd ldc.r8 1
0049 ldloc.0 00d4 mul 01e6 ldloc.s V_5
004a ldc.r8 37 00d5 add 01e8 sub
0053 ble IL_0066 00d6 mul 01e9 call NumberValue.Make
0058 ldc.r8 0 00d7 add 01ee br IL_01f9
0061 br IL_01a1 00d8 mul 01f3 ldloc.0
0066 ldloc.3 00d9 add 01f4 call NumberValue.Make
0067 stloc.0 ... 61 lines left out ... 01f9 ret
Asingle
unwrapping
WrappingWrappingWrapping
Examples: Calendrical functions
•  Excel’s calendar functions are poor
–  Wrong before 1900, no ISO week numbers,
cannot easily find first Monday of month, Easter, …
•  Easy to implement as sheet-defined functions
•  Example: Easter in a given year (1400 ns/call):
By MSc students Xu and
Liton, following
Dershowitz & Reingold
(3rd ed, Cambridge UP)
Input: year
Output:
Easter
fixdate
•  Some other functions:
–  Fixdate to/from day-month-year
–  Fixdate to/from ISO week and ISO year
–  Last/nth Monday (etc) before given date
–  First/nth Monday (etc) after given date
Higher-order functions:
Sheet-defined functions as values
•  New built-ins to manipulate functions
–  CLOSURE(“name”, a1, …) evaluates to a closure:
a partially applied sheet-defined function
–  APPLY(f, b1, …) applies a function value
•  Example function “ndie”, a general n-side die
•  Defining & rolling 6-sided and 20-sided dice
Input cell
Output cell
www.itu.dk
Funsheet: Linking Excel and Funcalc
•  Sheet-defined functions in Excel!
•  Eikeland and Timmermann MSc, June 2014
•  Via Excel DNA, an Excel-.NET bridge
•  Generated code is as fast as Funcalc
•  Call speed Excel -> Funcalc suffers from
general Excel slowness, 11 us/call or so
•  Complete Funcalc functionality: DEFINE,
CLOSURE, APPLY, SPECIALIZE, BENCHMARK
•  Prototype, so still a number of defects
13
www.itu.dk
TO DO: Validation
•  Improve the Excel <-> Funcalc link
•  Demonstrate one application area
•  Fix obvious problems
•  Perform development experiments
•  Perform maintenance experiments
•  ...
•  But experiments is not my area of expertise
14

High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

  • 1.
    www.itu.dk 1 High-performance sheet-defined functions inspreadsheets Peter Sestoft IT University of Copenhagen SEMS 2014-07-02 With thanks to Thomas S Iversen, Daniel Cortes, Morten Hansen, Poul Serek, Morten Poulsen, Hui Xu, Mainul Liton, Poul Brønnum, Tim Garbos, Kasper Videbæk, Jens Hamann, Jonas Druedahl Rask, Simon Eikeland Timmermann
  • 2.
    www.itu.dk 2 The troublewith functions •  One cannot define functions in a spreadsheet •  To define new functions, ”experts” use VBA •  Often very poorly, witness newsgroup microsoft.public.excel.programming •  Many (Excel) built-in functions are bad: –  Week numbers: two kinds, but not ISO standard •  Possible answers to this mess: –  ”People should not use spreadsheets” –  ”Only computer scientists should define functions” –  ”All necessary functions should be built in” –  Or: Functions within the spreadsheet metaphor (Nuñez 2000, Peyton-Jones et al 2003)
  • 3.
    www.itu.dk Problem example: Areaof triangles •  Area of triangle with sides a, b, c is SQRT(s(s-a)(s-b)(s-c)) where s = (a+b+c)/2 Either (1) compute s in column D: or (2) try to inline s in the area formula: Annoying intermediate result Horrible and error-prone
  • 4.
    www.itu.dk A solution: Sheet-defined functionTRIAREA Function sheet Ordinary sheet Input cells Output cell
  • 5.
    www.itu.dk 5 How usesheet-defined functions •  Assumptions –  End-users understand spreadsheet models –  End-users do not understand VBA, C#, VB.NET, … •  Sheet-defined functions in the organization –  Models are developed in ordinary spreadsheets –  After a while functions are factored out of models –  Functions can be further developed interactively –  An organization can develop and share libraries –  Without preventing further evolution by users •  Works only if –  Sheet-defined functions are fast enough
  • 6.
    Dual implementation •  Ordinarysheets, interpretive evaluation –  Frequently edited, rarely evaluated (at "recalculation") •  Function sheets, compiled evaluation –  Rarely edited, frequently evaluated (at function calls) –  Run-time code generation permits interactive editing
  • 7.
    Runtime code generation =SQRT(a1*a1+a2*a2) ldloc2 ldloc 2 mul ldloc 3 ldloc 3 mul add call Math.Sqrt fldl 0xfffffff0(%ebp) fldl 0xfffffff0(%ebp) fmulp %st,%st(1) fldl 0xffffffe8(%ebp) fldl 0xffffffe8(%ebp) fmulp %st,%st(1) faddp %st,%st(1) fsqrt Spreadsheet formula .NET bytecode x86 machine code My compiler JIT compiler Result: A very fast, portable spreadsheet implementation
  • 8.
    www.itu.dk New book (nextmonth) •  Spreadsheet Implementation Technology, MIT Press, August 2014 8 Peter Sestoft Spreadsheet Implementation Technology Basics and Extensions Version 0.99.5 of 2014-05-10 The MIT Press Cambridge, Massachusetts London, England •  A standard spreadsheet implementation •  Sheet-defined functions •  Examples •  Design choices •  Scalability and speed •  Implementation details •  Funcalc user manual
  • 9.
    www.itu.dk 9 Example function:NORMDISTCDF •  Normal distribution N(0,1) cumulative distribution function •  As accurate as Excel’s built-in NORMSDIST(z), and faster Input cell Output cell
  • 10.
    NORMDISTCDF generated code • Approximately 118 ns/call on 2.66 GHz Intel Core 2 •  VBA: 1760 ns; Excel built-in: 1140 ns; C#: 64 ns; C: 54 ns 0000 ldarg V_0 0068 ldloc.0 0198 div 0004 call ValueToDoubleOrNan 0069 call Double.IsInfinity 0199 add 0009 stloc.s V_6 006e brtrue IL_01a0 019a div 000b ldloc.s V_6 0073 ldloc.0 019b br IL_01a1 000d call Math.Abs 0074 call Double.IsNaN 01a0 ldloc.0 0012 stloc.3 0079 brtrue IL_01a0 01a1 br IL_01a7 0013 ldc.r8 -1 007e ldloc.0 01a6 ldloc.0 001c ldloc.3 007f ldc.r8 7.071 01a7 stloc.s V_5 001d mul 0088 bge IL_0144 01a9 ldloc.s V_6 001e ldloc.3 008d ldloc.s V_4 01ab stloc.0 001f mul 008f ldc.r8 220.206867912376 01ac ldloc.0 0020 ldc.r8 2 0098 ldloc.3 01ad call Double.IsInfinity 0029 div 0099 ldc.r8 221.213596169931 01b2 brtrue IL_01f3 002a call Math.Exp 00a2 ldloc.3 01b7 ldloc.0 002f stloc.s V_4 00a3 ldc.r8 112.07929149787 01b8 call Double.IsNaN 0031 ldloc.3 00ac ldloc.3 01bd brtrue IL_01f3 0032 stloc.0 00ad ldc.r8 33.912866078383 01c2 ldloc.0 0033 ldloc.0 00b6 ldloc.3 01c3 ldc.r8 0 0034 call Double.IsInfinity 00b7 ldc.r8 6.37396220353165 01cc bge IL_01dd 0039 brtrue IL_01a6 00c0 ldloc.3 01d1 ldloc.s V_5 003e ldloc.0 00c1 ldc.r8 0.700383064443688 01d3 call NumberValue.Make 003f call Double.IsNaN 00ca ldloc.3 01d8 br IL_01ee 0044 brtrue IL_01a6 00cb ldc.r8 0.035262496599891 01dd ldc.r8 1 0049 ldloc.0 00d4 mul 01e6 ldloc.s V_5 004a ldc.r8 37 00d5 add 01e8 sub 0053 ble IL_0066 00d6 mul 01e9 call NumberValue.Make 0058 ldc.r8 0 00d7 add 01ee br IL_01f9 0061 br IL_01a1 00d8 mul 01f3 ldloc.0 0066 ldloc.3 00d9 add 01f4 call NumberValue.Make 0067 stloc.0 ... 61 lines left out ... 01f9 ret Asingle unwrapping WrappingWrappingWrapping
  • 11.
    Examples: Calendrical functions • Excel’s calendar functions are poor –  Wrong before 1900, no ISO week numbers, cannot easily find first Monday of month, Easter, … •  Easy to implement as sheet-defined functions •  Example: Easter in a given year (1400 ns/call): By MSc students Xu and Liton, following Dershowitz & Reingold (3rd ed, Cambridge UP) Input: year Output: Easter fixdate •  Some other functions: –  Fixdate to/from day-month-year –  Fixdate to/from ISO week and ISO year –  Last/nth Monday (etc) before given date –  First/nth Monday (etc) after given date
  • 12.
    Higher-order functions: Sheet-defined functionsas values •  New built-ins to manipulate functions –  CLOSURE(“name”, a1, …) evaluates to a closure: a partially applied sheet-defined function –  APPLY(f, b1, …) applies a function value •  Example function “ndie”, a general n-side die •  Defining & rolling 6-sided and 20-sided dice Input cell Output cell
  • 13.
    www.itu.dk Funsheet: Linking Exceland Funcalc •  Sheet-defined functions in Excel! •  Eikeland and Timmermann MSc, June 2014 •  Via Excel DNA, an Excel-.NET bridge •  Generated code is as fast as Funcalc •  Call speed Excel -> Funcalc suffers from general Excel slowness, 11 us/call or so •  Complete Funcalc functionality: DEFINE, CLOSURE, APPLY, SPECIALIZE, BENCHMARK •  Prototype, so still a number of defects 13
  • 14.
    www.itu.dk TO DO: Validation • Improve the Excel <-> Funcalc link •  Demonstrate one application area •  Fix obvious problems •  Perform development experiments •  Perform maintenance experiments •  ... •  But experiments is not my area of expertise 14