Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Spreadsheet Engineering
Jácome Cunha
jacome@di.uminho.pt
www.di.uminho.pt/~jacome
Researcher @

HASLab / INESC TEC & Unive...
Agenda
I.

Motivation

II.

Spreadsheets Meet Models

III.

Models for Spreadsheets – ClassSheets

IV.

Inferring ClassShe...
I. Motivation

3
4
Why do Spreadsheets matter?

Financial intelligence firm CODA reports that
95% of all U.S. firms use spreadsheets for
fina...
Why do Spreadsheets matter?

They are the programming language of choice by nonprofessional programmers, a.k.a. end users....
7
Why do Spreadsheets matter?

In 2004, RevenueRecognition.com (now
Softtrax) had the International Data Corporation
intervi...
In fact, spreadsheets lack:
●

Abstraction
●

●

Encapsulation

Type system
●

Testing
●

●

IDE

...
9
And the consequences may be...
Around 200 people who thought their only
experience of the London 2012
Olympic Games would ...
And the consequences may be...
In a 2010 paper* Carmen Reinhart, now a professor at
Harvard Kennedy School, and Kenneth Ro...
II. Spreadsheets Meet Models
[PEPM'09, VL/HCC'10]

12
Why Models?

13
Spreadsheet Example

14
Functional Dependency?

→

↛
15
Functional Dependencies

●

●

●

We compute the business logic from the data, by
inferring FDs
They are the building bloc...
Too Many??
["A"] -> ["B","C","D","E","F"]
["C"] -> ["A","B","D","E","F"]
["D"] -> ["A","B","C","E","F"]
["E"] -> ["A","B",...
Accidents happen

●

●

We use a data mining algorithm which produces to
many accidental FDs!

We introduce some spreadshe...
Organize them
●

●

Label semantics: often keys are labeled “code” or “id”
Label arrangement: we prefer FDs respecting the...
Final set

Pilot-Id → Pilot-Name, Phone
N-Number → Model, Plane-Name
Pilot-Id, N-Number, Depart, Destination, Date, Hours ...
The first model: a relational model
Having computed the FDs, we can now use the FUN
algorithm to produce a relational mode...
III. Models for Spreadsheet – ClassSheets
Engels and Erwig ASE'05

22
ClassSheets - Models for Spreadsheets
ClassSheets are a high-level, object-oriented formalism to
specify spreadsheets

23
ClassSheets - Models for Spreadsheets

24
ClassSheets - Models for Spreadsheets

25
IV. Inferring ClassSheets
[VL/HCC'10]

26
ClassSheet Inference

27
V. Embedding ClassSheets
[VL/HCC'11]

28
Why the Embedding?

29
Embedding...
●

Embedding a language into another language is
a recurring strategy (e.g. for DSLs)
–

–

Users are used to...
Vertically Expandable Tables

31
Horizontally Expandable Tables

32
Relationship Tables

33
34
VI. Evolution!
[FASE'11, ICMT'12]

35
Why do Spreadsheet Models Need Evolution?
●

●

Suppose now you need to add new information
to the spreadsheet
For instanc...
Why do Spreadsheet Instances Need Evolution?

●

●

●

Some evolution steps are easier to perform on
the instance
For inst...
38
Bidirectional Transformation System

39
(Data) Operations on Instances

40
(Model) Operations on ClassSheets

41
Bidirectional Transformation Functions

42
Compositional
Example: Add a Column and a Class

43
VII. MDSheet – Model-Driven Spreadsheets
[ICSE'12]

44
MDSheet Tool

http://youtu.be/6LNdTdCpV2U

45
●

Available at http://ssaapp.di.uminho.pt

●

Built out of 7886 LOC:
–

3181 in Haskell, for the inference and evolution
...
VIII. Summary

47
Acknowledgments

This work has been done in collaboration with
many people:
Martin Erwig, João Paulo Fernandes, Jorge
Mend...
Thanks!
Questions?

49
More?
●

More at http://ssaapp.di.uminho.pt

●

Querying model-driven spreadsheet

●

Visually querying model-driven sprea...
Does It Work?

51
Empirical Study Settings

●

17 student from a MSc course

●

2 different spreadsheets
–

Microsoft budget

–

Local compa...
Study Setting

●

Hypotheses:
(1) In order to perform a given set of tasks,
users spend less time when using modeldriven s...
Main Results
Number of tasks performed on the MS spreadsheet

54
Main Results
Error rate in the budget spreadsheet

55
Upcoming SlideShare
Loading in …5
×

of

Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 1 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 2 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 3 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 4 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 5 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 6 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 7 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 8 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 9 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 10 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 11 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 12 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 13 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 14 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 15 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 16 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 17 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 18 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 19 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 20 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 21 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 22 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 23 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 24 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 25 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 26 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 27 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 28 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 29 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 30 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 31 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 32 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 33 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 34 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 35 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 36 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 37 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 38 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 39 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 40 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 41 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 42 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 43 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 44 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 45 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 46 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 47 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 48 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 49 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 50 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 51 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 52 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 53 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 54 Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 Slide 55
Upcoming SlideShare
MDSheet – Model-Driven Spreadsheets
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14

Download to read offline

Spreadsheets play a pivotal role in modern society as they are inherently multi-purpose and widely used both by individuals to cope with simple needs as well as large companies as integrators of complex systems and as support for informing business decisions. Spreadsheets have probably passed the point of no return in terms of importance: it is estimated that 95% of all U.S. firms use them for financial reporting, 90% of all analysts in industry perform calculations in spreadsheets, and 50% of all spreadsheets are the basis for decisions. This importance, however, has not been achieved together with effective mechanisms for error prevention.

To aid in error-prevention in this talk we present a methodology for spreadsheet engineering. First, we present data mining and database techniques to reason about spreadsheet data. These techniques are used to compute relationships between spreadsheet elements (cells/columns/rows). These relations are then used to infer a model defining the business logic of the spreadsheet. Such a model of a spreadsheet data is a visual domain specific language that we embed in a well-known spreadsheet system. The embedded model is the building block to define techniques for model-driven spreadsheet development, where advanced techniques are used to guarantee the model-instance synchronization. In this model-driven environment, any user data update as to follow the the model-instance conformance relation, thus, guiding spreadsheet users to introduce correct data. Bidirectional transformations are used to synchronize models and instances after users update/evolve the model or instance.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14

  1. 1. Spreadsheet Engineering Jácome Cunha jacome@di.uminho.pt www.di.uminho.pt/~jacome Researcher @ HASLab / INESC TEC & Universidade do Minho, Portugal Oregon State University (office 2077) OSU - EECS Colloquium - 02/24/14
  2. 2. Agenda I. Motivation II. Spreadsheets Meet Models III. Models for Spreadsheets – ClassSheets IV. Inferring ClassSheets V. Embedding ClassSheets VI. Evolution! VII. Model-Driven Spreadsheets VIII. Summary 2
  3. 3. I. Motivation 3
  4. 4. 4
  5. 5. Why do Spreadsheets matter? Financial intelligence firm CODA reports that 95% of all U.S. firms use spreadsheets for financial reporting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 5
  6. 6. Why do Spreadsheets matter? They are the programming language of choice by nonprofessional programmers, a.k.a. end users. In the U.S. alone, the number of end-user programmers is conservatively estimated at 11 million, compared to only 2.75 million other, professional programmers. Estimating the numbers of end users and end-user programmers, Christopher Scaffidi, Mary Shaw, and Brad Myers, VL/HCC 2005 6
  7. 7. 7
  8. 8. Why do Spreadsheets matter? In 2004, RevenueRecognition.com (now Softtrax) had the International Data Corporation interview 118 business leaders. IDC found that 85% were using spreadsheets in financial reporting and forecasting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 8
  9. 9. In fact, spreadsheets lack: ● Abstraction ● ● Encapsulation Type system ● Testing ● ● IDE ... 9
  10. 10. And the consequences may be... Around 200 people who thought their only experience of the London 2012 Olympic Games would be minor heats of synchronised swimming have received an unexpected upgrade to the men’s 100m final following an embarrassing ticketing mistake. ... Locog said the error occurred in the summer, between the first and second round of ticket sales, when a member of staff made a single keystroke mistake and entered ‘20,000’ into a spreadsheet rather than the correct figure of 10,000 remaining tickets. The Telegraph, 04 January 2012 10
  11. 11. And the consequences may be... In a 2010 paper* Carmen Reinhart, now a professor at Harvard Kennedy School, and Kenneth Rogoff, an economist at Harvard University...argued that GDP growth slows to a snail’s pace once government-debt levels exceed 90% of GDP. The 90% figure quickly became ammunition in political arguments over austerity...This week a new piece of research poured fuel on the fire by calling the 90% finding into question.. Economy losses of $10 billion/year! The Economist, 17 April 2013 Harvard University economists Carmen Reinhart and Kenneth Rogoff have acknowledged making a spreadsheet calculation mistake in a 2010 research paper, “Growth in a Time of Debt”, which has been widely cited to justify budgetcutting. Business Week, 18 April 2013 11
  12. 12. II. Spreadsheets Meet Models [PEPM'09, VL/HCC'10] 12
  13. 13. Why Models? 13
  14. 14. Spreadsheet Example 14
  15. 15. Functional Dependency? → ↛ 15
  16. 16. Functional Dependencies ● ● ● We compute the business logic from the data, by inferring FDs They are the building blocks inferring models for (legacy) spreadsheets The better the FDs we infer, the better the model we compute! 16
  17. 17. Too Many?? ["A"] -> ["B","C","D","E","F"] ["C"] -> ["A","B","D","E","F"] ["D"] -> ["A","B","C","E","F"] ["E"] -> ["A","B","C","D","F"] ["F"] -> ["A","B","C","D","E"] ["G"] -> ["H","I","J"] ["H"] -> ["G","I","J"] ["I"] -> ["G","H","J"] ["J"] -> ["G","H","I"] ["K"] -> ["L","M"] ["L"] -> ["K","M"] ["M"] -> ["K","L"] ["B","K"] -> ["A","C","D","E","F"] ["B","L"] -> ["A","C","D","E","F"] ["B","M"] -> ["A","C","D","E","F"] 17
  18. 18. Accidents happen ● ● We use a data mining algorithm which produces to many accidental FDs! We introduce some spreadsheet specific heuristics to filter out “accidental” FDs 18
  19. 19. Organize them ● ● Label semantics: often keys are labeled “code” or “id” Label arrangement: we prefer FDs respecting the order of columns ● Antecedent size: small keys are preferable ● Ratio: small ratio between keys and non-keys ● Single value columns: columns always with the same value appear in too many FDs 19
  20. 20. Final set Pilot-Id → Pilot-Name, Phone N-Number → Model, Plane-Name Pilot-Id, N-Number, Depart, Destination, Date, Hours → {} 20
  21. 21. The first model: a relational model Having computed the FDs, we can now use the FUN algorithm to produce a relational model for the spreadsheet: Pilots (Pilot-Id, Pilot-Name, Phone) Planes (N-Number, Model, Plane-Name) <Flights> (#Pilot-Id,# N-Number, Depart, Destination, Date Hours) 21
  22. 22. III. Models for Spreadsheet – ClassSheets Engels and Erwig ASE'05 22
  23. 23. ClassSheets - Models for Spreadsheets ClassSheets are a high-level, object-oriented formalism to specify spreadsheets 23
  24. 24. ClassSheets - Models for Spreadsheets 24
  25. 25. ClassSheets - Models for Spreadsheets 25
  26. 26. IV. Inferring ClassSheets [VL/HCC'10] 26
  27. 27. ClassSheet Inference 27
  28. 28. V. Embedding ClassSheets [VL/HCC'11] 28
  29. 29. Why the Embedding? 29
  30. 30. Embedding... ● Embedding a language into another language is a recurring strategy (e.g. for DSLs) – – Users are used to the host language and do not need to learn a (complete) new language :-) – Implementation effort is much reduced :-) – ● Embedded language inherit all the power of the host language :-) It may have some restrictions :-( We embedded ClassSheets in traditional spreadsheet systems 30
  31. 31. Vertically Expandable Tables 31
  32. 32. Horizontally Expandable Tables 32
  33. 33. Relationship Tables 33
  34. 34. 34
  35. 35. VI. Evolution! [FASE'11, ICMT'12] 35
  36. 36. Why do Spreadsheet Models Need Evolution? ● ● Suppose now you need to add new information to the spreadsheet For instance, the number of passengers of each flight ● It would require to do several error-prone tasks ● Add columns, labels, update formulas, etc. ● We can do it automatically! 36
  37. 37. Why do Spreadsheet Instances Need Evolution? ● ● ● Some evolution steps are easier to perform on the instance For instance, to add a column to one of the repetition blocks People felt the need to evolve the data 37
  38. 38. 38
  39. 39. Bidirectional Transformation System 39
  40. 40. (Data) Operations on Instances 40
  41. 41. (Model) Operations on ClassSheets 41
  42. 42. Bidirectional Transformation Functions 42
  43. 43. Compositional Example: Add a Column and a Class 43
  44. 44. VII. MDSheet – Model-Driven Spreadsheets [ICSE'12] 44
  45. 45. MDSheet Tool http://youtu.be/6LNdTdCpV2U 45
  46. 46. ● Available at http://ssaapp.di.uminho.pt ● Built out of 7886 LOC: – 3181 in Haskell, for the inference and evolution – 980 in Basic, for the embedding – 2884 in C++, for gluing all components – 340 in Perl, for compilation and setup – 722, for makefiles 46
  47. 47. VIII. Summary 47
  48. 48. Acknowledgments This work has been done in collaboration with many people: Martin Erwig, João Paulo Fernandes, Jorge Mendes, Hugo Pacheco, Rui Pereira, João Saraiva, Joost Visser 48
  49. 49. Thanks! Questions? 49
  50. 50. More? ● More at http://ssaapp.di.uminho.pt ● Querying model-driven spreadsheet ● Visually querying model-driven spreadsheets ● Detections of bad smells ● Edit assistance ● Empirical validations ● Variational spreadsheets (@ OSU) ● ... 50
  51. 51. Does It Work? 51
  52. 52. Empirical Study Settings ● 17 student from a MSc course ● 2 different spreadsheets – Microsoft budget – Local company responsible for water supply of Braga, Portugal - agere 52
  53. 53. Study Setting ● Hypotheses: (1) In order to perform a given set of tasks, users spend less time when using modeldriven spreadsheets instead of plain ones. (2) Spreadsheets developed in the modeldriven environment hold less errors than plain ones. 53
  54. 54. Main Results Number of tasks performed on the MS spreadsheet 54
  55. 55. Main Results Error rate in the budget spreadsheet 55

Spreadsheets play a pivotal role in modern society as they are inherently multi-purpose and widely used both by individuals to cope with simple needs as well as large companies as integrators of complex systems and as support for informing business decisions. Spreadsheets have probably passed the point of no return in terms of importance: it is estimated that 95% of all U.S. firms use them for financial reporting, 90% of all analysts in industry perform calculations in spreadsheets, and 50% of all spreadsheets are the basis for decisions. This importance, however, has not been achieved together with effective mechanisms for error prevention. To aid in error-prevention in this talk we present a methodology for spreadsheet engineering. First, we present data mining and database techniques to reason about spreadsheet data. These techniques are used to compute relationships between spreadsheet elements (cells/columns/rows). These relations are then used to infer a model defining the business logic of the spreadsheet. Such a model of a spreadsheet data is a visual domain specific language that we embed in a well-known spreadsheet system. The embedded model is the building block to define techniques for model-driven spreadsheet development, where advanced techniques are used to guarantee the model-instance synchronization. In this model-driven environment, any user data update as to follow the the model-instance conformance relation, thus, guiding spreadsheet users to introduce correct data. Bidirectional transformations are used to synchronize models and instances after users update/evolve the model or instance.

Views

Total views

3,249

On Slideshare

0

From embeds

0

Number of embeds

1,916

Actions

Downloads

12

Shares

0

Comments

0

Likes

0

×