Your SlideShare is downloading. ×
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14


Published on

Spreadsheets play a pivotal role in modern society as they are inherently multi-purpose and widely used both by individuals to cope with simple needs as well as large companies as integrators of …

Spreadsheets play a pivotal role in modern society as they are inherently multi-purpose and widely used both by individuals to cope with simple needs as well as large companies as integrators of complex systems and as support for informing business decisions. Spreadsheets have probably passed the point of no return in terms of importance: it is estimated that 95% of all U.S. firms use them for financial reporting, 90% of all analysts in industry perform calculations in spreadsheets, and 50% of all spreadsheets are the basis for decisions. This importance, however, has not been achieved together with effective mechanisms for error prevention.

To aid in error-prevention in this talk we present a methodology for spreadsheet engineering. First, we present data mining and database techniques to reason about spreadsheet data. These techniques are used to compute relationships between spreadsheet elements (cells/columns/rows). These relations are then used to infer a model defining the business logic of the spreadsheet. Such a model of a spreadsheet data is a visual domain specific language that we embed in a well-known spreadsheet system. The embedded model is the building block to define techniques for model-driven spreadsheet development, where advanced techniques are used to guarantee the model-instance synchronization. In this model-driven environment, any user data update as to follow the the model-instance conformance relation, thus, guiding spreadsheet users to introduce correct data. Bidirectional transformations are used to synchronize models and instances after users update/evolve the model or instance.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Spreadsheet are widely used
    By everyone
    By many companies
  • They are grate
    Lots of features
  • Intended to be used by trained person
    Professional on Spreadsheet models/ClassSheets
    Not by instance/data end users
  • This generated spreadsheet guides users in introducing correct data
    The spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
  • Baseado em haskell
    Integrado no OO
    Clicanca-se em botoes
  • Transcript

    • 1. Spreadsheet Engineering Jácome Cunha Researcher @ HASLab / INESC TEC & Universidade do Minho, Portugal Oregon State University (office 2077) OSU - EECS Colloquium - 02/24/14
    • 2. Agenda I. Motivation II. Spreadsheets Meet Models III. Models for Spreadsheets – ClassSheets IV. Inferring ClassSheets V. Embedding ClassSheets VI. Evolution! VII. Model-Driven Spreadsheets VIII. Summary 2
    • 3. I. Motivation 3
    • 4. 4
    • 5. Why do Spreadsheets matter? Financial intelligence firm CODA reports that 95% of all U.S. firms use spreadsheets for financial reporting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 5
    • 6. Why do Spreadsheets matter? They are the programming language of choice by nonprofessional programmers, a.k.a. end users. In the U.S. alone, the number of end-user programmers is conservatively estimated at 11 million, compared to only 2.75 million other, professional programmers. Estimating the numbers of end users and end-user programmers, Christopher Scaffidi, Mary Shaw, and Brad Myers, VL/HCC 2005 6
    • 7. 7
    • 8. Why do Spreadsheets matter? In 2004, (now Softtrax) had the International Data Corporation interview 118 business leaders. IDC found that 85% were using spreadsheets in financial reporting and forecasting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 8
    • 9. In fact, spreadsheets lack: ● Abstraction ● ● Encapsulation Type system ● Testing ● ● IDE ... 9
    • 10. And the consequences may be... Around 200 people who thought their only experience of the London 2012 Olympic Games would be minor heats of synchronised swimming have received an unexpected upgrade to the men’s 100m final following an embarrassing ticketing mistake. ... Locog said the error occurred in the summer, between the first and second round of ticket sales, when a member of staff made a single keystroke mistake and entered ‘20,000’ into a spreadsheet rather than the correct figure of 10,000 remaining tickets. The Telegraph, 04 January 2012 10
    • 11. And the consequences may be... In a 2010 paper* Carmen Reinhart, now a professor at Harvard Kennedy School, and Kenneth Rogoff, an economist at Harvard University...argued that GDP growth slows to a snail’s pace once government-debt levels exceed 90% of GDP. The 90% figure quickly became ammunition in political arguments over austerity...This week a new piece of research poured fuel on the fire by calling the 90% finding into question.. Economy losses of $10 billion/year! The Economist, 17 April 2013 Harvard University economists Carmen Reinhart and Kenneth Rogoff have acknowledged making a spreadsheet calculation mistake in a 2010 research paper, “Growth in a Time of Debt”, which has been widely cited to justify budgetcutting. Business Week, 18 April 2013 11
    • 12. II. Spreadsheets Meet Models [PEPM'09, VL/HCC'10] 12
    • 13. Why Models? 13
    • 14. Spreadsheet Example 14
    • 15. Functional Dependency? → ↛ 15
    • 16. Functional Dependencies ● ● ● We compute the business logic from the data, by inferring FDs They are the building blocks inferring models for (legacy) spreadsheets The better the FDs we infer, the better the model we compute! 16
    • 17. Too Many?? ["A"] -> ["B","C","D","E","F"] ["C"] -> ["A","B","D","E","F"] ["D"] -> ["A","B","C","E","F"] ["E"] -> ["A","B","C","D","F"] ["F"] -> ["A","B","C","D","E"] ["G"] -> ["H","I","J"] ["H"] -> ["G","I","J"] ["I"] -> ["G","H","J"] ["J"] -> ["G","H","I"] ["K"] -> ["L","M"] ["L"] -> ["K","M"] ["M"] -> ["K","L"] ["B","K"] -> ["A","C","D","E","F"] ["B","L"] -> ["A","C","D","E","F"] ["B","M"] -> ["A","C","D","E","F"] 17
    • 18. Accidents happen ● ● We use a data mining algorithm which produces to many accidental FDs! We introduce some spreadsheet specific heuristics to filter out “accidental” FDs 18
    • 19. Organize them ● ● Label semantics: often keys are labeled “code” or “id” Label arrangement: we prefer FDs respecting the order of columns ● Antecedent size: small keys are preferable ● Ratio: small ratio between keys and non-keys ● Single value columns: columns always with the same value appear in too many FDs 19
    • 20. Final set Pilot-Id → Pilot-Name, Phone N-Number → Model, Plane-Name Pilot-Id, N-Number, Depart, Destination, Date, Hours → {} 20
    • 21. The first model: a relational model Having computed the FDs, we can now use the FUN algorithm to produce a relational model for the spreadsheet: Pilots (Pilot-Id, Pilot-Name, Phone) Planes (N-Number, Model, Plane-Name) <Flights> (#Pilot-Id,# N-Number, Depart, Destination, Date Hours) 21
    • 22. III. Models for Spreadsheet – ClassSheets Engels and Erwig ASE'05 22
    • 23. ClassSheets - Models for Spreadsheets ClassSheets are a high-level, object-oriented formalism to specify spreadsheets 23
    • 24. ClassSheets - Models for Spreadsheets 24
    • 25. ClassSheets - Models for Spreadsheets 25
    • 26. IV. Inferring ClassSheets [VL/HCC'10] 26
    • 27. ClassSheet Inference 27
    • 28. V. Embedding ClassSheets [VL/HCC'11] 28
    • 29. Why the Embedding? 29
    • 30. Embedding... ● Embedding a language into another language is a recurring strategy (e.g. for DSLs) – – Users are used to the host language and do not need to learn a (complete) new language :-) – Implementation effort is much reduced :-) – ● Embedded language inherit all the power of the host language :-) It may have some restrictions :-( We embedded ClassSheets in traditional spreadsheet systems 30
    • 31. Vertically Expandable Tables 31
    • 32. Horizontally Expandable Tables 32
    • 33. Relationship Tables 33
    • 34. 34
    • 35. VI. Evolution! [FASE'11, ICMT'12] 35
    • 36. Why do Spreadsheet Models Need Evolution? ● ● Suppose now you need to add new information to the spreadsheet For instance, the number of passengers of each flight ● It would require to do several error-prone tasks ● Add columns, labels, update formulas, etc. ● We can do it automatically! 36
    • 37. Why do Spreadsheet Instances Need Evolution? ● ● ● Some evolution steps are easier to perform on the instance For instance, to add a column to one of the repetition blocks People felt the need to evolve the data 37
    • 38. 38
    • 39. Bidirectional Transformation System 39
    • 40. (Data) Operations on Instances 40
    • 41. (Model) Operations on ClassSheets 41
    • 42. Bidirectional Transformation Functions 42
    • 43. Compositional Example: Add a Column and a Class 43
    • 44. VII. MDSheet – Model-Driven Spreadsheets [ICSE'12] 44
    • 45. MDSheet Tool 45
    • 46. ● Available at ● Built out of 7886 LOC: – 3181 in Haskell, for the inference and evolution – 980 in Basic, for the embedding – 2884 in C++, for gluing all components – 340 in Perl, for compilation and setup – 722, for makefiles 46
    • 47. VIII. Summary 47
    • 48. Acknowledgments This work has been done in collaboration with many people: Martin Erwig, João Paulo Fernandes, Jorge Mendes, Hugo Pacheco, Rui Pereira, João Saraiva, Joost Visser 48
    • 49. Thanks! Questions? 49
    • 50. More? ● More at ● Querying model-driven spreadsheet ● Visually querying model-driven spreadsheets ● Detections of bad smells ● Edit assistance ● Empirical validations ● Variational spreadsheets (@ OSU) ● ... 50
    • 51. Does It Work? 51
    • 52. Empirical Study Settings ● 17 student from a MSc course ● 2 different spreadsheets – Microsoft budget – Local company responsible for water supply of Braga, Portugal - agere 52
    • 53. Study Setting ● Hypotheses: (1) In order to perform a given set of tasks, users spend less time when using modeldriven spreadsheets instead of plain ones. (2) Spreadsheets developed in the modeldriven environment hold less errors than plain ones. 53
    • 54. Main Results Number of tasks performed on the MS spreadsheet 54
    • 55. Main Results Error rate in the budget spreadsheet 55