Summer School DSL 2013 - SpreadSheet Engineering
Upcoming SlideShare
Loading in...5
×
 

Summer School DSL 2013 - SpreadSheet Engineering

on

  • 528 views

In this tutorial we present our work on spreadsheet engineering. We start by presenting a model-driven spreadsheet development environment (MDSDE), where a domain specific spreadsheet model is used to ...

In this tutorial we present our work on spreadsheet engineering. We start by presenting a model-driven spreadsheet development environment (MDSDE), where a domain specific spreadsheet model is used to guide end-users in introducing correct data. The business logic of spreadsheet data is modeled via domain specific ClassSheet models. End users can not only (traditionally) edit/update the spreadsheet data, but also to evolve the model and/or the data. Our MDSDE automatically guarantees model/instance synchronization after a model/instance evolution.

Statistics

Views

Total Views
528
Views on SlideShare
524
Embed Views
4

Actions

Likes
1
Downloads
12
Comments
0

1 Embed 4

https://twitter.com 4

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This generated spreadsheet guides users in introducing correct data The spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
  • Intended to be used by trained person Professional on Spreadsheet models/ClassSheets Not by instance/data end users
  • This generated spreadsheet guides users in introducing correct data The spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
  • Baseado em haskell Integrado no OO Clicanca-se em botoes Espetacular Basic

Summer School DSL 2013 - SpreadSheet Engineering Summer School DSL 2013 - SpreadSheet Engineering Presentation Transcript

  • Jácome Cunha1,2 , João P.Fernandes1,3 , João Saraiva1 1 HASLab / INESC TEC & Universidade do Minho 2 ESTGF, Instituto Politécnico do Porto 3 (rel)ease, Univ. da Beira Interior Portugal DSL 2013 Spreadsheet EngineeringSpreadsheet Engineering
  • 2 This TutorialThis Tutorial ●Domain Specific Languages ●Visual Modeling Domain Specific Languages ●Embedding Domain Specific Language ●Model-Driven Engineering ●Software Evolution ●Bidirectional Software Evolution ●Empirical Studies
  • 3 This TutorialThis Tutorial All defined in the context of Spreadsheets And, implemented in the Functional Programing Language Haskell!
  • 4 This Tutorial: PlanThis Tutorial: Plan Part I ● Motivation ● Spreadsheet Analysis Data Mining Techniques ● Models for Spreadsheets ClassSheets Embedded ClassSheet Models ● Model-driven Spreadsheets
  • 5 Image taken from http://www.flickr.com/photos/cosmosfan/2414002070/Image taken from http://www.flickr.com/photos/cosmosfan/2414002070/ Spreadsheets are widely usedSpreadsheets are widely usedSpreadsheets are widely usedSpreadsheets are widely used
  • 6 Why do Spreadsheets matter?Why do Spreadsheets matter? ●Probably the Biggest Programming Language! ●Probably the Biggest Functional Programming Language! ●Probably the Biggest Domain Specific Language! ●Probably the Biggest software system! ●Probably the biggest database system!
  • 7 Spreadsheets are great!Spreadsheets are great!
  • 8 Why do Spreadsheets matter?Why do Spreadsheets matter? Financial intelligence firm CODA reports that 95% of all U.S. Firms use spreadsheets for financial reporting Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008
  • 9 Why do Spreadsheets matter?Why do Spreadsheets matter? In 2004, RevenueRecognition.com (now Softtrax) had the International Data Corporation interview 118 business leaders. IDC found that 85% were using spreadsheets in financial reporting and forecasting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008
  • 10 Why do Spreadsheets matter?Why do Spreadsheets matter? 50% of all spreadsheets are the basis for decisions. Supporting professional spreadsheet users by generating leveled dataflow diagrams, Felienne Hermans, Martin Pinzger and Arie van Deursen, 2011
  • 11 Why do Spreadsheets matter?Why do Spreadsheets matter? They are the programming language of choice by non-professional programmers, a.k.a. end-users In the U.S. alone, the number of end- user programmers is conservatively estimated at 11 million, compared to only 2.75 million other, professional programmers Estimating the numbers of end users and end user programmers, Christopher Scaffidi, Mary Shaw, and Brad Myers, 2005
  • 12 Why do Spreadsheets matter?Why do Spreadsheets matter? Why are they so popular? Which characteristics make them so successful? First “empirical” study: fill in the inquiry!
  • 13 But, as a programming language...But, as a programming language... Exercise: Write a program that sums a list of integer values. sum :: [Int] -> Int sum [] = 0 sum (h:t) = h + sum t
  • 14 In fact, spreadsheets lack:In fact, spreadsheets lack: ● Abstraction ● Encapsulation ● Type system ● Testing ● IDE ● ...
  • 15 And the consequences may be...And the consequences may be... http://www.eusprig.org/stories.htm Economy losses of $10 billion/year!
  • 16 I. Spreadsheet AnalysisI. Spreadsheet Analysis
  • 17 Spreadsheet: An ExampleSpreadsheet: An Example A flight scheduling spreadsheet.
  • 18 Functional DependenciesFunctional Dependencies Informally, a functional dependency between a column A and another column B means that the values in column A determine the values in column B, that is, there are no two rows in the spreadsheet that have the same value in column A but differ in their values in column B.
  • 19 Functional DependenciesFunctional Dependencies
  • 20 Functional DependenciesFunctional Dependencies Any more Functional Dependencies?
  • 21 Functional DependenciesFunctional Dependencies Usually, too many!
  • 22 Functional DependenciesFunctional Dependencies ● We compute the business logic from the data, by inferring Fds. ● They are the building blocks inferring models for (legacy) spreadsheets. ● The better are the FDs we infer, the better is the model we compute!
  • 23 Functional DependenciesFunctional Dependencies ● We use a data mining algorithm that produces to many accidental Fds! ● We introduce some spreadsheet specific heuristics to filter out “accidental” FDs
  • 24 Functional DependenciesFunctional Dependencies ● Label semantics: often keys are labeled “code” or “id” ● Label arrangement: we prefer FDs respecting the order of columns ● Antecedent size: small keys are preferable ● Ratio: small ratio between keys and non- keys ● Single value columns: columns always with the same value appear in too many FDs
  • 25 Functional DependenciesFunctional Dependencies Functional dependencies after prunning:
  • 26 Relational ModelRelational Model ● Having computed the FDs, we can now use the FUN algorithm to produce a relational model for the spreadsheet:
  • 27 Relational ModelRelational Model ● Discovery-based Edit Assistance for Spreadsheets, Jácome Cunha, João Saraiva, and Joost Visser. In proceedings of 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2009). ● From Spreadsheets to Relational Databases and Back, Jácome Cunha, João Saraiva, and Joost Visser. In proceedings of the 2009 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM 2009).
  • 28 Spreadsheets with embedded Relational Model Spreadsheets with embedded Relational Model
  • 29 Spreadsheets with embedded Relational Model Spreadsheets with embedded Relational Model
  • 30 Spreadsheet AnalysisSpreadsheet Analysis ● Spreadsheet Querying: Rui Pereira (talk at the students workshop) ● Spreadhseet Smells: Pedro Martins: talk at the students workshop (not related to spreadsheets!).
  • 31 Spreadsheet QueryingSpreadsheet Querying
  • 32 Spreadsheet QueryingSpreadsheet Querying We need model-driven queries!
  • 33 Spreadsheet SmellsSpreadsheet Smells ● We have implemented a full catalog of smells for spreadsheets: empty cell, pattern finder, reference to empty cells, multiple operations, multiple references, conditional complexity, long calculation chain, duplicated formulas, innapropriate intimacy, etc
  • 34 Still...Still... Around 200 people who thought their only experience of the London 2012 Olympic Games would be minor heats of synchronised swimming have received an unexpected upgrade to the men’s 100m final following an embarrassing ticketing mistake. ... Locog said the error occurred in the summer, between the first and second round of ticket sales, when a member of staff made a single keystroke mistake and entered ‘20,000’ into a spreadsheet rather than the correct figure of 10,000 remaining tickets. The Telegraph, 04 January 2012
  • 35 II. Models inSpreadsheetII. Models inSpreadsheet
  • 36 The Contribution of ModelsThe Contribution of Models
  • 37 ClassSheets - Models in SpreadsheetsClassSheets - Models in Spreadsheets ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications, Gregor Engels, Martin Erwig, ASE'05 ● ClassSheets are a high-level, object- oriented formalism to specify the business logic of spreadsheets
  • 38 ClassSheets - Models in SpreadsheetsClassSheets - Models in Spreadsheets
  • 39 ClassSheets - Models in SpreadsheetsClassSheets - Models in Spreadsheets
  • 40 ClassSheets - Models in SpreadsheetsClassSheets - Models in Spreadsheets
  • 41 ClassSheets - Models in SpreadsheetsClassSheets - Models in Spreadsheets
  • 42 I. ClassSheet Model InferenceI. ClassSheet Model Inference
  • 43 I. ClassSheet Model InferenceI. ClassSheet Model Inference Automatically Inferring ClassSheet Models from Spreadsheets, Jácome Cunha, Martin Erwig, João Saraiva, VL/HCC'10 Data mining techniques Database normalization theory
  • 44 I. ClassSheet Model InferenceI. ClassSheet Model Inference
  • 45 Still...Still... Harvard University economists Carmen Reinhart and Kenneth Rogoff have acknowledged making a spreadsheet calculation mistake in a 2010 research paper, “Growth in a Time of Debt”, which has been widely cited to justify budget- cutting. Business Week, 18 April 2013 In a 2010 paper* Carmen Reinhart, now a professor at Harvard Kennedy School, and Kenneth Rogoff, an economist at Harvard University...argued that GDP growth slows to a snail’s pace once government-debt levels exceed 90% of GDP. The 90% figure quickly became ammunition in political arguments over austerity...This week a new piece of research poured fuel on the fire by calling the 90% finding into question.. The Economist, 17 April 2013
  • 46 II. Embedding ClassSheets in SpreadsheetsII. Embedding ClassSheets in Spreadsheets
  • 47 ● Erwig implemented ClassSheets as a standalone language. ● A new processor (for ClassSheets) was developed from scratch: Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets
  • 48 ● From a ClassSheet it produces an initial Spreadsheet with the model embedded to guide users intorducing correct data. Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets
  • 49 ● Embedding DSLs in general purpose programming languages is a recurring strategy – systems inherit all the power of the host language – implementation effort is much reduced ● We will present the embedding of the ClassSheet (DSL) model in traditional spreadsheet systems Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets
  • 50 Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets Embedding and Evolution of Spreadsheet Models in Spreadsheet Systems, Jácome Cunha, Jorge Mendes, João P. Fernandes, João Saraiva, VL/HCC '11 Powerful interactive interface Single environment for spreadsheet evolution Model-instance synchronization Syntactic restrictions
  • 51 Vertically Expandable TablesVertically Expandable Tables
  • 52 Horizontally Expandable TablesHorizontally Expandable Tables
  • 53 Relationship TablesRelationship Tables
  • 54 III.Model-driven SpreadsheetsIII.Model-driven Spreadsheets
  • 55 ClassSheet driven Spreadsheet Environment
  • 56 Model-driven SpreadsheetsModel-driven Spreadsheets ● Type-safe Evolution of Spreadsheets, Jácome Cunha, Joost Visser, Tiago Alves, João Saraiva, In proceedings of the Fundamental Approaches to Software Engineering - FASE 2011. ● MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jácome Cunha, João Paulo Fernandes, Jorge Mendes and João Saraiva.34th Internactional Conference on Software Engineering (ICSE 2012).
  • 57 MDSheet Tool Demo VideoMDSheet Tool Demo Video http://www.youtube.com/watch?feature=player_detailpage&v=6LNdTdCpV2U
  • 58 Still...Still... Perante os deputados, Souto Moura relembrou o dia em que o jornal 24 Horas divulgou a existência de uma listagem de chamadas de titulares de altos cargos de Estado, constantes de disquetes anexas ao processo Casa Pia . 'Foi um dia que nunca esquecerei, foi terrível, dramático', sustentou o agora juiz-conselheiro do Supremo Tribunal de Justiça. Nesse dia (13 de janeiro de 2006), diz ter chamado à PGR os procuradores João Guerra e as procuradoras adjuntas Paula Ferraz e Cristina Faleiro. Vistas as disquetes - com o programa Excel, que continha um filtro informático que ocultava parte da informação -, a conclusão foi que não haveria nada naquele suporte para além das listagens de Paulo Pedroso. 'Aqui só há números do dr. Paulo Pedroso' foi a conclusão do visionamento. tickets. Diário de Notícias, 10 February 2007
  • 59 This Tutorial: PlanThis Tutorial: Plan Part II ● Model Evolution ● Bidirectional Evolution ● Empirical Study
  • 60 Still...Still... http://www.eusprig.org/horror-stories.htm ● Title: Report identifies lack of spreadsheet controls, pressure to approve ● Source: http://files.shareholder.com/downloads/ONE/2261602328x0x628656/4c b574a0-0bf5-4728-9582-625e4519b5ab/Task_Force_Report.pdf ● Organization: JP Morgan ● Region: EU ● Release Date:18 January 2013 ● Risk: Lowering estimate of VaR in Basel II models ● Tags: Financial ● Spreadsheet Causes: Logic not reviewed, manual copy/paste operations
  • 61 I. Model EvolutionI. Model Evolution FASE 2011. Type-safe Evolution of Spreadsheets, Jácome Cunha, Joost Visser, Tiago Alves, João Saraiva VL/HCC 2011. Embedding and Evolution of of Spreadsheet Models in Spreadsheet Systems, Jácome Cunha, João P. Fernandes, Jorge Mendes, João Saraiva
  • 62
  • 63 ● Suppose now you need to add new information to the spreadsheet ● For instance, the number of passengers of each flight ● It would require to do several error-prone tasks ● Add columns, labels, update formulas, etc. ● We can do it automatically! Why do Spreadsheets Need Evolution?Why do Spreadsheets Need Evolution?
  • 64 Data RefinementsData Refinements
  • 65 Evolution StepsEvolution Steps ● Combinators: defined as helper steps ● Semantic: steps that add information to the model ● Layout: steps that do not add information to the model, just change its arrangement
  • 66 Combinator StepsCombinator Steps ● Pull Up All References – All references must be at the top level – For instance A×Bφ becomes (A×B)φ ● Apply After and friends
  • 67 Semantic StepsSemantic Steps ● Insert a column ● Insert a block ● Make it expandable ● Split
  • 68 Layout StepsLayout Steps ● Change orientation ● Normalize blocks ● Shift ● Move blocks
  • 69 Haskell ImplementationHaskell Implementation expandBlock :: String → Rule expandBlock str (label : clas) | compLabel label str = do let rep = Rep {to = id × tolist, from = id × head } return (View rep (label : (clas)↓))
  • 70 TOOL DEMOTOOL DEMO
  • 71 II. Bidirectional EvolutionII. Bidirectional Evolution ICMT 2011. Bidirectional Transformation of Model-Driven Spreadsheets, Jácome Cunha, João P. Fernandes, Jorge Mendes, Hugo Pacheco, and João Saraiva ICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jácome Cunha, João P. Fernandes, Jorge Mendes, João Saraiva
  • 72 ● Some evolution steps are easier to perform on the instance ● For instance, to add a column to one of the repetition blocks ● People felt the need to evolve the data Why do Spreadsheets Need Evolution, Again?Why do Spreadsheets Need Evolution, Again?
  • 73
  • 74 Bidirectional Transformation SystemBidirectional Transformation System
  • 75 Example of a Transformation we Want: Add a New Column Example of a Transformation we Want: Add a New Column
  • 76 (Data) Operations on Instances(Data) Operations on Instances
  • 77 (Model) Operations on ClassSheets(Model) Operations on ClassSheets
  • 78 Bidirectional Transformation FunctionsBidirectional Transformation Functions
  • 79 Composable Example: Add a Column and a Class Composable Example: Add a Column and a Class
  • 80 TOOL DEMOTOOL DEMO
  • 81 ● Available at http://ssaapp.di.uminho.pt ● Built out of 7886 LOC: – 3179 in Haskell, for the evolution and inference – 980 in Basic, for the embedding – 2665 in C++, for gluing all components – 340 in Perl, for compilation and setup – 722, for makefiles MDSheetMDSheet ICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jácome Cunha, João P. Fernandes, Jorge Mendes, João Saraiva
  • 82 III. Empirical StudiesIII. Empirical Studies
  • 83 ● 17 student from a MSc course ● 2 different spreadsheets – Microsoft budget – Local company responsible for water supply of Braga, Portugal - agere Study SettingsStudy Settings
  • 84 ● Hypotheses: (1) In order to perform a given set of tasks, users spend less time when using model- driven spreadsheets instead of plain ones. (2) Spreadsheets developed in the model- driven environment hold less errors than plain ones. Study SettingStudy Setting
  • 85 Main ResultsMain Results Number of tasks performed on the MS spreadsheet
  • 86 Main ResultsMain Results Error rate in the budget spreadsheet
  • 87 Empirical Study with YOUEmpirical Study with YOU
  • 88 Inquiry Results!Inquiry Results!
  • 89 Inquiry Results!Inquiry Results!
  • 90 Inquiry Results!Inquiry Results!
  • 91 Inquiry Results!Inquiry Results!
  • 92 Inquiry Results!Inquiry Results!
  • 93 IV. ConclusionIV. Conclusion
  • 94 MOST IMPORTANT ANOUNCEMENT OF THIS TUTORIAL ... MOST IMPORTANT ANOUNCEMENT OF THIS TUTORIAL ...
  • 95 AcknowledgmentsAcknowledgments This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124- FEDER-010048.