1) The document discusses refactoring techniques for ClassSheets, which are spreadsheet models used to control the structure of spreadsheet instances.
2) It presents five refactoring operations for ClassSheets, including moving formulas and attributes, extracting classes, and inlining classes.
3) An initial analysis found that applying the refactorings to models and their instances reduced smells and improved readability, with instances seeing a 14-30% reduction in cells.
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
Refactoring Model-Driven Spreadsheet Evolution
1. Refactoring meets Model-Driven
Spreadsheet Evolution
Rui Pereira, Jácome Cunha, João Paulo Fernandes,
Pedro Martins and João Saraiva
QUATIC 2014
Guimarães, Portugal
2. Spreadsheets
1
› Programming language and environment of choice for many
people/companies
› 50% of all spreadsheets are the basis of decisions
› 85% of companies use them for financial reporting
› 11 million end-users in the USA
3. Spreadsheets
2
› Very error prone
› Research papers on spreadsheet problems
› Websites on spreadsheet problems
› Google: “spreadsheet errors”
4. Spreadsheet Errors
3
› Only recently has this been researched (~10 years!!)
› Various techniques were developed
› One of them is ClassSheets
7. ClassSheets
6
› Everything is according to the model
› Control of every spreadsheet
› You only have to do it right once
› What if you don’t do it right once?
8. ClassSheets Quality
7
› ClassSheets still suffer from traditional problems
› Readability
› Maintainability
› Extensibility
› A ClassSheet may be hard to understand
› May have complicated design
› New rules can force changes to spreadsheet structure
› Refactorings!
10. MDSheet
9
data ModelOperation : Model Model =
-- add a new column
addColumnM Where Index
-- delete a column
| delColumnM Index
-- add a new row
| addRowM Where Index
-- delete a row
| delRowM Index
-- set a label
| setLabelM (Index, Index) Label
-- set a formula
| setFormulaM (Index, Index) Formula
-- replicate a class
| replicateM ClassName Direction Int Int
-- add a static class
| addClassM ClassName (Index, Index)
-- add an expandable class
| addClassExpM ClassName Direction (Index, Index) (Index, Index)
11. MDSheet
10
data ModelRefactoring: Spreadsheet [ModelOperation] =
-- add a formula and shift cells
AddShiftForm ClassName Value Index Label Index
-- add an attribute and shift cells
| AddShiftAtt ClassName Value Index Label Index
-- delete a cell and shift cells
| DeleteShift ClassName Value Label
-- add a reference and shift cells
| AddShiftRef ClassName ClassName Index
-- delete a reference and shift cells
| DeleteShiftRef ClassName ClassName
-- create a new class
| CreateClass ClassName Direction Index
-- delete a class and shift cells
| DeleteClassShift ClassName
12. Refactorings
11
1. Move Formula
2. Move Attribute
3. Extract Class
4. Inline Class
5. Remove Middle-Man
14. Refactorings: Move Formula
13
› When/Why
› Feature Envy
› Semantically makes more sense elsewhere
…
› Refactoring
…
15. Refactorings: Move Attribute
14
› When/Why
› Visually enhance readability
› Information evolution
› Incorrect normalization in a relational class
…
…
› Refactoring
16. Refactorings: Extract Class
15
› When/Why
› Complicated and hard to understand models
› Neglected subset of information
› Refactoring
17. Refactorings: Inline Class
16
› When/Why
› Insufficient justification of the existence of a class
› Not pulling its own weight
› Often consulted information
› Refactoring
18. Refactorings: Remove Middle-Man
17
› When/Why
› Delegator class with little responsibility or purpose
› Useless class which only complicated structure
› Refactoring
21. Quality of Refactored Models: Quick Analysis
Model
20
› Removed one class
› Organized data to be semantically correct
› Readability
› Joining attributes closer to their formulas
› Placed often used attributes in easier to access areas
Instance
› 14 less data cells (15% reduction)
› One more Client 22 less cells (17% reduction)
› Two more Clients 30 less cells (18% reduction)
› Larger our instances, more impactful are the refactorings
› Spreadsheet bad smell detector also showed a reduction in smells
22. Conclusions and Future Work
21
› Presented a set of refactorings for ClassSheets
› Implemented in a tool
› First analysis shows model quality improvement
› Further validation
› Quality assessment metrics
› Empirical studies with professionals who use models daily
› More info?
› http://ssaapp.di.uminho.pt
› Read the paper
› Ask me!
23. Refactoring meets Model-Driven
Spreadsheet Evolution
Rui Pereira, Jácome Cunha, João Paulo Fernandes,
Pedro Martins and João Saraiva
QUATIC 2014
Guimarães, Portugal
Editor's Notes
As most of us know, SS are very error prone. A statement also supported by much research. A simple google search of spreadsheet errors will show you many examples, and there is even a site dedicated to spreadsheet error problems. Spreadsheethorrorstories.
As you can see, this is a big problem, something that brings about huge implications in industry
Cs are the joining of models and spreadsheets. They are a high level formalism based on object oriented modeling languages, such as UML.
After creating our CS model, we can have various instances, all conforming to the layout and structure of our classsheet model
On top of ClassSheets, we have created MDSheet, a model-driven spreadsheet framework that provides a bidirectional CS ecosystem.
The techniques and language we developed allow transformations from models to be automatically applied to the instance and vice-versa
So any change on the model would update the instance, and any change on the instance would update the model
For the transformations, we defined the following grammar to represent the functions operating on each one. These operations on the models will reflect themselves as updates on the instance.
To express and help with our refactorings, we defined a set of auxiliary functions, ModelRefactoring. These functions return an ordered list of the operations (model evolution steps) from our previous operations.
All of our refactoring functions return the joining of the ordered lists from the output of our auxilirary functions and is used by MDSheet to evolve the ClassSheet models and instances to their refactored version.
Either refactorings straightforwardly derived from Fowler’s set or inspired by Fowlers work
One reason would be if it suffers from the feature envy phenomenon, where a formula is more interested in and used by attributes of another class than the class on which they are defined
Or something simple such as if it makes more semantical sense elsewhere. For example
Due to evolution, where we once had a class with a clear purpose, it is now doing the work of two
Since readability is important in a SS is important, the moment we have a subset of information often times neglected, it is a good idea to place it aside.
Lets imagine the the users of our spreadsheet do not tend to use or look up the address, city, and country of a client. As these are a subset of the client class, and make reading the spreadsheet too cluttered, we shall extract it into a new class placed off to the side
Not enough reason to keep a class around, simply not doing much. Or even often consulted information.
For example lets say our users look up the client’s contact information quite frequently. Instead of having to hop around a spreadsheet looking at references, it might be a better idea to join this with the main client class.
A middle man is a delegator class with little responsibility or purpose. Often times useless and only complicates
Here we see an occurrence with the seller class and decide to remove it. Any attributes which were in the sller class not ids or references are also moved, and we now reference sellInf from the order class
15% reduction to do elimination of redudant cells
This reduction increases proportionally in relation to the data in the instance
Providing better models which are easier to understand and reason about
Ensuring automated application of model refactorings
While already having shown that the refactorings improve the quality of the models, we want to further validate this