SlideShare a Scribd company logo
1 of 20
Download to read offline
Measuring Maintainability of Spreadsheets in the Wild
José Pedro Correia & Miguel Alexandre Ferreira

                                                 September 2011   T +31 20 314 0950
                                                                  info@sig.eu
                                                                  www.sig.eu
Introduction



                                                                                                                          2 I 20




Spreadsheets
     •  are widely used in all kinds of organizations
     •  contain important business logic
     •  are maintained by different people



Do all spreadsheets matter?
     •  throwaway calculations don’t
     •  some data intensive spreadsheets might
     •  “spreadsheet programs” matter the most



Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Pragmatic criteria for “spreadsheet programs”



                                                                                                                          3 I 20




have formulas



the formulas have references


Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Definitions



                                                                                                                          4 I 20




      •  spreadsheet = { sheet1, sheet2, …, sheeti }




                        cell11 cell12               …         cell1j
                        cell21 cell22               …         cell2j
      •  sheet =
                          …


                                      …




                                                               …




                         celli1      celli2         …         cellij




Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Definitions



                                                                                                                                        5 I 20


      •  cell types
             •  blank: no content
             •  data: non-blank and does not contain a formula
             •  proxy: contains a formula that is a direct, single reference (e.g. =A1)
             •  calculation: contains a formula and is not a proxy


      •  cell roles                      blank                      data                       proxy                      calculation
       not referenced                    no role                    label1                     data sink1                 calc. sink
       referenced                        open input                 data source1               data move                  calc. step

       1[Hodnigg & Mittermeir – Metric-based spreadsheet visualization: Support for focused maintenance – EuSpRIG’08]

Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Definitions



                                                                                                                          6 I 20




                                                                                                             =R1 + 2
      •  formula copy equivalents1                                                                           =R2 + 2
                                                                                                             =R3 + 2
                                                                                                             =R4 + 2


      •  unique formula                                                                                      =X + 2



                      1[Mittermeir & Clermont – Finding high-level structures in spreadsheet programs – WCRE’02]

Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Research questions



                                                                                                                          7 I 20




1.  Which metrics can we use to assess spreadsheet maintainability?




2.  What are the typical values for the selected metrics?




Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Study outline



                                                                                                                          8 I 20




Metric selection



Measuring the EUSES Spreadsheet Corpus



Analysis of results

Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Goal Question Metric (mockup)



                                                                                                                                                9 I 20




             Main goal                                                                      Maintainability


              Sub goals                               Analyzability                Changeability                   Stability         Testability


                                                    How large is the                 Are there                   How much           How complex is
             Questions                               spreadsheet?                 inconsistencies?            coupling is there?   the spreadsheet?



         Sub questions                               …              …                      …                              …               …



                Metrics                              …              …              …              …                       …               ...




Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Example metrics



                                                                                                                          10 I 20


Spreadsheet level
     •  # used rows / columns
     •  # formulas / unique formulas


Sheet level
     •  # data fan-in / fan-out
     •  # data move / sink cells


Formula level
     •  McCabe complexity
     •  # operators

Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
The EUSES Spreadsheet Corpus1



                                                                                                                                 11 I 20

                                                                  Spreadsheets
 5000
 4000
 3000
                                                                                                                          1609
 2000
 1000
         0
                        Total                     Internet                      Contain                   Contain    > 25 unique
                                                   search                      formulas                formulas with  formulas
                                                                                                         references
 1[Fisher II & Rothermel – The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet
                                                        dependability mechanisms – WEUSE’05]
Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Power-law like distributions



                                                                                                                           12 I 20


                                                   1000
                                                   800
                                                   600
                                       Frequency

                                                   400
                                                   200
                                                   0




                                                          0            5000                 10000                  15000

                                                                 NON_BLANK_CELLS (up to 99 quantile)



                  most metrics follow a power-law like distribution


Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Extremely skewed distributions



                                                                                                                                            13 I 20


         Attribute                     Object              Min             Q1           Med.               Q3                 95%    99%   Max
 # data move cells                    Sheet            0               0               0               0                  12        80     672

 # data sink cells                    Sheet            0               0               0               0                  8         80     1188

 # data fan-out                       Sheet            0               0               0               0                  52        1256   964366

 # data fan-in                        Sheet            0               0               0               0                  76        1814   964366



                                     at least 75% of sheets have no proxy cells



               at least 75% of sheets are not referenced / have no references
Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Sparse distributions



                                                                                                                                           14 I 20




        Attribute                    Object               Min             Q1           Med.               Q3              95%       99%   Max
 McCabe complexity                   Formula          1               1               1               1               1         5         34

 # unidentified values               Formula          0               0               0               1               3         9         51




                      conditionals and magic values are uncommon



Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Documentation within the spreadsheet



                                                                                                                                        15 I 20




        Attribute                   Object               Min            Q1            Med.               Q3               95%    99%    Max
 % label cells                      Sheet            0               35              64              100              100       100    100




  at least 25% of sheets are purely for documentation purposes




Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Spreadsheet layout



                                                                                                                                        16 I 20




     Attribute                   Object                  Min            Q1            Med.               Q3               95%    99%    Max
 # used columns               Spreadsheet            2               10              19              41               122       228    738

 # used rows                  Spreadsheet            2               47              99              210              686       1629   40518




                            most common layout seems to be vertical


Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
“Dragging” formulas



                                                                                                                                         17 I 20




      Attribute                     Object                 Min             Q1           Med.              Q3               95%    99%   Max
 # formula cells                 Spreadsheet           1               20              75              239                1153   4052   24523

 # unique formulas               Spreadsheet           1               3               8               26                 103    252    961




                                         seems to be a common practice



Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Answering the research questions



                                                                                                                          18 I 20




1.  Which metrics can we use to assess spreadsheet maintainability?
      •      levels: sheets, rows/columns, cells, formulas



2.  What are the typical values for the selected metrics?
      •      most distributions resemble software metrics distributions
      •      some distributions are extremely skewed/sparse
      •      expect label only sheets/more rows than columns/copy equivalent formulas



Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
Roadmap



                                                                                                                          19 I 20




1.  Study how and what to measure in spreadsheets

2.  Select a minimal set of metrics and build a quality model

3.  Gather a representative set of measurements for calibration

4.  Calibrate the thresholds in the model

5.  Validate the model


Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
The end…



                                                                                                                          20 I 20

                                                  Thank you for your attention!




                                                         Q&A
                                                  Complete data set and technical report at:
                                                 http://www.sig.eu/en/spreadsheet-quality

                                                                    Miguel Ferreira
                                                              Software Improvement Group
                                                                   m.ferreira@sig.eu
Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011

More Related Content

Viewers also liked

Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11ICSM 2011
 
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...ICSM 2011
 
Faults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussionsFaults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussionsICSM 2011
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsICSM 2011
 
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ICSM 2011
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...ICSM 2011
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects ICSM 2011
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...ICSM 2011
 
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...ICSM 2011
 
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java InterfacesMetrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java InterfacesICSM 2011
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...ICSM 2011
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...ICSM 2011
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical DebtICSM 2011
 
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...ICSM 2011
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationICSM 2011
 
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...ICSM 2011
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchICSM 2011
 
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...ICSM 2011
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...ICSM 2011
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...ICSM 2011
 

Viewers also liked (20)

Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11
 
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
ERA - Measuring Disruption from Software Evolution Activities Using Graph-Bas...
 
Faults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussionsFaults and Regression Testing - Fault interaction and its repercussions
Faults and Regression Testing - Fault interaction and its repercussions
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming Conventions
 
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
 
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
Traceability - Structural Conformance Checking with Design Tests: An Evaluati...
 
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java InterfacesMetrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical Debt
 
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
 
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
 
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
 
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
Tutorial 2 - Practical Combinatorial (t-way) Methods for Detecting Complex Fa...
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
 

Similar to ERA - Measuring Maintainability of Spreadsheets in the Wild

LAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkLAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkHendrik Drachsler
 
LAS - System Biology Lesson
LAS - System Biology LessonLAS - System Biology Lesson
LAS - System Biology LessonLASircc
 
A MineKnowledge Case Study: Analyzing Earthquakes
A MineKnowledge Case Study: Analyzing EarthquakesA MineKnowledge Case Study: Analyzing Earthquakes
A MineKnowledge Case Study: Analyzing Earthquakesmineknowledge
 
Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...
Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...
Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...Anu Suominen
 
Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010ERwin Modeling
 
A Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationA Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationArie van Deursen
 
Media4Math's Spring 2012 Catalog
Media4Math's Spring 2012 CatalogMedia4Math's Spring 2012 Catalog
Media4Math's Spring 2012 CatalogMedia4math
 
Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringIJORCS
 

Similar to ERA - Measuring Maintainability of Spreadsheets in the Wild (9)

Feb16 2pm math8th
Feb16 2pm math8thFeb16 2pm math8th
Feb16 2pm math8th
 
LAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkLAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_framework
 
LAS - System Biology Lesson
LAS - System Biology LessonLAS - System Biology Lesson
LAS - System Biology Lesson
 
A MineKnowledge Case Study: Analyzing Earthquakes
A MineKnowledge Case Study: Analyzing EarthquakesA MineKnowledge Case Study: Analyzing Earthquakes
A MineKnowledge Case Study: Analyzing Earthquakes
 
Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...
Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...
Using Friedman Test For Creating Comparable Group Results Of Non Parametric I...
 
Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010
 
A Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationA Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software Visualization
 
Media4Math's Spring 2012 Catalog
Media4Math's Spring 2012 CatalogMedia4Math's Spring 2012 Catalog
Media4Math's Spring 2012 Catalog
 
Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using Clustering
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

ERA - Measuring Maintainability of Spreadsheets in the Wild

  • 1. Measuring Maintainability of Spreadsheets in the Wild José Pedro Correia & Miguel Alexandre Ferreira September 2011 T +31 20 314 0950 info@sig.eu www.sig.eu
  • 2. Introduction 2 I 20 Spreadsheets •  are widely used in all kinds of organizations •  contain important business logic •  are maintained by different people Do all spreadsheets matter? •  throwaway calculations don’t •  some data intensive spreadsheets might •  “spreadsheet programs” matter the most Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 3. Pragmatic criteria for “spreadsheet programs” 3 I 20 have formulas the formulas have references Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 4. Definitions 4 I 20 •  spreadsheet = { sheet1, sheet2, …, sheeti } cell11 cell12 … cell1j cell21 cell22 … cell2j •  sheet = … … … celli1 celli2 … cellij Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 5. Definitions 5 I 20 •  cell types •  blank: no content •  data: non-blank and does not contain a formula •  proxy: contains a formula that is a direct, single reference (e.g. =A1) •  calculation: contains a formula and is not a proxy •  cell roles blank data proxy calculation not referenced no role label1 data sink1 calc. sink referenced open input data source1 data move calc. step 1[Hodnigg & Mittermeir – Metric-based spreadsheet visualization: Support for focused maintenance – EuSpRIG’08] Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 6. Definitions 6 I 20 =R1 + 2 •  formula copy equivalents1 =R2 + 2 =R3 + 2 =R4 + 2 •  unique formula =X + 2 1[Mittermeir & Clermont – Finding high-level structures in spreadsheet programs – WCRE’02] Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 7. Research questions 7 I 20 1.  Which metrics can we use to assess spreadsheet maintainability? 2.  What are the typical values for the selected metrics? Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 8. Study outline 8 I 20 Metric selection Measuring the EUSES Spreadsheet Corpus Analysis of results Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 9. Goal Question Metric (mockup) 9 I 20 Main goal Maintainability Sub goals Analyzability Changeability Stability Testability How large is the Are there How much How complex is Questions spreadsheet? inconsistencies? coupling is there? the spreadsheet? Sub questions … … … … … Metrics … … … … … ... Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 10. Example metrics 10 I 20 Spreadsheet level •  # used rows / columns •  # formulas / unique formulas Sheet level •  # data fan-in / fan-out •  # data move / sink cells Formula level •  McCabe complexity •  # operators Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 11. The EUSES Spreadsheet Corpus1 11 I 20 Spreadsheets 5000 4000 3000 1609 2000 1000 0 Total Internet Contain Contain > 25 unique search formulas formulas with formulas references 1[Fisher II & Rothermel – The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms – WEUSE’05] Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 12. Power-law like distributions 12 I 20 1000 800 600 Frequency 400 200 0 0 5000 10000 15000 NON_BLANK_CELLS (up to 99 quantile) most metrics follow a power-law like distribution Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 13. Extremely skewed distributions 13 I 20 Attribute Object Min Q1 Med. Q3 95% 99% Max # data move cells Sheet 0 0 0 0 12 80 672 # data sink cells Sheet 0 0 0 0 8 80 1188 # data fan-out Sheet 0 0 0 0 52 1256 964366 # data fan-in Sheet 0 0 0 0 76 1814 964366 at least 75% of sheets have no proxy cells at least 75% of sheets are not referenced / have no references Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 14. Sparse distributions 14 I 20 Attribute Object Min Q1 Med. Q3 95% 99% Max McCabe complexity Formula 1 1 1 1 1 5 34 # unidentified values Formula 0 0 0 1 3 9 51 conditionals and magic values are uncommon Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 15. Documentation within the spreadsheet 15 I 20 Attribute Object Min Q1 Med. Q3 95% 99% Max % label cells Sheet 0 35 64 100 100 100 100 at least 25% of sheets are purely for documentation purposes Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 16. Spreadsheet layout 16 I 20 Attribute Object Min Q1 Med. Q3 95% 99% Max # used columns Spreadsheet 2 10 19 41 122 228 738 # used rows Spreadsheet 2 47 99 210 686 1629 40518 most common layout seems to be vertical Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 17. “Dragging” formulas 17 I 20 Attribute Object Min Q1 Med. Q3 95% 99% Max # formula cells Spreadsheet 1 20 75 239 1153 4052 24523 # unique formulas Spreadsheet 1 3 8 26 103 252 961 seems to be a common practice Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 18. Answering the research questions 18 I 20 1.  Which metrics can we use to assess spreadsheet maintainability? •  levels: sheets, rows/columns, cells, formulas 2.  What are the typical values for the selected metrics? •  most distributions resemble software metrics distributions •  some distributions are extremely skewed/sparse •  expect label only sheets/more rows than columns/copy equivalent formulas Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 19. Roadmap 19 I 20 1.  Study how and what to measure in spreadsheets 2.  Select a minimal set of metrics and build a quality model 3.  Gather a representative set of measurements for calibration 4.  Calibrate the thresholds in the model 5.  Validate the model Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011
  • 20. The end… 20 I 20 Thank you for your attention! Q&A Complete data set and technical report at: http://www.sig.eu/en/spreadsheet-quality Miguel Ferreira Software Improvement Group m.ferreira@sig.eu Measuring Maintainability of Spreadsheets in the Wild – José Pedro Correia & Miguel Alexandre Ferreira – September 2011