SlideShare a Scribd company logo
1 of 11
Download to read offline
Reverse Engineering of Data Models from
Legacy Spreadsheets-Based Systems:
An Industrial Case Study
discussion paper
Domenico Amalfitano
Anna Rita Fasolino
Valerio Maggio
Porfirio Tramontana
Vincenzo De Simone
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Spreadsheet Based Information
System Issues
 Spreadsheets are designed only for
computing purposes and commercial
applications but …
 … very often they are used as
Information Systems
◦ Very difficult to maintain
 High rate of duplicated data between different
sheets and files
 The first and more critical step of a
migration process is the Data
Reengineering
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Case Study
 An automotive company collects the specification of the tests
executed on the vehicles in form of Test Patterns
◦ Test Patterns are implemented in Excel files following a common
template
 We have 30,615 different Excel files with 2,700 data cells on
average
◦ There is a high rate of replication data
 50% of data cells recurred more than 100 times
 Excel Test Patterns represent the input of an automatic test
generation process
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Data Model Reverse Engineering
 Data Model Reverse Engineering is the first
step of a more general migration process
towards a Web MVC architecture
 An heuristic based approach to infer the
Data Model was proposed.
 A set of 26 heuristics were considered.
◦ 11 heuristics derived from the literature and were
adapted to work in this specific context.
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Data Model Reverse Engineering
 Heuristics can be grouped in two main
classes:
◦ Structure based rules (SBRs)
◦ Information based rules (IBRs)
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Structure based rules (SBRs)
 SBRs analyze the structure and the
properties of spreadsheets and their
components, such as sheets, cells,
cell headers, etc.
◦ Used to abstract the set of candidate
classes and their relationships;
◦ Applied to a single Excel File.
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Example of SBR
Rule:
If the spreadsheet
contains more than one
sheet, then it is possible to
associate the spreadsheet
to a class C and each
component sheet to a
distinct class Si, where C
has a UML composition
relationship with each Si.
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Example of SBR
Rule:
If a sheet S contains sets of
consecutive non-empty cells (hereafter
non-empty cell area) that are well
delimited from each other by means of
empty cells, then it is possible to
associate each non-empty cell area to
a single class Ci and the sheet S to a
candidate class CS, where S has a
UML composition relationship with
each Ci.
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Information based rules
(IBRs)
 IBRs analyze the informative content
of the cells by looking for repeated
data, synonyms, and cells containing
well-defined data structures such as
array strings, integer matrixes, etc.
◦ Used to infer the attributes of classes, the
relationships between classes and their
cardinalities.
◦ Applied to all the Excel Files
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Example of IBR
Rule:
If the header cells of the
columns that discriminated
the extraction of a given class
A assume the same textual
content in all the
spreadsheets, then these
values may be considered
attributes of that class.
SEBD 2014 – Castellammare di Stabia – 16/6/2014
Process Execution and
Results
 Selected groups of rules were iteratively
applied to the spreadsheets.
 Sets of candidate classes and relationships
were automatically proposed.
 The data model made by 18 classes, 27
relationships, and 95 attributes was
reconstructed at the end of the process.
 Candidates were submitted to domain
experts who chose to accept, to refine or to
reject them.
◦ Experts accepted 75% of candidates inferred by
means of SBRs and 33% of candidates inferred
by IBRs
SEBD 2014 – Castellammare di Stabia – 16/6/2014

More Related Content

What's hot

Iwsm2014 understanding functional reuse of erp (maya daneva) - public release
Iwsm2014   understanding functional reuse of erp (maya daneva) - public releaseIwsm2014   understanding functional reuse of erp (maya daneva) - public release
Iwsm2014 understanding functional reuse of erp (maya daneva) - public release
Nesma
 
Formulas in ms excel for statistics(report2 in ict math ed)
Formulas in ms excel for statistics(report2 in ict math ed)Formulas in ms excel for statistics(report2 in ict math ed)
Formulas in ms excel for statistics(report2 in ict math ed)
Caryl Mae Puertollano
 

What's hot (20)

Spreadsheet ml subject calc chain
Spreadsheet ml subject   calc chainSpreadsheet ml subject   calc chain
Spreadsheet ml subject calc chain
 
Missing data
Missing dataMissing data
Missing data
 
[PDF] Optimization Modeling with Spreadsheets
[PDF] Optimization Modeling with Spreadsheets[PDF] Optimization Modeling with Spreadsheets
[PDF] Optimization Modeling with Spreadsheets
 
Ms excel
Ms excelMs excel
Ms excel
 
ECON 306 Homework 3
ECON 306 Homework 3ECON 306 Homework 3
ECON 306 Homework 3
 
Use of-Excel
Use of-ExcelUse of-Excel
Use of-Excel
 
Data structure (basics)
Data structure (basics)Data structure (basics)
Data structure (basics)
 
CVpro
CVproCVpro
CVpro
 
Spreadsheet Purposes
Spreadsheet PurposesSpreadsheet Purposes
Spreadsheet Purposes
 
softwares in public health
softwares in public healthsoftwares in public health
softwares in public health
 
Interactive data visualization project
Interactive data visualization project Interactive data visualization project
Interactive data visualization project
 
Statistical software
Statistical softwareStatistical software
Statistical software
 
Iwsm2014 understanding functional reuse of erp (maya daneva) - public release
Iwsm2014   understanding functional reuse of erp (maya daneva) - public releaseIwsm2014   understanding functional reuse of erp (maya daneva) - public release
Iwsm2014 understanding functional reuse of erp (maya daneva) - public release
 
spreadsheet
spreadsheetspreadsheet
spreadsheet
 
Linked list
Linked listLinked list
Linked list
 
Formulas in ms excel for statistics(report2 in ict math ed)
Formulas in ms excel for statistics(report2 in ict math ed)Formulas in ms excel for statistics(report2 in ict math ed)
Formulas in ms excel for statistics(report2 in ict math ed)
 
Spreadsheet Introduction - R.D.Sivakumar
Spreadsheet Introduction - R.D.SivakumarSpreadsheet Introduction - R.D.Sivakumar
Spreadsheet Introduction - R.D.Sivakumar
 
Introduction
IntroductionIntroduction
Introduction
 
How to merge and combine rows without losing data in excel
How to merge and combine rows without losing data in excelHow to merge and combine rows without losing data in excel
How to merge and combine rows without losing data in excel
 
Resume
ResumeResume
Resume
 

Similar to Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An Industrial Case Study

Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02
Jotham Gadot
 
02010 ppt ch02
02010 ppt ch0202010 ppt ch02
02010 ppt ch02
Hpong Js
 
Syllabus mca 2 rdbms i
Syllabus mca 2 rdbms iSyllabus mca 2 rdbms i
Syllabus mca 2 rdbms i
emailharmeet
 
DSA_Module 1-PPT for engineering students
DSA_Module 1-PPT for engineering studentsDSA_Module 1-PPT for engineering students
DSA_Module 1-PPT for engineering students
riotsush12
 

Similar to Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An Industrial Case Study (20)

Migrating Legacy Spreadsheets-based Systems to Web MVC architecture: an Indus...
Migrating Legacy Spreadsheets-based Systems to Web MVC architecture: an Indus...Migrating Legacy Spreadsheets-based Systems to Web MVC architecture: an Indus...
Migrating Legacy Spreadsheets-based Systems to Web MVC architecture: an Indus...
 
PPT Lecture 2.1 and 2.2 DataModels.ppt
PPT Lecture 2.1 and 2.2 DataModels.pptPPT Lecture 2.1 and 2.2 DataModels.ppt
PPT Lecture 2.1 and 2.2 DataModels.ppt
 
Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02
 
DBMS.ppt
DBMS.pptDBMS.ppt
DBMS.ppt
 
02010 ppt ch02
02010 ppt ch0202010 ppt ch02
02010 ppt ch02
 
Syllabus mca 2 rdbms i
Syllabus mca 2 rdbms iSyllabus mca 2 rdbms i
Syllabus mca 2 rdbms i
 
Overview of Databases and Data Modelling-2.pdf
Overview of Databases and Data Modelling-2.pdfOverview of Databases and Data Modelling-2.pdf
Overview of Databases and Data Modelling-2.pdf
 
DSA_Module 1-PPT for engineering students
DSA_Module 1-PPT for engineering studentsDSA_Module 1-PPT for engineering students
DSA_Module 1-PPT for engineering students
 
Archetype-based data transformation with LinkEHR
Archetype-based data transformation with LinkEHRArchetype-based data transformation with LinkEHR
Archetype-based data transformation with LinkEHR
 
data-modeling-paper
data-modeling-paperdata-modeling-paper
data-modeling-paper
 
Exam – june 2010 – qp 11
Exam – june 2010 – qp 11Exam – june 2010 – qp 11
Exam – june 2010 – qp 11
 
Data models
Data modelsData models
Data models
 
Data models
Data modelsData models
Data models
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIESENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 
Enhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiesEnhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologies
 
SECh78
SECh78SECh78
SECh78
 
Modularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space ExplorationModularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space Exploration
 
CS3492 - Database Management System Syallabus - Regulation 2021 for CSE.docx
CS3492 - Database Management System Syallabus - Regulation 2021 for CSE.docxCS3492 - Database Management System Syallabus - Regulation 2021 for CSE.docx
CS3492 - Database Management System Syallabus - Regulation 2021 for CSE.docx
 
Bulk ieee projects 2012 2013
Bulk ieee projects 2012 2013Bulk ieee projects 2012 2013
Bulk ieee projects 2012 2013
 

More from Porfirio Tramontana

Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Porfirio Tramontana
 
Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...
Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...
Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...
Porfirio Tramontana
 

More from Porfirio Tramontana (20)

An Approach for Model Based Testing of Augmented Reality Applications.pdf
An Approach for Model Based Testing of Augmented Reality Applications.pdfAn Approach for Model Based Testing of Augmented Reality Applications.pdf
An Approach for Model Based Testing of Augmented Reality Applications.pdf
 
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
Towards the Generation of Robust E2E Test Cases in Template-based Web Applica...
 
Techniques and Tools for Mobile Testing Automation
Techniques and Tools for Mobile Testing AutomationTechniques and Tools for Mobile Testing Automation
Techniques and Tools for Mobile Testing Automation
 
A technique for parallel gui testing of android applications
A technique for parallel gui testing of android applicationsA technique for parallel gui testing of android applications
A technique for parallel gui testing of android applications
 
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
Reverse Engineering Techniques: from Web Applications to Rich Internet Applic...
 
Web Application Testing in Fifteen Years of WSE
Web Application Testing in Fifteen Years of WSEWeb Application Testing in Fifteen Years of WSE
Web Application Testing in Fifteen Years of WSE
 
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
 
Comprehending Web Applications by a Clustering Based Approach
Comprehending Web Applications by a Clustering Based Approach Comprehending Web Applications by a Clustering Based Approach
Comprehending Web Applications by a Clustering Based Approach
 
Reverse Engineering Web Applications
Reverse Engineering Web ApplicationsReverse Engineering Web Applications
Reverse Engineering Web Applications
 
Recovering Interaction Design Patterns in Web Applications
Recovering Interaction Design Patterns in Web Applications Recovering Interaction Design Patterns in Web Applications
Recovering Interaction Design Patterns in Web Applications
 
Improving Usability of Web Pages for Blind
Improving Usability of Web Pages for BlindImproving Usability of Web Pages for Blind
Improving Usability of Web Pages for Blind
 
Techniques and Tools for Rich Internet Applications Testing
Techniques and Tools for Rich Internet Applications TestingTechniques and Tools for Rich Internet Applications Testing
Techniques and Tools for Rich Internet Applications Testing
 
Comprehending Ajax Web Applications by the DynaRIA Tool
Comprehending Ajax Web Applications by the DynaRIA ToolComprehending Ajax Web Applications by the DynaRIA Tool
Comprehending Ajax Web Applications by the DynaRIA Tool
 
A GUI Crawling-based Technique for Android Mobile Application Testing
A GUI Crawling-based Technique for Android Mobile Application TestingA GUI Crawling-based Technique for Android Mobile Application Testing
A GUI Crawling-based Technique for Android Mobile Application Testing
 
Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...
Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...
Using Dynamic Analysis for Generating End User Documentation for Web 2.0 Appl...
 
Considering Context Events in Event-Based Testing of Mobile Applications
Considering Context Events in Event-Based Testing of Mobile Applications Considering Context Events in Event-Based Testing of Mobile Applications
Considering Context Events in Event-Based Testing of Mobile Applications
 
Using GUI Ripping for Automated Testing of Android Apps
Using GUI Ripping for Automated Testing of Android AppsUsing GUI Ripping for Automated Testing of Android Apps
Using GUI Ripping for Automated Testing of Android Apps
 
A Toolset for GUI Testing of Android Applications
A Toolset for GUI Testing of Android ApplicationsA Toolset for GUI Testing of Android Applications
A Toolset for GUI Testing of Android Applications
 
DynaRIA: a Tool for Ajax Web Application Comprehension
DynaRIA: a Tool for Ajax Web Application ComprehensionDynaRIA: a Tool for Ajax Web Application Comprehension
DynaRIA: a Tool for Ajax Web Application Comprehension
 
Rich Internet Application Testing Using Execution Trace Data
Rich Internet Application Testing  Using Execution Trace Data Rich Internet Application Testing  Using Execution Trace Data
Rich Internet Application Testing Using Execution Trace Data
 

Recently uploaded

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
ZurliaSoop
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 

Recently uploaded (19)

"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
2024 mega trends for the digital workplace - FINAL.pdf
2024 mega trends for the digital workplace - FINAL.pdf2024 mega trends for the digital workplace - FINAL.pdf
2024 mega trends for the digital workplace - FINAL.pdf
 
Using AI to boost productivity for developers
Using AI to boost productivity for developersUsing AI to boost productivity for developers
Using AI to boost productivity for developers
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...
Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...
Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNLITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESBIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxBEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
 
History of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathHistory of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth death
 
ECOLOGY OF FISHES.pptx full presentation
ECOLOGY OF FISHES.pptx full presentationECOLOGY OF FISHES.pptx full presentation
ECOLOGY OF FISHES.pptx full presentation
 
Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20
 
The Concession of Asaba International Airport: Balancing Politics and Policy ...
The Concession of Asaba International Airport: Balancing Politics and Policy ...The Concession of Asaba International Airport: Balancing Politics and Policy ...
The Concession of Asaba International Airport: Balancing Politics and Policy ...
 

Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An Industrial Case Study

  • 1. Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An Industrial Case Study discussion paper Domenico Amalfitano Anna Rita Fasolino Valerio Maggio Porfirio Tramontana Vincenzo De Simone SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 2. Spreadsheet Based Information System Issues  Spreadsheets are designed only for computing purposes and commercial applications but …  … very often they are used as Information Systems ◦ Very difficult to maintain  High rate of duplicated data between different sheets and files  The first and more critical step of a migration process is the Data Reengineering SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 3. Case Study  An automotive company collects the specification of the tests executed on the vehicles in form of Test Patterns ◦ Test Patterns are implemented in Excel files following a common template  We have 30,615 different Excel files with 2,700 data cells on average ◦ There is a high rate of replication data  50% of data cells recurred more than 100 times  Excel Test Patterns represent the input of an automatic test generation process SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 4. Data Model Reverse Engineering  Data Model Reverse Engineering is the first step of a more general migration process towards a Web MVC architecture  An heuristic based approach to infer the Data Model was proposed.  A set of 26 heuristics were considered. ◦ 11 heuristics derived from the literature and were adapted to work in this specific context. SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 5. Data Model Reverse Engineering  Heuristics can be grouped in two main classes: ◦ Structure based rules (SBRs) ◦ Information based rules (IBRs) SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 6. Structure based rules (SBRs)  SBRs analyze the structure and the properties of spreadsheets and their components, such as sheets, cells, cell headers, etc. ◦ Used to abstract the set of candidate classes and their relationships; ◦ Applied to a single Excel File. SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 7. Example of SBR Rule: If the spreadsheet contains more than one sheet, then it is possible to associate the spreadsheet to a class C and each component sheet to a distinct class Si, where C has a UML composition relationship with each Si. SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 8. Example of SBR Rule: If a sheet S contains sets of consecutive non-empty cells (hereafter non-empty cell area) that are well delimited from each other by means of empty cells, then it is possible to associate each non-empty cell area to a single class Ci and the sheet S to a candidate class CS, where S has a UML composition relationship with each Ci. SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 9. Information based rules (IBRs)  IBRs analyze the informative content of the cells by looking for repeated data, synonyms, and cells containing well-defined data structures such as array strings, integer matrixes, etc. ◦ Used to infer the attributes of classes, the relationships between classes and their cardinalities. ◦ Applied to all the Excel Files SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 10. Example of IBR Rule: If the header cells of the columns that discriminated the extraction of a given class A assume the same textual content in all the spreadsheets, then these values may be considered attributes of that class. SEBD 2014 – Castellammare di Stabia – 16/6/2014
  • 11. Process Execution and Results  Selected groups of rules were iteratively applied to the spreadsheets.  Sets of candidate classes and relationships were automatically proposed.  The data model made by 18 classes, 27 relationships, and 95 attributes was reconstructed at the end of the process.  Candidates were submitted to domain experts who chose to accept, to refine or to reject them. ◦ Experts accepted 75% of candidates inferred by means of SBRs and 33% of candidates inferred by IBRs SEBD 2014 – Castellammare di Stabia – 16/6/2014