Semiotics in spreadsheets

Semiotics in Spreadsheets:
Enhancing Semantic
Interoperability
Ivelize Rocha Bernardo
André Santanchè

Outline
•Motivation
•Research Problems
•Related Work
•What I did in my Master Degree
•Limitations of the Master Degree Proposal
•Which are the plans to the PhD

Motivation
Large amount of information in spreadsheets
[Syed et al., 2010]

Motivation
Large amount of information in spreadsheets
[Syed et al., 2010]
Why?
•They are intuitive
•They have high flexibility -> diverse needs

Motivation
However, they were designed for:
•Isolated use
•Human reading

Research Goal
The main goal of our research is to promote a
richer semantic interoperability among
spreadsheets

Interoperability
(Ouksel & Sheth 1999)
system interoperability
syntactic interoperability
structural interoperability
semantic interoperability

Interoperability
(Ouksel & Sheth 1999)
system interoperability
structural interoperability
(Tolk 2006)
no interoperability
technical interoperability
pragmatic interoperability
dynamic interoperability
conceptual interoperability

Interoperability
semantic interoperability semantic interoperability
pragmatic interoperability
dynamic interoperability
conceptual interoperability
Data Interpretation

Which elements must be
considered in this
interpretation process?

Which elements must be
considered in this
interpretation process?
Unity Interpretation

Related Work
isolated label
(Han et al,. 2008) - RDF123: from spreadsheets to RDF, The Semantic Web. Lecture Notes in Computer Science, vol. 5318. Springer
(Langegger & Wolfram, 2009) - XLWrap Querying and Integrating Arbitrary Spreadsheets with SPARQL, The Semantic Web. Lecture
Notes in Computer Science, vol. 5823. Springer

Related Work
template
(Abraham & Erwig, 2006) - Inferring Templates from Spreadsheets, Proceedings of the International Conference on Software Engineering

Related Work
instances
(Zhao et al, 2010) - A spreadsheet system based on data semantic object, IEEE International Conference on Information Management and
Engineering

Related Work
isolated label associated to
linked data
(Syed et al., 2010) - Exploiting a Web of Semantic Data for Interpreting Tables, Proceedings of the Web Science Conference

Related Work
correlation of labels
associated to linked data
(Venetis et al., 2011) - Recovering Semantics of Tables on the Web, Proceedings of the VLDB Endowment
(Mulwad et al., 2010) - Using linked data to interpret tables, Proceedings of the International Workshop on Consuming Linked Data

Related Work
correlation between several
spreadsheet elements
associated to linked data
(Limaye, 2010) - Annotating and Searching Web Tables Using Entities, Proceedings of the VLDB Endowment

How far the system can
interpret, considering labels and
their correlations?

How much different they are in
fact?

What I did in my Master
Degree

Research Strategy
1. To identify construction patterns followed by biologists
during the creation of these spreadsheets
2. To verify if these construction patterns could lead us to
recognition of the spreadsheet purpose
3. To achieve a semantic interoperability among these
spreadsheets

How to identify Construction Patterns
*

*
what

*
what
what

*
what
whatwhen

*
what
what wherewhen

Construction Patterns
*
catalogue

Construction Patterns
*
catalogue
collection

Architecture Evaluation
Automatic analysis of 11,150 spreadsheets
the system recognized 1,151 spreadsheets
806 spreadsheets were classified as catalogue
345 spreadsheets were classified as collection
Total: 748,459 records analyzed
*

Architecture Evaluation - Results
• Random subset of 1,203 spreadsheets was
selected to evaluate precision/recall
– Precision: 0.84
– Recall: 0.76
– Specificity: 0.95
*

Limitation of the Master
Degree Proposal

Main Limitations● Single Domain
Specific spreadsheets (catalogue and
collection)
● Lack of a Model to represent
construction patterns
○ after, model for construction
patterns isolated for each other
● Linking labels to ontologies
○ not able to aggregate different
labels belonging to the same
concept
○ the ontology was selected by us, it
is not necessarily the best
representation for spreadsheets'
data

● Single Domain
○ Specific spreadsheets (catalogue
and collection)
● Lack of a Model to represent
construction patterns
○ after, model for construction
patterns isolated for each other
● Linking labels to ontologies
○ not able to aggregate different
labels belonging to the same
concept
○ the ontology was selected by us, it
is not necessarily the best
representation for spreadsheets'
data
● Multiple Domains
● Model as an association
network
○ relates elements and
concepts of several
spreadsheets
● Linking spreadsheet structure
to ontologies
○ the link is made between
concepts

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
MOSES
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
MOSES
M_MZ_sample1
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
MOSES
M_MZ_sample1
ura
Saccharomyces_
cerevisiae
4932
CEN.PK-113-7D
ura3
6,5 0,1 37 0,5 oC
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Semantic Interoperability
among Spreadsheets

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
ID
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
ID
time
rel.
glu.
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
ID
time
rel.
glu.
geno
type
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
ID
time
rel.
glu.
geno
type
trea.
tre.
val.
SD
Unit
tre.
val.
SD
Unit

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
ID
time
rel.
glu.
geno
type
trea.
tre.
val.
SD
Unit
tre.
val.
SD
Unittre.
val.

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
ID
time
rel.
glu.
geno
type
trea.
SD
Unit
tre.
val.
SD
Unittre.
val.

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
tre.
val.
SD
Unit
tre.
val.
SD
Unit
ID
time
rel.
glu.
geno
type
trea.

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
tre.
val.
SD
Unit
tre.
val.
SD
Unit
ID
time
rel.
glu.
geno
type
trea.
Spreadsheet
Purpose

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
tre.
val.
SD
Unit
tre.
val.
SD
Unit
ID
time
rel.
glu.
geno
type
trea.
Spreadsheet
Purpose
Spreadsheet
Domain

Data Model
Spreadsheets Semiotic Sign

Data Model
signifierstructural
form

Data Model
signifier signifiedstructural
form
spreadsheet
purpose
+
semantic
spreadsheet data

Start
SEEK
proj.
title
nam.
org.
NCBI
ID
stra.
gene
nam.
Mod.
type
phe.
com.
tre1.
ph
tre2.
tem.
End
tre.
val.
SD
Unit
tre.
val.
SD
Unit
ID
time
rel.
glu.
geno
type
trea.
Spreadsheet
Purpose
Spreadsheet
Domain
Start
XYZ
How to devise different domains
when the networks are
interconnected?
Research
Challenge
Spreadsheet
Domain
Spreadsheet
Purpose

Research Questions
• When spreadsheets could be considered of the
same purpose?
• Is there a canonical representation among
spreadsheets of the same purpose?
• Is it possible to define a canonical representation
for a spreadsheet group
• Can this representation be used to predict
spreadsheets of a given purpose?

Acknowledgements
● Laboratory of Information Systems (LIS)
● UNICAMP
● FAPESP
● Microsoft Research FAPESP Virtual Institute
(NavScales project)
● CNPq (MuZOO Project and PRONEX-FAPESP)
● INCT in Web Science(CNPq 557.128/2009-9)
● CAPES

Semiotics in spreadsheets

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Semiotics in spreadsheets

Similar to Semiotics in spreadsheets (20)

Recently uploaded

Recently uploaded (20)

Semiotics in spreadsheets